diff --git a/docs/api/api_python/nn/mindspore.nn.Conv1d.rst b/docs/api/api_python/nn/mindspore.nn.Conv1d.rst
index abb89363572..6f020b138b4 100644
--- a/docs/api/api_python/nn/mindspore.nn.Conv1d.rst
+++ b/docs/api/api_python/nn/mindspore.nn.Conv1d.rst
@@ -13,7 +13,7 @@ mindspore.nn.Conv1d
         \sum_{k = 0}^{C_{in} - 1} \text{ccor}({\text{weight}(C_{\text{out}_j}, k), \text{X}(N_i, k)})
 
     其中， :math:`ccor` 为 `cross-correlation <https://en.wikipedia.org/wiki/Cross-correlation>`_ ， :math:`C_{in}` 为输入空间维度， :math:`out_{j}` 对应输出的第 :math:`j` 个空间维度，:math:`j` 的范围在 :math:`[0，C_{out}-1]` 内，
-    :math:`\text{weight}(C_{\text{out}_j}, k)` 是shape为 :math:`(kernel_size)` 的卷积核切片，其中 :math:`\text{kernel_size}` 是卷积核的宽度。 :math:`\text{bias}` 为偏置参数。
+    :math:`\text{weight}(C_{\text{out}_j}, k)` 是shape为 :math:`(kernel_size)` 的卷积核切片，其中 :math:`\text{kernel_size}` 是卷积核的宽度。 :math:`\text{bias}` 为偏置参数， :math:`\text{X}` 为输入Tensor。
     完整卷积核的shape为 :math:`(C_{out}, C_{in} / \text{group}, \text{kernel_size})` ，其中 `group` 是在空间维度上分割输入 `x` 的组数。
     详细介绍请参考论文 `Gradient Based Learning Applied to Document Recognition <http://vision.stanford.edu/cs598_spring07/papers/Lecun98.pdf>`_ 。
     
@@ -22,18 +22,18 @@ mindspore.nn.Conv1d
     - **in_channels** (int) - Conv1d层输入Tensor的空间维度。
     - **out_channels** (int) - Conv1d层输出Tensor的空间维度。
     - **kernel_size** (int) - 指定一维卷积核的宽度。
-    - **stride** (int) - 卷积核的移动步长，默认值：1。
+    - **stride** (int) - 一维卷积核的移动步长，默认值：1。
     - **pad_mode** (str) - 指定填充模式。可选值为 "same"，"valid"，"pad"。默认值："same"。
 
       - same：输出的宽度与输入整除 `stride` 后的值相同。若设置该模式，`padding` 的值必须为0。
       - valid：在不填充的前提下返回有效计算所得的输出。不满足计算的多余像素会被丢弃。如果设置此模式，则 `padding` 的值必须为0。
-      - pad：对输入进行填充。在输入对两侧填充 `padding` 大小的0。如果设置此模式， `padding` 必须大于或等于0。
+      - pad：对输入进行填充。在输入对两侧填充 `padding` 大小的0。如果设置此模式， `padding` 的值必须大于或等于0。
 
     - **padding** (int) - 输入两侧填充的数量。值应该要大于等于0，默认值：0。
-    - **dilation** (int) - 卷积核膨胀尺寸。若 :math:`k > 1` ，则kernel间隔 `k` 个元素进行采样。 `k` 取值范围为[1, L]。默认值：1。
-    - **group** (int) - 将过滤器拆分为组， `in_ channels` 和 `out_channels` 必须可被 `group` 整除。默认值：1。
+    - **dilation** (int) - 一维卷积核膨胀尺寸。若 :math:`k > 1` ，则kernel间隔 `k` 个元素进行采样。 ｀k｀ 取值范围为[1, L]。默认值：1。
+    - **group** (int) - 将过滤器拆分为组， `in_channels` 和 `out_channels` 必须可被 `group` 整除。默认值：1。
     - **has_bias** (bool) - Conv1d层是否添加偏置参数。默认值：False。
-    - **weight_init** (Union[Tensor, str, Initializer, numbers.Number]) - 权重矩阵的初始化方法。它可以是Tensor，str，Initializer或numbers.Number。当使用str时，可选"TruncatedNormal"，"Normal"，"Uniform"，"HeUniform"和"XavierUniform"分布以及常量"One"和"Zero"分布的值，可接受别名"xavier_uniform"，"he_uniform"，"ones"和"zeros"。上述字符串大小写均可。更多细节请参考Initializer的值。默认值："normal"。
+    - **weight_init** (Union[Tensor, str, Initializer, numbers.Number]) - 权重参数的初始化方法。它可以是Tensor，str，Initializer或numbers.Number。当使用str时，可选"TruncatedNormal"，"Normal"，"Uniform"，"HeUniform"和"XavierUniform"分布以及常量"One"和"Zero"分布的值，可接受别名"xavier_uniform"，"he_uniform"，"ones"和"zeros"。上述字符串大小写均可。更多细节请参考Initializer的值。默认值："normal"。
     - **bias_init** (Union[Tensor, str, Initializer, numbers.Number]) - 偏置参数的初始化方法。可以使用的初始化方法与"weight_init"相同。更多细节请参考Initializer的值。默认值："zeros"。
 
     **输入：**
diff --git a/docs/api/api_python/nn/mindspore.nn.Conv1dTranspose.rst b/docs/api/api_python/nn/mindspore.nn.Conv1dTranspose.rst
index 466bb336d21..a403ba53f41 100644
--- a/docs/api/api_python/nn/mindspore.nn.Conv1dTranspose.rst
+++ b/docs/api/api_python/nn/mindspore.nn.Conv1dTranspose.rst
@@ -16,18 +16,18 @@ mindspore.nn.Conv1dTranspose
     - **in_channels** (int) - Conv1dTranspose层输入Tensor的空间维度。
     - **out_channels** (int) - Conv1dTranspose层输出Tensor的空间维度。
     - **kernel_size** (int) - 指定一维卷积核的宽度。
-    - **stride** (int) - 卷积核的移动步长，默认值：1。
+    - **stride** (int) - 一维卷积核的移动步长，默认值：1。
     - **pad_mode** (str) - 指定填充模式。可选值为"same"、"valid"、"pad"。默认值："same"。
 
       - same：输出的宽度与输入整除 `stride` 后的值相同。若设置该模式， `padding` 的值必须为0。
       - valid：在不填充的前提下返回有效计算所得的输出。不满足计算的多余像素会被丢弃。如果设置此模式，则 `padding` 的值必须为0。
-      - pad：对输入进行填充。 在输入对两侧填充 `padding` 大小的0。如果设置此模式， `padding` 必须大于或等于0。
+      - pad：对输入进行填充。在输入对两侧填充 `padding` 大小的0。如果设置此模式， `padding` 必须大于或等于0。
 
     - **padding** (int) - 输入两侧填充的数量。默认值：0。
-    - **dilation** (int) - 卷积核膨胀尺寸。若 :math:`k > 1` ，则kernel间隔 `k` 个元素进行采样。 `k` 取值范围为[1, L]。默认值：1。
-    - **group** (int) - 将过滤器拆分为组， `in_ channels` 和 `out_channels` 必须可被 `group` 整除。当 `group` 大于1时，暂不支持Ascend平台。默认值：1。
+    - **dilation** (int) - 一维卷积核膨胀尺寸。若 :math:`k > 1` ，则kernel间隔 `k` 个元素进行采样。 ｀k｀ 取值范围为[1, L]。默认值：1。
+    - **group** (int) - 将过滤器拆分为组， `in_channels` 和 `out_channels` 必须可被 `group` 整除。当 `group` 大于1时，暂不支持Ascend平台。默认值：1。
     - **has_bias** (bool) - Conv1dTranspose层是否添加偏置参数。默认值：False。
-    - **weight_init** (Union[Tensor, str, Initializer, numbers.Number]) - 权重矩阵的初始化方法。它可以是Tensor，str，Initializer或numbers.Number。当使用str时，可选"TruncatedNormal"，"Normal"，"Uniform"，"HeUniform"和"XavierUniform"分布以及常量"One"和"Zero"分布的值，可接受别名"xavier_uniform"，"he_uniform"，"ones"和"zeros"。上述字符串大小写均可。更多细节请参考Initializer的值。默认值："normal"。
+    - **weight_init** (Union[Tensor, str, Initializer, numbers.Number]) - 权重参数的初始化方法。它可以是Tensor，str，Initializer或numbers.Number。当使用str时，可选"TruncatedNormal"，"Normal"，"Uniform"，"HeUniform"和"XavierUniform"分布以及常量"One"和"Zero"分布的值，可接受别名"xavier_uniform"，"he_uniform"，"ones"和"zeros"。上述字符串大小写均可。更多细节请参考Initializer的值。默认值："normal"。
     - **bias_init** (Union[Tensor, str, Initializer, numbers.Number]) - 偏置参数的初始化方法。可以使用的初始化方法与"weight_init"相同。更多细节请参考Initializer的值。默认值："zeros"。
 
     **输入：**
diff --git a/docs/api/api_python/nn/mindspore.nn.Conv3d.rst b/docs/api/api_python/nn/mindspore.nn.Conv3d.rst
index 20eb1dede24..bb41308a8c3 100644
--- a/docs/api/api_python/nn/mindspore.nn.Conv3d.rst
+++ b/docs/api/api_python/nn/mindspore.nn.Conv3d.rst
@@ -12,7 +12,7 @@ mindspore.nn.Conv3d
         \sum_{k = 0}^{C_{in} - 1} \text{ccor}({\text{weight}(C_{\text{out}_j}, k), \text{X}(N_i, k)})
 
     其中，:math:`ccor` 为 `cross-correlation <https://en.wikipedia.org/wiki/Cross-correlation>`_ ， :math:`C_{in}` 为输入空间维度， :math:`out_{j}` 对应输出的第 :math:`j` 个空间维度，:math:`j` 的范围在 :math:`[0，C_{out}-1]` 内，
-    :math:`\text{weight}(C_{\text{out}_j}, k)`是shape为 :math:`(\text{kernel_size[0]}, \text{kernel_size[1]}, \text{kernel_size[2]})` 的卷积核切片，其中 :math:`\text{kernel_size[0]}` , :math:`\text{kernel_size[1]}和 :math:`\text{kernel_size[2]}` 是卷积核的深度、高度和宽度。 :math:`\text{bias}` 为偏置参数。
+    :math:`\text{weight}(C_{\text{out}_j}, k)`是shape为 :math:`(\text{kernel_size[0]}, \text{kernel_size[1]}, \text{kernel_size[2]})` 的卷积核切片，其中 :math:`\text{kernel_size[0]}` , :math:`\text{kernel_size[1]}和 :math:`\text{kernel_size[2]}` 是卷积核的深度、高度和宽度。 :math:`\text{bias}` 为偏置参数， :math:`\text{X}` 为输入Tensor。
     完整卷积核的shape为 :math:`(C_{out}, C_{in} / \text{group}, \text{kernel_size[0]}, \text{kernel_size[1]}, \text{kernel_size[2]})` ，其中 `group` 是在空间维度上分割输入 `x` 的组数。
     详细介绍请参考论文 `Gradient Based Learning Applied to Document Recognition <http://vision.stanford.edu/cs598_spring07/papers/Lecun98.pdf>`_ 。
 
@@ -21,18 +21,18 @@ mindspore.nn.Conv3d
     - **in_channels** (int) - Conv3d层输入Tensor的空间维度。
     - **out_channels** (int) - Conv3d层输出Tensor的空间维度。
     - **kernel_size** (Union[int, tuple[int]]) - 指定三维卷积核的深度、高度和宽度。数据类型为int或包含三个整数的tuple。一个整数表示卷积核的深度、高度和宽度均为该值该值。包含三个整数的tuple分别表示卷积核的深度、高度和宽度。
-    - **stride** (Union[int, tuple[int]]) - 卷积核的移动步长。数据类型为整型或三个整型的tuple。一个整数表示在深度、高度和宽度方向的移动步长均为该值。三个整数的tuple分别表示在深度、高度和宽度方向的移动步长。默认值：1。
+    - **stride** (Union[int, tuple[int]]) - 三维卷积核的移动步长。数据类型为整型或三个整型的tuple。一个整数表示在深度、高度和宽度方向的移动步长均为该值。三个整数的tuple分别表示在深度、高度和宽度方向的移动步长。默认值：1。
     - **pad_mode** (str) - 指定填充模式。可选值为"same"、"valid"、"pad"。默认值："same"。
 
       - same：输出的深度、高度和宽度分别与输入整除 `stride` 后的值相同。若设置该模式，`padding` 的值必须为0。
       - valid：在不填充的前提下返回有效计算所得的输出。不满足计算的多余像素会被丢弃。如果设置此模式，则 `padding` 的值必须为0。
-      - pad：对输入进行填充。 在输入的前后、垂直和左右方向上填充 `padding` 大小的0。如果设置此模式， `padding` 必须大于或等于0。
+      - pad：对输入进行填充。 在输入的深度、高度和宽度方向上填充 `padding` 大小的0。如果设置此模式， `padding` 必须大于或等于0。
 
-    - **padding** (Union(int, tuple[int])) - 输入的前后、垂直和左右方向上填充的数量。数据类型为int或包含6个整数的tuple。如果 `padding` 是一个整数，则前部、后部、顶部，底部，左边和右边的填充都等于 `padding` 。如果 `padding` 是6个整数的tuple，则前部、尾部、顶部、底部、左边和右边的填充分别等于填充padding[0]、padding[1]、padding[2]、padding[3]、padding[4]和padding[5]。值应该要大于等于0，默认值：0。
-    - **dilation** (Union[int, tuple[int]]) - 卷积核膨胀尺寸。数据类型为int或三个整数的tuple。若 :math:`k > 1` ，则kernel间隔 `k` 个元素进行采样。前后、垂直和左右方向上的 ｀k｀ ，其取值范围分别为[1, D]、[1, H]和[1, W]。默认值：1。
-    - **group** (int) - 将过滤器拆分为组， `in_ channels` 和 `out_channels` 必须可被 `group` 整除。默认值：1。当前仅支持1个。
+    - **padding** (Union(int, tuple[int])) - 输入的深度、高度和宽度方向上填充的数量。数据类型为int或包含6个整数的tuple。如果 `padding` 是一个整数，则前部、后部、顶部，底部，左边和右边的填充都等于 `padding` 。如果 `padding` 是6个整数的tuple，则前部、尾部、顶部、底部、左边和右边的填充分别等于填充padding[0]、padding[1]、padding[2]、padding[3]、padding[4]和padding[5]。值应该要大于等于0，默认值：0。
+    - **dilation** (Union[int, tuple[int]]) - 三维卷积核膨胀尺寸。数据类型为int或三个整数的tuple。若 :math:`k > 1` ，则kernel间隔 `k` 个元素进行采样。深度、高度和宽度方向上的 ｀k｀ ，其取值范围分别为[1, D]、[1, H]和[1, W]。默认值：1。
+    - **group** (int) - 将过滤器拆分为组， `in_channels` 和 `out_channels` 必须可被 `group` 整除。默认值：1。当前仅支持1。
     - **has_bias** (bool) - Conv3d层是否添加偏置参数。默认值：False。
-    - **weight_init** (Union[Tensor, str, Initializer, numbers.Number]) - 权重矩阵的初始化方法。它可以是Tensor，str，Initializer或numbers.Number。当使用str时，可选"TruncatedNormal"，"Normal"，"Uniform"，"HeUniform"和"XavierUniform"分布以及常量"One"和"Zero"分布的值，可接受别名"xavier_uniform"，"he_uniform"，"ones"和"zeros"。上述字符串大小写均可。更多细节请参考Initializer的值。默认值："normal"。
+    - **weight_init** (Union[Tensor, str, Initializer, numbers.Number]) - 权重参数的初始化方法。它可以是Tensor，str，Initializer或numbers.Number。当使用str时，可选"TruncatedNormal"，"Normal"，"Uniform"，"HeUniform"和"XavierUniform"分布以及常量"One"和"Zero"分布的值，可接受别名"xavier_uniform"，"he_uniform"，"ones"和"zeros"。上述字符串大小写均可。更多细节请参考Initializer的值。默认值："normal"。
     - **bias_init** (Union[Tensor, str, Initializer, numbers.Number]) - 偏置参数的初始化方法。可以使用的初始化方法与"weight_init"相同。更多细节请参考Initializer的值。默认值："zeros"。
     - **data_format** (str) - 数据格式的可选值。目前仅支持"NCDHW"。
 
@@ -47,23 +47,35 @@ mindspore.nn.Conv3d
     pad_mode为"same"时：
 
     .. math::
-       D_{out} ＝ \left \lfloor{\frac{D_{in}}{\text{stride[0]}} + 1} \right \rfloor
-       H_{out} ＝ \left \lfloor{\frac{H_{in}}{\text{stride[1]}} + 1} \right \rfloor
-       W_{out} ＝ \left \lfloor{\frac{W_{in}}{\text{stride[2]}} + 1} \right \rfloor
+        \begin{array}{ll} \\
+            D_{out} ＝ \left \lfloor{\frac{D_{in}}{\text{stride[0]}} + 1} \right \rfloor \\
+            H_{out} ＝ \left \lfloor{\frac{H_{in}}{\text{stride[1]}} + 1} \right \rfloor \\
+            W_{out} ＝ \left \lfloor{\frac{W_{in}}{\text{stride[2]}} + 1} \right \rfloor \\
+        \end{array}
 
     pad_mode为"valid"时：
 
     .. math::
-       D_{out} ＝ \left \lfloor{\frac{D_{in} - \text{dilation[0]} \times (\text{kernel_size[0]} - 1) }{\text{stride[0]}} + 1} \right \rfloor
-       H_{out} ＝ \left \lfloor{\frac{H_{in} - \text{dilation[1]} \times (\text{kernel_size[1]} - 1) }{\text{stride[1]}} + 1} \right \rfloor
-       W_{out} ＝ \left \lfloor{\frac{W_{in} - \text{dilation[2]} \times (\text{kernel_size[2]} - 1) }{\text{stride[2]}} + 1} \right \rfloor
+        \begin{array}{ll} \\
+            D_{out} ＝ \left \lfloor{\frac{D_{in} - \text{dilation[0]} \times (\text{kernel_size[0]} - 1) }
+            {\text{stride[0]}} + 1} \right \rfloor \\
+            H_{out} ＝ \left \lfloor{\frac{H_{in} - \text{dilation[1]} \times (\text{kernel_size[1]} - 1) }
+            {\text{stride[1]}} + 1} \right \rfloor \\
+            W_{out} ＝ \left \lfloor{\frac{W_{in} - \text{dilation[2]} \times (\text{kernel_size[2]} - 1) }
+            {\text{stride[2]}} + 1} \right \rfloor \\
+        \end{array}
 
     pad_mode为"pad"时：
 
     .. math::
-       D_{out} ＝ \left \lfloor{\frac{D_{in} + padding[0] + padding[1] - (\text{dilation[0]} - 1) \times \text{kernel_size[0]} - 1 }{\text{stride[0]}} + 1} \right \rfloor
-       H_{out} ＝ \left \lfloor{\frac{H_{in} + padding[2] + padding[3] - (\text{dilation[1]} - 1) \times \text{kernel_size[1]} - 1 }{\text{stride[1]}} + 1} \right \rfloor
-       W_{out} ＝ \left \lfloor{\frac{W_{in} + padding[4] + padding[5] - (\text{dilation[2]} - 1) \times \text{kernel_size[2]} - 1 }{\text{stride[2]}} + 1} \right \rfloor
+        \begin{array}{ll} \\
+            D_{out} ＝ \left \lfloor{\frac{D_{in} + padding[0] + padding[1] - (\text{dilation[0]} - 1) \times
+            \text{kernel_size[0]} - 1 }{\text{stride[0]}} + 1} \right \rfloor \\
+            H_{out} ＝ \left \lfloor{\frac{H_{in} + padding[2] + padding[3] - (\text{dilation[1]} - 1) \times
+            \text{kernel_size[1]} - 1 }{\text{stride[1]}} + 1} \right \rfloor \\
+            W_{out} ＝ \left \lfloor{\frac{W_{in} + padding[4] + padding[5] - (\text{dilation[2]} - 1) \times
+            \text{kernel_size[2]} - 1 }{\text{stride[2]}} + 1} \right \rfloor \\
+        \end{array}
 
     **异常：**
 
diff --git a/docs/api/api_python/nn/mindspore.nn.Conv3dTranspose.rst b/docs/api/api_python/nn/mindspore.nn.Conv3dTranspose.rst
index 52b92a20f18..31bf4eaba62 100644
--- a/docs/api/api_python/nn/mindspore.nn.Conv3dTranspose.rst
+++ b/docs/api/api_python/nn/mindspore.nn.Conv3dTranspose.rst
@@ -8,7 +8,7 @@ mindspore.nn.Conv3dTranspose
     计算三维转置卷积，可以视为Conv3d对输入求梯度，也称为反卷积（实际不是真正的反卷积）。
 
     输入的shape通常为 :math:`(N, C_{in}, D_{in}, H_{in}, W_{in})` ，其中 :math:`N` 为batch size， :math:`C` 是空间维度。:math:`D_{in}, H_{in}, W_{in}` 分别为特征层的深度、高度和宽度。
-    当Conv3d和ConvTranspose3d使用相同的参数初始化时，且 `pad_mode` 设置为"pad"，它们会在输入的前后、垂直和左右方向上填充 :math:`dilation * (kernel\_size - 1) - padding` 个零，这种情况下它们的输入和输出shape是互逆的。
+    当Conv3d和ConvTranspose3d使用相同的参数初始化时，且 `pad_mode` 设置为"pad"，它们会在输入的深度、高度和宽度方向上填充 :math:`dilation * (kernel\_size - 1) - padding` 个零，这种情况下它们的输入和输出shape是互逆的。
     然而，当 `stride` 大于1时，Conv3d会将多个输入的shape映射到同一个输出shape。反卷积网络可以参考 `Deconvolutional Networks <https://www.matthewzeiler.com/mattzeiler/deconvolutionalnetworks.pdf>`_ 。
 
     **参数：**
@@ -16,18 +16,18 @@ mindspore.nn.Conv3dTranspose
     - **in_channels** (int) - Conv3dTranspose层输入Tensor的空间维度。
     - **out_channels** (int) - Conv3dTranspose层输出Tensor的空间维度。
     - **kernel_size** (Union[int, tuple[int]]) - 指定三维卷积核的深度、高度和宽度。数据类型为int或包含三个整数的tuple。一个整数表示卷积核的深度、高度和宽度均为该值该值。包含三个整数的tuple分别表示卷积核的深度、高度和宽度。
-    - **stride** (Union[int, tuple[int]]) - 卷积核的移动步长。数据类型为整型或三个整型的tuple。一个整数表示在深度、高度和宽度方向的移动步长均为该值。三个整数的tuple分别表示在深度、高度和宽度方向的移动步长。默认值：1。
+    - **stride** (Union[int, tuple[int]]) - 三维卷积核的移动步长。数据类型为整型或三个整型的tuple。一个整数表示在深度、高度和宽度方向的移动步长均为该值。三个整数的tuple分别表示在深度、高度和宽度方向的移动步长。默认值：1。
     - **pad_mode** (str) - 指定填充模式。可选值为"same"、"valid"、"pad"。默认值："same"。
 
       - same：输出的深度、高度和宽度分别与输入整除 `stride` 后的值相同。若设置该模式，`padding` 的值必须为0。
       - valid：在不填充的前提下返回有效计算所得的输出。不满足计算的多余像素会被丢弃。如果设置此模式，则 `padding` 的值必须为0。
-      - pad：对输入进行填充。 在输入的前后、垂直和左右方向上填充 `padding` 大小的0。如果设置此模式， `padding` 必须大于或等于0。
+      - pad：对输入进行填充。 在输入的深度、高度和宽度方向上填充 `padding` 大小的0。如果设置此模式， `padding` 必须大于或等于0。
 
-    - **padding** (Union(int, tuple[int])) - 输入的前后、垂直和左右方向上填充的数量。数据类型为int或包含6个整数的tuple。如果 `padding` 是一个整数，则前部、后部、顶部，底部，左边和右边的填充都等于 `padding` 。如果 `padding` 是6个整数的tuple，则前部、尾部、顶部、底部、左边和右边的填充分别等于填充padding[0]、padding[1]、padding[2]、padding[3]、padding[4]和padding[5]。值应该要大于等于0，默认值：0。
-    - **dilation** (Union[int, tuple[int]]) - 卷积核膨胀尺寸。数据类型为int或三个整数的tuple。若 :math:`k > 1` ，则kernel间隔 `k` 个元素进行采样。前后、垂直和左右方向上的 ｀k｀ ，其取值范围分别为[1, D]、[1, H]和[1, W]。默认值：1。
-    - **group** (int) - 将过滤器拆分为组， `in_ channels` 和 `out_channels` 必须可被 `group` 整除。当 `group` 大于1时，暂不支持Ascend平台。默认值：1。当前仅支持1个。
+    - **padding** (Union(int, tuple[int])) - 输入的深度、高度和宽度方向上填充的数量。数据类型为int或包含6个整数的tuple。如果 `padding` 是一个整数，则前部、后部、顶部，底部，左边和右边的填充都等于 `padding` 。如果 `padding` 是6个整数的tuple，则前部、尾部、顶部、底部、左边和右边的填充分别等于填充padding[0]、padding[1]、padding[2]、padding[3]、padding[4]和padding[5]。值应该要大于等于0，默认值：0。
+    - **dilation** (Union[int, tuple[int]]) - 三维卷积核膨胀尺寸。数据类型为int或三个整数的tuple。若 :math:`k > 1` ，则kernel间隔 `k` 个元素进行采样。深度、高度和宽度方向上的 ｀k｀ ，其取值范围分别为[1, D]、[1, H]和[1, W]。默认值：1。
+    - **group** (int) - 将过滤器拆分为组， `in_channels` 和 `out_channels` 必须可被 `group` 整除。当 `group` 大于1时，暂不支持Ascend平台。默认值：1。当前仅支持1。
     - **has_bias** (bool) - Conv3dTranspose层是否添加偏置参数。默认值：False。
-    - **weight_init** (Union[Tensor, str, Initializer, numbers.Number]) - 权重矩阵的初始化方法。它可以是Tensor，str，Initializer或numbers.Number。当使用str时，可选"TruncatedNormal"，"Normal"，"Uniform"，"HeUniform"和"XavierUniform"分布以及常量"One"和"Zero"分布的值，可接受别名"xavier_uniform"，"he_uniform"，"ones"和"zeros"。上述字符串大小写均可。更多细节请参考Initializer的值。默认值："normal"。
+    - **weight_init** (Union[Tensor, str, Initializer, numbers.Number]) - 权重参数的初始化方法。它可以是Tensor，str，Initializer或numbers.Number。当使用str时，可选"TruncatedNormal"，"Normal"，"Uniform"，"HeUniform"和"XavierUniform"分布以及常量"One"和"Zero"分布的值，可接受别名"xavier_uniform"，"he_uniform"，"ones"和"zeros"。上述字符串大小写均可。更多细节请参考Initializer的值。默认值："normal"。
     - **bias_init** (Union[Tensor, str, Initializer, numbers.Number]) - 偏置参数的初始化方法。可以使用的初始化方法与"weight_init"相同。更多细节请参考Initializer的值。默认值："zeros"。
     - **data_format** (str) - 数据格式的可选值。目前仅支持"NCDHW"。
 
@@ -42,23 +42,35 @@ mindspore.nn.Conv3dTranspose
     pad_mode为"same"时：
 
     .. math::
-       D_{out} ＝ \left \lfloor{\frac{D_{in}}{\text{stride[0]}} + 1} \right \rfloor
-       H_{out} ＝ \left \lfloor{\frac{H_{in}}{\text{stride[1]}} + 1} \right \rfloor
-       W_{out} ＝ \left \lfloor{\frac{W_{in}}{\text{stride[2]}} + 1} \right \rfloor
+        \begin{array}{ll} \\
+            D_{out} ＝ \left \lfloor{\frac{D_{in}}{\text{stride[0]}} + 1} \right \rfloor \\
+            H_{out} ＝ \left \lfloor{\frac{H_{in}}{\text{stride[1]}} + 1} \right \rfloor \\
+            W_{out} ＝ \left \lfloor{\frac{W_{in}}{\text{stride[2]}} + 1} \right \rfloor \\
+        \end{array}
 
     pad_mode为"valid"时：
 
     .. math::
-       D_{out} ＝ \left \lfloor{\frac{D_{in} - \text{dilation[0]} \times (\text{kernel_size[0]} - 1) }{\text{stride[0]}} + 1} \right \rfloor
-       H_{out} ＝ \left \lfloor{\frac{H_{in} - \text{dilation[1]} \times (\text{kernel_size[1]} - 1) }{\text{stride[1]}} + 1} \right \rfloor
-       W_{out} ＝ \left \lfloor{\frac{W_{in} - \text{dilation[2]} \times (\text{kernel_size[2]} - 1) }{\text{stride[2]}} + 1} \right \rfloor
+        \begin{array}{ll} \\
+            D_{out} ＝ \left \lfloor{\frac{D_{in} - \text{dilation[0]} \times (\text{kernel_size[0]} - 1) }
+            {\text{stride[0]}} + 1} \right \rfloor \\
+            H_{out} ＝ \left \lfloor{\frac{H_{in} - \text{dilation[1]} \times (\text{kernel_size[1]} - 1) }
+            {\text{stride[1]}} + 1} \right \rfloor \\
+            W_{out} ＝ \left \lfloor{\frac{W_{in} - \text{dilation[2]} \times (\text{kernel_size[2]} - 1) }
+            {\text{stride[2]}} + 1} \right \rfloor \\
+        \end{array}
 
     pad_mode为"pad"时：
 
     .. math::
-       D_{out} ＝ \left \lfloor{\frac{D_{in} + padding[0] + padding[1] - (\text{dilation[0]} - 1) \times \text{kernel_size[0]} - 1 }{\text{stride[0]}} + 1} \right \rfloor
-       H_{out} ＝ \left \lfloor{\frac{H_{in} + padding[2] + padding[3] - (\text{dilation[1]} - 1) \times \text{kernel_size[1]} - 1 }{\text{stride[1]}} + 1} \right \rfloor
-       W_{out} ＝ \left \lfloor{\frac{W_{in} + padding[4] + padding[5] - (\text{dilation[2]} - 1) \times \text{kernel_size[2]} - 1 }{\text{stride[2]}} + 1} \right \rfloor
+        \begin{array}{ll} \\
+            D_{out} ＝ \left \lfloor{\frac{D_{in} + padding[0] + padding[1] - (\text{dilation[0]} - 1) \times
+            \text{kernel_size[0]} - 1 }{\text{stride[0]}} + 1} \right \rfloor \\
+            H_{out} ＝ \left \lfloor{\frac{H_{in} + padding[2] + padding[3] - (\text{dilation[1]} - 1) \times
+            \text{kernel_size[1]} - 1 }{\text{stride[1]}} + 1} \right \rfloor \\
+            W_{out} ＝ \left \lfloor{\frac{W_{in} + padding[4] + padding[5] - (\text{dilation[2]} - 1) \times
+            \text{kernel_size[2]} - 1 }{\text{stride[2]}} + 1} \right \rfloor \\
+        \end{array}
 
     **异常：**
 
diff --git a/docs/api/api_python/nn/mindspore.nn.conv2d.rst b/docs/api/api_python/nn/mindspore.nn.conv2d.rst
index e6b6bc50466..77523de0da5 100644
--- a/docs/api/api_python/nn/mindspore.nn.conv2d.rst
+++ b/docs/api/api_python/nn/mindspore.nn.conv2d.rst
@@ -12,7 +12,7 @@ mindspore.nn.Conv2d
         \sum_{k = 0}^{C_{in} - 1} \text{ccor}({\text{weight}(C_{\text{out}_j}, k), \text{X}(N_i, k)})
 
    其中，:math:`ccor` 为 `cross-correlation <https://en.wikipedia.org/wiki/Cross-correlation>`_ ， :math:`C_{in}` 为输入空间维度， :math:`out_{j}` 对应输出的第 :math:`j` 个空间维度，:math:`j` 的范围在 :math:`[0，C_{out}-1]` 内，
-   :math:`\text{weight}(C_{\text{out}_j}, k)`是shape为 :math:`(\text{kernel_size[0]}, \text{kernel_size[1]})` 的卷积核切片，其中 :math:`\text{kernel_size[0]}` 和 :math:`\text{kernel_size[1]}` 是卷积核的高度和宽度。 :math:`\text{bias}` 为偏置参数。
+   :math:`\text{weight}(C_{\text{out}_j}, k)`是shape为 :math:`(\text{kernel_size[0]}, \text{kernel_size[1]})` 的卷积核切片，其中 :math:`\text{kernel_size[0]}` 和 :math:`\text{kernel_size[1]}` 分别是卷积核的高度和宽度。 :math:`\text{bias}` 为偏置参数， :math:`\text{X}` 为输入Tensor。
    完整卷积核的shape为 :math:`(C_{out}, C_{in} / \text{group}, \text{kernel_size[0]}, \text{kernel_size[1]})` ，其中 `group` 是在空间维度上分割输入 `x` 的组数。
    详细介绍请参考论文 `Gradient Based Learning Applied to Document Recognition <http://vision.stanford.edu/cs598_spring07/papers/Lecun98.pdf>`_ 。
 
@@ -21,18 +21,18 @@ mindspore.nn.Conv2d
      - **in_channels** (`int`) – Conv2d层输入Tensor的空间维度。
      - **out_channels** (`dict`) - Conv2d层输出Tensor的空间维度。
      - **kernel_size** (`Union[int, tuple[int]]`) – 指定二维卷积核的高度和宽度。数据类型为整型或两个整型的tuple。一个整数表示卷积核的高度和宽度均为该值。两个整数的tuple分别表示卷积核的高度和宽度。
-     - **stride** (`Union[int, tuple[int]]`) – 卷积核的移动步长。数据类型为整型或两个整型的tuple。一个整数表示在高度和宽度方向的移动步长均为该值。两个整数的tuple分别表示在高度和宽度方向的移动步长。默认值：1。
+     - **stride** (`Union[int, tuple[int]]`) – 二维卷积核的移动步长。数据类型为整型或两个整型的tuple。一个整数表示在高度和宽度方向的移动步长均为该值。两个整数的tuple分别表示在高度和宽度方向的移动步长。默认值：1。
      - **pad_mode** (`str`) – 指定填充模式。可选值为"same"、"valid"、"pad"。默认值："same"。
 
        - **same**：输出的高度和宽度分别与输入整除 `stride` 后的值相同。若设置该模式，`padding` 的值必须为0。
        - **valid**：在不填充的前提下返回有效计算所得的输出。不满足计算的多余像素会被丢弃。如果设置此模式，则 `padding` 的值必须为0。
-       - **pad**：对输入进行填充。 在输入的垂直和水平方向上填充 `padding` 大小的0。如果设置此模式， `padding` 必须大于或等于0。
+       - **pad**：对输入进行填充。在输入的高度和宽度方向上填充 `padding` 大小的0。如果设置此模式， `padding` 必须大于或等于0。
 
-     - **padding** (`Union[int, tuple[int]]`) –  输入的垂直和水平方向上填充的数量。数据类型为int或包含4个整数的tuple。如果 `padding` 是一个整数，那么上、下、左、右的填充都等于 `padding` 。如果 `padding` 是一个有4个整数的tuple，那么上、下、左、右的填充分别等于 `padding[0]` 、 `padding[1]` 、 `padding[2]` 和 `padding[3]` 。值应该要大于等于0，默认值：0。
-     - **dilation** (`Union[int, tuple[int]]`) –  卷积核膨胀尺寸。数据类型为整型或具有两个整型的tuple。若 :math:`k > 1` ，则kernel间隔 `k` 个元素进行采样。垂直和水平方向上的 ｀k｀ ，其取值范围分别为[1, H]和[1, W]。默认值：1。
-     - **group** (`int`) –  将过滤器拆分为组， `in_ channels` 和 `out_channels` 必须可被 `group` 整除。如果组数等于 `in_channels` 和 `out_channels` ，这个二维卷积层也被称为二维深度卷积层。默认值：1.
+     - **padding** (`Union[int, tuple[int]]`) –  输入的高度和宽度方向上填充的数量。数据类型为int或包含4个整数的tuple。如果 `padding` 是一个整数，那么上、下、左、右的填充都等于 `padding` 。如果 `padding` 是一个有4个整数的tuple，那么上、下、左、右的填充分别等于 `padding[0]` 、 `padding[1]` 、 `padding[2]` 和 `padding[3]` 。值应该要大于等于0，默认值：0。
+     - **dilation** (`Union[int, tuple[int]]`) –  二维卷积核膨胀尺寸。数据类型为整型或具有两个整型的tuple。若 :math:`k > 1` ，则kernel间隔 `k` 个元素进行采样。垂直和水平方向上的 ｀k｀ ，其取值范围分别为[1, H]和[1, W]。默认值：1。
+     - **group** (`int`) –  将过滤器拆分为组， `in_channels` 和 `out_channels` 必须可被 `group` 整除。如果组数等于 `in_channels` 和 `out_channels` ，这个二维卷积层也被称为二维深度卷积层。默认值：1.
      - **has_bias** (`bool`) –  Conv2d层是否添加偏置参数。默认值：False。
-     - **weight_init** (Union[Tensor, str, Initializer, numbers.Number]) - 权重矩阵的初始化方法。它可以是Tensor，str，Initializer或numbers.Number。当使用str时，可选"TruncatedNormal"，"Normal"，"Uniform"，"HeUniform"和"XavierUniform"分布以及常量"One"和"Zero"分布的值，可接受别名"xavier_uniform"，"he_uniform"，"ones"和"zeros"。上述字符串大小写均可。更多细节请参考Initializer的值。默认值："normal"。
+     - **weight_init** (Union[Tensor, str, Initializer, numbers.Number]) - 权重参数的初始化方法。它可以是Tensor，str，Initializer或numbers.Number。当使用str时，可选"TruncatedNormal"，"Normal"，"Uniform"，"HeUniform"和"XavierUniform"分布以及常量"One"和"Zero"分布的值，可接受别名"xavier_uniform"，"he_uniform"，"ones"和"zeros"。上述字符串大小写均可。更多细节请参考Initializer的值。默认值："normal"。
      - **bias_init** (Union[Tensor, str, Initializer, numbers.Number]) - 偏置参数的初始化方法。可以使用的初始化方法与"weight_init"相同。更多细节请参考Initializer的值。默认值："zeros"。
      - **data_format** (`str`) –  数据格式的可选值有"NHWC"，"NCHW"。默认值："NCHW"。
 
@@ -47,20 +47,30 @@ mindspore.nn.Conv2d
    pad_mode为"same"时：
 
    .. math::
-       H_{out} ＝ \left \lfloor{\frac{H_{in}}{\text{stride[0]}} + 1} \right \rfloor
-       W_{out} ＝ \left \lfloor{\frac{W_{in}}{\text{stride[1]}} + 1} \right \rfloor
+        \begin{array}{ll} \\
+            H_{out} ＝ \left \lfloor{\frac{H_{in}}{\text{stride[0]}} + 1} \right \rfloor \\
+            W_{out} ＝ \left \lfloor{\frac{W_{in}}{\text{stride[1]}} + 1} \right \rfloor \\
+        \end{array}
 
    pad_mode为"valid"时：
 
    .. math::
-       H_{out} ＝ \left \lfloor{\frac{H_{in} - \text{dilation[0]} \times (\text{kernel_size[0]} - 1) }{\text{stride[0]}} + 1} \right \rfloor
-       W_{out} ＝ \left \lfloor{\frac{W_{in} - \text{dilation[1]} \times (\text{kernel_size[1]} - 1) }{\text{stride[1]}} + 1} \right \rfloor
+        \begin{array}{ll} \\
+            H_{out} ＝ \left \lfloor{\frac{H_{in} - \text{dilation[0]} \times (\text{kernel_size[0]} - 1) }
+            {\text{stride[0]}} + 1} \right \rfloor \\
+            W_{out} ＝ \left \lfloor{\frac{W_{in} - \text{dilation[1]} \times (\text{kernel_size[1]} - 1) }
+            {\text{stride[1]}} + 1} \right \rfloor \\
+        \end{array}
 
    pad_mode为"pad"时：
 
    .. math::
-       H_{out} ＝ \left \lfloor{\frac{H_{in} + padding[0] + padding[1] - (\text{dilation[0]} - 1) \times \text{kernel_size[0]} - 1 }{\text{stride[0]}} + 1} \right \rfloor
-       W_{out} ＝ \left \lfloor{\frac{W_{in} + padding[2] + padding[3] - (\text{dilation[1]} - 1) \times \text{kernel_size[1]} - 1 }{\text{stride[1]}} + 1} \right \rfloor
+        \begin{array}{ll} \\
+            H_{out} ＝ \left \lfloor{\frac{H_{in} + padding[0] + padding[1] - (\text{dilation[0]} - 1) \times
+            \text{kernel_size[0]} - 1 }{\text{stride[0]}} + 1} \right \rfloor \\
+            W_{out} ＝ \left \lfloor{\frac{W_{in} + padding[2] + padding[3] - (\text{dilation[1]} - 1) \times
+            \text{kernel_size[1]} - 1 }{\text{stride[1]}} + 1} \right \rfloor \\
+        \end{array}
 
    **异常：**
 
diff --git a/docs/api/api_python/nn/mindspore.nn.conv2dtranspose.rst b/docs/api/api_python/nn/mindspore.nn.conv2dtranspose.rst
index d176f949f85..5a268df9c4e 100644
--- a/docs/api/api_python/nn/mindspore.nn.conv2dtranspose.rst
+++ b/docs/api/api_python/nn/mindspore.nn.conv2dtranspose.rst
@@ -8,7 +8,7 @@ mindspore.nn.Conv2dTranspose
    计算二维转置卷积，可以视为Conv2d对输入求梯度，也称为反卷积（实际不是真正的反卷积）。
 
    输入的shape通常为 :math:`(N, C, H, W)` ，其中 :math:`N` 是batch size，:math:`C` 是空间维度， :math:`H_{in}, W_{in}` 分别为特征层的高度和宽度。
-   当Conv2d和ConvTranspose2d使用相同的参数初始化时，且 `pad_mode` 设置为"pad"，它们会在输入的垂直和水平方向上填充 :math:`dilation * (kernel\_size - 1) - padding` 个零，这种情况下它们的输入和输出shape是互逆的。
+   当Conv2d和ConvTranspose2d使用相同的参数初始化时，且 `pad_mode` 设置为"pad"，它们会在输入的高度和宽度方向上填充 :math:`dilation * (kernel\_size - 1) - padding` 个零，这种情况下它们的输入和输出shape是互逆的。
    然而，当 `stride` 大于1时，Conv2d会将多个输入的shape映射到同一个输出shape。反卷积网络可以参考 `Deconvolutional Networks <https://www.matthewzeiler.com/mattzeiler/deconvolutionalnetworks.pdf>`_ 。
 
    **参数：**
@@ -16,18 +16,18 @@ mindspore.nn.Conv2dTranspose
    - **in_channels** (`int`) – Conv2dTranspose层输入Tensor的空间维度。
    - **out_channels** (`dict`) - Conv2dTranspose层输出Tensor的空间维度。
    - **kernel_size** (`Union[int, tuple[int]]`) – 指定二维卷积核的高度和宽度。数据类型为整型或两个整型的tuple。一个整数表示卷积核的高度和宽度均为该值。两个整数的tuple分别表示卷积核的高度和宽度。
-   - **stride** (`Union[int, tuple[int]]`) – 卷积核的移动步长。数据类型为整型或两个整型的tuple。一个整数表示在高度和宽度方向的移动步长均为该值。两个整数的tuple分别表示在高度和宽度方向的移动步长。默认值：1。
+   - **stride** (`Union[int, tuple[int]]`) – 二维卷积核的移动步长。数据类型为整型或两个整型的tuple。一个整数表示在高度和宽度方向的移动步长均为该值。两个整数的tuple分别表示在高度和宽度方向的移动步长。默认值：1。
    - **pad_mode** (`str`) – 指定填充模式。可选值为"same"、"valid"、"pad"。默认值："same"。
 
        - **same**：输出的高度和宽度分别与输入整除 `stride` 后的值相同。若设置该模式，`padding` 的值必须为0。
        - **valid**：在不填充的前提下返回有效计算所得的输出。不满足计算的多余像素会被丢弃。如果设置此模式，则 `padding` 的值必须为0。
-       - **pad**：对输入进行填充。 在输入的垂直和水平方向上填充 `padding` 大小的0。如果设置此模式， `padding` 必须大于或等于0。
+       - **pad**：对输入进行填充。在输入的高度和宽度方向上填充 `padding` 大小的0。如果设置此模式， `padding` 必须大于或等于0。
 
-   - **padding** (`Union[int, tuple[int]]`) –  输入的垂直和水平方向上填充的数量。数据类型为int或包含4个整数的tuple。如果 `padding` 是一个整数，那么上、下、左、右的填充都等于 `padding` 。如果 `padding` 是一个有4个整数的tuple，那么上、下、左、右的填充分别等于 `padding[0]` 、 `padding[1]` 、 `padding[2]` 和 `padding[3]` 。值应该要大于等于0，默认值：0。
-   - **dilation** (`Union[int, tuple[int]]`) –  卷积核膨胀尺寸。数据类型为整型或具有两个整型的tuple。若 :math:`k > 1` ，则kernel间隔 `k` 个元素进行采样。垂直和水平方向上的 ｀k｀ ，其取值范围分别为[1, H]和[1, W]。默认值：1。
-   - **group** (`int`) –  将过滤器拆分为组， `in_ channels` 和 `out_channels` 必须可被 `group` 整除。如果组数等于 `in_channels` 和 `out_channels` ，这个二维卷积层也被称为二维深度卷积层。默认值：1.
+   - **padding** (`Union[int, tuple[int]]`) –  输入的高度和宽度方向上填充的数量。数据类型为整型或包含四个整数的tuple。如果 `padding` 是一个整数，那么上、下、左、右的填充都等于 `padding` 。如果 `padding` 是一个有四个整数的tuple，那么上、下、左、右的填充分别等于 `padding[0]` 、 `padding[1]` 、 `padding[2]` 和 `padding[3]` 。值应该要大于等于0，默认值：0。
+   - **dilation** (`Union[int, tuple[int]]`) –  二维卷积核膨胀尺寸。数据类型为整型或具有两个整型的tuple。若 :math:`k > 1` ，则kernel间隔 `k` 个元素进行采样。高度和宽度方向上的 ｀k｀ ，其取值范围分别为[1, H]和[1, W]。默认值：1。
+   - **group** (`int`) –  将过滤器拆分为组， `in_channels` 和 `out_channels` 必须可被 `group` 整除。如果组数等于 `in_channels` 和 `out_channels` ，这个二维卷积层也被称为二维深度卷积层。默认值：1.
    - **has_bias** (`bool`) –  Conv2dTranspose层是否添加偏置参数。默认值：False。
-   - **weight_init** (Union[Tensor, str, Initializer, numbers.Number]) - 权重矩阵的初始化方法。它可以是Tensor，str，Initializer或numbers.Number。当使用str时，可选"TruncatedNormal"，"Normal"，"Uniform"，"HeUniform"和"XavierUniform"分布以及常量"One"和"Zero"分布的值，可接受别名"xavier_uniform"，"he_uniform"，"ones"和"zeros"。上述字符串大小写均可。更多细节请参考Initializer的值。默认值："normal"。
+   - **weight_init** (Union[Tensor, str, Initializer, numbers.Number]) - 权重参数的初始化方法。它可以是Tensor，str，Initializer或numbers.Number。当使用str时，可选"TruncatedNormal"，"Normal"，"Uniform"，"HeUniform"和"XavierUniform"分布以及常量"One"和"Zero"分布的值，可接受别名"xavier_uniform"，"he_uniform"，"ones"和"zeros"。上述字符串大小写均可。更多细节请参考Initializer的值。默认值："normal"。
    - **bias_init** (Union[Tensor, str, Initializer, numbers.Number]) - 偏置参数的初始化方法。可以使用的初始化方法与"weight_init"相同。更多细节请参考Initializer的值。默认值："zeros"。
 
    **输入：**
@@ -41,20 +41,30 @@ mindspore.nn.Conv2dTranspose
    pad_mode为"same"时：
 
    .. math::
-       H_{out} ＝ \left \lfloor{\frac{H_{in}}{\text{stride[0]}} + 1} \right \rfloor
-       W_{out} ＝ \left \lfloor{\frac{W_{in}}{\text{stride[1]}} + 1} \right \rfloor
+        \begin{array}{ll} \\
+            H_{out} ＝ \left \lfloor{\frac{H_{in}}{\text{stride[0]}} + 1} \right \rfloor \\
+            W_{out} ＝ \left \lfloor{\frac{W_{in}}{\text{stride[1]}} + 1} \right \rfloor \\
+        \end{array}
 
    pad_mode为"valid"时：
 
    .. math::
-       H_{out} ＝ \left \lfloor{\frac{H_{in} - \text{dilation[0]} \times (\text{kernel_size[0]} - 1) }{\text{stride[0]}} + 1} \right \rfloor
-       W_{out} ＝ \left \lfloor{\frac{W_{in} - \text{dilation[1]} \times (\text{kernel_size[1]} - 1) }{\text{stride[1]}} + 1} \right \rfloor
+        \begin{array}{ll} \\
+            H_{out} ＝ \left \lfloor{\frac{H_{in} - \text{dilation[0]} \times (\text{kernel_size[0]} - 1) }
+            {\text{stride[0]}} + 1} \right \rfloor \\
+            W_{out} ＝ \left \lfloor{\frac{W_{in} - \text{dilation[1]} \times (\text{kernel_size[1]} - 1) }
+            {\text{stride[1]}} + 1} \right \rfloor \\
+        \end{array}
 
    pad_mode为"pad"时：
 
    .. math::
-       H_{out} ＝ \left \lfloor{\frac{H_{in} + padding[0] + padding[1] - (\text{dilation[0]} - 1) \times \text{kernel_size[0]} - 1 }{\text{stride[0]}} + 1} \right \rfloor
-       W_{out} ＝ \left \lfloor{\frac{W_{in} + padding[2] + padding[3] - (\text{dilation[1]} - 1) \times \text{kernel_size[1]} - 1 }{\text{stride[1]}} + 1} \right \rfloor
+        \begin{array}{ll} \\
+            H_{out} ＝ \left \lfloor{\frac{H_{in} + padding[0] + padding[1] - (\text{dilation[0]} - 1) \times
+            \text{kernel_size[0]} - 1 }{\text{stride[0]}} + 1} \right \rfloor \\
+            W_{out} ＝ \left \lfloor{\frac{W_{in} + padding[2] + padding[3] - (\text{dilation[1]} - 1) \times
+            \text{kernel_size[1]} - 1 }{\text{stride[1]}} + 1} \right \rfloor \\
+        \end{array}
 
    **异常：**
 
diff --git a/docs/api/api_python/ops/mindspore.ops.Gather.rst b/docs/api/api_python/ops/mindspore.ops.Gather.rst
index 72894c1a6b9..33358ca3a99 100644
--- a/docs/api/api_python/ops/mindspore.ops.Gather.rst
+++ b/docs/api/api_python/ops/mindspore.ops.Gather.rst
@@ -5,15 +5,21 @@ mindspore.ops.Gather
 
     返回输入Tensor在指定 `axis` 上 `input_indices` 索引对应的元素组成的切片。
 
+    下图展示了Gather常用的计算过程：
+
+    .. image:: api_img/Gather.png
+
+    其中，params代表输入`input_params`，indices代表要切片的索引`input_indices`。
+
+    .. note::
+        input_indices的值必须在 `[0, input_param.shape[axis])` 范围内，超出该范围结果未定义。
+
     **输入：**
 
     - **input_params** (Tensor) - 原始Tensor，shape为 :math:`(x_1, x_2, ..., x_R)` 。
     - **input_indices** (Tensor) - 要切片的索引Tensor，shape为 :math:`(y_1, y_2, ..., y_S)` 。指定原始Tensor中要切片的索引。数据类型必须是int32或int64。
     - **axis** (int) - 指定要切片的维度索引。
 
-    .. note::
-        input_indices的值必须在 `[0, input_param.shape[axis])` 范围内，超出该范围则报错。
-
     **输出：**
 
     Tensor，shape为 :math:`input\_params.shape[:axis] + input\_indices.shape + input\_params.shape[axis + 1:]` 。
@@ -21,4 +27,5 @@ mindspore.ops.Gather
     **异常：**
 
     - **TypeError** - `axis` 不是int。
+    - **TypeError** - `input_params` 不是Tensor。
     - **TypeError** - `input_indices` 不是int类型的Tensor。
diff --git a/docs/api_img/Gather.png b/docs/api_img/Gather.png
new file mode 100644
index 00000000000..1d45e8271fd
Binary files /dev/null and b/docs/api_img/Gather.png differ
diff --git a/mindspore/python/mindspore/nn/layer/combined.py b/mindspore/python/mindspore/nn/layer/combined.py
index a094900e2fd..19d8f30319b 100644
--- a/mindspore/python/mindspore/nn/layer/combined.py
+++ b/mindspore/python/mindspore/nn/layer/combined.py
@@ -48,7 +48,7 @@ class Conv2dBnAct(Cell):
         dilation (int): Specifies the dilation rate to use for dilated convolution. If set to be :math:`k > 1`,
             there will be :math:`k - 1` pixels skipped for each sampling location. Its value must be greater than
             or equal to 1 and lower than any one of the height and width of the `x`. Default: 1.
-        group (int): Splits filter into groups, `in_ channels` and `out_channels` must be
+        group (int): Splits filter into groups, `in_channels` and `out_channels` must be
             divisible by the number of groups. Default: 1.
         has_bias (bool): Specifies whether the layer uses a bias vector. Default: False.
         weight_init (Union[Tensor, str, Initializer, numbers.Number]): Initializer for the convolution kernel.
diff --git a/mindspore/python/mindspore/nn/layer/conv.py b/mindspore/python/mindspore/nn/layer/conv.py
index 8bf97a067bf..994d00ee0d8 100644
--- a/mindspore/python/mindspore/nn/layer/conv.py
+++ b/mindspore/python/mindspore/nn/layer/conv.py
@@ -114,79 +114,74 @@ class Conv2d(_Conv):
     r"""
     2D convolution layer.
 
-    Applies a 2D convolution over an input tensor which is typically of shape :math:`(N, C_{in}, H_{in}, W_{in})`,
-    where :math:`N` is batch size, :math:`C_{in}` is channel number, and :math:`H_{in}, W_{in}` are height and width.
-    For each batch of shape :math:`(C_{in}, H_{in}, W_{in})`, the formula is defined as:
+    Calculates the 2D convolution on the input tensor which is typically of shape :math:`(N, C_{in}, H_{in}, W_{in})`,
+    where :math:`N` is batch size, :math:`C_{in}` is a number of channels,
+    :math:`H_{in}, W_{in}` are the height and width of the feature layer respectively.
+    For the tensor of each batch, its shape is :math:`(C_{in}, H_{in}, W_{in})`, the formula is defined as:
 
     .. math::
 
-        out_j = \sum_{i=0}^{C_{in} - 1} ccor(W_{ij}, X_i) + b_j,
+        \text{out}(N_i, C_{\text{out}_j}) = \text{bias}(C_{\text{out}_j}) +
+        \sum_{k = 0}^{C_{in} - 1} \text{ccor}({\text{weight}(C_{\text{out}_j}, k), \text{X}(N_i, k)})
 
-    where :math:`ccor` is the cross-correlation operator, :math:`C_{in}` is the input channel number, :math:`j` ranges
-    from :math:`0` to :math:`C_{out} - 1`, :math:`W_{ij}` corresponds to the :math:`i`-th channel of the :math:`j`-th
-    filter and :math:`out_{j}` corresponds to the :math:`j`-th channel of the output. :math:`W_{ij}` is a slice
-    of kernel and it has shape :math:`(\text{kernel_size[0]}, \text{kernel_size[1]})`,
-    where :math:`\text{kernel_size[0]}` and :math:`\text{kernel_size[1]}` are the height and width of
-    the convolution kernel. The full kernel has shape
-    :math:`(C_{out}, C_{in} // \text{group}, \text{kernel_size[0]}, \text{kernel_size[1]})`,
-    where group is the group number to split the input `x` in the channel dimension.
+    where :math:`ccor` is the `cross-correlation <https://en.wikipedia.org/wiki/Cross-correlation>`_,
+    :math:`C_{in}` is the channel number of the input, :math:`out_{j}` corresponds to the jth channel of
+    the output and :math:`j` is in the range of :math:`[0，C_{out}-1]`. :math:`\text{weight}(C_{\text{out}_j}, k)`
+    is a convolution kernel slice with shape :math:`(\text{kernel_size[0]}, \text{kernel_size[1]})`,
+    where :math:`\text{kernel_size[0]}` and :math:`\text{kernel_size[1]}` are the height and width of the convolution
+    kernel respectively. :math:`\text{bias}` is the bias parameter and :math:`\text{X}` is the input tensor.
+    The shape of full convolution kernel is
+    :math:`(C_{out}, C_{in} / \text{group}, \text{kernel_size[0]}, \text{kernel_size[1]})`,
+    where `group` is the number of groups to split the input `x` in the channel dimension.
 
-    If the 'pad_mode' is set to be "valid", the output height and width will be
-    :math:`\left \lfloor{1 + \frac{H_{in} + \text{padding[0]} + \text{padding[1]} - \text{kernel_size[0]} -
-    (\text{kernel_size[0]} - 1) \times (\text{dilation[0]} - 1) }{\text{stride[0]}}} \right \rfloor`    and
-    :math:`\left \lfloor{1 + \frac{W_{in} + \text{padding[2]} + \text{padding[3]} - \text{kernel_size[1]} -
-    (\text{kernel_size[1]} - 1) \times (\text{dilation[1]} - 1) }{\text{stride[1]}}} \right \rfloor`    respectively.
-
-    The first introduction can be found in paper `Gradient Based Learning Applied to Document Recognition
-    <http://vision.stanford.edu/cs598_spring07/papers/Lecun98.pdf>`_.
+    For more details, please refers to the paper `Gradient Based Learning Applied to Document
+    Recognition <http://vision.stanford.edu/cs598_spring07/papers/Lecun98.pdf>`_.
 
     Args:
-        in_channels (int): The number of input channel :math:`C_{in}`.
-        out_channels (int): The number of output channel :math:`C_{out}`.
-        kernel_size (Union[int, tuple[int]]): The data type is int or a tuple of 2 integers. Specifies the height
-            and width of the 2D convolution window. Single int means the value is for both the height and the width of
-            the kernel. A tuple of 2 ints means the first value is for the height and the other is for the
-            width of the kernel.
-        stride (Union[int, tuple[int]]): The distance of kernel moving, an int number that represents
-            the height and width of movement are both strides, or a tuple of two int numbers that
-            represent height and width of movement respectively. Default: 1.
+        in_channels (int): The channel number of the input tensor of the Conv2d layer.
+        out_channels (int): The channel number of the output tensor of the Conv2d layer.
+        kernel_size (Union[int, tuple[int]]): Specifies the height and width of the 2D convolution kernel.
+            The data type is an integer or a tuple of two integers. An integer represents the height
+            and width of the convolution kernel. A tuple of two integers represents the height
+            and width of the convolution kernel respectively.
+        stride (Union[int, tuple[int]]): The movement stride of the 2D convolution kernel.
+            The data type is an integer or a tuple of two integers. An integer represents the movement step size
+            in both height and width directions. A tuple of two integers represents the movement step size in the height
+            and width directions respectively. Default: 1.
         pad_mode (str): Specifies padding mode. The optional values are
             "same", "valid", "pad". Default: "same".
 
-            - same: Adopts the way of completion. The height and width of the output will be the same as
-              the input `x`. The total number of padding will be calculated in horizontal and vertical
-              directions and evenly distributed to top and bottom, left and right if possible. Otherwise, the
-              last extra padding will be done from the bottom and the right side. If this mode is set, `padding`
-              must be 0.
+            - same: The width of the output is the same as the value of the input divided by `stride`.
+              If this mode is set, the value of `padding` must be 0.
 
-            - valid: Adopts the way of discarding. The possible largest height and width of output will be returned
-              without padding. Extra pixels will be discarded. If this mode is set, `padding`
-              must be 0.
+            - valid: Returns a valid calculated output without padding. Excess pixels that do not satisfy the
+              calculation will be discarded. If this mode is set, the value of `padding` must be 0.
 
-            - pad: Implicit paddings on both sides of the input `x`. The number of `padding` will be padded to the input
-              Tensor borders. `padding` must be greater than or equal to 0.
+            - pad: Pads the input. Padding `padding` size of zero on both sides of the input.
+              If this mode is set, the value of `padding` must be greater than or equal to 0.
 
-        padding (Union[int, tuple[int]]): Implicit paddings on both sides of the input `x`. If `padding` is one integer,
-                    the paddings of top, bottom, left and right are the same, equal to padding. If `padding` is a tuple
-                    with four integers, the paddings of top, bottom, left and right will be equal to padding[0],
-                    padding[1], padding[2], and padding[3] accordingly. Default: 0.
-        dilation (Union[int, tuple[int]]): The data type is int or a tuple of 2 integers. Specifies the dilation rate
-                                      to use for dilated convolution. If set to be :math:`k > 1`, there will
-                                      be :math:`k - 1` pixels skipped for each sampling location. Its value must
-                                      be greater or equal to 1 and bounded by the height and width of the
-                                      input `x`. Default: 1.
-        group (int): Splits filter into groups, `in_ channels` and `out_channels` must be
-            divisible by the number of groups. If the group is equal to `in_channels` and `out_channels`,
+        padding (Union[int, tuple[int]]): The number of padding on the height and width directions of the input.
+            The data type is an integer or a tuple of four integers. If `padding` is an integer,
+            then the top, bottom, left, and right padding are all equal to `padding`.
+            If `padding` is a tuple of 4 integers, then the top, bottom, left, and right padding
+            is equal to `padding[0]`, `padding[1]`, `padding[2]`, and `padding[3]` respectively.
+            The value should be greater than or equal to 0. Default: 0.
+        dilation (Union[int, tuple[int]]): Dilation size of 2D convolution kernel.
+            The data type is an integer or a tuple of two integers. If :math:`k > 1`, the kernel is sampled
+            every `k` elements. The value of `k` on the height and width directions is in range of [1, H]
+            and [1, W] respectively. Default: 1.
+        group (int): Splits filter into groups, `in_channels` and `out_channels` must be
+            divisible by `group`. If the group is equal to `in_channels` and `out_channels`,
             this 2D convolution layer also can be called 2D depthwise convolution layer. Default: 1.
-        has_bias (bool): Specifies whether the layer uses a bias vector. Default: False.
-        weight_init (Union[Tensor, str, Initializer, numbers.Number]): Initializer for the convolution kernel.
-            It can be a Tensor, a string, an Initializer or a number. When a string is specified,
+        has_bias (bool): Whether the Conv2d layer has a bias parameter. Default: False.
+        weight_init (Union[Tensor, str, Initializer, numbers.Number]): Initialization method of weight parameter.
+            It can be a Tensor, a string, an Initializer or a numbers.Number. When a string is specified,
             values from 'TruncatedNormal', 'Normal', 'Uniform', 'HeUniform' and 'XavierUniform' distributions as well
             as constant 'One' and 'Zero' distributions are possible. Alias 'xavier_uniform', 'he_uniform', 'ones'
             and 'zeros' are acceptable. Uppercase and lowercase are both acceptable. Refer to the values of
             Initializer for more details. Default: 'normal'.
-        bias_init (Union[Tensor, str, Initializer, numbers.Number]): Initializer for the bias vector. Possible
-            Initializer and string are the same as 'weight_init'. Refer to the values of
+        bias_init (Union[Tensor, str, Initializer, numbers.Number]): Initialization method of bias parameter.
+            Available initialization methods are the same as 'weight_init'. Refer to the values of
             Initializer for more details. Default: 'zeros'.
         data_format (str): The optional value for data format, is 'NHWC' or 'NCHW'.
             Default: 'NCHW'.
@@ -198,6 +193,34 @@ class Conv2d(_Conv):
     Outputs:
         Tensor of shape :math:`(N, C_{out}, H_{out}, W_{out})` or :math:`(N, H_{out}, W_{out}, C_{out})`.
 
+        pad_mode is 'same':
+
+        .. math::
+            \begin{array}{ll} \\
+                H_{out} ＝ \left \lfloor{\frac{H_{in}}{\text{stride[0]}} + 1} \right \rfloor \\
+                W_{out} ＝ \left \lfloor{\frac{W_{in}}{\text{stride[1]}} + 1} \right \rfloor \\
+            \end{array}
+
+        pad_mode is 'valid':
+
+        .. math::
+            \begin{array}{ll} \\
+                H_{out} ＝ \left \lfloor{\frac{H_{in} - \text{dilation[0]} \times (\text{kernel_size[0]} - 1) }
+                {\text{stride[0]}} + 1} \right \rfloor \\
+                W_{out} ＝ \left \lfloor{\frac{W_{in} - \text{dilation[1]} \times (\text{kernel_size[1]} - 1) }
+                {\text{stride[1]}} + 1} \right \rfloor \\
+            \end{array}
+
+        pad_mode is 'pad':
+
+        .. math::
+            \begin{array}{ll} \\
+                H_{out} ＝ \left \lfloor{\frac{H_{in} + padding[0] + padding[1] - (\text{dilation[0]} - 1) \times
+                \text{kernel_size[0]} - 1 }{\text{stride[0]}} + 1} \right \rfloor \\
+                W_{out} ＝ \left \lfloor{\frac{W_{in} + padding[2] + padding[3] - (\text{dilation[1]} - 1) \times
+                \text{kernel_size[1]} - 1 }{\text{stride[1]}} + 1} \right \rfloor \\
+            \end{array}
+
     Raises:
         TypeError: If `in_channels`, `out_channels` or `group` is not an int.
         TypeError: If `kernel_size`, `stride`, `padding` or `dilation` is neither an int not a tuple.
@@ -298,75 +321,82 @@ class Conv1d(_Conv):
     r"""
     1D convolution layer.
 
-    Applies a 1D convolution over an input tensor which is typically of shape :math:`(N, C_{in}, W_{in})`,
-    where :math:`N` is batch size and :math:`C_{in}` is channel number. For each batch of shape
-    :math:`(C_{in}, W_{in})`, the formula is defined as:
+    Calculates the 1D convolution on the input tensor which is typically of shape :math:`(N, C_{in}, L_{in})`,
+    where :math:`N` is batch size, :math:`C_{in}` is a number of channels and :math:`L_{in}` is a length of sequence.
+    For the tensor of each batch, its shape is :math:`(C_{in}, L_{in})`, the formula is defined as:
 
     .. math::
 
-        out_j = \sum_{i=0}^{C_{in} - 1} ccor(W_{ij}, X_i) + b_j,
+        \text{out}(N_i, C_{\text{out}_j}) = \text{bias}(C_{\text{out}_j}) +
+        \sum_{k = 0}^{C_{in} - 1} \text{ccor}({\text{weight}(C_{\text{out}_j}, k), \text{X}(N_i, k)})
 
-    where :math:`ccor` is the cross correlation operator, :math:`C_{in}` is the input channel number, :math:`j` ranges
-    from :math:`0` to :math:`C_{out} - 1`, :math:`W_{ij}` corresponds to the :math:`i`-th channel of the :math:`j`-th
-    filter and :math:`out_{j}` corresponds to the :math:`j`-th channel of the output. :math:`W_{ij}` is a slice
-    of kernel and it has shape :math:`(\text{ks_w})`, where :math:`\text{ks_w}` is the width of the convolution kernel.
-    The full kernel has shape :math:`(C_{out}, C_{in} // \text{group}, \text{ks_w})`, where group is the group number
-    to split the input `x` in the channel dimension.
+    where :math:`ccor` is the `cross-correlation <https://en.wikipedia.org/wiki/Cross-correlation>`_,
+    :math:`C_{in}` is the channel number of the input, :math:`out_{j}` corresponds to the jth channel of
+    the output and :math:`j` is in the range of :math:`[0，C_{out}-1]`. :math:`\text{weight}(C_{\text{out}_j}, k)`
+    is a convolution kernel slice with shape :math:`(kernel_size)`, where :math:`\text{kernel_size}` is the width of
+    the convolution kernel. :math:`\text{bias}` is the bias parameter, and :math:`\text{X}` is the input tensor.
+    The shape of full convolution kernel is :math:`(C_{out}, C_{in} / \text{group}, \text{kernel_size})`,
+    where `group` is the number of groups to split the input `x` in the channel dimension.
 
-    If the 'pad_mode' is set to be "valid", the output width will be
-    :math:`\left \lfloor{1 + \frac{W_{in} + 2 \times \text{padding} - \text{ks_w} -
-    (\text{ks_w} - 1) \times (\text{dilation} - 1) }{\text{stride}}} \right \rfloor`    respectively.
-
-    The first introduction of convolution layer can be found in paper `Gradient Based Learning Applied to Document
+    For more details, please refers to the paper `Gradient Based Learning Applied to Document
     Recognition <http://vision.stanford.edu/cs598_spring07/papers/Lecun98.pdf>`_.
 
     Args:
-        in_channels (int): The number of input channel :math:`C_{in}`.
-        out_channels (int): The number of output channel :math:`C_{out}`.
-        kernel_size (int): The data type is int. Specifies the
-            width of the 1D convolution window.
-        stride (int): The distance of kernel moving, an int number that represents
-            the width of movement. Default: 1.
+        in_channels (int): The channel number of the input tensor of the Conv1d layer.
+        out_channels (int): The channel number of the output tensor of the Conv1d layer.
+        kernel_size (int): Specifies the width of the 1D convolution kernel.
+        stride (int): The movement stride of the 1D convolution kernel. Default: 1.
         pad_mode (str): Specifies padding mode. The optional values are
             "same", "valid", "pad". Default: "same".
 
-            - same: Adopts the way of completion. The output width will be the same as the input `x`.
-              The total number of padding will be calculated in the horizontal
-              direction and evenly distributed to left and right if possible. Otherwise, the
-              last extra padding will be done from the bottom and the right side. If this mode is set, `padding`
-              must be 0.
+            - same: The width of the output is the same as the value of the input divided by `stride`.
+              If this mode is set, the value of `padding` must be 0.
 
-            - valid: Adopts the way of discarding. The possible largest width of the output will be returned
-              without padding. Extra pixels will be discarded. If this mode is set, `padding`
-              must be 0.
+            - valid: Returns a valid calculated output without padding. Excess pixels that do not satisfy the
+              calculation will be discarded. If this mode is set, the value of `padding` must be 0.
 
-            - pad: Implicit paddings on both sides of the input `x`. The number of `padding` will be padded to the input
-              Tensor borders. `padding` must be greater than or equal to 0.
+            - pad: Pads the input. Padding `padding` size of zero on both sides of the input.
+              If this mode is set, the value of `padding` must be greater than or equal to 0.
 
-        padding (int): Implicit paddings on both sides of the input `x`. Default: 0.
-        dilation (int): The data type is int. Specifies the dilation rate
-                                      to use for dilated convolution. If set to be :math:`k > 1`, there will
-                                      be :math:`k - 1` pixels skipped for each sampling location. Its value must
-                                      be greater or equal to 1 and bounded by the height and width of the
-                                      input `x`. Default: 1.
-        group (int): Splits filter into groups, `in_ channels` and `out_channels` must be
-            divisible by the number of groups. Default: 1.
-        has_bias (bool): Specifies whether the layer uses a bias vector. Default: False.
-        weight_init (Union[Tensor, str, Initializer, numbers.Number]): An initializer for the convolution kernel.
-            It can be a Tensor, a string, an Initializer or a number. When a string is specified,
+        padding (int): The number of padding on both sides of input.
+            The value should be greater than or equal to 0. Default: 0.
+        dilation (int): Dilation size of 1D convolution kernel. If :math:`k > 1`, the kernel is sampled
+            every `k` elements. The value of `k` is in range of [1, L]. Default: 1.
+        group (int): Splits filter into groups, `in_channels` and `out_channels` must be
+            divisible by `group`. Default: 1.
+        has_bias (bool): Whether the Conv1d layer has a bias parameter. Default: False.
+        weight_init (Union[Tensor, str, Initializer, numbers.Number]): Initialization method of weight parameter.
+            It can be a Tensor, a string, an Initializer or a numbers.Number. When a string is specified,
             values from 'TruncatedNormal', 'Normal', 'Uniform', 'HeUniform' and 'XavierUniform' distributions as well
             as constant 'One' and 'Zero' distributions are possible. Alias 'xavier_uniform', 'he_uniform', 'ones'
             and 'zeros' are acceptable. Uppercase and lowercase are both acceptable. Refer to the values of
             Initializer for more details. Default: 'normal'.
-        bias_init (Union[Tensor, str, Initializer, numbers.Number]): Initializer for the bias vector. Possible
-            Initializer and string are the same as 'weight_init'. Refer to the values of
+        bias_init (Union[Tensor, str, Initializer, numbers.Number]): Initialization method of bias parameter.
+            Available initialization methods are the same as 'weight_init'. Refer to the values of
             Initializer for more details. Default: 'zeros'.
 
     Inputs:
-        - **x** (Tensor) - Tensor of shape :math:`(N, C_{in}, W_{in})`.
+        - **x** (Tensor) - Tensor of shape :math:`(N, C_{in}, L_{in})`.
 
     Outputs:
-        Tensor of shape :math:`(N, C_{out}, W_{out})`.
+        Tensor of shape :math:`(N, C_{out}, L_{out})`.
+
+        pad_mode is 'same':
+
+        .. math::
+            L_{out} ＝ \left \lfloor{\frac{L_{in}}{\text{stride}} + 1} \right \rfloor
+
+        pad_mode is 'valid':
+
+        .. math::
+            L_{out} ＝ \left \lfloor{\frac{L_{in} - \text{dilation} \times (\text{kernel_size} - 1) }
+            {\text{stride}} + 1} \right \rfloor
+
+        pad_mode is 'pad':
+
+        .. math::
+            L_{out} ＝ \left \lfloor{\frac{L_{in} + 2 \times padding - (\text{dilation} - 1) \times
+            \text{kernel_size} - 1 }{\text{stride}} + 1} \right \rfloor
 
     Raises:
         TypeError: If `in_channels`, `out_channels`, `kernel_size`, `stride`, `padding` or `dilation` is not an int.
@@ -487,75 +517,76 @@ class Conv3d(_Conv):
     r"""
     3D convolution layer.
 
-    Applies a 3D convolution over an input tensor which is typically of shape
-    :math:`(N, C_{in}, D_{in}, H_{in}, W_{in})` and output shape
-    :math:`(N, C_{out}, D_{out}, H_{out}, W_{out})`. where :math:`N` is batch size. :math:`C` is channel number.
-    the formula is defined as:
+    Calculates the 3D convolution on the input tensor which is typically of shape
+    :math:`(N, C_{in}, D_{in}, H_{in}, W_{in})`,
+    where :math:`N` is batch size, :math:`C_{in}` is a number of channels,
+    :math:`D_{in}, H_{in}, W_{in}` are the depth, height and width of the feature layer respectively.
+    For the tensor of each batch, its shape is :math:`(C_{in}, D_{in}, H_{in}, W_{in})`, the formula is defined as:
 
     .. math::
 
-        \operatorname{out}\left(N_{i}, C_{\text {out}_j}\right)=\operatorname{bias}\left(C_{\text {out}_j}\right)+
-        \sum_{k=0}^{C_{in}-1} ccor(\text {weight}\left(C_{\text {out}_j}, k\right),
-        \operatorname{input}\left(N_{i}, k\right))
+        \text{out}(N_i, C_{\text{out}_j}) = \text{bias}(C_{\text{out}_j}) +
+        \sum_{k = 0}^{C_{in} - 1} \text{ccor}({\text{weight}(C_{\text{out}_j}, k), \text{X}(N_i, k)})
 
-    where :math:`ccor` is the cross-correlation operator.
+    where :math:`ccor` is the `cross-correlation <https://en.wikipedia.org/wiki/Cross-correlation>`_,
+    :math:`C_{in}` is the channel number of the input, :math:`out_{j}` corresponds to the jth channel of
+    the output and :math:`j` is in the range of :math:`[0，C_{out}-1]`. :math:`\text{weight}(C_{\text{out}_j}, k)`
+    is a convolution kernel slice with shape
+    :math:`(\text{kernel_size[0]}, \text{kernel_size[1]}, \text{kernel_size[2]})`,
+    where :math:`\text{kernel_size[0]}`, :math:`\text{kernel_size[1]}` and :math:`\text{kernel_size[2]}` are
+    the depth, height and width of the convolution kernel respectively. :math:`\text{bias}` is the bias parameter
+    and :math:`\text{X}` is the input tensor.
+    The shape of full convolution kernel is
+    :math:`(C_{out}, C_{in} / \text{group}, \text{kernel_size[0]}, \text{kernel_size[1]}, \text{kernel_size[2]})`,
+    where `group` is the number of groups to split the input `x` in the channel dimension.
 
-    If the 'pad_mode' is set to be "valid", the output depth, height and width will be
-    :math:`\left \lfloor{1 + \frac{D_{in} + \text{padding[0]} + \text{padding[1]} - \text{kernel_size[0]} -
-    (\text{kernel_size[0]} - 1) \times (\text{dilation[0]} - 1) }{\text{stride[0]}}} \right \rfloor` and
-    :math:`\left \lfloor{1 + \frac{H_{in} + \text{padding[2]} + \text{padding[3]} - \text{kernel_size[1]} -
-    (\text{kernel_size[1]} - 1) \times (\text{dilation[1]} - 1) }{\text{stride[1]}}} \right \rfloor` and
-    :math:`\left \lfloor{1 + \frac{W_{in} + \text{padding[4]} + \text{padding[5]} - \text{kernel_size[2]} -
-    (\text{kernel_size[2]} - 1) \times (\text{dilation[2]} - 1) }{\text{stride[2]}}} \right \rfloor` respectively.
+    For more details, please refers to the paper `Gradient Based Learning Applied to Document
+    Recognition <http://vision.stanford.edu/cs598_spring07/papers/Lecun98.pdf>`_.
 
     Args:
-        in_channels (int): The number of input channel :math:`C_{in}`.
-        out_channels (int): The number of output channel :math:`C_{out}`.
-        kernel_size (Union[int, tuple[int]]): The data type is int or a tuple of 3 integers.
-            Specifies the depth, height and width of the 3D convolution window.
-            Single int means the value is for the depth, height and the width of the kernel.
-            A tuple of 3 ints means the first value is for the depth, the second value is for the height and the
-            other is for the width of the kernel.
-        stride (Union[int, tuple[int]]): The distance of kernel moving, an int number that represents
-            the depth, height and width of movement are both strides, or a tuple of three int numbers that
-            represent depth, height and width of movement respectively. Default: 1.
+        in_channels (int): The channel number of the input tensor of the Conv3d layer.
+        out_channels (int): The channel number of the output tensor of the Conv3d layer.
+        kernel_size (Union[int, tuple[int]]): Specifies the depth, height and width of the 3D convolution kernel.
+            The data type is an integer or a tuple of three integers. An integer represents the depth, height
+            and width of the convolution kernel. A tuple of three integers represents the depth, height
+            and width of the convolution kernel respectively.
+        stride (Union[int, tuple[int]]): The movement stride of the 3D convolution kernel.
+            The data type is an integer or a tuple of three integers. An integer represents the movement step size
+            in depth, height and width directions. A tuple of three integers represents the movement step size
+            in the depth, height and width directions respectively. Default: 1.
         pad_mode (str): Specifies padding mode. The optional values are
             "same", "valid", "pad". Default: "same".
 
-            - same: Adopts the way of completion. The depth, height and width of the output will be the same as
-              the input `x`. The total number of padding will be calculated in depth, horizontal and vertical
-              directions and evenly distributed to head and tail, top and bottom, left and right if possible.
-              Otherwise, the last extra padding will be done from the tail, bottom and the right side.
-              If this mode is set, `padding` must be 0.
+            - same: The width of the output is the same as the value of the input divided by `stride`.
+              If this mode is set, the value of `padding` must be 0.
 
-            - valid: Adopts the way of discarding. The possible largest depth, height and width of output
-              will be returned without padding. Extra pixels will be discarded. If this mode is set, `padding`
-              must be 0.
+            - valid: Returns a valid calculated output without padding. Excess pixels that do not satisfy the
+              calculation will be discarded. If this mode is set, the value of `padding` must be 0.
 
-            - pad: Implicit paddings on both sides of the input `x` in depth, height, width. The number of `padding`
-              will be padded to the input Tensor borders. `padding` must be greater than or equal to 0.
+            - pad: Pads the input. Padding `padding` size of zero on both sides of the input.
+              If this mode is set, the value of `padding` must be greater than or equal to 0.
 
-        padding (Union(int, tuple[int])): Implicit paddings on both sides of the input `x`.
-            The data type is int or a tuple of 6 integers. Default: 0. If `padding` is an integer,
-            the paddings of head, tail, top, bottom, left and right are the same, equal to padding.
-            If `paddings` is a tuple of six integers, the padding of head, tail, top, bottom, left and right equal to
-            padding[0], padding[1], padding[2], padding[3], padding[4] and padding[5] correspondingly.
-        dilation (Union[int, tuple[int]]): The data type is int or a tuple of 3 integers
-            :math:`(dilation_d, dilation_h, dilation_w)`. Currently, dilation on depth only supports the case of 1.
-            Specifies the dilation rate to use for dilated convolution. If set to be :math:`k > 1`,
-            there will be :math:`k - 1` pixels skipped for each sampling location.
-            Its value must be greater or equal to 1 and bounded by the height and width of the input `x`. Default: 1.
-        group (int): Splits filter into groups, `in_ channels` and `out_channels` must be
-            divisible by the number of groups. Default: 1. Only 1 is currently supported.
-        has_bias (bool): Specifies whether the layer uses a bias vector. Default: False.
-        weight_init (Union[Tensor, str, Initializer, numbers.Number]): Initializer for the convolution kernel.
-            It can be a Tensor, a string, an Initializer or a number. When a string is specified,
+        padding (Union(int, tuple[int])): The number of padding on the depth, height and width directions of the input.
+            The data type is an integer or a tuple of six integers. If `padding` is an integer,
+            then the head, tail, top, bottom, left, and right padding are all equal to `padding`.
+            If `padding` is a tuple of six integers, then the head, tail, top, bottom, left, and right padding
+            is equal to `padding[0]`, `padding[1]`, `padding[2]`, `padding[3]`, `padding[4]` and `padding[5]`
+            respectively. The value should be greater than or equal to 0. Default: 0.
+        dilation (Union[int, tuple[int]]): Dilation size of 3D convolution kernel.
+            The data type is an integer or a tuple of three integers. If :math:`k > 1`, the kernel is sampled
+            every `k` elements. The value of `k` on the depth, height and width directions is in range of
+            [1, D], [1, H] and [1, W] respectively. Default: 1.
+        group (int): Splits filter into groups, `in_channels` and `out_channels` must be
+            divisible by `group`. Default: 1. Only 1 is currently supported.
+        has_bias (bool): Whether the Conv3d layer has a bias parameter. Default: False.
+        weight_init (Union[Tensor, str, Initializer, numbers.Number]): Initialization method of weight parameter.
+            It can be a Tensor, a string, an Initializer or a numbers.Number. When a string is specified,
             values from 'TruncatedNormal', 'Normal', 'Uniform', 'HeUniform' and 'XavierUniform' distributions as well
             as constant 'One' and 'Zero' distributions are possible. Alias 'xavier_uniform', 'he_uniform', 'ones'
             and 'zeros' are acceptable. Uppercase and lowercase are both acceptable. Refer to the values of
             Initializer for more details. Default: 'normal'.
-        bias_init (Union[Tensor, str, Initializer, numbers.Number]): Initializer for the bias vector. Possible
-            Initializer and string are the same as 'weight_init'. Refer to the values of
+        bias_init (Union[Tensor, str, Initializer, numbers.Number]): Initialization method of bias parameter.
+            Available initialization methods are the same as 'weight_init'. Refer to the values of
             Initializer for more details. Default: 'zeros'.
         data_format (str): The optional value for data format. Currently only support "NCDHW".
 
@@ -564,7 +595,41 @@ class Conv3d(_Conv):
           Currently input data type only support float16 and float32.
 
     Outputs:
-        Tensor, the value that applied 3D convolution. The shape is :math:`(N, C_{out}, D_{out}, H_{out}, W_{out})`.
+        Tensor of shape is :math:`(N, C_{out}, D_{out}, H_{out}, W_{out})`.
+
+        pad_mode is 'same':
+
+        .. math::
+            \begin{array}{ll} \\
+                D_{out} ＝ \left \lfloor{\frac{D_{in}}{\text{stride[0]}} + 1} \right \rfloor \\
+                H_{out} ＝ \left \lfloor{\frac{H_{in}}{\text{stride[1]}} + 1} \right \rfloor \\
+                W_{out} ＝ \left \lfloor{\frac{W_{in}}{\text{stride[2]}} + 1} \right \rfloor \\
+            \end{array}
+
+
+        pad_mode is 'valid':
+
+        .. math::
+            \begin{array}{ll} \\
+                D_{out} ＝ \left \lfloor{\frac{D_{in} - \text{dilation[0]} \times (\text{kernel_size[0]} - 1) }
+                {\text{stride[0]}} + 1} \right \rfloor \\
+                H_{out} ＝ \left \lfloor{\frac{H_{in} - \text{dilation[1]} \times (\text{kernel_size[1]} - 1) }
+                {\text{stride[1]}} + 1} \right \rfloor \\
+                W_{out} ＝ \left \lfloor{\frac{W_{in} - \text{dilation[2]} \times (\text{kernel_size[2]} - 1) }
+                {\text{stride[2]}} + 1} \right \rfloor \\
+            \end{array}
+
+        pad_mode is 'pad':
+
+        .. math::
+            \begin{array}{ll} \\
+                D_{out} ＝ \left \lfloor{\frac{D_{in} + padding[0] + padding[1] - (\text{dilation[0]} - 1) \times
+                \text{kernel_size[0]} - 1 }{\text{stride[0]}} + 1} \right \rfloor \\
+                H_{out} ＝ \left \lfloor{\frac{H_{in} + padding[2] + padding[3] - (\text{dilation[1]} - 1) \times
+                \text{kernel_size[1]} - 1 }{\text{stride[1]}} + 1} \right \rfloor \\
+                W_{out} ＝ \left \lfloor{\frac{W_{in} + padding[4] + padding[5] - (\text{dilation[2]} - 1) \times
+                \text{kernel_size[2]} - 1 }{\text{stride[2]}} + 1} \right \rfloor \\
+            \end{array}
 
     Raises:
         TypeError: If `in_channels`, `out_channels` or `group` is not an int.
@@ -663,102 +728,64 @@ class Conv3d(_Conv):
 
 class Conv3dTranspose(_Conv):
     r"""
-    Compute a 3D transposed convolution, which is also known as a deconvolution
-    (although it is not an actual deconvolution).
-    The transposed convolution operator multiplies each input value element-wise by a learnable kernel,
-    and sums over the outputs from all input feature planes.
-    This module can be seen as the gradient of Conv3d with respect to its input.
+    3D transposed convolution layer.
 
-    `x` is typically of shape :math:`(N, C, D, H, W)`, where :math:`N` is batch size, :math:`C` is channel number,
-    :math:`D` is the characteristic depth, :math:`H` is the height of the characteristic layer,
-    and :math:`W` is the width of the characteristic layer.
-    The calculation process of transposed convolution is equivalent to the reverse calculation of convolution.
+    Calculates a 3D transposed convolution, which can be regarded as Conv3d for the gradient of the input.
+    It also called deconvolution (although it is not an actual deconvolution).
 
-    The pad_mode argument effectively adds :math:`dilation * (kernel\_size - 1) - padding` amount of zero padding
-    to both sizes of the input. So that when a Conv3d and a ConvTranspose3d are initialized with same parameters,
-    they are inverses of each other in regard to the input and output shapes.
-    However, when stride > 1, Conv3d maps multiple input shapes to the same output shape.
-    ConvTranspose3d provide padding argument to  increase the calculated output shape on one or more side.
+    The input is typically of shape :math:`(N, C, D, H, W)`, where :math:`N` is batch size, :math:`C` is a number of
+    channels, :math:`D_{in}, H_{in}, W_{in}` are the depth, height and width of the feature layer respectively.
 
-    The height and width of output are defined as:
-
-    if the 'pad_mode' is set to be "pad",
-
-    .. math::
-        D_{out} = (D_{in} - 1) \times \text{stride_d} - 2 \times \text{padding_d} + \text{dilation_d} \times
-        (\text{kernel_size_d} - 1) + \text{output_padding_d} + 1
-
-        H_{out} = (H_{in} - 1) \times \text{stride_h} - 2 \times \text{padding_h} + \text{dilation_h} \times
-        (\text{kernel_size_h} - 1) + \text{output_padding_h} + 1
-
-        W_{out} = (W_{in} - 1) \times \text{stride_w} - 2 \times \text{padding_w} + \text{dilation_w} \times
-        (\text{kernel_size_w} - 1) + \text{output_padding_w} + 1
-
-    if the 'pad_mode' is set to be "SAME",
-
-    .. math::
-
-        D_{out} = (D_{in} + \text{stride_d} - 1)/\text{stride_d} \\
-        H_{out} = (H_{in} + \text{stride_h} - 1)/\text{stride_h} \\
-        W_{out} = (W_{in} + \text{stride_w} - 1)/\text{stride_w}
-
-    if the 'pad_mode' is set to be "VALID",
-
-    .. math::
-
-        D_{out} = (D_{in} - 1) \times \text{stride_d} + \text{dilation_d} \times
-        (\text{kernel_size_d} - 1) + 1 \\
-        H_{out} = (H_{in} - 1) \times \text{stride_h} + \text{dilation_h} \times
-        (\text{kernel_size_h} - 1) + 1 \\
-        W_{out} = (W_{in} - 1) \times \text{stride_w} + \text{dilation_w} \times
-        (\text{kernel_size_w} - 1) + 1
+    When Conv3d and Conv3dTranspose are initialized with the same parameters, and `pad_mode` is set to 'pad',
+    :math:`dilation * (kernel\_size - 1) - padding` amount of zero will be paded to the depth, height and width
+    directions of the input, they are inverses of each other in regard to the input and output shapes in this case.
+    However, when `stride` > 1, Conv2d maps multiple input shapes to the same output shape. Deconvolutional network
+    can refer to `Deconvolutional Networks <https://www.matthewzeiler.com/matzeiler/deconvolutionalnetworks.pdf>`_.
 
     Args:
-        in_channels (int): The number of input channel :math:`C_{in}`.
-        out_channels (int): The number of output channel :math:`C_{out}`.
-        kernel_size (Union[int, tuple[int]]): The kernel size of the 3D convolution.
-        stride (Union[int, tuple[int]]): The distance of kernel moving, an int number that represents
-            the depth, height and width of movement are both strides, or a tuple of three int numbers that
-            represent depth, height and width of movement respectively. Its value must be equal to or greater than 1.
-            Default: 1.
-        pad_mode (str): Select the mode of the pad. The optional values are
-            "pad", "same", "valid". Default: "same".
+        in_channels (int): The channel number of the input tensor of the Conv3dTranspose layer.
+        out_channels (int): The channel number of the output tensor of the Conv3dTranspose layer.
+        kernel_size (Union[int, tuple[int]]): Specifies the depth, height and width of the 3D convolution kernel.
+            The data type is an integer or a tuple of three integers. An integer represents the depth, height
+            and width of the convolution kernel. A tuple of three integers represents the depth, height
+            and width of the convolution kernel respectively.
+        stride (Union[int, tuple[int]]): The movement stride of the 3D convolution kernel.
+            The data type is an integer or a tuple of three integers. An integer represents the movement step size
+            in depth, height and width directions. A tuple of three integers represents the movement step size
+            in the depth, height and width directions respectively. Default: 1.
+        pad_mode (str): Specifies padding mode. The optional values are
+            "same", "valid", "pad". Default: "same".
 
-            - same: Adopts the way of completion. The depth, height and width of the output will be the same as
-              the input `x`. The total number of padding will be calculated in depth, horizontal and vertical
-              directions and evenly distributed to head and tail, top and bottom, left and right if possible.
-              Otherwise, the last extra padding will be done from the tail, bottom and the right side.
-              If this mode is set, `padding` and `output_padding` must be 0.
+            - same: The width of the output is the same as the value of the input divided by `stride`.
+              If this mode is set, the value of `padding` must be 0.
 
-            - valid: Adopts the way of discarding. The possible largest depth, height and width of output
-              will be returned without padding. Extra pixels will be discarded. If this mode is set, `padding`
-              and `output_padding` must be 0.
+            - valid: Returns a valid calculated output without padding. Excess pixels that do not satisfy the
+              calculation will be discarded. If this mode is set, the value of `padding` must be 0.
 
-            - pad: Implicit paddings on both sides of the input `x` in depth, height, width. The number of `pad` will
-              be padded to the input Tensor borders. `padding` must be greater than or equal to 0.
+            - pad: Pads the input. Padding `padding` size of zero on both sides of the input.
+              If this mode is set, the value of `padding` must be greater than or equal to 0.
 
-        padding (Union(int, tuple[int])): The pad value to be filled. Default: 0. If `padding` is an integer,
-            the paddings of head, tail, top, bottom, left and right are the same, equal to padding.
-            If `padding` is a tuple of six integers, the padding of head, tail, top, bottom, left and right equal to
-            padding[0], padding[1], padding[2], padding[3], padding[4] and padding[5] correspondingly.
-        dilation (Union(int, tuple[int])): The data type is int or a tuple of 3 integers
-            :math:`(dilation_d, dilation_h, dilation_w)`. Currently, dilation on depth only supports the case of 1.
-            Specifies the dilation rate to use for dilated convolution. If set to be :math:`k > 1`,
-            there will be :math:`k - 1` pixels skipped for each sampling location.
-            Its value must be greater or equal to 1 and bounded by the height and width of the input `x`. Default: 1.
-        group (int): Splits filter into groups, `in_ channels` and `out_channels` must be
-            divisible by the number of groups. Default: 1. Only 1 is currently supported.
-        output_padding (Union(int, tuple[int])): Add extra size to each dimension of the output. Default: 0.
-            Must be greater than or equal to 0.
-        has_bias (bool): Specifies whether the layer uses a bias vector. Default: False.
-        weight_init (Union[Tensor, str, Initializer, numbers.Number]): Initializer for the convolution kernel.
-            It can be a Tensor, a string, an Initializer or a number. When a string is specified,
+        padding (Union(int, tuple[int])): The number of padding on the depth, height and width directions of the input.
+            The data type is an integer or a tuple of six integers. If `padding` is an integer,
+            then the head, tail, top, bottom, left, and right padding are all equal to `padding`.
+            If `padding` is a tuple of six integers, then the head, tail, top, bottom, left, and right padding
+            is equal to `padding[0]`, `padding[1]`, `padding[2]`, `padding[3]`, `padding[4]` and `padding[5]`
+            respectively. The value should be greater than or equal to 0. Default: 0.
+        dilation (Union(int, tuple[int])): Dilation size of 3D convolution kernel.
+            The data type is an integer or a tuple of three integers. If :math:`k > 1`, the kernel is sampled
+            every `k` elements. The value of `k` on the depth, height and width directions is in range of
+            [1, D], [1, H] and [1, W] respectively. Default: 1.
+        group (int): Splits filter into groups, `in_channels` and `out_channels` must be
+            divisible by `group`. Default: 1. Only 1 is currently supported.
+        has_bias (bool): Whether the Conv3dTranspose layer has a bias parameter. Default: False.
+        weight_init (Union[Tensor, str, Initializer, numbers.Number]): Initialization method of weight parameter.
+            It can be a Tensor, a string, an Initializer or a numbers.Number. When a string is specified,
             values from 'TruncatedNormal', 'Normal', 'Uniform', 'HeUniform' and 'XavierUniform' distributions as well
             as constant 'One' and 'Zero' distributions are possible. Alias 'xavier_uniform', 'he_uniform', 'ones'
             and 'zeros' are acceptable. Uppercase and lowercase are both acceptable. Refer to the values of
             Initializer for more details. Default: 'normal'.
-        bias_init (Union[Tensor, str, Initializer, numbers.Number]): Initializer for the bias vector. Possible
-            Initializer and string are the same as 'weight_init'. Refer to the values of
+        bias_init (Union[Tensor, str, Initializer, numbers.Number]): Initialization method of bias parameter.
+            Available initialization methods are the same as 'weight_init'. Refer to the values of
             Initializer for more details. Default: 'zeros'.
         data_format (str): The optional value for data format. Currently only support 'NCDHW'.
 
@@ -769,6 +796,40 @@ class Conv3dTranspose(_Conv):
     Outputs:
         Tensor, the shape is :math:`(N, C_{out}, D_{out}, H_{out}, W_{out})`.
 
+        pad_mode is 'same':
+
+        .. math::
+            \begin{array}{ll} \\
+                D_{out} ＝ \left \lfloor{\frac{D_{in}}{\text{stride[0]}} + 1} \right \rfloor \\
+                H_{out} ＝ \left \lfloor{\frac{H_{in}}{\text{stride[1]}} + 1} \right \rfloor \\
+                W_{out} ＝ \left \lfloor{\frac{W_{in}}{\text{stride[2]}} + 1} \right \rfloor \\
+            \end{array}
+
+
+        pad_mode is 'valid':
+
+        .. math::
+            \begin{array}{ll} \\
+                D_{out} ＝ \left \lfloor{\frac{D_{in} - \text{dilation[0]} \times (\text{kernel_size[0]} - 1) }
+                {\text{stride[0]}} + 1} \right \rfloor \\
+                H_{out} ＝ \left \lfloor{\frac{H_{in} - \text{dilation[1]} \times (\text{kernel_size[1]} - 1) }
+                {\text{stride[1]}} + 1} \right \rfloor \\
+                W_{out} ＝ \left \lfloor{\frac{W_{in} - \text{dilation[2]} \times (\text{kernel_size[2]} - 1) }
+                {\text{stride[2]}} + 1} \right \rfloor \\
+            \end{array}
+
+        pad_mode is 'pad':
+
+        .. math::
+            \begin{array}{ll} \\
+                D_{out} ＝ \left \lfloor{\frac{D_{in} + padding[0] + padding[1] - (\text{dilation[0]} - 1) \times
+                \text{kernel_size[0]} - 1 }{\text{stride[0]}} + 1} \right \rfloor \\
+                H_{out} ＝ \left \lfloor{\frac{H_{in} + padding[2] + padding[3] - (\text{dilation[1]} - 1) \times
+                \text{kernel_size[1]} - 1 }{\text{stride[1]}} + 1} \right \rfloor \\
+                W_{out} ＝ \left \lfloor{\frac{W_{in} + padding[4] + padding[5] - (\text{dilation[2]} - 1) \times
+                \text{kernel_size[2]} - 1 }{\text{stride[2]}} + 1} \right \rfloor \\
+            \end{array}
+
     Supported Platforms:
         ``Ascend`` ``GPU``
 
@@ -890,89 +951,62 @@ class Conv2dTranspose(_Conv):
     r"""
     2D transposed convolution layer.
 
-    Compute a 2D transposed convolution, which is also known as a deconvolution
-    (although it is not an actual deconvolution).
-    This module can be seen as the gradient of Conv2d with respect to its input.
+    Calculates a 2D transposed convolution, which can be regarded as Conv2d for the gradient of the input.
+    It also called deconvolution (although it is not an actual deconvolution).
 
-    `x` is typically of shape :math:`(N, C, H, W)`, where :math:`N` is batch size, :math:`C` is channel number,
-    :math:`H` is the height of the characteristic layer and :math:`W` is the width of the characteristic layer.
+    The input is typically of shape :math:`(N, C, H, W)`, where :math:`N` is batch size, :math:`C` is a number of
+    channels, :math:`H_{in}, W_{in}` are the height and width of the feature layer respectively.
 
-    The pad_mode argument effectively adds :math:`dilation * (kernel\_size - 1) - padding` amount of zero padding
-    to both sizes of the input. So that when a Conv2d and a ConvTranspose2d are initialized with same parameters,
-    they are inverses of each other in regard to the input and output shapes.
-    However, when stride > 1, Conv2d maps multiple input shapes to the same output shape.
-    ConvTranspose2d provide padding argument to  increase the calculated output shape on one or more side.
-
-    The height and width of output are defined as:
-
-    if the 'pad_mode' is set to be "pad",
-
-    .. math::
-
-        H_{out} = (H_{in} - 1) \times \text{stride[0]} - \left (\text{padding[0]} + \text{padding[1]}\right ) +
-        \text{dilation[0]} \times (\text{kernel_size[0]} - 1) + 1
-
-        W_{out} = (W_{in} - 1) \times \text{stride[1]} - \left (\text{padding[2]} + \text{padding[3]}\right ) +
-        \text{dilation[1]} \times (\text{kernel_size[1]} - 1) + 1
-
-    if the 'pad_mode' is set to be "SAME",
-
-    .. math::
-
-        H_{out} = (H_{in} + \text{stride[0]} - 1)/\text{stride[0]} \\
-        W_{out} = (W_{in} + \text{stride[1]} - 1)/\text{stride[1]}
-
-    if the 'pad_mode' is set to be "VALID",
-
-    .. math::
-
-        H_{out} = (H_{in} - 1) \times \text{stride[0]} + \text{dilation[0]} \times
-        (\text{ks_w[0]} - 1) + 1 \\
-        W_{out} = (W_{in} - 1) \times \text{stride[1]} + \text{dilation[1]} \times
-        (\text{ks_w[1]} - 1) + 1
-
-    where :math:`\text{kernel_size[0]}` is the height of the convolution kernel and :math:`\text{kernel_size[1]}`
-    is the width of the convolution kernel.
+    When Conv2d and Conv2dTranspose are initialized with the same parameters, and `pad_mode` is set to 'pad',
+    :math:`dilation * (kernel\_size - 1) - padding` amount of zero will be paded to the height and width
+    directions of the input, they are inverses of each other in regard to the input and output shapes in this case.
+    However, when `stride` > 1, Conv2d maps multiple input shapes to the same output shape. Deconvolutional network
+    can refer to `Deconvolutional Networks <https://www.matthewzeiler.com/matzeiler/deconvolutionalnetworks.pdf>`_.
 
     Args:
-        in_channels (int): The number of channels in the input space.
-        out_channels (int): The number of channels in the output space.
-        kernel_size (Union[int, tuple]): int or a tuple of 2 integers, which specifies the height
-            and width of the 2D convolution window. Single int means the value is for both the height and the width of
-            the kernel. A tuple of 2 ints means the first value is for the height and the other is for the
-            width of the kernel.
-        stride (Union[int, tuple[int]]): The distance of kernel moving, an int number that represents
-            the height and width of movement are both strides, or a tuple of two int numbers that
-            represent height and width of movement respectively. Its value must be equal to or greater than 1.
+        in_channels (int): The channel number of the input tensor of the Conv2dTranspose layer.
+        out_channels (int): The channel number of the output tensor of the Conv2dTranspose layer.
+        kernel_size (Union[int, tuple]): Specifies the height and width of the 2D convolution kernel.
+            The data type is an integer or a tuple of two integers. An integer represents the height
+            and width of the convolution kernel. A tuple of two integers represents the height
+            and width of the convolution kernel respectively.
+        stride (Union[int, tuple[int]]): The movement stride of the 2D convolution kernel.
+            The data type is an integer or a tuple of two integers. An integer represents the movement step size
+            in both height and width directions. A tuple of two integers represents the movement step size in the height
+            and width directions respectively. Default: 1.
+        pad_mode (str): Specifies padding mode. The optional values are
+            "same", "valid", "pad". Default: "same".
+
+            - same: The width of the output is the same as the value of the input divided by `stride`.
+              If this mode is set, the value of `padding` must be 0.
+
+            - valid: Returns a valid calculated output without padding. Excess pixels that do not satisfy the
+              calculation will be discarded. If this mode is set, the value of `padding` must be 0.
+
+            - pad: Pads the input. Padding `padding` size of zero on both sides of the input.
+              If this mode is set, the value of `padding` must be greater than or equal to 0.
+
+        padding (Union[int, tuple[int]]): The number of padding on the height and width directions of the input.
+            The data type is an integer or a tuple of four integers. If `padding` is an integer,
+            then the top, bottom, left, and right padding are all equal to `padding`.
+            If `padding` is a tuple of 4 integers, then the top, bottom, left, and right padding
+            is equal to `padding[0]`, `padding[1]`, `padding[2]`, and `padding[3]` respectively.
+            The value should be greater than or equal to 0. Default: 0.
+        dilation (Union[int, tuple[int]]): Dilation size of 2D convolution kernel.
+            The data type is an integer or a tuple of two integers. If :math:`k > 1`, the kernel is sampled
+            every `k` elements. The value of `k` on the height and width directions is in range of [1, H]
+            and [1, W] respectively. Default: 1.
+        group (int): Splits filter into groups, `in_channels` and `out_channels` must be divisible by `group`.
             Default: 1.
-        pad_mode (str): Select the mode of the pad. The optional values are
-            "pad", "same", "valid". Default: "same".
-
-            - pad: Implicit paddings on both sides of the input `x`.
-
-            - same: Adopted the way of completion.
-
-            - valid: Adopted the way of discarding.
-        padding (Union[int, tuple[int]]): Implicit paddings on both sides of the input `x`. If `padding` is one integer,
-                    the paddings of top, bottom, left and right are the same, equal to padding. If `padding` is a tuple
-                    with four integers, the paddings of top, bottom, left and right will be equal to padding[0],
-                    padding[1], padding[2], and padding[3] accordingly. Default: 0.
-        dilation (Union[int, tuple[int]]): The data type is int or a tuple of 2 integers. Specifies the dilation rate
-                                      to use for dilated convolution. If set to be :math:`k > 1`, there will
-                                      be :math:`k - 1` pixels skipped for each sampling location. Its value must
-                                      be greater than or equal to 1 and bounded by the height and width of the
-                                      input `x`. Default: 1.
-        group (int): Splits filter into groups, `in_channels` and `out_channels` must be
-            divisible by the number of groups. This does not support for Davinci devices when group > 1. Default: 1.
-        has_bias (bool): Specifies whether the layer uses a bias vector. Default: False.
-        weight_init (Union[Tensor, str, Initializer, numbers.Number]): Initializer for the convolution kernel.
-            It can be a Tensor, a string, an Initializer or a number. When a string is specified,
+        has_bias (bool): Whether the Conv2dTranspose layer has a bias parameter. Default: False.
+        weight_init (Union[Tensor, str, Initializer, numbers.Number]): Initialization method of weight parameter.
+            It can be a Tensor, a string, an Initializer or a numbers.Number. When a string is specified,
             values from 'TruncatedNormal', 'Normal', 'Uniform', 'HeUniform' and 'XavierUniform' distributions as well
             as constant 'One' and 'Zero' distributions are possible. Alias 'xavier_uniform', 'he_uniform', 'ones'
             and 'zeros' are acceptable. Uppercase and lowercase are both acceptable. Refer to the values of
             Initializer for more details. Default: 'normal'.
-        bias_init (Union[Tensor, str, Initializer, numbers.Number]): Initializer for the bias vector. Possible
-            Initializer and string are the same as 'weight_init'. Refer to the values of
+        bias_init (Union[Tensor, str, Initializer, numbers.Number]): Initialization method of bias parameter.
+            Available initialization methods are the same as 'weight_init'. Refer to the values of
             Initializer for more details. Default: 'zeros'.
 
     Inputs:
@@ -981,6 +1015,34 @@ class Conv2dTranspose(_Conv):
     Outputs:
         Tensor of shape :math:`(N, C_{out}, H_{out}, W_{out})`.
 
+        pad_mode is 'same':
+
+        .. math::
+            \begin{array}{ll} \\
+                H_{out} ＝ \left \lfloor{\frac{H_{in}}{\text{stride[0]}} + 1} \right \rfloor \\
+                W_{out} ＝ \left \lfloor{\frac{W_{in}}{\text{stride[1]}} + 1} \right \rfloor \\
+            \end{array}
+
+        pad_mode is 'valid':
+
+        .. math::
+            \begin{array}{ll} \\
+                H_{out} ＝ \left \lfloor{\frac{H_{in} - \text{dilation[0]} \times (\text{kernel_size[0]} - 1) }
+                {\text{stride[0]}} + 1} \right \rfloor \\
+                W_{out} ＝ \left \lfloor{\frac{W_{in} - \text{dilation[1]} \times (\text{kernel_size[1]} - 1) }
+                {\text{stride[1]}} + 1} \right \rfloor \\
+            \end{array}
+
+        pad_mode is 'pad':
+
+        .. math::
+            \begin{array}{ll} \\
+                H_{out} ＝ \left \lfloor{\frac{H_{in} + padding[0] + padding[1] - (\text{dilation[0]} - 1) \times
+                \text{kernel_size[0]} - 1 }{\text{stride[0]}} + 1} \right \rfloor \\
+                W_{out} ＝ \left \lfloor{\frac{W_{in} + padding[2] + padding[3] - (\text{dilation[1]} - 1) \times
+                \text{kernel_size[1]} - 1 }{\text{stride[1]}} + 1} \right \rfloor \\
+            \end{array}
+
     Raises:
         TypeError: If `in_channels`, `out_channels` or `group` is not an int.
         TypeError: If `kernel_size`, `stride`, `padding` or `dilation` is neither an int not a tuple.
@@ -1099,70 +1161,74 @@ class Conv1dTranspose(_Conv):
     r"""
     1D transposed convolution layer.
 
-    Compute a 1D transposed convolution, which is also known as a deconvolution
-    (although it is not an actual deconvolution).
-    This module can be seen as the gradient of Conv1d with respect to its input.
+    Calculates a 1D transposed convolution, which can be regarded as Conv1d for the gradient of the input.
+    It also called deconvolution (although it is not an actual deconvolution).
 
-    `x` is typically of shape :math:`(N, C, W)`, where :math:`N` is batch size, :math:`C` is channel number and
-    :math:`W` is the characteristic length.
+    The input is typically of shape :math:`(N, C, L)`, where :math:`N` is batch size, :math:`C` is a number of channels
+    and :math:`L_{in}` is a length of sequence.
 
-    The padding argument effectively adds :math:`dilation * (kernel\_size - 1) - padding` amount of zero padding to
-    both sizes of the input. So that when a Conv1d and a ConvTranspose1d are initialized with same parameters,
-    they are inverses of each other in regard to the input and output shapes. However, when stride > 1,
-    Conv1d maps multiple input shapes to the same output shape.
-
-    The width of output is defined as:
-
-    .. math::
-
-        W_{out} = \begin{cases}
-        (W_{in} - 1) \times \text{stride} - 2 \times \text{padding} + \text{dilation} \times
-        (\text{ks_w} - 1) + 1, & \text{if pad_mode='pad'}\\
-        (W_{in} + \text{stride} - 1)/\text{stride}, & \text{if pad_mode='same'}\\
-        (W_{in} - 1) \times \text{stride} + \text{dilation} \times
-        (\text{ks_w} - 1) + 1, & \text{if pad_mode='valid'}
-        \end{cases}
-
-    where :math:`\text{ks_w}` is the width of the convolution kernel.
+    When Conv1d and ConvTranspose1d are initialized with the same parameters, and `pad_mode` is set to 'pad',
+    :math:`dilation * (kernel\_size - 1) - padding` amount of zero will be paded to both sizes of input,
+    they are inverses of each other in regard to the input and output shapes in this case.
+    However, when `stride` > 1, Conv1d maps multiple input shapes to the same output shape. Deconvolutional network
+    can refer to `Deconvolutional Networks <https://www.matthewzeiler.com/matzeiler/deconvolutionalnetworks.pdf>`_.
 
     Args:
-        in_channels (int): The number of channels in the input space.
-        out_channels (int): The number of channels in the output space.
-        kernel_size (int): int, which specifies the width of the 1D convolution window.
-        stride (int): The distance of kernel moving, an int number that represents
-            the width of movement. Default: 1.
-        pad_mode (str): Select the mode of the pad. The optional values are
-            "pad", "same", "valid". Default: "same".
+        in_channels (int): The channel number of the input tensor of the Conv1dTranspose layer.
+        out_channels (int): The channel number of the output tensor of the Conv1dTranspose layer.
+        kernel_size (int): Specifies the width of the 1D convolution kernel.
+        stride (int): The movement stride of the 1D convolution kernel. Default: 1.
+        pad_mode (str): Specifies padding mode. The optional values are
+            "same", "valid", "pad". Default: "same".
 
-            - pad: Implicit paddings on both sides of the input `x`.
+            - same: The width of the output is the same as the value of the input divided by `stride`.
+              If this mode is set, the value of `padding` must be 0.
 
-            - same: Adopted the way of completion.
+            - valid: Returns a valid calculated output without padding. Excess pixels that do not satisfy the
+              calculation will be discarded. If this mode is set, the value of `padding` must be 0.
 
-            - valid: Adopted the way of discarding.
-        padding (int): Implicit paddings on both sides of the input `x`. Default: 0.
-        dilation (int): The data type is int. Specifies the dilation rate
-                                      to use for dilated convolution. If set to be :math:`k > 1`, there will
-                                      be :math:`k - 1` pixels skipped for each sampling location. Its value must
-                                      be greater or equal to 1 and bounded by the width of the
-                                      input `x`. Default: 1.
+            - pad: Pads the input. Padding `padding` size of zero on both sides of the input.
+              If this mode is set, the value of `padding` must be greater than or equal to 0.
+
+        padding (int): The number of padding on both sides of input.
+            The value should be greater than or equal to 0. Default: 0.
+        dilation (int): Dilation size of 1D convolution kernel. If :math:`k > 1`, the kernel is sampled
+            every `k` elements. The value of `k` is in range of [1, L]. Default: 1.
         group (int): Splits filter into groups, `in_channels` and `out_channels` must be
-            divisible by the number of groups. This is not support for Davinci devices when group > 1. Default: 1.
-        has_bias (bool): Specifies whether the layer uses a bias vector. Default: False.
-        weight_init (Union[Tensor, str, Initializer, numbers.Number]): Initializer for the convolution kernel.
+            divisible by `group`. When `group` > 1, the Ascend platform is not supported yet. Default: 1.
+        has_bias (bool): Whether the Conv1dTranspose layer has a bias parameter. Default: False.
+        weight_init (Union[Tensor, str, Initializer, numbers.Number]): Initialization method of weight parameter.
             It can be a Tensor, a string, an Initializer or a numbers.Number. When a string is specified,
             values from 'TruncatedNormal', 'Normal', 'Uniform', 'HeUniform' and 'XavierUniform' distributions as well
             as constant 'One' and 'Zero' distributions are possible. Alias 'xavier_uniform', 'he_uniform', 'ones'
             and 'zeros' are acceptable. Uppercase and lowercase are both acceptable. Refer to the values of
             Initializer for more details. Default: 'normal'.
-        bias_init (Union[Tensor, str, Initializer, numbers.Number]): Initializer for the bias vector. Possible
-            Initializer and string are the same as 'weight_init'. Refer to the values of
+        bias_init (Union[Tensor, str, Initializer, numbers.Number]): Initialization method of bias parameter.
+            Available initialization methods are the same as 'weight_init'. Refer to the values of
             Initializer for more details. Default: 'zeros'.
 
     Inputs:
-        - **x** (Tensor) - Tensor of shape :math:`(N, C_{in}, W_{in})`.
+        - **x** (Tensor) - Tensor of shape :math:`(N, C_{in}, L_{in})`.
 
     Outputs:
-        Tensor of shape :math:`(N, C_{out}, W_{out})`.
+        Tensor of shape :math:`(N, C_{out}, L_{out})`.
+
+        pad_mode is 'same':
+
+        .. math::
+            L_{out} ＝ \left \lfloor{\frac{L_{in}}{\text{stride}} + 1} \right \rfloor
+
+        pad_mode is 'valid':
+
+        .. math::
+            L_{out} ＝ \left \lfloor{\frac{L_{in} - \text{dilation} \times (\text{kernel_size} - 1) }
+            {\text{stride}} + 1} \right \rfloor
+
+        pad_mode is 'pad':
+
+        .. math::
+            L_{out} ＝ \left \lfloor{\frac{L_{in} + 2 \times padding - (\text{dilation} - 1) \times
+            \text{kernel_size} - 1 }{\text{stride}} + 1} \right \rfloor
 
     Raises:
         TypeError: If `in_channels`, `out_channels`, `kernel_size`, `stride`, `padding` or `dilation` is not an int.
diff --git a/mindspore/python/mindspore/nn/layer/quant.py b/mindspore/python/mindspore/nn/layer/quant.py
index 37f04f3cdeb..a5ee6b158ed 100644
--- a/mindspore/python/mindspore/nn/layer/quant.py
+++ b/mindspore/python/mindspore/nn/layer/quant.py
@@ -611,7 +611,7 @@ class Conv2dBnFoldQuantOneConv(Cell):
         pad_mode (str): Specifies padding mode. The optional values are "same", "valid", "pad". Default: "same".
         padding (Union[int, tuple[int]]): Implicit paddings on both sides of the `x`. Default: 0.
         dilation (Union[int, tuple[int]]): Specifies the dilation rate to use for dilated convolution. Default: 1.
-        group (int): Splits filter into groups, `in_ channels` and `out_channels` must be
+        group (int): Splits filter into groups, `in_channels` and `out_channels` must be
             divisible by the number of groups. Default: 1.
         eps (float): Parameters for Batch Normalization. Default: 1e-5.
         momentum (float): Parameters for Batch Normalization op. Default: 0.997.
@@ -849,7 +849,7 @@ class Conv2dBnFoldQuant(Cell):
         pad_mode (str): Specifies padding mode. The optional values are "same", "valid", "pad". Default: "same".
         padding (Union[int, tuple[int]]): Implicit paddings on both sides of the `x`. Default: 0.
         dilation (Union[int, tuple[int]]): Specifies the dilation rate to use for dilated convolution. Default: 1.
-        group (int): Splits filter into groups, `in_ channels` and `out_channels` must be
+        group (int): Splits filter into groups, `in_channels` and `out_channels` must be
             divisible by the number of groups. Default: 1.
         eps (float): Parameters for Batch Normalization. Default: 1e-5.
         momentum (float): Parameters for Batch Normalization op. Default: 0.997.
diff --git a/mindspore/python/mindspore/ops/operations/array_ops.py b/mindspore/python/mindspore/ops/operations/array_ops.py
index 593f60690a0..47d20bc6cdc 100755
--- a/mindspore/python/mindspore/ops/operations/array_ops.py
+++ b/mindspore/python/mindspore/ops/operations/array_ops.py
@@ -845,7 +845,17 @@ class Unique(Primitive):
 
 class Gather(Primitive):
     r"""
-    Returns the slice of the input Tensor corresponding to the elements of `input_indices` on the specified `axis`.
+    Returns the slice of the input tensor corresponding to the elements of `input_indices` on the specified `axis`.
+
+    The following figure shows the calculation process of Gather commonly:
+
+    .. image:: api_img/Gather.png
+
+    where params represents the input `input_params`, and indices represents the index to be sliced `input_indices`.
+
+    .. note::
+         The value of input_indices must be in the range of `[0, input_param.shape[axis])`, the result is undefined
+         out of range.
 
     Inputs:
         - **input_params** (Tensor) - The original Tensor. The shape of tensor is :math:`(x_1, x_2, ..., x_R)`.
@@ -853,36 +863,50 @@ class Gather(Primitive):
           Specifies the indices of elements of the original Tensor. The data type can be int32 or int64.
         - **axis** (int) - Specifies the dimension index to gather indices.
 
-    .. note::
-         The value of input_indices must be in the range of `[0, input_param.shape[axis])`, and report an error if it
-         exceeds this range.
-
     Outputs:
         Tensor, the shape of tensor is
         :math:`input\_params.shape[:axis] + input\_indices.shape + input\_params.shape[axis + 1:]`.
 
     Raises:
         TypeError: If `axis` is not an int.
-        TypeError: If `input_indices` is not an int type Tensor.
-        TypeError: If `input_indices` is not an int.
+        TypeError: If `input_params` is not a tensor.
+        TypeError: If `input_indices` is not a tensor of type int.
 
     Supported Platforms:
         ``Ascend`` ``GPU`` ``CPU``
 
     Examples:
-        >>> input_params = Tensor(np.array([[1, 2, 7, 42], [3, 4, 54, 22], [2, 2, 55, 3]]), mindspore.float32)
-        >>> input_indices = Tensor(np.array([1, 2]), mindspore.int32)
-        >>> axis = 1
-        >>> output = ops.Gather()(input_params, input_indices, axis)
-        >>> print(output)
-        [[ 2.  7.]
-         [ 4. 54.]
-         [ 2. 55.]]
+        >>> # case1: input_indices is a Tensor with shape (5, ).
+        >>> input_params = Tensor(np.array([1, 2, 3, 4, 5, 6, 7]), mindspore.float32)
+        >>> input_indices = Tensor(np.array([0, 2, 4, 2, 6]), mindspore.int32)
         >>> axis = 0
         >>> output = ops.Gather()(input_params, input_indices, axis)
         >>> print(output)
-        [[3. 4. 54. 22.]
-         [2. 2. 55.  3.]]
+        [1. 3. 5. 3. 7.]
+        >>> # case2: input_indices is a Tensor with shape (2, 2). When the input_params has one dimension, the output shape is equal to the input_indices shape.
+        >>> input_indices = Tensor(np.array([[0, 2], [2, 6]]), mindspore.int32)
+        >>> axis = 0
+        >>> output = ops.Gather()(input_params, input_indices, axis)
+        >>> print(output)
+        [[ 1. 3.]
+         [ 3. 7.]]
+        >>> # case3: input_indices is a Tensor with shape (2, ). input_params is a Tensor with shape (3, 4) and axis is 0.
+        >>> input_params = Tensor(np.array([[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12]]), mindspore.float32)
+        >>> input_indices = Tensor(np.array([0, 2]), mindspore.int32)
+        >>> axis = 0
+        >>> output = ops.Gather()(input_params, input_indices, axis)
+        >>> print(output)
+        [[1.  2.  3.  4.]
+         [9. 10. 11. 12.]]
+        >>> # case4: input_indices is a Tensor with shape (2, ). input_params is a Tensor with shape (3, 4) and axis is 1.
+        >>> input_params = Tensor(np.array([[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12]]), mindspore.float32)
+        >>> input_indices = Tensor(np.array([0, 2]), mindspore.int32)
+        >>> axis = 1
+        >>> output = ops.Gather()(input_params, input_indices, axis)
+        >>> print(output)
+        [[1.  3.]
+         [5.  7.]
+         [9. 11.]]
     """
 
     @prim_attr_register