!39748 Add Warning Clear

Merge pull request !39748 from huangxinjing/fix_warning
2022-08-08 01:08:43 +00:00 · 2022-08-08 01:08:43 +00:00 · 23835793b3
parent 070acb8f7d 7bd91c0351
commit 23835793b3
15 changed files with 249 additions and 140 deletions
--- a/docs/api/api_python/mindspore.communication.rst
+++ b/docs/api/api_python/mindspore.communication.rst
@ -21,21 +21,24 @@ mindspore.communication
    初始化通信服务需要的分布式后端，例如 `HCCL` 或 `NCCL` 服务。

    .. note::
-        HCCL的全称是华为集合通信库（Huawei Collective Communication Library），NCCL的全称是英伟达集合通信库（NVIDIA Collective Communication Library）。 `init` 方法应该在 `set_context` 方法之后使用。
+        - HCCL的全称是华为集合通信库（Huawei Collective Communication Library），NCCL的全称是英伟达集合通信库（NVIDIA Collective Communication Library）。
+        - `init` 方法应该在 `set_context` 方法之后使用。
+        - 在运行以下示例之前，用户需要预设通信环境变量，请查看mindspore.communication的文档注释。

    参数：
        - **backend_name** (str) - 分布式后端的名称，可选HCCL或NCCL。如果未设置则根据硬件平台类型（device_target）进行推断，默认值为None。

    异常：
        - **TypeError** - 参数 `backend_name` 不是字符串。
-        - **RuntimeError** - 1）硬件设备类型无效；2）后台服务无效；3）分布式计算初始化失败；4）未设置环境变量 `RANK_ID` 或 `MINDSPORE_HCCL_CONFIG_PATH` 的情况下初始化HCCL服务。
+        - **RuntimeError** - 1）硬件设备类型无效；2）后台服务无效；3）分布式计算初始化失败；4）后端是HCCL的情况下，未设置环境变量 `RANK_ID` 或 `MINDSPORE_HCCL_CONFIG_PATH` 的情况下初始化HCCL服务。

 .. py:function:: mindspore.communication.release()

    释放分布式资源，例如 `HCCL` 或 `NCCL` 服务。

    .. note::
-        `release` 方法应该在 `init` 方法之后使用。在运行以下示例之前，用户需要预设通信环境变量，请查看mindspore.communication的文档注释。
+        - `release` 方法应该在 `init` 方法之后使用。
+        - 在运行以下示例之前，用户需要预设通信环境变量，请查看mindspore.communication的文档注释。

    异常：
        - **RuntimeError** - 在释放分布式资源失败时抛出。
@ -45,7 +48,8 @@ mindspore.communication
    在指定通信组中获取当前的设备序号。

    .. note::
-        `get_rank` 方法应该在 `init` 方法之后使用。在运行以下示例之前，用户需要预设通信环境变量，请查看mindspore.communication的文档注释。
+        - `get_rank` 方法应该在 `init` 方法之后使用。
+        - 在运行以下示例之前，用户需要预设通信环境变量，请查看mindspore.communication的文档注释。

    参数：
        - **group** (str) - 通信组名称，通常由 `create_group` 方法创建，否则将使用默认组。默认值： `GlobalComm.WORLD_COMM_GROUP` 。
@ -62,7 +66,9 @@ mindspore.communication

    获取指定通信组实例的rank_size。

-    .. note:: `get_group_size` 方法应该在 `init` 方法之后使用。在跑用例之前用户需要预先配置通信相关的环境变量。
+    .. note::
+        - `get_group_size` 方法应该在 `init` 方法之后使用。
+        - 在运行以下示例之前，用户需要预设通信环境变量，请查看mindspore.communication的文档注释。

    参数：
        - **group** (str) - 指定工作组实例（由 create_group 方法创建）的名称，支持数据类型为str，默认值为 `WORLD_COMM_GROUP` 。
@ -80,9 +86,10 @@ mindspore.communication
    由指定通信组中的设备序号获取通信集群中的全局设备序号。

    .. note::
-        - GPU 版本的MindSpore不支持此方法；
-        - 参数 `group` 不能是 `hccl_world_group`；
-        - `get_world_rank_from_group_rank` 方法应该在 `init` 方法之后使用。在运行以下示例之前，用户需要预设通信环境变量，请查看mindspore.communication的文档注释。
+        - GPU 版本的MindSpore不支持此方法。
+        - 参数 `group` 不能是 `hccl_world_group`。
+        - `get_world_rank_from_group_rank` 方法应该在 `init` 方法之后使用。
+        - 在运行以下示例之前，用户需要预设通信环境变量，请查看mindspore.communication的文档注释。

    参数：
        - **group** (str) - 传入的通信组名称，通常由 `create_group` 方法创建。
@ -94,16 +101,17 @@ mindspore.communication
    异常：
        - **TypeError** - 参数 `group` 不是字符串或参数 `group_rank_id` 不是数字。
        - **ValueError** - 参数 `group` 是 `hccl_world_group` 或后台不可用。
-        - **RuntimeError** - `HCCL` 或 `NCCL` 服务不可用，以及使用CPU版本的MindSpore。
+        - **RuntimeError** - `HCCL` 服务不可用时，或者使用了GPU版本的MindSpore。

 .. py:function:: mindspore.communication.get_group_rank_from_world_rank(world_rank_id, group)

    由通信集群中的全局设备序号获取指定用户通信组中的rank ID。

    .. note::
-        - GPU 版本的MindSpore不支持此方法；
-        - 参数 `group` 不能是 `hccl_world_group`；
+        - GPU 版本的MindSpore不支持此方法。
+        - 参数 `group` 不能是 `hccl_world_group`。
        - `get_group_rank_from_world_rank` 方法应该在 `init` 方法之后使用。
+        - 在运行以下示例之前，用户需要预设通信环境变量，请查看mindspore.communication的文档注释。

    参数：
        - **world_rank_id** (`int`) - 通信集群内的全局rank ID。
@ -115,18 +123,19 @@ mindspore.communication
    异常：
        - **TypeError** - 在参数 `group_rank_id` 不是数字或参数 `group` 不是字符串时抛出。
        - **ValueError** - 在参数 `group` 是 `hccl_world_group` 或后台不可用时抛出。
-        - **RuntimeError** - 在 `HCCL` 或 `NCCL` 服务不可用，以及使用GPU版本的MindSpore时抛出。
+        - **RuntimeError** - `HCCL` 服务不可用时，或者使用了GPU版本的MindSpore。

 .. py:function:: mindspore.communication.create_group(group, rank_ids)

    创建用户自定义的通信组实例。

    .. note::
-        - GPU 版本的MindSpore不支持此方法；
-        - 列表rank_ids的长度应大于1；
-        - 列表rank_ids内不能有重复数据；
-        - `create_group` 方法应该在 `init` 方法之后使用；
+        - GPU 版本的MindSpore不支持此方法。
+        - 列表rank_ids的长度应大于1。
+        - 列表rank_ids内不能有重复数据。
+        - `create_group` 方法应该在 `init` 方法之后使用。
        - PyNative模式下仅支持全局单通信组。
+        - 在运行以下示例之前，用户需要预设通信环境变量，请查看mindspore.communication的文档注释。

    参数：
        - **group** (str) - 输入用户自定义的通信组实例名称，支持数据类型为str。
@ -135,15 +144,16 @@ mindspore.communication
    异常：
        - **TypeError** - 参数 `group_rank_id` 不是数字或参数 `group` 不是字符串。
        - **ValueError** - 列表rank_ids的长度小于1，或列表rank_ids内有重复数据，以及后台无效。
-        - **RuntimeError** - 在 `HCCL` 或 `NCCL` 服务不可用，以及使用CPU版本的MindSpore。
+        - **RuntimeError** - `HCCL` 服务不可用时，或者使用了GPU版本的MindSpore。

 .. py:function:: mindspore.communication.get_local_rank(group=GlobalComm.WORLD_COMM_GROUP)

    获取指定通信组中当前设备的本地设备序号。

    .. note::
-        - GPU 版本的MindSpore不支持此方法；
-        - `get_local_rank` 方法应该在 `init` 方法之后使用。在运行以下示例之前，用户需要预设通信环境变量，请查看mindspore.communication的文档注释。
+        - GPU 版本的MindSpore不支持此方法。
+        - `get_local_rank` 方法应该在 `init` 方法之后使用。
+        - 在运行以下示例之前，用户需要预设通信环境变量，请查看mindspore.communication的文档注释。

    参数：
        - **group** (`str`) - 通信组名称，通常由 `create_group` 方法创建，否则将使用默认组名称。默认值： `WORLD_COMM_GROUP` 。
@ -154,15 +164,16 @@ mindspore.communication
    异常：
        - **TypeError** - 在参数 `group` 不是字符串时抛出。
        - **ValueError** - 在后台不可用时抛出。
-        - **RuntimeError** - 在 `HCCL` 或 `NCCL` 服务不可用时抛出。
+        - **RuntimeError** - `HCCL` 服务不可用时，或者使用了GPU版本的MindSpore。

 .. py:function:: mindspore.communication.get_local_rank_size(group=GlobalComm.WORLD_COMM_GROUP)

    获取指定通信组的本地设备总数。

    .. note::
-        - GPU 版本的MindSpore不支持此方法；
+        - GPU 版本的MindSpore不支持此方法。
        - `get_local_rank_size` 方法应该在 `init` 方法之后使用。
+        - 在运行以下示例之前，用户需要预设通信环境变量，请查看mindspore.communication的文档注释。

    参数：
        - **group** (str) - 传入的通信组名称，通常由 `create_group` 方法创建，或默认使用 `WORLD_COMM_GROUP` 。
@ -173,15 +184,15 @@ mindspore.communication
    异常：
        - **TypeError** - 在参数 `group` 不是字符串时抛出。
        - **ValueError** - 在后台不可用时抛出。
-        - **RuntimeError** - 在 `HCCL` 或 `NCCL` 服务不可用时抛出。
+        - **RuntimeError** - `HCCL` 服务不可用时，或者使用了GPU版本的MindSpore。

 .. py:function:: mindspore.communication.destroy_group(group)

    注销用户通信组。

    .. note::
-        - GPU 版本的MindSpore不支持此方法；
-        - 参数 `group` 不能是 `hccl_world_group`；
+        - GPU 版本的MindSpore不支持此方法。
+        - 参数 `group` 不能是 `hccl_world_group`。
        - `destroy_group` 方法应该在 `init` 方法之后使用。

    参数：
@ -190,7 +201,7 @@ mindspore.communication
    异常：
        - **TypeError** - 在参数 `group` 不是字符串时抛出。
        - **ValueError** - 在参数 `group` 是 `hccl_world_group` 或后台不可用时抛出。
-        - **RuntimeError** - 在 `HCCL` 或 `NCCL` 服务不可用时抛出。
+        - **RuntimeError** - `HCCL` 服务不可用时，或者使用了GPU版本的MindSpore。

 .. py:data:: mindspore.communication.HCCL_WORLD_COMM_GROUP

--- a/mindspore/ccsrc/frontend/parallel/auto_parallel/operator_costmodel.cc
+++ b/mindspore/ccsrc/frontend/parallel/auto_parallel/operator_costmodel.cc
@ -116,8 +116,9 @@ double MatMulCost::GetBackwardCommCost(const std::vector<TensorInfo> &inputs, co
      used_device_num *= input1_shape[i] / input1_slice_shape[i];
    }

-    if (total_device_num != LongToSize(used_device_num))
+    if (total_device_num != LongToSize(used_device_num)) {
      result += ListProduct(input1_slice_shape) * static_cast<double>(inputs_type_lengths_[1]);
+    }
  }

  return result;
@ -161,8 +162,9 @@ double MatMulCost::GetBackwardComputationCost(const std::vector<TensorInfo> &inp
      used_device_num *= input1_shape[i] / input1_slice_shape[i];
    }

-    if (total_device_num != LongToSize(used_device_num))
+    if (total_device_num != LongToSize(used_device_num)) {
      result += ListProduct(input1_slice_shape) * static_cast<double>(inputs_type_lengths_[1]);
+    }
  }

  return result;
@ -197,7 +199,7 @@ void MatMulCost::CalculateInputsInMemory(const std::map<size_t, bool> &prev_outp
 }

 // return the per device communication cost in the forward phase.
-double BatchNormCost::GetForwardCommCost(const std::vector<TensorInfo> &inputs, const std::vector<TensorInfo> &outputs,
+double BatchNormCost::GetForwardCommCost(const std::vector<TensorInfo> &inputs, const std::vector<TensorInfo> &,
                                         int64_t) const {
  TensorInfo input0 = inputs[0];
  Shape input0_shape = input0.shape();
@ -258,8 +260,8 @@ double BatchNormCost::GetForwardComputationCost(const std::vector<TensorInfo> &i

 // Return the per device computation cost in the forward phase. The cost is calculated according to the bytes
 // this operator uses
-double BatchNormCost::GetBackwardComputationCost(const std::vector<TensorInfo> &inputs, const std::vector<TensorInfo> &,
-                                                 int64_t stage_id) const {
+double BatchNormCost::GetBackwardComputationCost(const std::vector<TensorInfo> &, const std::vector<TensorInfo> &,
+                                                 int64_t) const {
  return 0.0;
 }

@ -831,8 +833,9 @@ double SubCost::GetBackwardComputationCost(const std::vector<TensorInfo> &inputs
      used_device_num *= input_a_shape[i] / input_a_slice_shape[i];
    }

-    if (total_device_num != LongToSize(used_device_num))
+    if (total_device_num != LongToSize(used_device_num)) {
      result += ListProduct(input_a_slice_shape) * static_cast<double>(inputs_type_lengths_[0]);
+    }
  }

  if (is_parameter_[1]) {
@ -844,8 +847,9 @@ double SubCost::GetBackwardComputationCost(const std::vector<TensorInfo> &inputs
      used_device_num *= input_b_shape[i] / input_b_slice_shape[i];
    }

-    if (total_device_num != LongToSize(used_device_num))
+    if (total_device_num != LongToSize(used_device_num)) {
      result += ListProduct(input_b_slice_shape) * static_cast<double>(inputs_type_lengths_[1]);
+    }
  }
  return result;
 }
@ -866,8 +870,9 @@ double SubCost::GetBackwardCommCost(const std::vector<TensorInfo> &inputs, const
      used_device_num *= input_a_shape[i] / input_a_slice_shape[i];
    }

-    if (total_device_num != LongToSize(used_device_num))
+    if (total_device_num != LongToSize(used_device_num)) {
      result += ListProduct(input_a_slice_shape) * static_cast<double>(inputs_type_lengths_[0]);
+    }
  }

  if (is_parameter_[1]) {
@ -879,8 +884,9 @@ double SubCost::GetBackwardCommCost(const std::vector<TensorInfo> &inputs, const
      used_device_num *= input_b_shape[i] / input_b_slice_shape[i];
    }

-    if (total_device_num != LongToSize(used_device_num))
+    if (total_device_num != LongToSize(used_device_num)) {
      result += ListProduct(input_b_slice_shape) * static_cast<double>(inputs_type_lengths_[1]);
+    }
  }

  return result;
@ -1205,8 +1211,9 @@ double ReduceSumCost::GetBackwardCommCost(const std::vector<TensorInfo> &inputs,
      used_device_num *= input_shape[i] / input_slice_shape[i];
    }

-    if (total_device_num != LongToSize(used_device_num))
+    if (total_device_num != LongToSize(used_device_num)) {
      result += ListProduct(input_slice_shape) * static_cast<double>(inputs_type_lengths_[0]);
+    }
  }

  return result;
@ -1432,7 +1439,7 @@ double DSDMatmulCost::GetForwardComputationCost(const std::vector<TensorInfo> &i

 void DSDMatmulCost::CalculateOutputInMemory() {
  is_output_should_in_memory_ =
-    (std::find(is_parameter_involve_.begin(), is_parameter_involve_.end(), true) != is_parameter_involve_.end());
+    (std::find(is_parameter_involve_.cbegin(), is_parameter_involve_.cend(), true) != is_parameter_involve_.cend());
 }

 void DSDMatmulCost::CalculateInputsInMemory(const std::map<size_t, bool> &) {
@ -1867,7 +1874,7 @@ double MatmulDDSCost::GetForwardComputationCost(const std::vector<TensorInfo> &i
 // Not taking account of output
 void MatmulDDSCost::CalculateOutputInMemory() {
  is_output_should_in_memory_ =
-    (std::find(is_parameter_involve_.begin(), is_parameter_involve_.end(), true) != is_parameter_involve_.end());
+    (std::find(is_parameter_involve_.cbegin(), is_parameter_involve_.cend(), true) != is_parameter_involve_.cend());
 }

 // Taking account of input
--- a/mindspore/ccsrc/frontend/parallel/auto_parallel/operator_costmodel.h
+++ b/mindspore/ccsrc/frontend/parallel/auto_parallel/operator_costmodel.h
@ -24,13 +24,13 @@

 namespace mindspore {
 namespace parallel {
-#define MAXIMUM_INPUT_NUMBER 100
-#define DEFAULT_DATA_TYPE_LENGTH 4
-#define DROPOUT_COST_RATE 1.125  // the DropoutGenMask need 12.5% memory
-#define GATHERV2_COST_WEIGHT0 3
-#define GATHERV2_COST_WEIGHT1 7
-#define GATHERV2_COST_WEIGHT2 2
-#define GATHERV2_COST_WEIGHT3 6
+constexpr size_t MAXIMUM_INPUT_NUMBER = 100;
+constexpr size_t DEFAULT_DATA_TYPE_LENGTH = 4;
+constexpr double DROPOUT_COST_RATE = 1.125;  // the DropoutGenMask need 12.5% memory
+constexpr size_t GATHERV2_COST_WEIGHT0 = 3;
+constexpr size_t GATHERV2_COST_WEIGHT1 = 7;
+constexpr size_t GATHERV2_COST_WEIGHT2 = 2;
+constexpr size_t GATHERV2_COST_WEIGHT3 = 6;

 class OperatorCost;
 using OperatorCostPtr = std::shared_ptr<OperatorCost>;
@ -92,7 +92,7 @@ class OperatorCost {
  // Contributing the output part for 'GetMemoryCost'
  double GetOutputMemoryCost(const std::vector<TensorInfo> &inputs, const std::vector<TensorInfo> &outputs) const;
  // per device memory cost in a inference phase
-  double GetMemoryCostForInference(const std::vector<TensorInfo> &, const std::vector<TensorInfo> &) const;
+  double GetMemoryCostForInference(const std::vector<TensorInfo> &, const std::vector<TensorInfo> &outputs) const;

 protected:
  // For each input in 'inputs_', a bool variable is true if the corresponding one is a parameter or a output of
@ -153,7 +153,7 @@ class BatchNormCost : public OperatorCost {
                     int64_t stage_id) const override {
    return GetForwardCommCost(inputs, outputs, stage_id) + GetBackwardCommCost(inputs, outputs, stage_id);
  }
-  double GetForwardCommCost(const std::vector<TensorInfo> &inputs, const std::vector<TensorInfo> &outputs,
+  double GetForwardCommCost(const std::vector<TensorInfo> &inputs, const std::vector<TensorInfo> &,
                            int64_t stage_id) const override;
  double GetBackwardCommCost(const std::vector<TensorInfo> &inputs, const std::vector<TensorInfo> &outputs,
                             int64_t stage_id) const override;
@ -165,8 +165,8 @@ class BatchNormCost : public OperatorCost {
  }
  double GetForwardComputationCost(const std::vector<TensorInfo> &inputs, const std::vector<TensorInfo> &outputs,
                                   int64_t stage_id) const override;
-  double GetBackwardComputationCost(const std::vector<TensorInfo> &inputs, const std::vector<TensorInfo> &outputs,
-                                    int64_t stage_id) const override;
+  double GetBackwardComputationCost(const std::vector<TensorInfo> &, const std::vector<TensorInfo> &,
+                                    int64_t) const override;
  void CalculateOutputInMemory() override;
  void CalculateInputsInMemory(const std::map<size_t, bool> &prev_output_in_mem) override;
 };
@ -399,7 +399,8 @@ class BatchParallelCost : public OperatorCost {
  double GetForwardCommCost(const std::vector<TensorInfo> &, const std::vector<TensorInfo> &, int64_t) const override {
    return 0.0;
  }
-  double GetBackwardCommCost(const std::vector<TensorInfo> &, const std::vector<TensorInfo> &, int64_t) const override;
+  double GetBackwardCommCost(const std::vector<TensorInfo> &, const std::vector<TensorInfo> &,
+                             int64_t stage_id) const override;
  double GetComputationCost(const std::vector<TensorInfo> &inputs, const std::vector<TensorInfo> &outputs,
                            int64_t stage_id) const override {
    return GetForwardComputationCost(inputs, outputs, stage_id) + GetBackwardComputationCost(inputs, outputs, stage_id);
@ -629,7 +630,8 @@ class SubCost : public OperatorCost {
  double GetForwardCommCost(const std::vector<TensorInfo> &, const std::vector<TensorInfo> &, int64_t) const override {
    return 0.0;
  }
-  double GetBackwardCommCost(const std::vector<TensorInfo> &, const std::vector<TensorInfo> &, int64_t) const override;
+  double GetBackwardCommCost(const std::vector<TensorInfo> &, const std::vector<TensorInfo> &,
+                             int64_t stage_id) const override;

  double GetComputationCost(const std::vector<TensorInfo> &inputs, const std::vector<TensorInfo> &outputs,
                            int64_t stage_id) const override {
@ -852,10 +854,12 @@ class GetNextCost : public OperatorCost {
                     int64_t stage_id) const override {
    return GetForwardCommCost(inputs, outputs, stage_id) + GetBackwardCommCost(inputs, outputs, stage_id);
  }
-  double GetForwardCommCost(const std::vector<TensorInfo> &, const std::vector<TensorInfo> &, int64_t) const override {
+  double GetForwardCommCost(const std::vector<TensorInfo> &, const std::vector<TensorInfo> &,
+                            int64_t stage_id) const override {
    return 0.0;
  }
-  double GetBackwardCommCost(const std::vector<TensorInfo> &, const std::vector<TensorInfo> &, int64_t) const override {
+  double GetBackwardCommCost(const std::vector<TensorInfo> &, const std::vector<TensorInfo> &,
+                             int64_t stage_id) const override {
    return 0.0;
  }
  double GetComputationCost(const std::vector<TensorInfo> &inputs, const std::vector<TensorInfo> &outputs,
@ -1056,7 +1060,7 @@ class UniqueCost : public OperatorCost {
  double GetForwardComputationCost(const std::vector<TensorInfo> &inputs, const std::vector<TensorInfo> &outputs,
                                   int64_t stage_id) const override;
  double GetBackwardComputationCost(const std::vector<TensorInfo> &inputs, const std::vector<TensorInfo> &outputs,
-                                    int64_t) const override;
+                                    int64_t stage_id) const override;
  // Taking account of output
  void CalculateOutputInMemory() override;
  // Not Taking account of input
--- a/mindspore/ccsrc/frontend/parallel/auto_parallel/rec_core/rec_cost.cc
+++ b/mindspore/ccsrc/frontend/parallel/auto_parallel/rec_core/rec_cost.cc
@ -180,8 +180,8 @@ double CostMatMul::GetMaxCostIn(const OperatorRec &op) {
 }

 // Chose strategy for MatMul
-StrategyRec CostMatMul::ChoseStr(const std::vector<double> &cost_op, StrategyRec str) {
-  uint64_t min_position = min_element(cost_op.begin(), cost_op.end()) - cost_op.begin();
+StrategyRec CostMatMul::ChoseStr(const std::vector<double> &cost_op, StrategyRec str) const {
+  uint64_t min_position = LongToUlong(min_element(cost_op.begin(), cost_op.end()) - cost_op.begin());
  if (cost_op[min_position] > (DOUBLE_MAX - 0.1)) {
    return str;
  }
@ -318,8 +318,8 @@ double CostConvolution::GetMinCostIn(const Graph::NodeType &node) {
 }

 // Chose strategy for Conv
-StrategyRec CostConvolution::ChoseStr(const std::vector<double> &cost_op, StrategyRec str) {
-  uint64_t min_position = min_element(cost_op.begin(), cost_op.end()) - cost_op.begin();
+StrategyRec CostConvolution::ChoseStr(const std::vector<double> &cost_op, StrategyRec str) const {
+  uint64_t min_position = LongToUlong(min_element(cost_op.begin(), cost_op.end()) - cost_op.begin());
  if (cost_op[min_position] > (DOUBLE_MAX - 0.1)) {
    return str;
  }
@ -381,7 +381,7 @@ StrategyRec CostConvolution::ChoseStr(const std::vector<double> &cost_op, Strate
 // Get optimal strategy for Pooling
 StrategyRec CostPooling::GetOptimalStr(const Graph::NodeType &node,
                                       const std::vector<std::pair<std::string, StrategyRec>> &node_name_to_strategy,
-                                       const Graph &graph) {
+                                       const Graph &graph) const {
  int64_t tensor_n = static_cast<int64_t>(node.tensor_parm.tensor_shape.shape_n * node.tensor_parm.tensor_str.str_n);
  int64_t tensor_c = static_cast<int64_t>(node.tensor_parm.tensor_shape.shape_c * node.tensor_parm.tensor_str.str_c);

@ -408,8 +408,8 @@ StrategyRec CostPooling::GetOptimalStr(const Graph::NodeType &node,
 }

 // Chose strategy for Pooling
-StrategyRec CostPooling::ChoseStr(const std::vector<double> &cost_op, StrategyRec str) {
-  uint64_t min_position = min_element(cost_op.begin(), cost_op.end()) - cost_op.begin();
+StrategyRec CostPooling::ChoseStr(const std::vector<double> &cost_op, StrategyRec str) const {
+  uint64_t min_position = LongToUlong(min_element(cost_op.begin(), cost_op.end()) - cost_op.begin());
  if (cost_op[min_position] > (DOUBLE_MAX - 0.1)) {
    return str;
  }
@ -451,7 +451,7 @@ StrategyRec CostPooling::ChoseStr(const std::vector<double> &cost_op, StrategyRe

 // Chose strategy for Add
 StrategyRec CostTensorAdd::ChoseStr(const std::vector<double> &cost_op, StrategyRec str) {
-  uint64_t min_position = min_element(cost_op.begin(), cost_op.end()) - cost_op.begin();
+  uint64_t min_position = LongToUlong(min_element(cost_op.begin(), cost_op.end()) - cost_op.begin());
  if (cost_op[min_position] > (DOUBLE_MAX - 0.1)) {
    return str;
  }
@ -502,7 +502,7 @@ StrategyRec CostReshape::ChoseStr(StrategyRec str) const { return str; }

 // Chose strategy for BiasAdd
 StrategyRec CostBiasAdd::ChoseStr(const std::vector<double> &cost_op, StrategyRec str) {
-  uint64_t min_position = min_element(cost_op.begin(), cost_op.end()) - cost_op.begin();
+  uint64_t min_position = LongToUlong(min_element(cost_op.begin(), cost_op.end()) - cost_op.begin());
  if (cost_op[min_position] > (DOUBLE_MAX - 0.1)) {
    return str;
  }
@ -588,7 +588,7 @@ StrategyRec CostCommon::GetOptimalStr(const Graph::NodeType &node,

 // Chose strategy for Common op
 StrategyRec CostCommon::ChoseStr(const std::vector<double> &cost_op, StrategyRec str) {
-  uint64_t min_position = min_element(cost_op.begin(), cost_op.end()) - cost_op.begin();
+  uint64_t min_position = LongToUlong(min_element(cost_op.begin(), cost_op.end()) - cost_op.begin());
  if (cost_op[min_position] > (DOUBLE_MAX - 0.1)) {
    return str;
  }
@ -667,7 +667,7 @@ StrategyRec CostBatchParallel::GetOptimalStr(const Graph::NodeType &node) {

 // Chose strategy for BatchParallel op
 StrategyRec CostBatchParallel::ChoseStr(const std::vector<double> &cost_op, StrategyRec str) {
-  uint64_t min_position = min_element(cost_op.begin(), cost_op.end()) - cost_op.begin();
+  uint64_t min_position = LongToUlong(min_element(cost_op.begin(), cost_op.end()) - cost_op.begin());
  if (cost_op[min_position] > (DOUBLE_MAX - 0.1)) {
    return str;
  }
@ -709,7 +709,7 @@ StrategyRec CostBatchParallel::ChoseStr(const std::vector<double> &cost_op, Stra

 // Chose strategy for CostSoftmaxCrossEntropyWithLogits
 StrategyRec CostSoftmaxCrossEntropyWithLogits::ChoseStr(const std::vector<double> &cost_op, StrategyRec str) {
-  uint64_t min_position = min_element(cost_op.begin(), cost_op.end()) - cost_op.begin();
+  uint64_t min_position = LongToUlong(min_element(cost_op.begin(), cost_op.end()) - cost_op.begin());
  if (cost_op[min_position] > (DOUBLE_MAX - 0.1)) {
    return str;
  }
--- a/mindspore/ccsrc/frontend/parallel/auto_parallel/rec_core/rec_cost.h
+++ b/mindspore/ccsrc/frontend/parallel/auto_parallel/rec_core/rec_cost.h
@ -25,12 +25,13 @@

 #include "frontend/parallel/auto_parallel/rec_core/rec_graph.h"
 #include "frontend/parallel/auto_parallel/rec_core/rec_strategy.h"
+#include "utils/check_convert_utils.h"

 namespace mindspore {
 namespace parallel {
 #define DOUBLE_MAX (std::numeric_limits<double>::max)()
-#define MATMUL_MEM_COEF 0.25
-#define REDIS_COEF 16
+constexpr double MATMUL_MEM_COEF = 0.25;
+constexpr size_t REDIS_COEF = 16;

 double CostRedis(const Graph::NodeType &node,
                 const std::vector<std::pair<std::string, StrategyRec>> &node_name_to_strategy,
@ -69,7 +70,7 @@ class CostMatMul {
    return cost_in_k_;
  }

-  StrategyRec ChoseStr(const std::vector<double> &cost_op, StrategyRec str);
+  StrategyRec ChoseStr(const std::vector<double> &cost_op, StrategyRec str) const;

  double cost_in_i_ = 0;

@ -130,7 +131,7 @@ class CostConvolution {
    return cost_in_q_;
  }

-  StrategyRec ChoseStr(const std::vector<double> &cost_op, StrategyRec str);
+  StrategyRec ChoseStr(const std::vector<double> &cost_op, StrategyRec str) const;

  double cost_in_b_ = 0;

@ -152,12 +153,12 @@ class CostPooling {
 public:
  StrategyRec GetOptimalStr(const Graph::NodeType &node,
                            const std::vector<std::pair<std::string, StrategyRec>> &node_name_to_strategy,
-                            const Graph &graph);
+                            const Graph &graph) const;

  double GetMinCostIn() const { return cost_in_; }

 private:
-  StrategyRec ChoseStr(const std::vector<double> &cost_op, StrategyRec str);
+  StrategyRec ChoseStr(const std::vector<double> &cost_op, StrategyRec str) const;

  double cost_in_ = 0;
 };  // class CostPooling is used to compute the cost of Pooling operator.
--- a/mindspore/ccsrc/frontend/parallel/auto_parallel/rec_core/rec_generate_strategy.cc
+++ b/mindspore/ccsrc/frontend/parallel/auto_parallel/rec_core/rec_generate_strategy.cc
@ -129,7 +129,7 @@ Strategies PrepareStridedSlice(const std::vector<std::shared_ptr<OperatorInfo>>
 }

 Strategies PrepareSoftMax(const std::vector<std::shared_ptr<OperatorInfo>> &ops, const size_t iter_ops,
-                          Dimensions basic_stra) {
+                          const Dimensions &basic_stra) {
  Strategies strategies;
  strategies.push_back(basic_stra);
  std::vector<int64_t> axis_list;
@ -218,7 +218,7 @@ Strategies PrepareGatherV2(const std::vector<std::shared_ptr<OperatorInfo>> &ops
    return (output_shape[LongToSize(a + 1)] > output_shape[LongToSize(b + 1)]);
  });
  std::transform(std::begin(index), std::end(index), std::begin(index), [](int64_t x) { return x + 1; });
-  index.insert(index.begin(), 0);
+  (void)index.insert(index.cbegin(), 0);

  Dimensions strategie(output_shape.size(), 1);
  size_t num_device = g_device_manager->DeviceNum();
@ -282,7 +282,7 @@ Dimensions PrepareGatherV2OutputStrategy(const std::vector<std::shared_ptr<Opera
  std::sort(index.begin(), index.end(),
            [&output_shape](const size_t &a, const size_t &b) { return (output_shape[a + 1] > output_shape[b + 1]); });
  std::transform(std::begin(index), std::end(index), std::begin(index), [](int64_t x) { return x + 1; });
-  index.insert(index.begin(), 0);
+  (void)index.insert(index.cbegin(), 0);

  Dimensions strategie(output_shape.size(), 1);
  size_t num_device = g_device_manager->DeviceNum();
@ -450,7 +450,7 @@ Strategies MakeDataParallelStrategy(const std::shared_ptr<Graph> &graph,
  StrategyPtr origin_strategy = ops[iter_ops]->strategy();
  Strategies strategies;
  size_t max_device_num = g_device_manager->DeviceNum();
-  size_t target_tensor_batch = ops[iter_ops]->inputs_tensor_info()[0].shape()[0];
+  size_t target_tensor_batch = LongToUlong(ops[iter_ops]->inputs_tensor_info()[0].shape()[0]);
  for (size_t iter_op_inputs = 0; iter_op_inputs < ops[iter_ops]->inputs_tensor_info().size(); iter_op_inputs++) {
    if (iter_op_inputs >= origin_strategy->GetInputDim().size()) {
      MS_LOG(EXCEPTION) << "Failure: Strategy's InputDim out of range.";
@ -623,7 +623,7 @@ void ModifyParamSharingOpsStrategy(const std::vector<std::shared_ptr<OperatorInf
            str1 = str_j;
            size_t num_device_used = 1;
            for (size_t i = 0; i < str_j.size(); i++) {
-              num_device_used *= str_j[i];
+              num_device_used *= LongToSize(str_j[i]);
            }
            str2.push_back(g_device_manager->DeviceNum() / num_device_used);
            str2.push_back(1);
@ -676,14 +676,14 @@ float CheckVirtualDatasetStrategy(const std::shared_ptr<Graph> &graph, const siz

 Dimensions CopyVirtualDataset(const std::shared_ptr<Graph> &graph,
                              const std::vector<std::shared_ptr<OperatorInfo>> &ops, const size_t iter_ops,
-                              const size_t iter_graph) {
+                              const size_t iter_graph, float epsilon = 0.00005f) {
  Dimensions s;
  auto input_stra_dim = ops[iter_ops]->inputs_tensor_info()[0].shape().size();
  auto virtual_dataset_str = CheckVirtualDatasetStrategy(graph, iter_graph);
  if (input_stra_dim == 0) {
    return s;
  } else {
-    if (virtual_dataset_str == 0) {
+    if (std::fabs(virtual_dataset_str) < epsilon) {
      s.push_back(1);
    } else {
      s.push_back(FloatToLong(1 / virtual_dataset_str));
@ -805,7 +805,7 @@ Dimensions PrepareReshapeOutputStrategy(const std::vector<std::shared_ptr<Operat
      for (size_t j = LongToSize(tmp_index); j < input_shape.size(); j++) {
        tmp_prod *= strategy->GetInputDim()[0][j];
        tmp_index++;
-        if (mapping[i] == (int64_t)j) {
+        if (mapping[i] == SizeToLong(j)) {
          s.push_back(tmp_prod);
          tmp_prod = 1;
          break;
@ -839,12 +839,15 @@ Dimensions PrepareExpandDimsOutputStrategy(const std::vector<std::shared_ptr<Ope
  // The strategy of the expanded dimesion will be assigned 1, the others take the strategies of corresponding
  // dimensions.
  for (size_t i = 0; i < ops[incoming_op_index]->inputs_tensor_info()[0].shape().size() + 1; i++) {
-    if ((int64_t)i == axis_input) {
+    if (UlongToLong(i) == axis_input) {
      s.push_back(1);
      already_expand = true;
-    } else if ((int64_t)i != axis_input && !already_expand) {
+    } else if (UlongToLong(i) != axis_input && !already_expand) {
      s.push_back(strategy->GetInputDim()[0][i]);
    } else {
+      if (i < 1) {
+        MS_LOG(EXCEPTION) << "The index i -1 is less than 0. Please check the situation.";
+      }
      s.push_back(strategy->GetInputDim()[0][i - 1]);
    }
  }
@ -904,7 +907,7 @@ Dimensions PrepareIncomingOperatorInputStrategy(const std::vector<std::shared_pt
  } else if (ops[incoming_op_index]->type() == EXPAND_DIMS) {
    return PrepareExpandDimsOutputStrategy(ops, incoming_op_index);
  }
-  for (size_t i = 0; i < (size_t)ops[incoming_op_index]->inputs_tensor_info().size(); i++) {
+  for (size_t i = 0; i < ops[incoming_op_index]->inputs_tensor_info().size(); i++) {
    if (ops[incoming_op_index]->inputs_tensor_info()[i].shape().size() == 0) {
      continue;
    }
@ -955,10 +958,10 @@ Dimensions ModifyStrategyIfSqueezeIncoming(const std::vector<std::shared_ptr<Ope
    if (ops[incoming_op_index]->inputs_tensor_info()[0].shape()[LongToSize(axis)] != 1) {
      MS_LOG(EXCEPTION) << "Failure: Removed dimension's shape is not 1." << std::endl;
    }
-    stra_dim_list.erase(it);
+    (void)stra_dim_list.erase(it);
  }

-  for (size_t i = 0; i < (size_t)stra_dim_list.size(); i++) {
+  for (size_t i = 0; i < stra_dim_list.size(); i++) {
    s_Squeeze.push_back(s[LongToSize(stra_dim_list[i])]);
  }
  return s_Squeeze;
@ -1020,10 +1023,10 @@ Dimensions ModifyStrategyIfReduceIncoming(const std::vector<std::shared_ptr<Oper
    if (it == axis_list.end()) {
      MS_LOG(EXCEPTION) << "Failure: Can not find dimension indexes in Axis." << std::endl;
    }
-    axis_list.erase(it);
+    (void)axis_list.erase(it);
  }

-  for (size_t i = 0; i < (size_t)axis_list.size(); i++) {
+  for (size_t i = 0; i < axis_list.size(); i++) {
    s_Reduce.push_back(s[LongToSize(axis_list[i])]);
  }
  return s_Reduce;
@ -1076,10 +1079,10 @@ Dimensions ModifyStrategyIfArgIncoming(const std::vector<std::shared_ptr<Operato
    if (it == axis_list.end()) {
      MS_LOG(EXCEPTION) << "Failure: Can not find dimension indexes in Axis." << std::endl;
    }
-    axis_list.erase(it);
+    (void)axis_list.erase(it);
  }

-  for (size_t i = 0; i < (size_t)axis_list.size(); i++) {
+  for (size_t i = 0; i < axis_list.size(); i++) {
    s_Arg.push_back(s[LongToSize(axis_list[i])]);
  }
  return s_Arg;
@ -1117,8 +1120,7 @@ Strategies GenerateStrategiesFromStrategy(const std::vector<std::shared_ptr<Oper
  }

  if (basic_stra.size() == 0) {
-    for (size_t iter_op_inputs = 0; iter_op_inputs < (size_t)ops[iter_ops]->inputs_tensor_info().size();
-         iter_op_inputs++) {
+    for (size_t iter_op_inputs = 0; iter_op_inputs < ops[iter_ops]->inputs_tensor_info().size(); iter_op_inputs++) {
      stra.push_back(basic_stra);
    }
    return stra;
@ -1158,7 +1160,7 @@ Strategies GenerateStrategiesFromStrategy(const std::vector<std::shared_ptr<Oper

 // Function to deal with ops with broadcasting, like TensorAdd/Sub/Mul/Div etc.
 Strategies CheckBroadcast(const std::vector<std::shared_ptr<OperatorInfo>> &ops, const size_t iter_ops,
-                          const Dimensions s) {
+                          const Dimensions &s) {
  Strategies stra;

  size_t first_tensor_dim = ops[iter_ops]->inputs_tensor_info()[0].shape().size();
@ -1257,13 +1259,12 @@ Dimensions ApplyBroadcast(const std::vector<std::shared_ptr<OperatorInfo>> &ops,

 // Check whether the operator can be divided by the current strategy.
 Strategies CheckDivisible(const std::vector<std::shared_ptr<OperatorInfo>> &ops, const size_t iter_ops,
-                          const Dimensions basic_stra) {
+                          const Dimensions &basic_stra) {
  Dimensions s_empty = {};
  Strategies stra;

  // For all the input tensors.
-  for (size_t iter_op_inputs = 0; iter_op_inputs < (size_t)ops[iter_ops]->inputs_tensor_info().size();
-       iter_op_inputs++) {
+  for (size_t iter_op_inputs = 0; iter_op_inputs < ops[iter_ops]->inputs_tensor_info().size(); iter_op_inputs++) {
    // If input tensor is empty, return strategy as void.
    if (ops[iter_ops]->inputs_tensor_info()[iter_op_inputs].shape().size() == 0) {
      stra.push_back(s_empty);
@ -1274,7 +1275,7 @@ Strategies CheckDivisible(const std::vector<std::shared_ptr<OperatorInfo>> &ops,
    bool modified = false;

    // Make sure each tensor's dim shape is greater than 1. If not, push back strategy as 1 instead.
-    for (size_t j = 0; j < (size_t)ops[iter_ops]->inputs_tensor_info()[iter_op_inputs].shape().size(); j++) {
+    for (size_t j = 0; j < ops[iter_ops]->inputs_tensor_info()[iter_op_inputs].shape().size(); j++) {
      if (ops[iter_ops]->inputs_tensor_info()[iter_op_inputs].shape()[j] == 1) {
        tmp_stra[j] = 1;
        modified = true;
@ -1336,8 +1337,8 @@ Dimensions ModifyStrategyIfSqueezeOutgoing(const std::vector<std::shared_ptr<Ope
  auto axis_list = GetAxisList(ops, SizeToLong(iter_ops));
  size_t s_index = 0;
  size_t axis_list_index = 0;
-  for (size_t i = 0; i < (size_t)(s.size() + axis_list.size()); i++) {
-    if (i == (size_t)axis_list[axis_list_index]) {
+  for (size_t i = 0; i < s.size() + axis_list.size(); i++) {
+    if (i == LongToSize(axis_list[axis_list_index])) {
      s_Squeeze.push_back(1);
      axis_list_index++;
    } else {
--- a/mindspore/ccsrc/frontend/parallel/auto_parallel/rec_core/rec_generate_strategy.h
+++ b/mindspore/ccsrc/frontend/parallel/auto_parallel/rec_core/rec_generate_strategy.h
@ -31,7 +31,7 @@ void GenerateStrategy(const std::shared_ptr<Graph> &graph, const std::vector<std
                      const std::shared_ptr<std::vector<std::vector<size_t>>> &eli_list,
                      const std::vector<std::vector<std::string>> &input_tensor_names,
                      const std::shared_ptr<std::vector<size_t>> &index_list, bool is_training,
-                      const std::vector<std::vector<size_t>> &shared_tensors_ops);
+                      const std::vector<std::vector<size_t>> &param_users_ops_index);
 Dimensions PrepareMatMulStrategy(const std::shared_ptr<Graph> &graph, const size_t iter_graph, bool transpose_a,
                                 bool transpose_b, size_t iter_op_inputs);
 Strategies PrepareMatMul(const std::shared_ptr<Graph> &graph, const std::vector<std::shared_ptr<OperatorInfo>> &ops,
@ -40,7 +40,7 @@ Strategies PrepareBiasAdd(const std::shared_ptr<Dimensions> &s);
 Strategies PrepareStridedSlice(const std::vector<std::shared_ptr<OperatorInfo>> &ops, const size_t iter_ops,
                               Dimensions basic_stra);
 Strategies PrepareSoftMax(const std::vector<std::shared_ptr<OperatorInfo>> &ops, const size_t iter_ops,
-                          Dimensions basic_stra);
+                          const Dimensions &basic_stra);
 Strategies PrepareOneHot(const std::vector<std::shared_ptr<OperatorInfo>> &ops, const size_t iter_ops, Dimensions s);
 Strategies PrepareAxisRelatedStrategy(const std::shared_ptr<Graph> &graph,
                                      const std::vector<std::shared_ptr<OperatorInfo>> &ops, const size_t iter_graph,
@ -53,10 +53,12 @@ Strategies PrepareL2Normalize(const std::vector<std::shared_ptr<OperatorInfo>> &
 Strategies MakeRecSearchStrategy(const std::shared_ptr<Graph> &graph,
                                 const std::vector<std::shared_ptr<OperatorInfo>> &ops, const size_t iter_graph,
                                 const size_t iter_ops);
-Strategies CheckBroadcast(const std::vector<std::shared_ptr<OperatorInfo>> &ops, const size_t iter_ops, Dimensions s);
+Strategies CheckBroadcast(const std::vector<std::shared_ptr<OperatorInfo>> &ops, const size_t iter_ops,
+                          const Dimensions &s);
 Dimensions ApplyBroadcast(const std::vector<std::shared_ptr<OperatorInfo>> &ops, const size_t iter_ops, Dimensions s,
                          size_t first_tensor_dim, size_t second_tensor_dim, bool broadcast_first_tensor);
-Strategies CheckDivisible(const std::vector<std::shared_ptr<OperatorInfo>> &ops, const size_t iter_ops, Dimensions s);
+Strategies CheckDivisible(const std::vector<std::shared_ptr<OperatorInfo>> &ops, const size_t iter_ops,
+                          const Dimensions &s);
 Strategies MakeDataParallelStrategy(const std::shared_ptr<Graph> &graph,
                                    const std::vector<std::shared_ptr<OperatorInfo>> &ops, const size_t iter_graph,
                                    const size_t iter_ops);
@ -118,7 +120,7 @@ void GenerateRemainingOperatorStrategy(const std::shared_ptr<Graph> &graph,
                                       const std::shared_ptr<std::vector<size_t>> &index_list,
                                       const std::shared_ptr<std::vector<size_t>> &no_stra_op_list);
 void ModifyParamSharingOpsStrategy(const std::vector<std::shared_ptr<OperatorInfo>> &ops,
-                                   const std::vector<std::vector<size_t>> &shared_tensors_ops);
+                                   const std::vector<std::vector<size_t>> &param_users_ops_index);
 }  // namespace parallel
 }  // namespace mindspore
 #endif  // PARALLEL_AUTO_PARALLEL_REC_GENERATE_STRATEGY_H_
--- a/mindspore/ccsrc/frontend/parallel/graph_util/get_parallel_info.cc
+++ b/mindspore/ccsrc/frontend/parallel/graph_util/get_parallel_info.cc
@ -34,8 +34,8 @@ namespace {
 constexpr char INPUTS[] = "inputs";
 constexpr char ATTRS[] = "attrs";
 using FuncGraphNameMap = const std::unordered_map<FuncGraphPtr, std::string>;
-static std::unordered_map<std::string, size_t> op_count;
-static std::unordered_map<CNodePtr, std::string> name_map;
+static std::unordered_map<std::string, size_t> op_count = {};
+static std::unordered_map<CNodePtr, std::string> name_map = {};

 // Extract the op name and the topology number of the same node in the graph
 // e.g, Default/Mul-op32 -> Mul-op0, Default/Mul-op35 -> Mul-op1
--- a/mindspore/ccsrc/frontend/parallel/ops_info/broadcast_to_info.cc
+++ b/mindspore/ccsrc/frontend/parallel/ops_info/broadcast_to_info.cc
@ -65,8 +65,8 @@ Status BroadcastToInfo::CheckStrategy(const StrategyPtr &strategy) {
    MS_LOG(ERROR) << name_ << ": Invalid strategy";
    return FAILED;
  }
-
-  auto stra = strategy->GetInputDim().at(0);
+  auto input_dim = strategy->GetInputDim();
+  auto stra = input_dim.at(0);
  auto in_shape = inputs_shape_.at(0);
  for (size_t i = 0; i < stra.size(); ++i) {
    if ((in_shape[i] == 1) && (stra[i] != 1)) {
--- a/mindspore/ccsrc/frontend/parallel/ops_info/gather_info.cc
+++ b/mindspore/ccsrc/frontend/parallel/ops_info/gather_info.cc
@ -241,8 +241,9 @@ Status GatherInfo::CheckManualSplit(const Strategies &strategy) {
 }

 Status GatherInfo::CheckSplitAxisStrategy(const StrategyPtr &strategy) {
-  auto param_strategy = strategy->GetInputDim().at(0);
-  auto index_strategy = strategy->GetInputDim().at(1);
+  auto input_dim = strategy->GetInputDim();
+  auto param_strategy = input_dim.at(0);
+  auto index_strategy = input_dim.at(1);
  // param_strategy(axis) != 1, index can't be split
  auto product_i = std::accumulate(index_strategy.begin(), index_strategy.end(), 1, std::multiplies<int64_t>());
  if ((param_strategy.at(LongToSize(axis_)) != 1) && (product_i != 1)) {
@ -302,7 +303,8 @@ bool GatherInfo::ShardBatchAndAxis(const Strategies &strategy) const {
 }

 void GatherInfo::SetAttribute(const StrategyPtr &strategy) {
-  auto param_strategy = strategy->GetInputDim().at(0);
+  auto input_dim = strategy->GetInputDim();
+  auto param_strategy = input_dim.at(0);
  // axis=0, index_shape(0)%param_strategy(0) must be 0
  Shape index_shape = inputs_shape_.at(1);
  if ((axis_ == 0) && (index_shape.at(0) % param_strategy.at(0) != 0) && !dynamic_shape_indices_) {
@ -335,7 +337,8 @@ Status GatherInfo::CheckStrategy(const StrategyPtr &strategy) {

  // param slice shape need 32Byte aligned
  auto param_shape = inputs_shape_.at(0);
-  auto param_strategy = strategy->GetInputDim().at(0);
+  auto input_dim = strategy->GetInputDim();
+  auto param_strategy = input_dim.at(0);
  auto slice_shape = param_shape.at(param_shape.size() - 1) / param_strategy.at(param_strategy.size() - 1);
  if ((target_ != CPU) && (slice_shape % 8 != 0) && (slice_shape != 1)) {
    ReportError(name_ + ": Last dim of param slice shape need 32Byte aligned.");
@ -480,8 +483,9 @@ Status GatherInfo::InferDevMatrixShape() {
  dev_matrix_shape_.clear();
  out_dev_matrix_shape_.clear();
  // infer input dev_matrix_shape
-  auto param_strategy = strategy_->GetInputDim().at(0);
-  auto index_strategy = strategy_->GetInputDim().at(1);
+  auto param_stra = strategy_->GetInputDim();
+  auto param_strategy = param_stra.at(0);
+  auto index_strategy = param_stra.at(1);

  if (manual_split_) {
    dev_matrix_shape_ = param_strategy;
@ -545,7 +549,8 @@ void GatherInfo::InferInputsTensorMap() {
  size_t total_size = param_size + index_size;
  Shape tensor_map_index;
  Shape tensor_map_params;
-  auto param_strategy = strategy_->GetInputDim().at(0);
+  auto input_dim = strategy_->GetInputDim();
+  auto param_strategy = input_dim.at(0);
  if (param_strategy.at(LongToSize(axis_)) != 1) {
    (void)tensor_map_index.insert(tensor_map_index.begin(), index_size, MAP_NONE);
    for (size_t i = 0; i < param_size; ++i) {
@ -600,7 +605,8 @@ void GatherInfo::InferOutputsTensorMap() {
  size_t index_size = inputs_shape_.at(1).size();
  size_t total_size = param_size + index_size;
  Shape tensor_map_out;
-  auto param_strategy = strategy_->GetInputDim().at(0);
+  auto input_dim = strategy_->GetInputDim();
+  auto param_strategy = input_dim.at(0);
  if (param_strategy.at(LongToSize(axis_)) == 1) {
    // param_strategy(axis) is 1
    for (size_t i = 0; i < param_size; ++i) {
@ -835,7 +841,8 @@ Status GatherInfo::InferForwardCommunication() {
  }

  forward_op_.clear();
-  auto param_strategy = strategy_->GetInputDim().at(0);
+  auto input_dim = strategy_->GetInputDim();
+  auto param_strategy = input_dim.at(0);
  // don't split axis or target is not CPU, no need forward communication
  if (target_ != CPU || param_strategy.at(LongToSize(axis_)) == 1) {
    return SUCCESS;
@ -934,8 +941,8 @@ ReplaceGraphPtr GatherInfo::replace_graph(const CNodePtr &cnode) {
    }
    return replace_graph_;
  }
-
-  auto param_strategy = strategy_->GetInputDim().at(0);
+  auto input_dim = strategy_->GetInputDim();
+  auto param_strategy = input_dim.at(0);
  // target_ == CPU, no need to replace graph
  if (target_ == CPU) {
    return nullptr;
--- a/mindspore/python/mindspore/communication/management.py
+++ b/mindspore/python/mindspore/communication/management.py
@ -85,11 +85,11 @@ def init(backend_name=None):
    Initialize distributed backend, e.g. HCCL/NCCL, it is required before using the communication service.

    Note:
-        The full name of HCCL is Huawei Collective Communication Library.
-        The full name of NCCL is NVIDIA Collective Communication Library.
-        The full name of MCCL is MindSpore Collective Communication Library.
-        This method should be used after set_context. The user needs to preset communication environment variables
-        before running the following example, please see the docstring of the mindspore.management.
+        - The full name of HCCL is Huawei Collective Communication Library.
+        - The full name of NCCL is NVIDIA Collective Communication Library.
+        - The full name of MCCL is MindSpore Collective Communication Library.
+        - This method should be used after set_context. The user needs to preset communication environment variables
+        before running the following example, please see the docstring of the mindspore.communication.

    Args:
        backend_name (str): Backend, using HCCL/NCCL/MCCL. If the `backend_name` is None, system will recognize
@ -167,7 +167,7 @@ def release():

    Note:
        This method should be used after init(). The user needs to preset communication environment variables
-        before running the following example, please see the docstring of the mindspore.managerment.
+        before running the following example, please see the docstring of the mindspore.communication.

    Raises:
        RuntimeError: If failed to release distributed resource.
@ -189,7 +189,7 @@ def get_rank(group=GlobalComm.WORLD_COMM_GROUP):

    Note:
        This method should be used after init(). The user needs to preset communication environment variables
-        before running the following example, please see the docstring of the mindspore.managerment.
+        before running the following example, please see the docstring of the mindspore.communication.

    Args:
        group (str): The communication group to work on. Normally, the group should be created by create_group,
@ -226,7 +226,7 @@ def get_local_rank(group=GlobalComm.WORLD_COMM_GROUP):
    Note:
        GPU version of MindSpore doesn't support this method.
        This method should be used after init(). The user needs to preset communication environment variables
-        before running the following example, please see the docstring of the mindspore.managerment.
+        before running the following example, please see the docstring of the mindspore.communication.

    Args:
        group (str): The communication group to work on. Normally, the group should be created by create_group,
@ -266,7 +266,7 @@ def get_group_size(group=GlobalComm.WORLD_COMM_GROUP):

    Note:
        This method should be used after init(). The user needs to preset communication environment variables before
-        running the following example, please see the docstring of the mindspore.managerment.
+        running the following example, please see the docstring of the mindspore.communication.

    Args:
        group (str): The communication group to work on. Normally, the group should be created by create_group,
@ -305,7 +305,7 @@ def get_local_rank_size(group=GlobalComm.WORLD_COMM_GROUP):
    Note:
        GPU version of MindSpore doesn't support this method.
        This method should be used after init(). The user needs to preset communication environment variables before
-        running the following example, please see the docstring of the mindspore.managerment.
+        running the following example, please see the docstring of the mindspore.communication.

    Args:
        group (str): The communication group to work on. The group is created by create_group
@ -347,7 +347,7 @@ def get_world_rank_from_group_rank(group, group_rank_id):
        GPU version of MindSpore doesn't support this method.
        The parameter group should not be "hccl_world_group".
        This method should be used after init(). The user needs to preset communication environment variables
-        before running the following example, please see the docstring of the mindspore.managerment.
+        before running the following example, please see the docstring of the mindspore.communication.

    Args:
        group (str): The communication group to work on. The group is created by create_group.
--- a/mindspore/python/mindspore/log.py
+++ b/mindspore/python/mindspore/log.py
@ -607,9 +607,10 @@ class _LogActionOnce:
    """
    __is_logged__ = dict()

-    def __init__(self, logger, key):
+    def __init__(self, logger, key, no_warning=False):
        self.logger = logger
        self.key = key
+        self.no_warning = no_warning

    def __call__(self, func):
        def wrapper(*args, **kwargs):
@ -617,7 +618,7 @@ class _LogActionOnce:
                return func(*args, **kwargs)

            _old_ = self.logger.warning
-            if self.key in _LogActionOnce.__is_logged__:
+            if self.no_warning or self.key in _LogActionOnce.__is_logged__:
                self.logger.warning = lambda x: x
            else:
                _LogActionOnce.__is_logged__[self.key] = True
--- a/mindspore/python/mindspore/nn/transformer/loss.py
+++ b/mindspore/python/mindspore/nn/transformer/loss.py
@ -26,6 +26,8 @@ from mindspore.nn.loss.loss import _check_is_tensor
 from mindspore.parallel._utils import _get_parallel_mode, _is_sharding_propagation
 from mindspore.context import ParallelMode
 from mindspore.parallel._utils import _get_device_num, _get_pipeline_stages
+from mindspore.log import _LogActionOnce
+from mindspore import log as logger
 from .layers import _check_input_dtype, _check_input_shape
 from .op_parallel_config import default_dpmp_config, OpParallelConfig

@ -181,7 +183,8 @@ class CrossEntropyLoss(Cell):
        >>> print(output.shape)
        (1,)
    """
-
+    @_LogActionOnce(logger=logger, key='CrossEntropyLoss',
+                    no_warning=_get_parallel_mode() in (ParallelMode.STAND_ALONE,))
    def __init__(self, parallel_config=default_dpmp_config):
        super(CrossEntropyLoss, self).__init__()
        if not isinstance(parallel_config, OpParallelConfig):
--- a/mindspore/python/mindspore/nn/transformer/transformer.py
+++ b/mindspore/python/mindspore/nn/transformer/transformer.py
@ -31,6 +31,7 @@ from mindspore._checkparam import Validator
 from mindspore import log as logger
 from mindspore.parallel._utils import _get_parallel_mode, _is_sharding_propagation
 from mindspore.context import ParallelMode
+from mindspore.log import _LogActionOnce
 from .layers import _LayerNorm, _Linear, _check_input_shape, \
    _args_type_validator_check, _valid_type_checks, _valid_value_checks, \
    _check_shape_equal, _check_past_none_input_none, _check_input_dtype, _check_input_shape_value
@ -390,7 +391,8 @@ class FeedForward(Cell):
            >>> print(output.shape)
            (2, 20, 15)
    """
-
+    @_LogActionOnce(logger=logger, key='FeedForward',
+                    no_warning=_get_parallel_mode() in (ParallelMode.STAND_ALONE,))
    @_args_type_validator_check(hidden_size=Validator.check_positive_int,
                                ffn_hidden_size=Validator.check_positive_int,
                                dropout_rate=Validator.check_non_negative_float,
@ -578,7 +580,8 @@ class AttentionMask(Cell):
              [1. 1. 1. 0]
              [0. 0. 0. 0]]]
    """
-
+    @_LogActionOnce(logger=logger, key='AttentionMask',
+                    no_warning=_get_parallel_mode() in (ParallelMode.STAND_ALONE,))
    @_args_type_validator_check(seq_length=Validator.check_positive_int,
                                parallel_config=_valid_type_checks([OpParallelConfig], "AttentionMask"))
    def __init__(self, seq_length, parallel_config=default_dpmp_config):
@ -667,7 +670,8 @@ class VocabEmbedding(Cell):
            >>> print(table.shape)
            (30, 30)
    """
-
+    @_LogActionOnce(logger=logger, key='VocabEmbedding',
+                    no_warning=_get_parallel_mode() in (ParallelMode.STAND_ALONE,))
    @_args_type_validator_check(vocab_size=Validator.check_positive_int,
                                embedding_size=Validator.check_positive_int,
                                parallel_config=_valid_type_checks([EmbeddingOpParallelConfig], "VocabEmbedding"))
@ -821,7 +825,8 @@ class MultiHeadAttention(Cell):
            >>> print(past[1].shape)
            (2, 3, 20, 5)
    """
-
+    @_LogActionOnce(logger=logger, key='MultiHeadAttention',
+                    no_warning=_get_parallel_mode() in (ParallelMode.STAND_ALONE,))
    @_args_type_validator_check(batch_size=Validator.check_positive_int,
                                hidden_size=Validator.check_positive_int,
                                num_heads=Validator.check_positive_int,
@ -1420,7 +1425,8 @@ class TransformerEncoderLayer(Cell):
            >>> print(past[1].shape)
            (2, 2, 16, 4)
    """
-
+    @_LogActionOnce(logger=logger, key='TransformerEncoderLayer',
+                    no_warning=_get_parallel_mode() in (ParallelMode.STAND_ALONE,))
    @_args_type_validator_check(batch_size=Validator.check_positive_int,
                                hidden_size=Validator.check_positive_int,
                                num_heads=Validator.check_positive_int,
@ -1804,7 +1810,8 @@ class TransformerDecoderLayer(Cell):
            >>> print(past[3].shape)
            (2, 2, 20, 32)
    """
-
+    @_LogActionOnce(logger=logger, key='TransformerDecoderLayer',
+                    no_warning=_get_parallel_mode() in (ParallelMode.STAND_ALONE,))
    @_args_type_validator_check(batch_size=Validator.check_positive_int,
                                hidden_size=Validator.check_positive_int,
                                num_heads=Validator.check_positive_int,
@ -2344,7 +2351,8 @@ class TransformerEncoder(Cell):
            >>> print(past[0][1].shape)
            (2, 2, 16, 4)
    """
-
+    @_LogActionOnce(logger=logger, key='TransformerEncoder',
+                    no_warning=_get_parallel_mode() in (ParallelMode.STAND_ALONE,))
    @_args_type_validator_check(batch_size=Validator.check_positive_int,
                                hidden_size=Validator.check_positive_int,
                                num_heads=Validator.check_positive_int,
@ -2575,7 +2583,8 @@ class TransformerDecoder(Cell):
            >>> print(past[0][3].shape)
            (2, 2, 20, 32)
    """
-
+    @_LogActionOnce(logger=logger, key='TransformerDecoder',
+                    no_warning=_get_parallel_mode() in (ParallelMode.STAND_ALONE,))
    @_args_type_validator_check(batch_size=Validator.check_positive_int,
                                hidden_size=Validator.check_positive_int,
                                num_heads=Validator.check_positive_int,
@ -2842,7 +2851,8 @@ class Transformer(Cell):
            >>> print(de_past[0][3].shape)
            (2, 2, 20, 32)
    """
-
+    @_LogActionOnce(logger=logger, key='Transformer',
+                    no_warning=_get_parallel_mode() in (ParallelMode.STAND_ALONE,))
    @_args_type_validator_check(batch_size=Validator.check_positive_int,
                                hidden_size=Validator.check_positive_int,
                                num_heads=Validator.check_positive_int,
--- a/tests/ut/python/nn/test_transformer.py
+++ b/tests/ut/python/nn/test_transformer.py
@ -13,8 +13,13 @@
 # limitations under the License.
 # ============================================================================
 """ test transformer"""
+import os
+import shutil
 import numpy as np
 import pytest
+
+
+import mindspore
 from mindspore import Tensor
 from mindspore.common import dtype
 from mindspore.parallel.nn import MultiHeadAttention, FeedForward, TransformerEncoderLayer, TransformerEncoder, \
@ -271,3 +276,60 @@ def test_sparse_attention():
    v = Tensor(np.ones((2, 1024, 512)), dtype.float16)
    mask = Tensor(np.ones((2, 1024, 1024)), dtype.float32)
    _cell_graph_executor.compile(model, q, k, v, mask)
+
+
+class TestBasicWarningValidator:
+    log_envs = dict(GLOG_v=None, GLOG_logtostderr=None, GLOG_log_dir=None, logger_maxBytes=None,
+                    logger_backupCount=None)
+    log_path = './TestBasicWarningValidator'
+
+    def setup_method(self):
+        for env in self.log_envs:
+            self.log_envs[env] = os.environ.get(env, None)
+        os.environ['GLOG_log_dir'] = self.log_path
+        os.environ['GLOG_v'] = '1'
+        os.environ['GLOG_logtostderr'] = '0'
+        # Force to generate the logger again
+        # pylint: disable=W0212
+        mindspore.log._global_logger = None
+
+    def teardown_method(self):
+        for env in self.log_envs:
+            if self.log_envs.get(env, False):
+                os.environ[env] = self.log_envs.get(env, "False")
+        shutil.rmtree(os.path.join(self.log_path))
+
+    def check_warning_log(self):
+        cmd = f'cd {self.log_path} && grep WARNING rank_0/logs/mindspore.log.* |wc -l'
+        file_count = os.popen(cmd).read().strip()
+        assert file_count == "0"
+
+    def test_cross_entory_no_warning(self):
+        """
+        Feature: Test the warning log
+        Description: Test a forward compile has no warning error
+        Expectation: To compile passed
+        """
+        # Force to rebuild the logger
+        test_cross_entroy()
+        self.check_warning_log()
+
+    def test_transformer_encoder_no_warning(self):
+        """
+        Feature: Test the warning log
+        Description: Test a forward compile has no warning error
+        Expectation: To compile passed
+        """
+        # Force to rebuild the logger
+        test_transformer_encoder_only()
+        self.check_warning_log()
+
+    def test_transformer_decoder_no_warning(self):
+        """
+        Feature: Test the warning log
+        Description: Test a forward compile has no warning error
+        Expectation: To compile passed
+        """
+        # Force to rebuild the logger
+        test_transformer_decoder()
+        self.check_warning_log()