fix release notes for 2.3.0rc2

This commit is contained in:
yanghaoran 2024-05-11 19:31:15 +08:00
parent c6a1400a90
commit 058a3f4b8f
3 changed files with 22 additions and 20 deletions

View File

@ -59,4 +59,6 @@ https://mindspore.cn/search/en?inputValue=Index%20values
https://mindspore.cn*/r2.3.q1/*
https://www.mindspore.cn*/r2.3.q1/*
https://mindspore.cn*/r2.3.0rc2/*
https://www.mindspore.cn*/r2.3.0rc2/*
http://sox.sourceforge.net/sox.html

View File

@ -9,29 +9,29 @@
#### AutoParallel
- [STABLE] Transpose/Sub/Add/Mul/Div/ReLU/Softmax/Sigmoid supports layout configuration.
- [STABLE] The collective communication precision will affect network convergence. The configuration item force_fp32_communication is provided in the interface mindspore.set_auto_parallel_context. When set to True, the communication type of the reduce communication operator can be forced to be converted to float32.
- [STABLE] The collective communication precision will affect network convergence. The configuration item [force_fp32_communication](https://www.mindspore.cn/docs/en/r2.3.0rc2/api_python/mindspore/mindspore.set_auto_parallel_context.html) is provided in the interface mindspore.set_auto_parallel_context. When set to True, the communication type of the reduce communication operator can be forced to be converted to float32.
- [BETA] Pipeline parallel support Interleave. Optimize the performance when micro batch is limited.
- [BETA] Optimize checkpoint transformation speed when using pipeline parallel, support single stage transform.
- [BETA] Pynative mode supports long sequence parallel of RingAttention. Optimizes long sequence training performance.
#### PyNative
- [STABLE] Support recompute on PyNative mode
- [STABLE] Support register_hook on PyNative mode
- [BETA] Support [recompute](https://www.mindspore.cn/docs/en/r2.3.0rc2/api_python/mindspore/mindspore.recompute.html) on PyNative mode.
- [STABLE] Support [register_hook](https://www.mindspore.cn/docs/en/r2.3.0rc2/api_python/mindspore/Tensor/mindspore.Tensor.register_hook.html#mindspore.Tensor.register_hook) on PyNative mode.
### API Change
Add timeout environment variables in dynamic networking scenarios:
Add timeout environment variables in [dynamic networking](https://www.mindspore.cn/tutorials/experts/en/r2.3.0rc2/parallel/dynamic_cluster.html) scenarios:
- MS_TOPO_TIMEOUT: Cluster networking phase timeout time in seconds.
- MS_NODE_TIMEOUT: Node heartbeat timeout in seconds.
- MS_RECEIVE_MSG_TIMEOUT: Node timeout for receiving messages in seconds.
- `MS_TOPO_TIMEOUT`: Cluster networking phase timeout time in seconds.
- `MS_NODE_TIMEOUT`: Node heartbeat timeout in seconds.
- `MS_RECEIVE_MSG_TIMEOUT`: Node timeout for receiving messages in seconds.
Added new environment variable MS_ENABLE_LCCL to support the use of LCCL communication library.
Added new environment variable `MS_ENABLE_LCCL` to support the use of LCCL communication library.
### Bug Fixes
- [#I9CR96] Fix the issue of insufficient timeout time causing failure for dynamic networking startup in large-scale clusters.
- [#I94AQQ] Fixed the problem of incorrect output shape of ops.Addcdiv operator in graph mode.
### Contributors

View File

@ -9,29 +9,29 @@
#### AutoParallel
- [STABLE] Transpose/Sub/Add/Mul/Div/ReLU/Softmax/Sigmoid算子支持配置Layout。
- [STABLE] 集合通信精度会影响网络收敛在接口mindspore.set_auto_parallel_context提供配置项force_fp32_communication设为True时可以强制将reduce类通信算子的通信类型转为float32。
- [BETA] 流水并行支持Interleave调度优化micro batch大小受限场景下的模型性能。
- [BETA] 优化流水线并行场景下提高模型转换速度支持单个stage单独转换。
- [BETA] 动态图长序列并行支持RingAttention优化长序列训练性能。
- [STABLE] 集合通信精度会影响网络收敛在接口mindspore.set_auto_parallel_context提供配置项[force_fp32_communication](https://www.mindspore.cn/docs/zh-CN/r2.3.0rc2/api_python/mindspore/mindspore.set_auto_parallel_context.html)设为True时可以强制将reduce类通信算子的通信类型转为float32。
- [BETA] pipeline并行支持Interleave调度优化micro batch大小受限场景下的模型性能。
- [BETA] 优化pipeline并行场景下提高模型转换速度支持单个stage单独转换。
#### PyNative
- [STABLE] 动态图下支持重计算功能。
- [STABLE] 动态图下支持register_hook功能。
- [BETA] 动态图下支持[重计算](https://www.mindspore.cn/docs/zh-CN/r2.3.0rc2/api_python/mindspore/mindspore.recompute.html)功能。
- [STABLE] 动态图下支持[register_hook](https://www.mindspore.cn/docs/zh-CN/r2.3.0rc2/api_python/mindspore/Tensor/mindspore.Tensor.register_hook.html#mindspore.Tensor.register_hook)功能。
### API变更
增加动态组网场景下各类超时时间环境变量配置:
增加[动态组网](https://www.mindspore.cn/tutorials/experts/zh-CN/r2.3.0rc2/parallel/dynamic_cluster.html)场景下各类超时时间环境变量配置:
- MS_TOPO_TIMEOUT 集群组网阶段超时时间,单位:秒。
- MS_NODE_TIMEOUT节点心跳超时时间单位秒。
- MS_RECEIVE_MSG_TIMEOUT节点接收消息超时时间单位秒。
- `MS_TOPO_TIMEOUT` 集群组网阶段超时时间,单位:秒。
- `MS_NODE_TIMEOUT`:节点心跳超时时间,单位:秒。
- `MS_RECEIVE_MSG_TIMEOUT`:节点接收消息超时时间,单位:秒。
新增环境变量 MS_ENABLE_LCCL支持昇腾后端单机多卡场景下使用LCCL通信库。
新增环境变量 `MS_ENABLE_LCCL`支持昇腾后端单机多卡场景下使用LCCL通信库。
### 问题修复
- [#I9CR96] 修复在大规模集群下,动态组网启动方式的超时时间不足导致集群启动失败的问题。
- [#I94AQQ] 修复ops.Addcdiv算子在图模式下输出shape有误问题。
### 贡献者