From 1ec424ddbff1af485ed19cc4c4f5643c9c6c81a5 Mon Sep 17 00:00:00 2001 From: liangzhibo Date: Mon, 14 Mar 2022 14:16:38 +0800 Subject: [PATCH] Change introduction of autodiff in readme --- README.md | 11 +++++------ README_CN.md | 11 +++++------ 2 files changed, 10 insertions(+), 12 deletions(-) diff --git a/README.md b/README.md index c5d43ae4674..0fa5d8f0d6d 100644 --- a/README.md +++ b/README.md @@ -48,15 +48,14 @@ For more details please check out our [Architecture Guide](https://www.mindspore ### Automatic Differentiation -There are currently three automatic differentiation techniques in mainstream deep learning frameworks: +Currently, there are two automatic differentiation techniques in mainstream deep learning frameworks: -- **Conversion based on static compute graph**: Convert the network into a static data flow graph at compile time, then turn the chain rule into a data flow graph to implement automatic differentiation. -- **Conversion based on dynamic compute graph**: Record the operation trajectory of the network during forward execution in an operator overloaded manner, then apply the chain rule to the dynamically generated data flow graph to implement automatic differentiation. -- **Conversion based on source code**: This technology is evolving from the functional programming framework and performs automatic differential transformation on the intermediate expression (the expression form of the program during the compilation process) in the form of just-in-time compilation (JIT), supporting complex control flow scenarios, higher-order functions and closures. +- **Operator Overloading (OO)**: Overloading the basic operators of the programming language to encapsulate their gradient rules. Record the operation trajectory of the network during forward execution in an operator overloaded manner, then apply the chain rule to the dynamically generated data flow graph to implement automatic differentiation. +- **Source Transformation (ST)**: This technology is evolving from the functional programming framework and performs automatic differential transformation on the intermediate expression (the expression form of the program during the compilation process) in the form of just-in-time compilation (JIT), supporting complex control flow scenarios, higher-order functions and closures. -TensorFlow adopted static calculation diagrams in the early days, whereas PyTorch used dynamic calculation diagrams. Static maps can utilize static compilation technology to optimize network performance, however, building a network or debugging it is very complicated. The use of dynamic graphics is very convenient, but it is difficult to achieve extreme optimization in performance. +PyTorch used OO. Compared to ST, OO generates gradient graph in runtime, so it does not need to take function call and control flow into consideration, which makes it easier to develop. However, OO can not perform gradient graph optimization in compilation time and the control flow has to be unfolded in runtime, so it is difficult to achieve extreme optimization in performance. -But MindSpore finds another way, automatic differentiation based on source code conversion. On the one hand, it supports automatic differentiation of automatic control flow, so it is quite convenient to build models like PyTorch. On the other hand, MindSpore can perform static compilation optimization on neural networks to achieve great performance. +MindSpore implemented automatic differentiation based on ST. On the one hand, it supports automatic differentiation of automatic control flow, so it is quite convenient to build models like PyTorch. On the other hand, MindSpore can perform static compilation optimization on neural networks to achieve great performance. Automatic Differentiation diff --git a/README_CN.md b/README_CN.md index 9457cf99932..46aa98611c7 100644 --- a/README_CN.md +++ b/README_CN.md @@ -45,15 +45,14 @@ MindSpore提供了友好的设计和高效的执行,旨在提升数据科学 ### 自动微分 -当前主流深度学习框架中有三种自动微分技术: +当前主流深度学习框架中有两种自动微分技术: -- **基于静态计算图的转换**:编译时将网络转换为静态数据流图,将链式法则应用于数据流图,实现自动微分。 -- **基于动态计算图的转换**:记录算子过载正向执行时网络的运行轨迹,对动态生成的数据流图应用链式法则,实现自动微分。 -- **基于源码的转换**:该技术是从功能编程框架演进而来,以即时编译(Just-in-time Compilation,JIT)的形式对中间表达式(程序在编译过程中的表达式)进行自动差分转换,支持复杂的控制流场景、高阶函数和闭包。 +- **操作符重载法**: 通过操作符重载对编程语言中的基本操作语义进行重定义,封装其微分规则。 在程序运行时记录算子过载正向执行时网络的运行轨迹,对动态生成的数据流图应用链式法则,实现自动微分。 +- **代码变换法**: 该技术是从功能编程框架演进而来,以即时编译(Just-in-time Compilation,JIT)的形式对中间表达式(程序在编译过程中的表达式)进行自动差分转换,支持复杂的控制流场景、高阶函数和闭包。 -TensorFlow早期采用的是静态计算图,PyTorch采用的是动态计算图。静态映射可以利用静态编译技术来优化网络性能,但是构建网络或调试网络非常复杂。动态图的使用非常方便,但很难实现性能的极限优化。 +PyTorch采用的是操作符重载法。相较于代码变换法,操作符重载法是在运行时生成微分计算图的, 无需考虑函数调用与控制流等情况, 开发更为简单。 但该方法不能在编译时刻做微分图的优化, 控制流也需要根据运行时的信息来展开, 很难实现性能的极限优化。 -MindSpore找到了另一种方法,即基于源代码转换的自动微分。一方面,它支持自动控制流的自动微分,因此像PyTorch这样的模型构建非常方便。另一方面,MindSpore可以对神经网络进行静态编译优化,以获得更好的性能。 +MindSpore则采用的是代码变换法。一方面,它支持自动控制流的自动微分,因此像PyTorch这样的模型构建非常方便。另一方面,MindSpore可以对神经网络进行静态编译优化,以获得更好的性能。 Automatic Differentiation