forked from OSchip/llvm-project
917d95fc8a
The default lowering of vector transpose operations generates a large sequence of scalar extract/insert operations, one pair for each scalar element in the input tensor. In other words, the vector transpose is scalarized. However, there are transpose patterns where one or more adjacent high-order dimensions are not transposed (for example, in the transpose pattern [1, 0, 2, 3], dimensions 2 and 3 are not transposed). This patch improves the lowering of those cases by not scalarizing them and extracting/ inserting a full n-D vector, where 'n' is the number of adjacent high-order dimensions not being transposed. By doing so, we prevent the scalarization of the code and generate a more performant vector version. Paradoxically, this patch shouldn't improve the performance of transpose operations if we are using LLVM. The LLVM pipeline is able to optimize away some of the extract/insert operations and the SLP vectorizer is converting the scalar operations back to its vector form. However, scalarizing a vector version of the code in MLIR and relying on the SLP vectorizer to reconstruct the vector code again is highly undesirable for several reasons. Reviewed By: nicolasvasilache, ThomasRaoux Differential Revision: https://reviews.llvm.org/D120601 |
||
---|---|---|
.. | ||
Analysis | ||
Bindings/Python | ||
CAPI | ||
Conversion | ||
Dialect | ||
ExecutionEngine | ||
IR | ||
Interfaces | ||
Parser | ||
Pass | ||
Reducer | ||
Rewrite | ||
Support | ||
TableGen | ||
Target | ||
Tools | ||
Transforms | ||
CMakeLists.txt |