forked from OSchip/llvm-project
ee01c7a740
Default vector.contract lowering essentially yields a series of sdot/ddot operations. However, for some layouts a series of saxpy/daxpy operations, chained through fma are more efficient. This CL introduces a choice between the two lowering paths. A default heuristic is to follow. Some preliminary avx2 performance numbers for matrix-times-vector. Here, dot performs best for 64x64 A x b and saxpy for 64x64 A^T x b. ``` ------------------------------------------------------------ A x b A^T x b ------------------------------------------------------------ GFLOPS sdot (reassoc) saxpy sdot (reassoc) saxpy ------------------------------------------------------------ 1x1 0.6 0.9 0.6 0.9 2x2 2.5 3.2 2.4 3.5 4x4 6.4 8.4 4.9 11.8 8x8 11.7 6.1 5.0 29.6 16x16 20.7 10.8 7.3 43.3 32x32 29.3 7.9 6.4 51.8 64x64 38.9 79.3 128x128 32.4 40.7 ------------------------------------------------------------ ``` Reviewed By: nicolasvasilache, ftynse Differential Revision: https://reviews.llvm.org/D83012 |
||
---|---|---|
.. | ||
mlir | ||
mlir-c |