2017-09-28 23:10:22 +08:00
|
|
|
|
.. include:: <isonum.txt>
|
|
|
|
|
==================================================
|
|
|
|
|
Performance
|
|
|
|
|
==================================================
|
|
|
|
|
|
|
|
|
|
High-Performance Generalized Matrix Multiplication
|
|
|
|
|
--------------------------------------------------
|
|
|
|
|
|
|
|
|
|
Polly automatically detects and optimizes generalized matrix multiplication,
|
|
|
|
|
the computation C |larr| α ⊗ C ⊕ β ⊗ A ⊗ B, where A, B, and C are three appropriately sized matrices,
|
2017-09-28 23:31:20 +08:00
|
|
|
|
⊕ and ⊗ operations are originating from the corresponding matrix semiring, and α and β are
|
2017-09-28 23:10:22 +08:00
|
|
|
|
constants, and beta is not equal to zero. It allows to obtain the highly optimized form structured
|
|
|
|
|
similar to the expert implementation of GEMM that can be found in GotoBLAS and its successors. The
|
|
|
|
|
performance evaluation of GEMM is shown in the following figure.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
.. image:: images/GEMM_double.png
|
|
|
|
|
:align: center
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Compile Time Impact of Polly
|
|
|
|
|
----------------------------
|
|
|
|
|
|
|
|
|
|
Clang+LLVM+Polly are compiled using Clang on a Intel(R) Core(TM) i7-7700 based system. The experiment
|
|
|
|
|
is repeated twice: with and without Polly enabled in order to measure its compile time impact.
|
|
|
|
|
|
|
|
|
|
The following versions are used:
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
- Polly (git hash 0db98a4837b6f233063307bb9184374175401922)
|
|
|
|
|
- Clang (git hash 3e1d04a92b51ed36163995c96c31a0e4bbb1561d)
|
|
|
|
|
- LLVM git hash 0265ec7ebad69a47f5c899d95295b5eb41aba68e)
|
|
|
|
|
|
|
|
|
|
`ninja <https://ninja-build.org/>`_ is used as the build system.
|
|
|
|
|
|
|
|
|
|
For both cases the whole compilation was performed five times. The compile times in seconds are shown in the following table.
|
|
|
|
|
|
|
|
|
|
+--------------+-------------+
|
|
|
|
|
|Polly Disabled|Polly Enabled|
|
|
|
|
|
+==============+=============+
|
|
|
|
|
|964 |977 |
|
|
|
|
|
+--------------+-------------+
|
|
|
|
|
|964 |980 |
|
|
|
|
|
+--------------+-------------+
|
|
|
|
|
|967 |981 |
|
|
|
|
|
+--------------+-------------+
|
|
|
|
|
|967 |981 |
|
|
|
|
|
+--------------+-------------+
|
|
|
|
|
|968 |982 |
|
|
|
|
|
+--------------+-------------+
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
The median compile time without Polly enabled is 967 seconds and with Polly enabled it is 981 seconds. The overhead is 1.4%.
|
|
|
|
|
|