forked from OSchip/llvm-project
![]() Depends On D104780 Recursive work splitting instead of sequential async tasks submission gives ~20%-30% speedup in microbenchmarks. Algorithm outline: 1. Collapse scf.parallel dimensions into a single dimension 2. Compute the block size for the parallel operations from the 1d problem size 3. Launch parallel tasks 4. Each parallel task reconstructs its own bounds in the original multi-dimensional iteration space 5. Each parallel task computes the original parallel operation body using scf.for loop nest Reviewed By: herhut Differential Revision: https://reviews.llvm.org/D104850 |
||
---|---|---|
.. | ||
cmake/modules | ||
docs | ||
examples | ||
include | ||
lib | ||
python | ||
test | ||
tools | ||
unittests | ||
utils | ||
.clang-format | ||
.clang-tidy | ||
CMakeLists.txt | ||
LICENSE.TXT | ||
README.md |
README.md
Multi-Level Intermediate Representation
See https://mlir.llvm.org/ for more information.