llvm-project/mlir/lib
William S. Moses 973cb2c326 [MLIR][OMP] Ensure nested scf.parallel execute all iterations
Presently, the lowering of nested scf.parallel loops to OpenMP creates one omp.parallel region, with two (nested) OpenMP worksharing loops on the inside. When lowered to LLVM and executed, this results in incorrect results. The reason for this is as follows:

An OpenMP parallel region results in the code being run with whatever number of threads available to OpenMP. Within a parallel region a worksharing loop divides up the total number of requested iterations by the available number of threads, and distributes accordingly. For a single ws loop in a parallel region, this works as intended.

Now consider nested ws loops as follows:

omp.parallel {
   A: omp.ws %i = 0...10 {
      B: omp.ws %j = 0...10 {
          code(%i, %j)
      }
   }
}

Suppose we ran this on two threads. The first workshare loop would decide to execute iterations 0, 1, 2, 3, 4 on thread 0, and iterations 5, 6, 7, 8, 9 on thread 1. The second workshare loop would decide the same for its iteration. This means thread 0 would execute i \in [0, 5) and j \in [0, 5). Thread 1 would execute i \in [5, 10) and j \in [5, 10). This means that iterations i in [5, 10), j in [0, 5) and i in [0, 5), j in [5, 10) never get executed, which is clearly wrong.

This permits two options for a remedy:
1) Change the semantics of the omp.wsloop to be distinct from that of the OpenMP runtime call or equivalently #pragma omp for. This could then allow some lowering transformation to remedy the aforementioned issue. I don't think this is desirable for an abstraction standpoint.
2) When lowering an scf.parallel always surround the wsloop with a new parallel region (thereby causing the innermost wsloop to use the number of threads available only to it).

This PR implements the latter change.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D108426
2021-08-20 19:06:28 -04:00
..
Analysis [mlir][scf] Simplify affine.min ops after loop peeling 2021-08-19 17:24:53 +09:00
Bindings/Python [MLIR] [Python] Add `owner` to `mlir.ir.Block` 2021-08-19 00:02:09 -07:00
CAPI Remove libMLIRPublicAPI DSO. 2021-07-20 17:58:28 -07:00
Conversion [MLIR][OMP] Ensure nested scf.parallel execute all iterations 2021-08-20 19:06:28 -04:00
Dialect [mlir][linalg] Finish refactor of TC ops to YAML 2021-08-20 12:35:04 -07:00
ExecutionEngine [mlir][sparse] add dense to sparse conversion implementation 2021-08-09 12:12:39 -07:00
IR [mlir][scf] Simplify affine.min ops after loop peeling 2021-08-19 17:24:53 +09:00
Interfaces [mlir] Enable specifying querying function in ValueShapeRange 2021-08-10 11:44:20 -07:00
Parser [mlir] Set the namespace of the BuiltinDialect to 'builtin' 2021-07-28 21:00:10 +00:00
Pass [mlir] Set the namespace of the BuiltinDialect to 'builtin' 2021-07-28 21:00:10 +00:00
Reducer [mlir-reduce] Fix the memory leak and recycle unused modules. 2021-07-08 20:03:47 +08:00
Rewrite [mlir] Add support for filtering patterns based on debug names and labels 2021-06-02 12:05:25 -07:00
Support [mlir] Fix CMake linker rules for ViewOpGraph.cpp 2021-08-04 19:25:15 +09:00
TableGen [mlir-tblgen] Support binding multi-results of NativeCodeCall 2021-07-21 11:23:22 +08:00
Target [OMPIRBuilder] Clarify CanonicalLoopInfo. NFC. 2021-08-12 21:02:19 -05:00
Tools [mlir-lsp-server] Only use one MLIRContext per MLIRTextFile 2021-08-04 20:09:07 +00:00
Transforms [mlir][Analysis][NFC] FlatAffineConstraints: Use BoundType enum in functions 2021-08-19 10:33:42 +09:00
Translation [mlir] run the verifier before translating a module 2021-07-28 18:15:58 +02:00
CMakeLists.txt Re-engineer MLIR python build support. 2021-07-27 15:54:58 +00:00