Commit Graph

4 Commits

Author SHA1 Message Date
Alex Zinenko 1ce752b741 [mlir] support reductions in SCF to OpenMP conversion
OpenMP reductions need a neutral element, so we match some known reduction
kinds (integer add/mul/or/and/xor, float add/mul, integer and float min/max) to
define the neutral element and the atomic version when possible to express
using atomicrmw (everything except float mul). The SCF-to-OpenMP pass becomes a
module pass because it now needs to introduce new symbols for reduction
declarations in the module.

Reviewed By: chelini

Differential Revision: https://reviews.llvm.org/D107549
2021-09-09 13:04:27 +02:00
William S. Moses 973cb2c326 [MLIR][OMP] Ensure nested scf.parallel execute all iterations
Presently, the lowering of nested scf.parallel loops to OpenMP creates one omp.parallel region, with two (nested) OpenMP worksharing loops on the inside. When lowered to LLVM and executed, this results in incorrect results. The reason for this is as follows:

An OpenMP parallel region results in the code being run with whatever number of threads available to OpenMP. Within a parallel region a worksharing loop divides up the total number of requested iterations by the available number of threads, and distributes accordingly. For a single ws loop in a parallel region, this works as intended.

Now consider nested ws loops as follows:

omp.parallel {
   A: omp.ws %i = 0...10 {
      B: omp.ws %j = 0...10 {
          code(%i, %j)
      }
   }
}

Suppose we ran this on two threads. The first workshare loop would decide to execute iterations 0, 1, 2, 3, 4 on thread 0, and iterations 5, 6, 7, 8, 9 on thread 1. The second workshare loop would decide the same for its iteration. This means thread 0 would execute i \in [0, 5) and j \in [0, 5). Thread 1 would execute i \in [5, 10) and j \in [5, 10). This means that iterations i in [5, 10), j in [0, 5) and i in [0, 5), j in [5, 10) never get executed, which is clearly wrong.

This permits two options for a remedy:
1) Change the semantics of the omp.wsloop to be distinct from that of the OpenMP runtime call or equivalently #pragma omp for. This could then allow some lowering transformation to remedy the aforementioned issue. I don't think this is desirable for an abstraction standpoint.
2) When lowering an scf.parallel always surround the wsloop with a new parallel region (thereby causing the innermost wsloop to use the number of threads available only to it).

This PR implements the latter change.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D108426
2021-08-20 19:06:28 -04:00
David Truby de155f4af2 [MLIR][OpenMP] Pretty printer and parser for omp.wsloop
Co-authored-by: Kiran Chandramohan <kiran.chandramohan@arm.com>

Reviewed By: ftynse

Differential Revision: https://reviews.llvm.org/D92327
2021-03-18 13:37:01 +00:00
Alex Zinenko 119545f433 [mlir] Add conversion from SCF parallel loops to OpenMP
Introduce a conversion pass from SCF parallel loops to OpenMP dialect
constructs - parallel region and workshare loop. Loops with reductions are not
supported because the OpenMP dialect cannot model them yet.

The conversion currently targets only one level of parallelism, i.e. only
one top-level `omp.parallel` operation is produced even if there are nested
`scf.parallel` operations that could be mapped to `omp.wsloop`. Nested
parallelism support is left for future work.

Reviewed By: kiranchandramohan

Differential Revision: https://reviews.llvm.org/D91982
2020-11-24 21:12:56 +01:00