Commit Graph

997 Commits

Author SHA1 Message Date
Alexey Bataev 8bcc69c054 [OPENMP][NVPTX]Extend number of constructs executed in SPMD mode.
If the statements between target|teams|distribute directives does not
require execution in master thread, like constant expressions, null
statements, simple declarations, etc., such construct can be xecuted in
SPMD mode.

llvm-svn: 346551
2018-11-09 20:03:19 +00:00
Alexey Bataev 09c9eea78f [OPENMP][NVPTX]Allow to use shared memory for the
target|teams|distribute variables.

If the total size of the variables, declared in target|teams|distribute
regions, is less than the maximal size of shared memory available, the
buffer is allocated in the shared memory.

llvm-svn: 346507
2018-11-09 16:18:04 +00:00
Alexey Bataev 969dbc02c2 [OPENMP]Make lambda mapping follow reqs for PTR_AND_OBJ mapping.
The base pointer for the lambda mapping must point to the lambda capture
placement and pointer must point to the captured variable itself. Patch
fixes this problem.

llvm-svn: 346408
2018-11-08 15:47:39 +00:00
Alexey Bataev 2a6f3f5fa2 [OPENMP]Fix handling of the globals during compilation for the device.
Fixed lookup for the target regions in unused virtual functions + fixed
processing of the global variables not marked as declare target but
emitted during debug info emission.

llvm-svn: 346343
2018-11-07 19:11:14 +00:00
Alexey Bataev 1fc1f8e819 [OPENMP][NVPTX]Use __kmpc_data_sharing_coalesced_push_stack function.
Coalesced memory access requires use of the new function
`__kmpc_data_sharing_coalesced_push_stack` instead of the
`__kmpc_data_sharing_push_stack`.

llvm-svn: 345991
2018-11-02 16:08:31 +00:00
Alexey Bataev 2dc07d0cf3 [OPENMP]Change the mapping type for lambda captures.
The previously used combination `PTR_AND_OBJ | PRIVATE` could be used for mapping of some data in Fortran. Changed it to `PTR_AND_OBJ | LITERAL`.

llvm-svn: 345982
2018-11-02 15:25:06 +00:00
Alexey Bataev e40901806f [OPENMP][NVPTX]Improve emission of the globalized variables for
target/teams/distribute regions.

Target/teams/distribute regions exist for all the time the kernel is
executed. Thus, if the variable is declared in their context and then
escape it, we can allocate global memory statically instead of
allocating it dynamically.
Patch captures all the globalized variables in target/teams/distribute
contexts, merges them into the records, one per each target region.
Those records are then joined into the union, one per compilation unit
(to save the global memory). Those units are organized into
2 x dimensional arrays, where the first dimension is
the number of blocks per SM and the second one is the number of SMs.
Runtime functions manage this global memory space between the executing
teams.

llvm-svn: 345978
2018-11-02 14:54:07 +00:00
Patrick Lyster 7a2a27c4a4 Add support for 'atomic_default_mem_order' clause on 'requires' directive. Also renamed test files relating to 'requires'. Differntial review: https://reviews.llvm.org/D53513
llvm-svn: 345967
2018-11-02 12:18:11 +00:00
Alexey Bataev 6070542296 [OPENMP] Support for mapping of the lambdas in target regions.
Added support for mapping of lambdas in the target regions. It scans all
the captures by reference in the lambda, implicitly maps those variables
in the target region and then later reinstate the addresses of
references in lambda to the correct addresses of the captured|privatized
variables.

llvm-svn: 345609
2018-10-30 15:50:12 +00:00
Alexey Bataev f07946e101 [OPENMP]Fix PR39372: Does not complain about loop bound variable not
being shared.

According to the standard, the variables with unspecified data-sharing
attributes in presence of `default(none)` clause must be reported to
users. Compiler did not generate error reports for the variables used in
other OpenMP regions. Patch fixes this.

llvm-svn: 345533
2018-10-29 20:17:42 +00:00
Gheorghe-Teodor Bercea 9d6341ff51 [OpenMP] Fix condition.
Summary: Iteration variable must be strictly less than the number of iterations. This fixes a bug introduced by previous patch D53448.

Reviewers: ABataev, caomhin

Reviewed By: ABataev

Subscribers: guansong, cfe-commits

Differential Revision: https://reviews.llvm.org/D53827

llvm-svn: 345527
2018-10-29 19:44:25 +00:00
Gheorghe-Teodor Bercea e92567601b [OpenMP][NVPTX] Use single loops when generating code for distribute parallel for
Summary: This patch adds a new code generation path for bound sharing directives containing distribute parallel for. The new code generation scheme applies to chunked schedules on distribute and parallel for directives. The scheme simplifies the code that is being generated by eliminating the need for an outer for loop over chunks for both distribute and parallel for directives. In the case of distribute it applies to any sized chunk while in the parallel for case it only applies when chunk size is 1.

Reviewers: ABataev, caomhin

Reviewed By: ABataev

Subscribers: jholewinski, guansong, cfe-commits

Differential Revision: https://reviews.llvm.org/D53448

llvm-svn: 345509
2018-10-29 15:45:47 +00:00
Gheorghe-Teodor Bercea 669dbde7a5 [OpenMP][NVPTX] Enable default scheduling for parallel for in non-SPMD cases.
Summary: This patch enables the choosing of the default schedule for parallel for loops even in non-SPMD cases.

Reviewers: ABataev, caomhin

Reviewed By: ABataev

Subscribers: jholewinski, guansong, cfe-commits

Differential Revision: https://reviews.llvm.org/D53443

llvm-svn: 345507
2018-10-29 15:23:23 +00:00
Alexey Bataev 6ab5bb115a [OPENMP] Do not capture private loop counters.
If the loop counter is not declared in the context of the loop and it is
private, such loop counters should not be captured in the outlined
regions.

llvm-svn: 345505
2018-10-29 15:01:58 +00:00
Gheorghe-Teodor Bercea a6cb25676e [NFC][OpenMP] Add new test for parallel for code generation.
Summary:
This is a simple test of the parallel for code generation. It will be used to showcase the change introduced by patch D53443.


Reviewers: ABataev, caomhin

Reviewed By: ABataev

Subscribers: guansong, cfe-commits

Differential Revision: https://reviews.llvm.org/D53772

llvm-svn: 345417
2018-10-26 18:59:52 +00:00
Alexey Bataev 8fc7b5f922 [OPENMP]Fix PR39422: variables are not firstprivatized in task context.
According to the OpenMP standard, In a task generating construct, if no
default clause is present, a variable for which the data-sharing
attribute is not determined by the rules above is firstprivatized.
Compiler tries to implement this, but if the variable is not directly
used in the task context, this variable may not be firstprivatized.
Patch fixes this problem.

llvm-svn: 345277
2018-10-25 15:35:27 +00:00
Alexey Bataev ac6e4de714 Do not always request an implicit taskgroup region inside the kmpc_taskloop function
Summary:
For the following code:
```
    int i;
    #pragma omp taskloop
    for (i = 0; i < 100; ++i)
    {}

    #pragma omp taskloop nogroup
    for (i = 0; i < 100; ++i)
    {}
```

Clang emits the following LLVM IR:

```
 ...
  call void @__kmpc_taskgroup(%struct.ident_t* @0, i32 %0)
  %2 = call i8* @__kmpc_omp_task_alloc(%struct.ident_t* @0, i32 %0, i32 1, i64 80, i64 8, i32 (i32, i8*)* bitcast (i32 (i32, %struct.kmp_task_t_with_privates*)* @.omp_task_entry. to i32 (i32, i8*)*))
  ...
  call void @__kmpc_taskloop(%struct.ident_t* @0, i32 %0, i8* %2, i32 1, i64* %8, i64* %9, i64 %13, i32 0, i32 0, i64 0, i8* null)
  call void @__kmpc_end_taskgroup(%struct.ident_t* @0, i32 %0)

  ...
  %15 = call i8* @__kmpc_omp_task_alloc(%struct.ident_t* @0, i32 %0, i32 1, i64 80, i64 8, i32 (i32, i8*)* bitcast (i32 (i32, %struct.kmp_task_t_with_privates.1*)* @.omp_task_entry..2 to i32 (i32, i8*)*))
  ...
  call void @__kmpc_taskloop(%struct.ident_t* @0, i32 %0, i8* %15, i32 1, i64* %21, i64* %22, i64 %26, i32 0, i32 0, i64 0, i8* null)

```

The first set of instructions corresponds to the first taskloop construct. It is important to note that the implicit taskgroup region associated with the taskloop construct has been materialized in our IR:  the `__kmpc_taskloop` occurs inside a taskgroup region. Note also that this taskgroup region does not exist in our second taskloop because we are using the `nogroup` clause.

The issue here is the 4th argument of the kmpc_taskloop call, starting from the end,  is always a zero. Checking the LLVM OpenMP RT implementation, we see that this argument corresponds to the nogroup parameter:

```
void __kmpc_taskloop(ident_t *loc, int gtid, kmp_task_t *task, int if_val,
                     kmp_uint64 *lb, kmp_uint64 *ub, kmp_int64 st, int nogroup,
                     int sched, kmp_uint64 grainsize, void *task_dup);
```

So basically we always tell to the RT to do another taskgroup region. For the first taskloop, this means that we create two taskgroup regions. For the second example, it means that despite the fact we had a nogroup clause we are going to have a taskgroup region, so we unnecessary wait until all descendant tasks have been executed.

Reviewers: ABataev

Reviewed By: ABataev

Subscribers: rogfer01, cfe-commits

Differential Revision: https://reviews.llvm.org/D53636

llvm-svn: 345180
2018-10-24 19:06:37 +00:00
Alexey Bataev b40e0520e1 [OPENMP]Fix PR39366: do not try to private field if it is not captured.
The compiler is crashing if we trying to post-capture the fields
implicitly captured inside of the task constructs. Seems, this kind of
processing is not supported and such fields should not be
firstprivatized.

llvm-svn: 345177
2018-10-24 18:53:12 +00:00
Sean Fertile d900dd0c23 Revert "[CodeGenCXX] Treat 'this' as noalias in constructors"
This reverts commit https://reviews.llvm.org/rL344150 which causes
MachineOutliner related failures on the ppc64le multistage buildbot.

llvm-svn: 344526
2018-10-15 15:43:00 +00:00
Alexey Bataev 4ac58d1a4b [OPENMP][NVPTX]Reduce memory usage in target region.
Additional reduction of the global memory usage in the target regions
without parallel regions.

llvm-svn: 344413
2018-10-12 20:19:59 +00:00
Alexey Bataev 9bfe91da3d [OPENMP][NVPTX]Reduce memory usage in orphaned functions.
if the function has globalized variables and called in context of
target/teams/distribute regions, it does not need to globalize 32
copies of the same variables for memory coalescing, it is enough to
have just one copy, because there is parallel region.
Patch does this by adding call for `__kmpc_parallel_level` function and
checking its return value. If the code sees that the parallel level is
0, then only one variable is allocated, not 32.

llvm-svn: 344356
2018-10-12 16:04:20 +00:00
Alexey Bataev ff23bb6622 [OPENMP][NVPTX]Reduce memory use for globalized vars in
target/teams/distribute regions.

Previously introduced globalization scheme that uses memory coalescing
scheme may increase memory usage fr the variables that are devlared in
target/teams/distribute contexts. We don't need 32 copies of such
variables, just 1. Patch reduces memory use in this case.

llvm-svn: 344273
2018-10-11 18:30:31 +00:00
Patrick Lyster 3fe9e396f4 Add support for 'dynamic_allocators' clause on 'requires' directive. Differential Revision: https://reviews.llvm.org/D53079
llvm-svn: 344249
2018-10-11 14:41:10 +00:00
Anton Bikineev cc7e74753a [CodeGenCXX] Treat 'this' as noalias in constructors
This is currently a clang extension and a resolution
of the defect report in the C++ Standard.

Differential Revision: https://reviews.llvm.org/D46441

llvm-svn: 344150
2018-10-10 16:14:51 +00:00
Alexey Bataev 9ea3c38597 [OPENMP][NVPTX] Support memory coalescing for globalized variables.
Added support for memory coalescing for better performance for
globalized variables. From now on all the globalized variables are
represented as arrays of 32 elements and each thread accesses these
elements using `tid & 31` as index.

llvm-svn: 344049
2018-10-09 14:49:00 +00:00
Alexey Bataev 6bc2732f71 [OPENMP][NVPTX] Fix emission of __kmpc_global_thread_num() for non-SPMD
mode.

__kmpc_global_thread_num() should be called before initialization of the
runtime.

llvm-svn: 343857
2018-10-05 15:27:47 +00:00
Alexey Bataev fd006c44ee [OPENMP] Fix emission of the __kmpc_global_thread_num.
Fixed emission of the __kmpc_global_thread_num() so that it is not
messed up with alloca instructions anymore. Plus, fixes emission of the
__kmpc_global_thread_num() functions in the target outlined regions so
that they are not called before runtime is initialized.

llvm-svn: 343856
2018-10-05 15:08:53 +00:00
Patrick Lyster 6bdf63bd32 [OPENMP] Add reverse_offload clause to requires directive
llvm-svn: 343711
2018-10-03 20:07:58 +00:00
Jonas Hahnfeld 3ca4701d35 [OpenMP][NVPTX] Simplify codegen for orphaned parallel, NFCI.
Worker threads fork off to the compiler generated worker function
directly after entering the kernel function. Hence, there is no
need to check whether the current thread is the master if we are
outside of a parallel region (neither SPMD nor parallel_level > 0).

Differential Revision: https://reviews.llvm.org/D52732

llvm-svn: 343618
2018-10-02 19:12:54 +00:00
Alexey Bataev e79451a517 [OPENMP][NVPTX] Handle `requires datasharing` flag correctly with
lightweight runtime.

The datasharing flag must be set to `1` when executing SPMD-mode compatible directive with reduction|lastprivate clauses.

llvm-svn: 343492
2018-10-01 16:20:57 +00:00
Patrick Lyster 4a370b9f63 Add support for unified_shared_memory clause on requires directive
llvm-svn: 343472
2018-10-01 13:47:43 +00:00
Alexey Bataev bc52967835 [OPENMP]Fix PR39084: Check datasharing attributes of reduction variables only.
According to OpenMP, the reduction item must be shared in parent region.
But the item can be an array section or array subscript. In this case,
we should not check for the datasharing of the base declaration.

llvm-svn: 343356
2018-09-28 19:33:14 +00:00
Gheorghe-Teodor Bercea 8233af90e1 [OpenMP] Make default parallel for schedule in NVPTX target regions in SPMD mode achieve coalescing
Summary: Set default schedule for parallel for loops to schedule(static, 1) when using SPMD mode on the NVPTX device offloading toolchain to ensure coalescing.

Reviewers: ABataev, Hahnfeld, caomhin

Reviewed By: ABataev

Subscribers: jholewinski, guansong, cfe-commits

Differential Revision: https://reviews.llvm.org/D52629

llvm-svn: 343260
2018-09-27 20:29:00 +00:00
Gheorghe-Teodor Bercea 02650d4c2c [OpenMP] Make default distribute schedule for NVPTX target regions in SPMD mode achieve coalescing
Summary: For the OpenMP NVPTX toolchain choose a default distribute schedule that ensures coalescing on the GPU when in SPMD mode. This significantly increases the performance of offloaded target code and reduces the number of registers used on the GPU side.

Reviewers: ABataev, caomhin, Hahnfeld

Reviewed By: ABataev, Hahnfeld

Subscribers: Hahnfeld, jholewinski, guansong, cfe-commits

Differential Revision: https://reviews.llvm.org/D52434

llvm-svn: 343253
2018-09-27 19:22:56 +00:00
Kelvin Li 1408f91a25 [OPENMP] Add support for OMP5 requires directive + unified_address clause
Add support for OMP5.0 requires directive and unified_address clause.
Patches to follow will include support for additional clauses.

Differential Revision: https://reviews.llvm.org/D52359

llvm-svn: 343063
2018-09-26 04:28:39 +00:00
Alexey Bataev f898d1de4a [OPENMP] Disable emission of the class with vptr if they are not used in
target constructs.

Prevent compilation of the classes with the virtual tables when
compiling for the device.

llvm-svn: 342741
2018-09-21 14:55:26 +00:00
Alexey Bataev 2adecff1aa [OPENMP][NVPTX] Enable support for lastprivates in SPMD constructs.
Previously we could not use lastprivates in SPMD constructs, patch
allows supporting lastprivates in SPMD with uninitialized runtime.

llvm-svn: 342738
2018-09-21 14:22:53 +00:00
Alexey Bataev e82445f5a9 [OPENMP] Add support for mapping memory pointed by member pointer.
Added support for map(s, s.ptr[0:1]) kind of mapping.

llvm-svn: 342648
2018-09-20 13:54:02 +00:00
Alexey Bataev e6aa4694de [OPENMP] Fix PR38903: Crash on instantiation of the non-dependent
declare reduction.

If the declare reduction construct with the non-dependent type is
defined in the template construct, the compiler might crash on the
template instantition. Reworked the whole instantiation scheme for the
declare reduction constructs to fix this problem correctly.

llvm-svn: 342151
2018-09-13 16:54:05 +00:00
Alexey Bataev 43b90b73e6 [OPENMP] Fix PR38902: support ADL for declare reduction constructs.
Added support for argument-dependent lookup when trying to find the
required declare reduction decl.

llvm-svn: 342062
2018-09-12 16:31:59 +00:00
Alexey Bataev 30a7821760 [OPENMP] Simplified checks for declarations in declare target regions.
Sema analysis should not mark functions as an implicit declare target,
it may break codegen. Simplified semantic analysis and removed extra
code for implicit declare target functions.

llvm-svn: 341939
2018-09-11 13:59:10 +00:00
Kelvin Li bc38e63718 [OpenMP] Add support for nested 'declare target' directives
Add the capability to nest multiple declare target directives 
- including header files within a declare target region.

Differential Revision: https://reviews.llvm.org/D51378

Patch by Patrick Lyster

llvm-svn: 341766
2018-09-10 02:07:09 +00:00
Alexey Bataev 47b2ed2e16 Revert "[OPENMP][NVPTX] Disable runtime-type info for CUDA devices."
Still need the RTTI for NVPTX target to pass sema checks.

llvm-svn: 341668
2018-09-07 14:50:25 +00:00
Alexey Bataev d2c1dd5902 [OPENMP] Fix PR38823: Do not emit vtable if it is not used in target
context.

If the explicit template instantiation definition defined outside of the
target context, its vtable should not be marked as used. This is true
for other situations where the compiler want to emit vtables
unconditionally.

llvm-svn: 341570
2018-09-06 17:56:28 +00:00
Alexey Bataev 33c137bf0c [OPENMP][NVPTX] Disable runtime-type info for CUDA devices.
RTTI is not supported by the NVPTX target.

llvm-svn: 341483
2018-09-05 17:10:30 +00:00
Alexey Bataev bd8ff9bd70 [OPENMP] Fix PR38710: static functions are not emitted as implicitly
'declare target'.

All the functions, referenced in implicit|explicit target regions must
be emitted during code emission for the device.

llvm-svn: 341093
2018-08-30 18:56:11 +00:00
Alexey Bataev 80a9a61ded [OPENMP][NVPTX] Add options -f[no-]openmp-cuda-force-full-runtime.
Added options -f[no-]openmp-cuda-force-full-runtime to [not] force use
of the full runtime for OpenMP offloading to CUDA devices.

llvm-svn: 341073
2018-08-30 14:45:24 +00:00
Alexey Bataev b4dd6d24d7 [OPENMP] Do not create offloading entry for declare target variables
declarations.

We should not create offloading entries for declare target var
declarations as it causes compiler crash.

llvm-svn: 340968
2018-08-29 20:41:37 +00:00
Alexey Bataev 8d8e1235ab [OPENMP][NVPTX] Add support for lightweight runtime.
If the target construct can be executed in SPMD mode + it is a loop
based directive with static scheduling, we can use lightweight runtime
support.

llvm-svn: 340953
2018-08-29 18:32:21 +00:00
Mike Rice e1ca7b614f [OPENMP] Create non-const ident_t objects.
Currently ident_t objects are created const when debug info is not
enabled, but the libittnotify libray in the OpenMP runtime writes to
the reserved_2 field (See __kmp_itt_region_forking in
openmp/runtime/src/kmp_itt.inl).  Now create ident_t objects non-const.

Differential Revision: https://reviews.llvm.org/D51331

llvm-svn: 340934
2018-08-29 15:45:11 +00:00