Commit Graph

180 Commits

Author SHA1 Message Date
Alexey Bataev a44b216036 [OPENMP][NVPTX]Mark barrier functions calls as convergent.
Added convergent attribute to the barrier functions calls for correct
optimizations.

llvm-svn: 366437
2019-07-18 13:49:24 +00:00
Yaxun Liu 6add24adaf [HIP] Add GPU arch gfx1010, gfx1011, and gfx1012
Differential Revision: https://reviews.llvm.org/D64364

llvm-svn: 365799
2019-07-11 17:50:09 +00:00
Stanislav Mekhanoshin 0cfd75a07d [AMDGPU] gfx908 clang target
Differential Revision: https://reviews.llvm.org/D64430

llvm-svn: 365528
2019-07-09 18:19:00 +00:00
Gheorghe-Teodor Bercea 66cdbb47d2 [OpenMP] Add support for registering requires directives with the runtime
Summary:
This patch adds support for the registration of the requires directives with the runtime.

Each requires directive clause will enable a particular flag to be set.

The set of flags is passed to the runtime to be checked for compatibility with other such flags coming from other object files.

The registration function is called whenever OpenMP is present even if a requires directive is not present. This helps detect cases in which requires directives are used inconsistently.

Reviewers: ABataev, AlexEichenberger, caomhin

Reviewed By: ABataev, AlexEichenberger

Subscribers: jholewinski, guansong, jfb, jdoerfert, cfe-commits

Tags: #clang

Differential Revision: https://reviews.llvm.org/D60568

llvm-svn: 361298
2019-05-21 19:42:01 +00:00
Alexey Bataev 8c5555c39a [OPENMP][NVPTX]Mark more functions as always_inline for better
performance.

Internally generated functions must be marked as always_inlines in most
cases. Patch marks some extra reduction function + outlined parallel
functions as always_inline for better performance, but only if the
optimization is requested.

llvm-svn: 361269
2019-05-21 15:11:58 +00:00
Fangrui Song 899d13926d Use llvm::stable_sort
llvm-svn: 359098
2019-04-24 14:43:05 +00:00
Alexey Bataev 1472e32cd7 [OPENMP][NVPTX] target [teams distribute] simd maybe run without
runtime.

target [teams distribute] simd costructs do not require full runtime for
the correct execution, we can run them without full runtime.

llvm-svn: 358766
2019-04-19 16:48:38 +00:00
Alexey Bataev dc9e7dcbb0 [OPENMP][NVPTX]Run combined constructs with if clause in SPMD mode.
All target-parallel-based constructs can be run in SPMD mode from now
on. Even if num_threads clauses or if clauses are used, such constructs
can be executed in SPMD mode.

llvm-svn: 358595
2019-04-17 16:53:08 +00:00
Alexey Bataev 5e2879320d [OPENMP][NVPTX]Run combined constructs with if clause in SPMD mode.
Combined constructs with parallel and if clauses without modifiers may
be executed in SPMD mode since if the condition is true for the target
region, it is also true for parallel region and the threads must be run
in parallel.

llvm-svn: 358503
2019-04-16 15:39:12 +00:00
Alexey Bataev e0eb13135f [OPENMP][NVPTX]Run parallel regions with num_threads clauses in SPMD
mode.

After the previous patch with the more correct handling of the number of
threads in parallel regions, the parallel regions with num_threads
clauses can be executed in SPMD mode.

llvm-svn: 358445
2019-04-15 20:38:10 +00:00
Alexey Bataev 5c4273620d [OPENMP]Improve detection of number of teams, threads in target
regions.

Added more complex analysis for number of teams and number of threads in
the target regions, also merged related common code between CGOpenMPRuntime
and CGOpenMPRuntimeNVPTX classes.

llvm-svn: 358126
2019-04-10 19:11:33 +00:00
Alexey Bataev 1db9bfeba5 [OPENMP][NVPTX]Fixed processing of memory management directives.
Added special processing of the memory management directives/clauses for
NVPTX target. For private locals, omp_default_mem_alloc and
omp_thread_mem_alloc result in allocation in local memory.
omp_const_mem_alloc allocates const memory, omp_teams_mem_alloc
allocates shared memory, and omp_cgroup_mem_alloc and
omp_large_cap_mem_alloc allocate global memory.

llvm-svn: 357923
2019-04-08 16:53:57 +00:00
Alexey Bataev d2565d2126 [OPENMP]Fix a warning about unused variable, NFC.
llvm-svn: 356715
2019-03-21 20:52:04 +00:00
Alexey Bataev 084b0c2f03 [OPENMP] Simplify codegen for allocate directive on local variables.
Simplified codegen for the allocate directive for local variables,
initial implementation of the codegen for NVPTX target.

llvm-svn: 356710
2019-03-21 20:36:16 +00:00
Alexey Bataev c56872589f [OPENMP]Codegen support for allocate directive on global variables.
For the global variables the allocate directive must specify only the
predefined allocator. This allocator must be translated into the correct
form of the address space for the targets that support different address
spaces.

llvm-svn: 356702
2019-03-21 19:35:27 +00:00
Alexey Bataev 982a35eb1d [OPENMP]Remove unused parameter, NFC.
Parameter CodeGenModule &CGM is not required for CGOpenMPRuntime member
functions, since class holds the reference to the CGM.

llvm-svn: 356480
2019-03-19 17:09:52 +00:00
Alexey Bataev 4f680db257 [OPENMP] Codegen for local variables with the allocate pragma.
Added initial codegen for the local variables with the #pragma omp
allocate directive. Instead of allocating the variables on the stack,
__kmpc_alloc|__kmpc_free functions are used for memory (de-)allocation.

llvm-svn: 356472
2019-03-19 16:41:16 +00:00
Alexey Bataev 7b3eabdcd2 [OPENMP][NVPTX]Fix PR40893: Size doesn't match for
'_openmp_teams_reductions_buffer_$_.

nvlink does not handle weak linkage correctly, same symbols with the
different sizes are reported as erroneous though the largest size must
be chosen instead. Patch fixes this problem by using Internal linkage
instead of the Common.

llvm-svn: 356072
2019-03-13 18:21:10 +00:00
Alexey Bataev 5b68c72f77 [OPENMP]Remove debug service variable.
Removed not required service variable for the debug info.

llvm-svn: 355729
2019-03-08 20:48:54 +00:00
Alexey Bataev 25ed0c07c1 [OPENMP 5.0]Add initial support for 'allocate' directive.
Added parsing/sema analysis/serialization/deserialization support for
'allocate' directive.

llvm-svn: 355614
2019-03-07 17:54:44 +00:00
Alexey Bataev 1af5bd54a8 [OPENMP]Target region: emit const firstprivates as globals with constant
memory.

If the variable with the constant non-scalar type is firstprivatized in
the target region, the local copy is created with the data copying.
Instead, we allocate the copy in the constant memory and avoid extra
copying in the outlined target regions. This global copy is used in the
target regions without loss of the performance.

llvm-svn: 355418
2019-03-05 17:47:18 +00:00
Alexey Bataev 8061acd501 [OPENMP][NVPTX]Use faster teams reduction algorithm.
A faster way to reduce the values in teams reductions was found, the
codegen is updated to use this faster algorithm and new runtime functions.

llvm-svn: 354479
2019-02-20 16:36:22 +00:00
James Y Knight 751fe286dc [opaque pointer types] Cleanup CGBuilder's Create*GEP.
The various EltSize, Offset, DataLayout, and StructLayout arguments
are all computable from the Address's element type and the DataLayout
which the CGBuilder already has access to.

After having previously asserted that the computed values are the same
as those passed in, now remove the redundant arguments from
CGBuilder's Create*GEP functions.

Differential Revision: https://reviews.llvm.org/D57767

llvm-svn: 353629
2019-02-09 22:22:28 +00:00
James Y Knight 9871db064d [opaque pointer types] Pass function types for runtime function calls.
Emit{Nounwind,}RuntimeCall{,OrInvoke} have been modified to take a
FunctionCallee as an argument, and CreateRuntimeFunction has been
modified to return a FunctionCallee. All callers have been updated.

Additionally, CreateBuiltinFunction is removed, as it was redundant
with CreateRuntimeFunction after some previous changes.

Differential Revision: https://reviews.llvm.org/D57668

llvm-svn: 353184
2019-02-05 16:42:33 +00:00
Michael Kruse 251e1488e1 [OpenMP 5.0] Parsing/sema support for "omp declare mapper" directive.
This patch implements parsing and sema for "omp declare mapper"
directive. User defined mapper, i.e., declare mapper directive, is a new
feature in OpenMP 5.0. It is introduced to extend existing map clauses
for the purpose of simplifying the copy of complex data structures
between host and device (i.e., deep copy). An example is shown below:

    struct S {  int len;  int *d; };
    #pragma omp declare mapper(struct S s) map(s, s.d[0:s.len]) // Memory region that d points to is also mapped using this mapper.

Contributed-by: Lingda Li <lildmh@gmail.com>

Differential Revision: https://reviews.llvm.org/D56326

llvm-svn: 352906
2019-02-01 20:25:04 +00:00
Alexey Bataev e4e9ba2bea [OPENMP][NVPTX]Emit service debug variable for NVPTX.
In case of the empty module, the ptxas tool may emit error message about
empty debug info sections. This patch fixes this bug.

llvm-svn: 352421
2019-01-28 20:03:02 +00:00
Chandler Carruth 2946cd7010 Update the file headers across all of the LLVM projects in the monorepo
to reflect the new license.

We understand that people may be surprised that we're moving the header
entirely to discuss the new license. We checked this carefully with the
Foundation's lawyer and we believe this is the correct approach.

Essentially, all code in the project is now made available by the LLVM
project under our new license, so you will see that the license headers
include that license only. Some of our contributors have contributed
code under our old license, and accordingly, we have retained a copy of
our old license notice in the top-level files in each project and
repository.

llvm-svn: 351636
2019-01-19 08:50:56 +00:00
Alexey Bataev 7bb3353f6a [OPENMP]Add call to __kmpc_push_target_tripcount() function.
Each we create the target regions with the teams distribute inner
region, we can better estimate number of the teams required to execute
the target region. Function __kmpc_push_target_tripcount() is used for
purpose, which accepts device_id and the number of the iterations,
performed by the associated loop.

llvm-svn: 350571
2019-01-07 21:30:43 +00:00
Alexey Bataev 25d3de8a0a [OPENMP][NVPTX]Reduce number of barriers in reductions.
After the fix for the syncthreads we don't need to generate extra
barriers for the parallel reductions.

llvm-svn: 350530
2019-01-07 15:45:09 +00:00
Alexey Bataev 8e009036c9 [OPENMP][NVPTX]Use new functions from the runtime library.
Updated codegen to use the new functions from the runtime library.

llvm-svn: 350415
2019-01-04 17:25:09 +00:00
Alexey Bataev a3924b517e [OPENMP][NVPTX]Use __kmpc_barrier_simple_spmd(nullptr, 0) instead of
nvvm_barrier0.

Use runtime functions instead of the direct call to the nvvm intrinsics.
It allows to prevent some dangerous LLVM optimizations, that breaks the
code for the NVPTX target.

llvm-svn: 350328
2019-01-03 16:25:35 +00:00
Alexey Bataev 6a1b06bcd4 [OPENMP][NVPTX]Emit shared memory buffer for reduction as 128 bytes
buffer.

Seems to me, nvlink has a bug with the proper support of the weakly
linked symbols. It does not allow to define several shared memory buffer
with the different sizes even with the weak linkage. Instead we always
use 128 bytes buffer to prevent nvlink from the error message emission.

llvm-svn: 349540
2018-12-18 21:01:42 +00:00
Alexey Bataev 29d47fcb30 [OPENMP][NVPTX]Added extra sync point to the inter-warp copy function.
The parallel reduction operation requires an extra synchronization point
in the inter-warp copy function to avoid divergence.

llvm-svn: 349525
2018-12-18 19:20:15 +00:00
Alexey Bataev ae51b96f99 [OPENMP][NVPTX]Improved interwarp copy function.
Inlined runtime with the current implementation of the interwarp copy
function leads to the undefined behavior because of the not quite
correct implementation of the barriers. Start using generic
__kmpc_barier function instead of the custom made barriers.

llvm-svn: 349192
2018-12-14 21:00:58 +00:00
Raphael Isemann b23ccecbb0 Misc typos fixes in ./lib folder
Summary: Found via `codespell -q 3 -I ../clang-whitelist.txt -L uint,importd,crasher,gonna,cant,ue,ons,orign,ned`

Reviewers: teemperor

Reviewed By: teemperor

Subscribers: teemperor, jholewinski, jvesely, nhaehnle, whisperity, jfb, cfe-commits

Differential Revision: https://reviews.llvm.org/D55475

llvm-svn: 348755
2018-12-10 12:37:46 +00:00
Alexey Bataev 6393eb7ec6 [OPENMP][NVPTX] Fix globalization of the mapped array sections.
If the array section is based on pointer and this sections is mapped in
target region + then it is used in the inner parallel region, it also
must be globalized as the pointer itself is passed by value, not by
reference.

llvm-svn: 348492
2018-12-06 15:35:13 +00:00
Alexey Bataev 2c1ff9dda7 [OPENMP][NVPTX]Fixed emission of the critical region.
Critical regions in NVPTX are the constructs, which, generally speaking,
are not supported by the NVPTX target. Instead we're using special
technique to handle the critical regions. Currently they are supported
only within the loop and all the threads in the loop must execute the
same critical region.
Inside of this special regions the regions still must be emitted as
critical, to avoid possible data races between the teams +
synchronization must use __kmpc_barrier functions.

llvm-svn: 348272
2018-12-04 15:25:01 +00:00
Alexey Bataev c3028cac24 [OPENMP][NVPTX]Mark __kmpc_barrier functions as convergent.
__kmpc_barrier runtime functions must be marked as convergent to prevent
some dangerous optimizations. Also, for NVPTX target all barriers must
be emitted as simple barriers.

llvm-svn: 348271
2018-12-04 15:03:25 +00:00
Alexey Bataev 3ce5d827fc [OPENMP][NVPTX]Call get __kmpc_global_thread_num in worker after
initialization.

Function __kmpc_global_thread_num  should be called only after
initialization, not earlier.

llvm-svn: 347919
2018-11-29 21:21:32 +00:00
Gheorghe-Teodor Bercea 2b40470c61 [OpenMP] Add a new version of the SPMD deinit kernel function
Summary: This patch adds a new runtime for the SPMD deinit kernel function which replaces the previous function. The new function takes as argument the flag which signals whether the runtime is required or not. This enables the compiler to optimize out the part of the deinit function which are not needed.

Reviewers: ABataev, caomhin

Reviewed By: ABataev

Subscribers: jholewinski, guansong, cfe-commits

Differential Revision: https://reviews.llvm.org/D54970

llvm-svn: 347915
2018-11-29 20:53:49 +00:00
Alexey Bataev a116602475 [OPENMP][NVPTX]Basic support for reductions across the teams.
Added basic codegen support for the reductions across the teams.

llvm-svn: 347715
2018-11-27 21:24:54 +00:00
Alexey Bataev e8ad4b7124 [OPENMP][NVPTX]Emit default locations with the correct Exec|Runtime
modes.

If the region is inside target|teams|distribute region, we can emit the
locations with the correct info for execution mode and runtime mode.
Patch adds this ability to the NVPTX codegen to help the optimizer to
produce better code.

llvm-svn: 347583
2018-11-26 18:37:09 +00:00
Alexey Bataev ceeaa48052 [OPENMP][NVPTX]Emit default locations as constant with undefined mode.
For the NVPTX target default locations should be emitted as constants +
additional info must be emitted in the reserved_2 field of the ident_t
structure. The 1st bit controls the execution mode and the 2nd bit
controls use of the lightweight runtime. The combination of the bits for
Non-SPMD mode + lightweight runtime represents special undefined mode,
used outside of the target regions for orphaned directives or functions.
Should allow and additional optimization inside of the target regions.

llvm-svn: 347425
2018-11-21 21:04:34 +00:00
Patrick Lyster 8f7f586e53 [OpenMP] Check target architecture supports unified shared memory for requires directive. Differential Review: https://reviews.llvm.org/D54493
llvm-svn: 347214
2018-11-19 15:09:33 +00:00
David L. Jones 085ec01d6b Fix unused variable warning.
llvm-svn: 347133
2018-11-17 04:48:54 +00:00
Alexey Bataev f2f39be9ed [OPENMP][NVPTX]Emit correct reduction code for teams/parallel
reductions.

Fixed previously committed code for the reduction support in
teams/parallel constructs taking into account new design of the NVPTX
support in the compiler. Teams reduction are not fully functional yet,
it is going to be fixed in the following patches.

llvm-svn: 347081
2018-11-16 19:38:21 +00:00
Alexey Bataev 8bcc69c054 [OPENMP][NVPTX]Extend number of constructs executed in SPMD mode.
If the statements between target|teams|distribute directives does not
require execution in master thread, like constant expressions, null
statements, simple declarations, etc., such construct can be xecuted in
SPMD mode.

llvm-svn: 346551
2018-11-09 20:03:19 +00:00
Alexey Bataev 09c9eea78f [OPENMP][NVPTX]Allow to use shared memory for the
target|teams|distribute variables.

If the total size of the variables, declared in target|teams|distribute
regions, is less than the maximal size of shared memory available, the
buffer is allocated in the shared memory.

llvm-svn: 346507
2018-11-09 16:18:04 +00:00
Alexey Bataev 1fc1f8e819 [OPENMP][NVPTX]Use __kmpc_data_sharing_coalesced_push_stack function.
Coalesced memory access requires use of the new function
`__kmpc_data_sharing_coalesced_push_stack` instead of the
`__kmpc_data_sharing_push_stack`.

llvm-svn: 345991
2018-11-02 16:08:31 +00:00
Alexey Bataev e40901806f [OPENMP][NVPTX]Improve emission of the globalized variables for
target/teams/distribute regions.

Target/teams/distribute regions exist for all the time the kernel is
executed. Thus, if the variable is declared in their context and then
escape it, we can allocate global memory statically instead of
allocating it dynamically.
Patch captures all the globalized variables in target/teams/distribute
contexts, merges them into the records, one per each target region.
Those records are then joined into the union, one per compilation unit
(to save the global memory). Those units are organized into
2 x dimensional arrays, where the first dimension is
the number of blocks per SM and the second one is the number of SMs.
Runtime functions manage this global memory space between the executing
teams.

llvm-svn: 345978
2018-11-02 14:54:07 +00:00