Commit Graph

28 Commits

Author SHA1 Message Date
hyeongyukim aacfbb953e [Clang/Test]: Rename enable_noundef_analysis to disable-noundef-analysis and turn it off by default
Turning on `enable_noundef_analysis` flag allows better codegen by removing freeze instructions.
I modified clang by renaming `enable_noundef_analysis` flag to `disable-noundef-analysis` and turning it off by default.

Test updates are made as a separate patch: D108453

Reviewed By: eugenis

Differential Revision: https://reviews.llvm.org/D105169

[Clang/Test]: Rename enable_noundef_analysis to disable-noundef-analysis and turn it off by default (2)

This patch updates test files after D105169.
Autogenerated test codes are changed by `utils/update_cc_test_checks.py,` and non-autogenerated test codes are changed as follows:

(1) I wrote a python script that (partially) updates the tests using regex: {F18594904} The script is not perfect, but I believe it gives hints about which patterns are updated to have `noundef` attached.

(2) The remaining tests are updated manually.

Reviewed By: eugenis

Differential Revision: https://reviews.llvm.org/D108453

Resolve lit failures in clang after 8ca4b3e's land

Fix lit test failures in clang-ppc* and clang-x64-windows-msvc

Fix missing failures in clang-ppc64be* and retry fixing clang-x64-windows-msvc

Fix internal_clone(aarch64) inline assembly
2021-11-06 19:19:22 +09:00
Juneyoung Lee 89ad2822af Revert "[Clang/Test]: Rename enable_noundef_analysis to disable-noundef-analysis and turn it off by default"
This reverts commit 7584ef766a.
2021-11-06 15:39:19 +09:00
Juneyoung Lee 7584ef766a [Clang/Test]: Rename enable_noundef_analysis to disable-noundef-analysis and turn it off by default
Turning on `enable_noundef_analysis` flag allows better codegen by removing freeze instructions.
I modified clang by renaming `enable_noundef_analysis` flag to `disable-noundef-analysis` and turning it off by default.

Test updates are made as a separate patch: D108453

Reviewed By: eugenis

Differential Revision: https://reviews.llvm.org/D105169
2021-11-06 15:36:42 +09:00
Juneyoung Lee f193bcc701 Revert D105169 due to the two-stage failure in ASAN
This reverts the following commits:
37ca7a795b
9aa6c72b92
705387c507
8ca4b3ef19
80dba72a66
2021-10-18 23:52:46 +09:00
Juneyoung Lee 8ca4b3ef19 [Clang/Test]: Rename enable_noundef_analysis to disable-noundef-analysis and turn it off by default (2)
This patch updates test files after D105169.
Autogenerated test codes are changed by `utils/update_cc_test_checks.py,` and non-autogenerated test codes are changed as follows:

(1) I wrote a python script that (partially) updates the tests using regex: {F18594904} The script is not perfect, but I believe it gives hints about which patterns are updated to have `noundef` attached.

(2) The remaining tests are updated manually.

Reviewed By: eugenis

Differential Revision: https://reviews.llvm.org/D108453
2021-10-16 12:01:41 +09:00
Shilei Tian 423d34f74a [OpenMP][Offloading] Change `bool IsSPMD` to `int8_t Mode` in `__kmpc_target_init` and `__kmpc_target_deinit`
This is a follow-up of D110029, which uses bitset to indicate execution mode. This patches makes the changes in the function call.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D110279
2021-09-22 17:16:41 -04:00
Shilei Tian ca999f7191 [OpenMP][Offloading] Use bitset to indicate execution mode instead of value
The execution mode of a kernel is stored in a global variable, whose value means:
- 0 - SPMD mode
- 1 - indicates generic mode
- 2 - SPMD mode execution with generic mode semantics

We are going to add support for SIMD execution mode. It will be come with another
execution mode, such as SIMD-generic mode. As a result, this value-based indicator
is not flexible.

This patch changes to bitset based solution to encode execution mode. Each
position is:
[0] - generic mode
[1] - SPMD mode
[2] - SIMD mode (will be added later)

In this way, `0x1` is generic mode, `0x2` is SPMD mode, and `0x3` is SPMD mode
execution with generic mode semantics. In the future after we add the support for
SIMD mode, `0b1xx` will be in SIMD mode.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D110029
2021-09-22 11:40:52 -04:00
Johannes Doerfert e2cfbfcc0c [OpenMP] Unified entry point for SPMD & generic kernels in the device RTL
In the spirit of TRegions [0], this patch provides a simpler and uniform
interface for a kernel to set up the device runtime. The OMPIRBuilder is
used for reuse in Flang. A custom state machine will be generated in the
follow up patch.

The "surplus" threads of the "master warp" will not exit early anymore
so we need to use non-aligned barriers. The new runtime will not have an
extra warp but also require these non-aligned barriers.

[0] https://link.springer.com/chapter/10.1007/978-3-030-28596-8_11

This was in parts extracted from D59319.

Reviewed By: ABataev, JonChesterfield

Differential Revision: https://reviews.llvm.org/D101976
2021-07-10 17:53:56 -05:00
Joseph Huber 68d133a3e8 [OpenMP] Simplify GPU memory globalization
Summary:
Memory globalization is required to maintain OpenMP standard semantics for data sharing between
worker and master threads. The GPU cannot share data between its threads so must allocate global or
shared memory to store the data in. Currently this is implemented fully in the frontend using the
`__kmpc_data_sharing_push_stack` and __kmpc_data_sharing_pop_stack` functions to emulate standard
CPU stack sharing. The front-end scans the target region for variables that escape the region and
must be shared between the threads. Each variable then has a field created for it in a global record
type.

This patch replaces this functinality with a single allocation command, effectively mimicing an
alloca instruction for the variables that must be shared between the threads. This will be much
slower than the current solution, but makes it much easier to optimize as we can analyze each
variable independently and determine if it is not captured. In the future, we can replace these
calls with an `alloca` and small allocations can be pushed to shared memory.

Reviewed By: tianshilei1992

Differential Revision: https://reviews.llvm.org/D97680
2021-06-22 10:52:46 -04:00
JonChesterfield 5d02ca49a2 [libomptarget][nvptx] Undef, weak shared variables
[libomptarget][nvptx] Undef, weak shared variables

Shared variables on nvptx, and LDS on amdgcn, are uninitialized at
the start of kernel execution. Therefore create the variables with
undef instead of zeros, motivated in part by the amdgcn back end
rejecting LDS+initializer.

Common is zero initialized, which seems incompatible with shared. Thus
change them to weak, following the direction of
https://reviews.llvm.org/rG7b3eabdcd215

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D90248
2020-10-28 14:25:36 +00:00
Pushpinder Singh 3a12ff0dac [OpenMP][RTL] Remove dead code
RequiresDataSharing was always 0, resulting dead code in device runtime library.

Reviewed By: jdoerfert, JonChesterfield

Differential Revision: https://reviews.llvm.org/D88829
2020-10-06 05:43:47 -04:00
Tim Northover a009a60a91 IR: print value numbers for unnamed function arguments
For consistency with normal instructions and clarity when reading IR,
it's best to print the %0, %1, ... names of function arguments in
definitions.

Also modifies the parser to accept IR in that form for obvious reasons.

llvm-svn: 367755
2019-08-03 14:28:34 +00:00
Alexey Bataev 25d3de8a0a [OPENMP][NVPTX]Reduce number of barriers in reductions.
After the fix for the syncthreads we don't need to generate extra
barriers for the parallel reductions.

llvm-svn: 350530
2019-01-07 15:45:09 +00:00
Alexey Bataev 8e009036c9 [OPENMP][NVPTX]Use new functions from the runtime library.
Updated codegen to use the new functions from the runtime library.

llvm-svn: 350415
2019-01-04 17:25:09 +00:00
Alexey Bataev 29d47fcb30 [OPENMP][NVPTX]Added extra sync point to the inter-warp copy function.
The parallel reduction operation requires an extra synchronization point
in the inter-warp copy function to avoid divergence.

llvm-svn: 349525
2018-12-18 19:20:15 +00:00
Alexey Bataev ae51b96f99 [OPENMP][NVPTX]Improved interwarp copy function.
Inlined runtime with the current implementation of the interwarp copy
function leads to the undefined behavior because of the not quite
correct implementation of the barriers. Start using generic
__kmpc_barier function instead of the custom made barriers.

llvm-svn: 349192
2018-12-14 21:00:58 +00:00
Gheorghe-Teodor Bercea 2b40470c61 [OpenMP] Add a new version of the SPMD deinit kernel function
Summary: This patch adds a new runtime for the SPMD deinit kernel function which replaces the previous function. The new function takes as argument the flag which signals whether the runtime is required or not. This enables the compiler to optimize out the part of the deinit function which are not needed.

Reviewers: ABataev, caomhin

Reviewed By: ABataev

Subscribers: jholewinski, guansong, cfe-commits

Differential Revision: https://reviews.llvm.org/D54970

llvm-svn: 347915
2018-11-29 20:53:49 +00:00
Alexey Bataev f2f39be9ed [OPENMP][NVPTX]Emit correct reduction code for teams/parallel
reductions.

Fixed previously committed code for the reduction support in
teams/parallel constructs taking into account new design of the NVPTX
support in the compiler. Teams reduction are not fully functional yet,
it is going to be fixed in the following patches.

llvm-svn: 347081
2018-11-16 19:38:21 +00:00
Alexey Bataev 8bcc69c054 [OPENMP][NVPTX]Extend number of constructs executed in SPMD mode.
If the statements between target|teams|distribute directives does not
require execution in master thread, like constant expressions, null
statements, simple declarations, etc., such construct can be xecuted in
SPMD mode.

llvm-svn: 346551
2018-11-09 20:03:19 +00:00
Alexey Bataev 8d8e1235ab [OPENMP][NVPTX] Add support for lightweight runtime.
If the target construct can be executed in SPMD mode + it is a loop
based directive with static scheduling, we can use lightweight runtime
support.

llvm-svn: 340953
2018-08-29 18:32:21 +00:00
Gheorghe-Teodor Bercea ad4e579407 [OpenMP] Initialize data sharing stack for SPMD case
Summary: In the SPMD case, we need to initialize the data sharing and globalization infrastructure. This covers the case when an SPMD region calls a function in a different compilation unit.

Reviewers: ABataev, carlo.bertolli, caomhin

Reviewed By: ABataev

Subscribers: Hahnfeld, jholewinski, guansong, cfe-commits

Differential Revision: https://reviews.llvm.org/D49188

llvm-svn: 337015
2018-07-13 16:18:24 +00:00
Alexey Bataev 12c62908b5 [OPENMP, NVPTX] Fix reduction of the big data types/structures.
If the shuffle is required for the reduced structures/big data type,
current code may cause compiler crash because of the loading of the
aggregate values. Patch fixes this problem.

llvm-svn: 335377
2018-06-22 19:10:38 +00:00
Alexey Bataev e372710d30 [OPENMP] Code cleanup and code improvements.
llvm-svn: 330270
2018-04-18 15:57:46 +00:00
Carlo Bertolli 79712097c7 [OpenMP] Extend NVPTX SPMD implementation of combined constructs
Differential Revision: https://reviews.llvm.org/D43852

This patch extends the SPMD implementation to all target constructs and guards this implementation under a new flag.

llvm-svn: 326368
2018-02-28 20:48:35 +00:00
Alexey Bataev b2575930b3 [OPENMP] Fix casting in NVPTX support library.
If the reduction required shuffle in the NVPTX codegen, we may need to
cast the reduced value to the integer type. This casting was implemented
incorrectly and may cause compiler crash. Patch fixes this problem.

llvm-svn: 321818
2018-01-04 20:18:55 +00:00
Arpith Chacko Jacob 101e8fb1f3 [OpenMP] Parallel reduction on the NVPTX device.
This patch implements codegen for the reduction clause on
any parallel construct for elementary data types.  An efficient
implementation requires hierarchical reduction within a
warp and a threadblock.  It is complicated by the fact that
variables declared in the stack of a CUDA thread cannot be
shared with other threads.

The patch creates a struct to hold reduction variables and
a number of helper functions.  The OpenMP runtime on the GPU
implements reduction algorithms that uses these helper
functions to perform reductions within a team.  Variables are
shared between CUDA threads using shuffle intrinsics.

An implementation of reductions on the NVPTX device is
substantially different to that of CPUs.  However, this patch
is written so that there are minimal changes to the rest of
OpenMP codegen.

The implemented design allows the compiler and runtime to be
decoupled, i.e., the runtime does not need to know of the
reduction operation(s), the type of the reduction variable(s),
or the number of reductions.  The design also allows reuse of
host codegen, with appropriate specialization for the NVPTX
device.

While the patch does introduce a number of abstractions, the
expected use case calls for inlining of the GPU OpenMP runtime.
After inlining and optimizations in LLVM, these abstractions
are unwound and performance of OpenMP reductions is comparable
to CUDA-canonical code.

Patch by Tian Jin in collaboration with Arpith Jacob

Reviewers: ABataev
Differential Revision: https://reviews.llvm.org/D29758

llvm-svn: 295333
2017-02-16 16:20:16 +00:00
Arpith Chacko Jacob bd6344c0be Revert r295319 while investigating buildbot failure.
llvm-svn: 295323
2017-02-16 14:25:35 +00:00
Arpith Chacko Jacob 8e170fc857 [OpenMP] Parallel reduction on the NVPTX device.
This patch implements codegen for the reduction clause on
any parallel construct for elementary data types.  An efficient
implementation requires hierarchical reduction within a
warp and a threadblock.  It is complicated by the fact that
variables declared in the stack of a CUDA thread cannot be
shared with other threads.

The patch creates a struct to hold reduction variables and
a number of helper functions.  The OpenMP runtime on the GPU
implements reduction algorithms that uses these helper
functions to perform reductions within a team.  Variables are
shared between CUDA threads using shuffle intrinsics.

An implementation of reductions on the NVPTX device is
substantially different to that of CPUs.  However, this patch
is written so that there are minimal changes to the rest of
OpenMP codegen.

The implemented design allows the compiler and runtime to be
decoupled, i.e., the runtime does not need to know of the
reduction operation(s), the type of the reduction variable(s),
or the number of reductions.  The design also allows reuse of
host codegen, with appropriate specialization for the NVPTX
device.

While the patch does introduce a number of abstractions, the
expected use case calls for inlining of the GPU OpenMP runtime.
After inlining and optimizations in LLVM, these abstractions
are unwound and performance of OpenMP reductions is comparable
to CUDA-canonical code.

Patch by Tian Jin in collaboration with Arpith Jacob

Reviewers: ABataev
Differential Revision: https://reviews.llvm.org/D29758

llvm-svn: 295319
2017-02-16 14:03:36 +00:00