Commit Graph

58 Commits

Author SHA1 Message Date
Atmn Patel 737c4a2673 [clang][openmp][NFC] Remove arch-specific CGOpenMPRuntimeGPU files
The existing CGOpenMPRuntimeAMDGCN and CGOpenMPRuntimeNVPTX classes are
just code bloat. By removing them, the codebase gets a bit cleaner.

Reviewed By: jdoerfert, JonChesterfield, tianshilei1992

Differential Revision: https://reviews.llvm.org/D113421
2021-11-09 15:11:05 -05:00
Atmn Patel ef717f3852 Revert "[clang][openmp][NFC] Remove arch-specific CGOpenMPRuntimeGPU files"
This reverts commit 81a7cad2ff.
2021-11-09 02:10:42 -05:00
Atmn Patel 81a7cad2ff [clang][openmp][NFC] Remove arch-specific CGOpenMPRuntimeGPU files
The existing CGOpenMPRuntimeAMDGCN and CGOpenMPRuntimeNVPTX classes are
just code bloat. By removing them, the codebase gets a bit cleaner.

Reviewed By: jdoerfert, JonChesterfield, tianshilei1992

Differential Revision: https://reviews.llvm.org/D113421
2021-11-09 01:52:52 -05:00
hyeongyu kim fd9b099906 Revert "[Clang/Test]: Rename enable_noundef_analysis to disable-noundef-analysis and turn it off by default"
This reverts commit aacfbb953e.

Revert "Fix lit test failures in CodeGenCoroutines"

This reverts commit 63fff0f5bf.
2021-11-09 02:15:55 +09:00
hyeongyukim aacfbb953e [Clang/Test]: Rename enable_noundef_analysis to disable-noundef-analysis and turn it off by default
Turning on `enable_noundef_analysis` flag allows better codegen by removing freeze instructions.
I modified clang by renaming `enable_noundef_analysis` flag to `disable-noundef-analysis` and turning it off by default.

Test updates are made as a separate patch: D108453

Reviewed By: eugenis

Differential Revision: https://reviews.llvm.org/D105169

[Clang/Test]: Rename enable_noundef_analysis to disable-noundef-analysis and turn it off by default (2)

This patch updates test files after D105169.
Autogenerated test codes are changed by `utils/update_cc_test_checks.py,` and non-autogenerated test codes are changed as follows:

(1) I wrote a python script that (partially) updates the tests using regex: {F18594904} The script is not perfect, but I believe it gives hints about which patterns are updated to have `noundef` attached.

(2) The remaining tests are updated manually.

Reviewed By: eugenis

Differential Revision: https://reviews.llvm.org/D108453

Resolve lit failures in clang after 8ca4b3e's land

Fix lit test failures in clang-ppc* and clang-x64-windows-msvc

Fix missing failures in clang-ppc64be* and retry fixing clang-x64-windows-msvc

Fix internal_clone(aarch64) inline assembly
2021-11-06 19:19:22 +09:00
Juneyoung Lee 89ad2822af Revert "[Clang/Test]: Rename enable_noundef_analysis to disable-noundef-analysis and turn it off by default"
This reverts commit 7584ef766a.
2021-11-06 15:39:19 +09:00
Juneyoung Lee 7584ef766a [Clang/Test]: Rename enable_noundef_analysis to disable-noundef-analysis and turn it off by default
Turning on `enable_noundef_analysis` flag allows better codegen by removing freeze instructions.
I modified clang by renaming `enable_noundef_analysis` flag to `disable-noundef-analysis` and turning it off by default.

Test updates are made as a separate patch: D108453

Reviewed By: eugenis

Differential Revision: https://reviews.llvm.org/D105169
2021-11-06 15:36:42 +09:00
Juneyoung Lee f193bcc701 Revert D105169 due to the two-stage failure in ASAN
This reverts the following commits:
37ca7a795b
9aa6c72b92
705387c507
8ca4b3ef19
80dba72a66
2021-10-18 23:52:46 +09:00
Juneyoung Lee 8ca4b3ef19 [Clang/Test]: Rename enable_noundef_analysis to disable-noundef-analysis and turn it off by default (2)
This patch updates test files after D105169.
Autogenerated test codes are changed by `utils/update_cc_test_checks.py,` and non-autogenerated test codes are changed as follows:

(1) I wrote a python script that (partially) updates the tests using regex: {F18594904} The script is not perfect, but I believe it gives hints about which patterns are updated to have `noundef` attached.

(2) The remaining tests are updated manually.

Reviewed By: eugenis

Differential Revision: https://reviews.llvm.org/D108453
2021-10-16 12:01:41 +09:00
hsmahesha f7de6962c8 [CFE][Codegen][In-progress] Remove CodeGenFunction::InitTempAlloca()
Sequel patch to https://reviews.llvm.org/D111293.

Remove call to CodeGenFunction::InitTempAlloca() from OpenMP related
codegen part.

Also remove the metadata `!llvm.access.group` from the updated lit
tests.

Reviewed By: rjmccall

Differential Revision: https://reviews.llvm.org/D111316
2021-10-12 10:01:46 +05:30
Joseph Huber bad44d5f39 [OpenMP] Add RTL function for getting number of threads in block.
This patch adds support for the
`__kmpc_get_hardware_num_threads_in_block` function that returns the
number of threads. This was missing in the new runtime and was used by
the AMDGPU plugin which prevented it from using the new runtime. This
patchs also unified the interface for getting the thread numbers in the
frontend.

Originally authored by jdoerfert.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D111475
2021-10-08 22:21:59 -04:00
Shilei Tian 423d34f74a [OpenMP][Offloading] Change `bool IsSPMD` to `int8_t Mode` in `__kmpc_target_init` and `__kmpc_target_deinit`
This is a follow-up of D110029, which uses bitset to indicate execution mode. This patches makes the changes in the function call.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D110279
2021-09-22 17:16:41 -04:00
Giorgis Georgakoudis ac90dfc43a Revert "[OpenMP] Codegen aggregate for outlined function captures"
This reverts commit 1d66649adf.

Revert to fix AMG GPU issue.
2021-09-21 13:20:39 -07:00
Giorgis Georgakoudis 1d66649adf [OpenMP] Codegen aggregate for outlined function captures
Parallel regions are outlined as functions with capture variables explicitly generated as distinct parameters in the function's argument list. That complicates the fork_call interface in the OpenMP runtime: (1) the fork_call is variadic since there is a variable number of arguments to forward to the outlined function, (2) wrapping/unwrapping arguments happens in the OpenMP runtime, which is sub-optimal, has been a source of ABI bugs, and has a hardcoded limit (16) in the number of arguments, (3)  forwarded arguments must cast to pointer types, which complicates debugging. This patch avoids those issues by aggregating captured arguments in a struct to pass to the fork_call.

Reviewed By: jdoerfert, jhuber6

Differential Revision: https://reviews.llvm.org/D102107
2021-09-21 10:50:04 -07:00
Jon Chesterfield 21d91a8ef3 [libomptarget][devicertl] Replace lanemask with uint64 at interface
Use uint64_t for lanemask on all GPU architectures at the interface
with clang. Updates tests. The deviceRTL is always linked as IR so the zext
and trunc introduced for wave32 architectures will fold after inlining.

Simplification partly motivated by amdgpu gfx10 which will be wave32 and
is awkward to express in the current arch-dependant typedef interface.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D108317
2021-08-18 20:47:33 +01:00
Joseph Huber 754eb1c210 [OpenMP] Change `__kmpc_free_shared` to include the paired allocation size
This patch changes `__kmpc_free_shared` to take an additional argument
corresponding to the associated allocation's size. This makes it easier to
implement the allocator in the runtime.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D106496
2021-07-21 20:56:21 -04:00
Giorgis Georgakoudis fb0cf01795 Revert "[OpenMP] Codegen aggregate for outlined function captures"
This reverts commit e9c7291cb2.

Fix failing tests
2021-07-19 07:54:26 -07:00
Giorgis Georgakoudis e9c7291cb2 [OpenMP] Codegen aggregate for outlined function captures
Parallel regions are outlined as functions with capture variables explicitly generated as distinct parameters in the function's argument list. That complicates the fork_call interface in the OpenMP runtime: (1) the fork_call is variadic since there is a variable number of arguments to forward to the outlined function, (2) wrapping/unwrapping arguments happens in the OpenMP runtime, which is sub-optimal, has been a source of ABI bugs, and has a hardcoded limit (16) in the number of arguments, (3)  forwarded arguments must cast to pointer types, which complicates debugging. This patch avoids those issues by aggregating captured arguments in a struct to pass to the fork_call.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D102107
2021-07-16 23:27:44 -07:00
Johannes Doerfert e2cfbfcc0c [OpenMP] Unified entry point for SPMD & generic kernels in the device RTL
In the spirit of TRegions [0], this patch provides a simpler and uniform
interface for a kernel to set up the device runtime. The OMPIRBuilder is
used for reuse in Flang. A custom state machine will be generated in the
follow up patch.

The "surplus" threads of the "master warp" will not exit early anymore
so we need to use non-aligned barriers. The new runtime will not have an
extra warp but also require these non-aligned barriers.

[0] https://link.springer.com/chapter/10.1007/978-3-030-28596-8_11

This was in parts extracted from D59319.

Reviewed By: ABataev, JonChesterfield

Differential Revision: https://reviews.llvm.org/D101976
2021-07-10 17:53:56 -05:00
Joseph Huber 68d133a3e8 [OpenMP] Simplify GPU memory globalization
Summary:
Memory globalization is required to maintain OpenMP standard semantics for data sharing between
worker and master threads. The GPU cannot share data between its threads so must allocate global or
shared memory to store the data in. Currently this is implemented fully in the frontend using the
`__kmpc_data_sharing_push_stack` and __kmpc_data_sharing_pop_stack` functions to emulate standard
CPU stack sharing. The front-end scans the target region for variables that escape the region and
must be shared between the threads. Each variable then has a field created for it in a global record
type.

This patch replaces this functinality with a single allocation command, effectively mimicing an
alloca instruction for the variables that must be shared between the threads. This will be much
slower than the current solution, but makes it much easier to optimize as we can analyze each
variable independently and determine if it is not captured. In the future, we can replace these
calls with an `alloca` and small allocations can be pushed to shared memory.

Reviewed By: tianshilei1992

Differential Revision: https://reviews.llvm.org/D97680
2021-06-22 10:52:46 -04:00
Giorgis Georgakoudis 207b08a913 [OpenMP][NFC] Refactor Clang OpenMP tests using update_cc_test_checks
This patch refactors a subset of Clang OpenMP tests, generating checklines using the update_cc_test_checks script. This refactoring facilitates updating the Clang OpenMP code generation codebase by automating test generation.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D101849
2021-05-05 20:08:38 -07:00
Giorgis Georgakoudis 78a7d8c4dd [Utils][NFC] Rename replace-function-regex in update_cc_test_checks
This patch renames the replace-function-regex to replace-value-regex to indicate that the existing regex replacement functionality can replace any IR value besides functions.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D101934
2021-05-05 14:19:30 -07:00
Giorgis Georgakoudis f016c06abb Revert "[OpenMP][NFC] Refactor Clang OpenMP tests using update_cc_test_checks"
This reverts commit 956cae2f09.
2021-05-04 17:12:32 -07:00
Giorgis Georgakoudis 956cae2f09 [OpenMP][NFC] Refactor Clang OpenMP tests using update_cc_test_checks
This patch refactors a subset of Clang OpenMP tests, generating checklines using the update_cc_test_checks script. This refactoring facilitates updating the Clang OpenMP code generation codebase by automating test generation.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D101849
2021-05-04 16:58:45 -07:00
Giorgis Georgakoudis a2dbfb6b72 [OpenMP] Simplify offloading parallel call codegen
This revision simplifies Clang codegen for parallel regions in OpenMP GPU target offloading and corresponding changes in libomptarget: SPMD/non-SPMD parallel calls are unified under a single `kmpc_parallel_51` runtime entry point for parallel regions (which will be commonized between target, host-side parallel regions), data sharing is internalized to the runtime. Tests have been auto-generated using `update_cc_test_checks.py`. Also, the revision contains changes to OpenMPOpt for remark creation on target offloading regions.

Reviewed By: jdoerfert, Meinersbur

Differential Revision: https://reviews.llvm.org/D95976
2021-04-21 18:46:07 -07:00
Johannes Doerfert 03cc8a1ba0 [OpenMP][NFC] Move the `noinline` to the parallel entry point
The `noinline` for non-SPMD parallel functions is probably not necessary
but as long as we use it we should put it on the outermost parallel
function, which is the wrapper, not the actual outlined function.

Resolves PR49752

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D99506
2021-03-30 01:12:45 -05:00
JonChesterfield 5d02ca49a2 [libomptarget][nvptx] Undef, weak shared variables
[libomptarget][nvptx] Undef, weak shared variables

Shared variables on nvptx, and LDS on amdgcn, are uninitialized at
the start of kernel execution. Therefore create the variables with
undef instead of zeros, motivated in part by the amdgcn back end
rejecting LDS+initializer.

Common is zero initialized, which seems incompatible with shared. Thus
change them to weak, following the direction of
https://reviews.llvm.org/rG7b3eabdcd215

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D90248
2020-10-28 14:25:36 +00:00
Joseph Huber 3cc1f1fc1d [OpenMP] Replace OpenMP RTL Functions With OMPIRBuilder and OMPKinds.def
Summary:
Replace the OpenMP Runtime Library functions used in CGOpenMPRuntimeGPU
for OpenMP device code generation with ones in OMPKinds.def and use
OMPIRBuilder for generating runtime calls. This allows us to
consolidate more OpenMP code generation into the OMPIRBuilder. Future
additions to the GPU runtime functions should now go in OMPKinds.def

Reviewers: jdoerfert

Subscribers: aaron.ballman cfe-commits guansong llvm-commits sstefan1 yaxunl

Tags: #OpenMP #LLVM #clang

Differential Revision: https://reviews.llvm.org/D88430
2020-10-08 14:00:22 -04:00
Joseph Huber 1b60f63e4f Revert "[OpenMP] Replace OpenMP RTL Functions With OMPIRBuilder and OMPKinds.def"
Failing tests on Arm due to the tests automatically populating
incomatible pointer width architectures. Reverting until the tests are
updated. Failing tests:

OpenMP/distribute_parallel_for_num_threads_codegen.cpp
OpenMP/distribute_parallel_for_if_codegen.cpp
OpenMP/distribute_parallel_for_simd_if_codegen.cpp
OpenMP/distribute_parallel_for_simd_num_threads_codegen.cpp
OpenMP/target_teams_distribute_parallel_for_if_codegen.cpp
OpenMP/target_teams_distribute_parallel_for_simd_if_codegen.cpp
OpenMP/teams_distribute_parallel_for_if_codegen.cpp
OpenMP/teams_distribute_parallel_for_simd_if_codegen.cpp

This reverts commit 90eaedda9b.
2020-09-30 15:12:21 -04:00
Joseph Huber 90eaedda9b [OpenMP] Replace OpenMP RTL Functions With OMPIRBuilder and OMPKinds.def
Summary:
Replace the OpenMP Runtime Library functions used in CGOpenMPRuntimeGPU
for OpenMP device code generation with ones in OMPKinds.def and use
OMPIRBuilder for generating runtime calls. This allows us to consolidate
more OpenMP code generation into the OMPIRBuilder. This patch also
invalidates specifying target architectures with conflicting pointer
sizes.

Reviewers: jdoerfert

Subscribers: aaron.ballman cfe-commits guansong llvm-commits sstefan1 yaxunl

Tags: #OpenMP #Clang #LLVM

Differential Revision: https://reviews.llvm.org/D88430
2020-09-30 14:00:01 -04:00
Johannes Doerfert c98699582a [OpenMP][NFC] Remove unused (always fixed) arguments
There are various runtime calls in the device runtime with unused, or
always fixed, arguments. This is bad for all sorts of reasons. Clean up
two before as we match them in OpenMPOpt now.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D83268
2020-07-11 00:51:51 -05:00
Alexey Bataev 32ea3397be [OPENMP]Dynamic globalization for parallel target regions.
Summary:
Added support for dynamic memory allocation for globalized variables in
case if execution of target regions in parallel is required.

Reviewers: jdoerfert

Subscribers: jholewinski, yaxunl, guansong, sstefan1, cfe-commits, caomhin

Tags: #clang

Differential Revision: https://reviews.llvm.org/D82324
2020-06-25 08:25:24 -04:00
Eli Friedman 62f3ef2b53 [CGCall] Annotate references with "align" attribute.
If we're going to assume references are dereferenceable, we should also
assume they're aligned: otherwise, we can't actually dereference them.

See also D80072.

Differential Revision: https://reviews.llvm.org/D80166
2020-05-19 20:21:30 -07:00
Matt Arsenault 56a503bdba OpenMP: Add convergent to more runtime functions
Several of these other functions are probably also convergent, but
these two seem obviously convergent.
2019-10-27 21:26:55 -07:00
Alexey Bataev 2cd7fafc11 [OPENMP][NVPTX]Fix critical region codegen.
Summary:
Previously critical regions were emitted with the barrier making it a
worksharing construct though it is not. Also, it leads to incorrect
behavior in Cuda9+. Patch fixes this problem.

Reviewers: ABataev, jdoerfert

Subscribers: jholewinski, guansong, cfe-commits, grokos

Tags: #clang

Differential Revision: https://reviews.llvm.org/D66673

llvm-svn: 369946
2019-08-26 19:07:48 +00:00
Alexey Bataev a44b216036 [OPENMP][NVPTX]Mark barrier functions calls as convergent.
Added convergent attribute to the barrier functions calls for correct
optimizations.

llvm-svn: 366437
2019-07-18 13:49:24 +00:00
Alexey Bataev 8c5555c39a [OPENMP][NVPTX]Mark more functions as always_inline for better
performance.

Internally generated functions must be marked as always_inlines in most
cases. Patch marks some extra reduction function + outlined parallel
functions as always_inline for better performance, but only if the
optimization is requested.

llvm-svn: 361269
2019-05-21 15:11:58 +00:00
Alexey Bataev 8e009036c9 [OPENMP][NVPTX]Use new functions from the runtime library.
Updated codegen to use the new functions from the runtime library.

llvm-svn: 350415
2019-01-04 17:25:09 +00:00
Alexey Bataev a3924b517e [OPENMP][NVPTX]Use __kmpc_barrier_simple_spmd(nullptr, 0) instead of
nvvm_barrier0.

Use runtime functions instead of the direct call to the nvvm intrinsics.
It allows to prevent some dangerous LLVM optimizations, that breaks the
code for the NVPTX target.

llvm-svn: 350328
2019-01-03 16:25:35 +00:00
Alexey Bataev 6a1b06bcd4 [OPENMP][NVPTX]Emit shared memory buffer for reduction as 128 bytes
buffer.

Seems to me, nvlink has a bug with the proper support of the weakly
linked symbols. It does not allow to define several shared memory buffer
with the different sizes even with the weak linkage. Instead we always
use 128 bytes buffer to prevent nvlink from the error message emission.

llvm-svn: 349540
2018-12-18 21:01:42 +00:00
Alexey Bataev 2c1ff9dda7 [OPENMP][NVPTX]Fixed emission of the critical region.
Critical regions in NVPTX are the constructs, which, generally speaking,
are not supported by the NVPTX target. Instead we're using special
technique to handle the critical regions. Currently they are supported
only within the loop and all the threads in the loop must execute the
same critical region.
Inside of this special regions the regions still must be emitted as
critical, to avoid possible data races between the teams +
synchronization must use __kmpc_barrier functions.

llvm-svn: 348272
2018-12-04 15:25:01 +00:00
Alexey Bataev c3028cac24 [OPENMP][NVPTX]Mark __kmpc_barrier functions as convergent.
__kmpc_barrier runtime functions must be marked as convergent to prevent
some dangerous optimizations. Also, for NVPTX target all barriers must
be emitted as simple barriers.

llvm-svn: 348271
2018-12-04 15:03:25 +00:00
Alexey Bataev f2f39be9ed [OPENMP][NVPTX]Emit correct reduction code for teams/parallel
reductions.

Fixed previously committed code for the reduction support in
teams/parallel constructs taking into account new design of the NVPTX
support in the compiler. Teams reduction are not fully functional yet,
it is going to be fixed in the following patches.

llvm-svn: 347081
2018-11-16 19:38:21 +00:00
Alexey Bataev 09c9eea78f [OPENMP][NVPTX]Allow to use shared memory for the
target|teams|distribute variables.

If the total size of the variables, declared in target|teams|distribute
regions, is less than the maximal size of shared memory available, the
buffer is allocated in the shared memory.

llvm-svn: 346507
2018-11-09 16:18:04 +00:00
Alexey Bataev e40901806f [OPENMP][NVPTX]Improve emission of the globalized variables for
target/teams/distribute regions.

Target/teams/distribute regions exist for all the time the kernel is
executed. Thus, if the variable is declared in their context and then
escape it, we can allocate global memory statically instead of
allocating it dynamically.
Patch captures all the globalized variables in target/teams/distribute
contexts, merges them into the records, one per each target region.
Those records are then joined into the union, one per compilation unit
(to save the global memory). Those units are organized into
2 x dimensional arrays, where the first dimension is
the number of blocks per SM and the second one is the number of SMs.
Runtime functions manage this global memory space between the executing
teams.

llvm-svn: 345978
2018-11-02 14:54:07 +00:00
Alexey Bataev 4ac58d1a4b [OPENMP][NVPTX]Reduce memory usage in target region.
Additional reduction of the global memory usage in the target regions
without parallel regions.

llvm-svn: 344413
2018-10-12 20:19:59 +00:00
Alexey Bataev 9ea3c38597 [OPENMP][NVPTX] Support memory coalescing for globalized variables.
Added support for memory coalescing for better performance for
globalized variables. From now on all the globalized variables are
represented as arrays of 32 elements and each thread accesses these
elements using `tid & 31` as index.

llvm-svn: 344049
2018-10-09 14:49:00 +00:00
Alexey Bataev 4065b9ae48 [OPENMP, NVPTX] Fix globalization of the variables passed to orphaned
parallel region.

If the current construct requires sharing of the local variable in the
inner parallel region, this variable must be globalized to avoid
runtime crash.

llvm-svn: 335285
2018-06-21 20:26:33 +00:00
Alexey Bataev bf5c84861c [OPENMP, NVPTX] Initial support for L2 parallelism in SPMD mode.
Added initial support for L2 parallelism in SPMD mode. Note, though,
that the orphaned parallel directives are not currently supported in
SPMD mode.

llvm-svn: 332016
2018-05-10 18:32:08 +00:00
Alexey Bataev c661186cca [OPENMP, NVPTX] Small test fix, NFC.
llvm-svn: 331654
2018-05-07 17:38:13 +00:00