Commit Graph

29 Commits

Author SHA1 Message Date
Brendon Cahoon 294efbbd3e Reland "[AMDGPU] Add gfx1013 target"
This reverts commit 211e584fa2.

Fixed a use-after-free error that caused the sanitizers to fail.
2021-06-08 21:15:35 -04:00
Brendon Cahoon 211e584fa2 Revert "[AMDGPU] Add gfx1013 target"
This reverts commit ea10a86984.

A sanitizer buildbot reports an error.
2021-06-08 16:29:41 -04:00
Brendon Cahoon ea10a86984 [AMDGPU] Add gfx1013 target
Differential Revision: https://reviews.llvm.org/D103663
2021-06-08 12:49:49 -04:00
Michael Kruse 83ff0ff463 [Clang][OpenMP] Allow unified_shared_memory for Pascal-generation GPUs.
The Pascal architecture supports the page migration engine required for
unified_shared_memory, as indicated by NVIDIA:
 * https://developer.nvidia.com/blog/unified-memory-cuda-beginners/
 * https://developer.nvidia.com/blog/beyond-gpu-memory-limits-unified-memory-pascal/
 * https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#um-requirements

The limitation was introduced in D54493 which justified the cut-off by
the requirement for unified addressing. However, Unified Virtual
Addressing (UVA) is already available with sm20 (Fermi, Kepler,
Maxwell):
 * https://docs.nvidia.com/cuda/gpudirect-rdma/index.html#basics-of-uva-cuda-memory-management

Unified shared memory might even be possible with these, but with
migration of entire allocations on kernel startup.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D101595
2021-05-13 17:15:34 -05:00
Aakanksha Patil 464e4dc50f [AMDGPU] Add gfx1034 target
Differential Revision: https://reviews.llvm.org/D102306
2021-05-13 14:25:18 -04:00
Giorgis Georgakoudis a2dbfb6b72 [OpenMP] Simplify offloading parallel call codegen
This revision simplifies Clang codegen for parallel regions in OpenMP GPU target offloading and corresponding changes in libomptarget: SPMD/non-SPMD parallel calls are unified under a single `kmpc_parallel_51` runtime entry point for parallel regions (which will be commonized between target, host-side parallel regions), data sharing is internalized to the runtime. Tests have been auto-generated using `update_cc_test_checks.py`. Also, the revision contains changes to OpenMPOpt for remark creation on target offloading regions.

Reviewed By: jdoerfert, Meinersbur

Differential Revision: https://reviews.llvm.org/D95976
2021-04-21 18:46:07 -07:00
Johannes Doerfert 03cc8a1ba0 [OpenMP][NFC] Move the `noinline` to the parallel entry point
The `noinline` for non-SPMD parallel functions is probably not necessary
but as long as we use it we should put it on the outermost parallel
function, which is the wrapper, not the actual outlined function.

Resolves PR49752

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D99506
2021-03-30 01:12:45 -05:00
Nikita Popov 42eb658f65 [OpaquePtrs] Remove some uses of type-less CreateGEP() (NFC)
This removes some (but not all) uses of type-less CreateGEP()
and CreateInBoundsGEP() APIs, which are incompatible with opaque
pointers.

There are a still a number of tricky uses left, as well as many
more variation APIs for CreateGEP.
2021-03-12 21:01:16 +01:00
Stanislav Mekhanoshin a8d9d50762 [AMDGPU] gfx90a support
Differential Revision: https://reviews.llvm.org/D96906
2021-02-17 16:01:32 -08:00
Artem Belevich 2aa01ccec3 [CUDA, NVPTX] Allow targeting sm_86 GPUs.
The patch only plumbs through the option necessary for targeting sm_86 GPUs w/o
adding any new functionality.

Differential Revision: https://reviews.llvm.org/D95974
2021-02-09 11:01:10 -08:00
Johannes Doerfert bd756286d2 [OpenMP][FIX] Enforce a function boundary for a new data environment
Whenever we enter a new OpenMP data environment we want to enter a
function to simplify reasoning. Later we probably want to remove the
entire specialization wrt. the if clause and pass the result to the
runtime, for now this should fix PR48686.

Reviewed By: ABataev

Differential Revision: https://reviews.llvm.org/D94315
2021-01-25 22:43:37 -06:00
Thorsten Schütt 2fd11e0b1e Revert "[NFC, Refactor] Modernize StorageClass from Specifiers.h to a scoped enum (II)"
This reverts commit efc82c4ad2.
2021-01-04 23:17:45 +01:00
Thorsten Schütt efc82c4ad2 [NFC, Refactor] Modernize StorageClass from Specifiers.h to a scoped enum (II)
Reviewed By: aaron.ballman

Differential Revision: https://reviews.llvm.org/D93765
2021-01-04 22:58:26 +01:00
Tim Renouf 89d41f3a2b [AMDGPU] Add gfx1033 target
Differential Revision: https://reviews.llvm.org/D90447

Change-Id: If2650fc7f31bbdd49c76e74a9ca8e3734d769761
2020-11-03 16:27:48 +00:00
Tim Renouf ee3e642627 [AMDGPU] Add gfx90c target
This differentiates the Ryzen 4000/4300/4500/4700 series APUs that were
previously included in gfx909.

Differential Revision: https://reviews.llvm.org/D90419

Change-Id: Ia901a7157eb2f73ccd9f25dbacec38427312377d
2020-11-03 16:27:43 +00:00
JonChesterfield 5d02ca49a2 [libomptarget][nvptx] Undef, weak shared variables
[libomptarget][nvptx] Undef, weak shared variables

Shared variables on nvptx, and LDS on amdgcn, are uninitialized at
the start of kernel execution. Therefore create the variables with
undef instead of zeros, motivated in part by the amdgcn back end
rejecting LDS+initializer.

Common is zero initialized, which seems incompatible with shared. Thus
change them to weak, following the direction of
https://reviews.llvm.org/rG7b3eabdcd215

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D90248
2020-10-28 14:25:36 +00:00
Stanislav Mekhanoshin d1beb95d12 [AMDGPU] gfx1032 target
Differential Revision: https://reviews.llvm.org/D89487
2020-10-15 12:41:18 -07:00
Tim Renouf 666ef0db20 [AMDGPU] Add gfx602, gfx705, gfx805 targets
At AMD, in an internal audit of our code, we found some corner cases
where we were not quite differentiating targets enough for some old
hardware. This commit is part of fixing that by adding three new
targets:

* The "Oland" and "Hainan" variants of gfx601 are now split out into
  gfx602. LLPC (in the GPUOpen driver) and other front-ends could use
  that to avoid using the shaderZExport workaround on gfx602.

* One variant of gfx703 is now split out into gfx705. LLPC and other
  front-ends could use that to avoid using the
  shaderSpiCsRegAllocFragmentation workaround on gfx705.

* The "TongaPro" variant of gfx802 is now split out into gfx805.
  TongaPro has a faster 64-bit shift than its former friends in gfx802,
  and a subtarget feature could be set up for that to take advantage of
  it. This commit does not make that change; it just adds the target.

V2: Add clang changes. Put TargetParser list in order.
V3: AMDGCNGPUs table in TargetParser.cpp needs to be in GPUKind order,
    so fix the GPUKind order.

Differential Revision: https://reviews.llvm.org/D88916

Change-Id: Ia901a7157eb2f73ccd9f25dbacec38427312377d
2020-10-10 17:22:22 +01:00
Joseph Huber 3cc1f1fc1d [OpenMP] Replace OpenMP RTL Functions With OMPIRBuilder and OMPKinds.def
Summary:
Replace the OpenMP Runtime Library functions used in CGOpenMPRuntimeGPU
for OpenMP device code generation with ones in OMPKinds.def and use
OMPIRBuilder for generating runtime calls. This allows us to
consolidate more OpenMP code generation into the OMPIRBuilder. Future
additions to the GPU runtime functions should now go in OMPKinds.def

Reviewers: jdoerfert

Subscribers: aaron.ballman cfe-commits guansong llvm-commits sstefan1 yaxunl

Tags: #OpenMP #LLVM #clang

Differential Revision: https://reviews.llvm.org/D88430
2020-10-08 14:00:22 -04:00
Pushpinder Singh 3a12ff0dac [OpenMP][RTL] Remove dead code
RequiresDataSharing was always 0, resulting dead code in device runtime library.

Reviewed By: jdoerfert, JonChesterfield

Differential Revision: https://reviews.llvm.org/D88829
2020-10-06 05:43:47 -04:00
Yaxun (Sam) Liu cbd420c5ed [CUDA][HIP] Fix bound arch for offload action for fat binary
Currently CUDA/HIP toolchain uses "unknown" as bound arch
for offload action for fat binary. This causes -mcpu or -march
with "unknown" added in HIPToolChain::TranslateArgs or
CUDAToolChain::TranslateArgs.

This causes issue for https://reviews.llvm.org/D88377 since
HIP toolchain needs to check -mcpu in HIPToolChain::TranslateArgs.

The bound arch of offload action for fat binary is not really
used, therefore set it to CudaArch::UNUSED.

Differential Revision: https://reviews.llvm.org/D88524
2020-10-02 19:05:51 -04:00
Joseph Huber 1b60f63e4f Revert "[OpenMP] Replace OpenMP RTL Functions With OMPIRBuilder and OMPKinds.def"
Failing tests on Arm due to the tests automatically populating
incomatible pointer width architectures. Reverting until the tests are
updated. Failing tests:

OpenMP/distribute_parallel_for_num_threads_codegen.cpp
OpenMP/distribute_parallel_for_if_codegen.cpp
OpenMP/distribute_parallel_for_simd_if_codegen.cpp
OpenMP/distribute_parallel_for_simd_num_threads_codegen.cpp
OpenMP/target_teams_distribute_parallel_for_if_codegen.cpp
OpenMP/target_teams_distribute_parallel_for_simd_if_codegen.cpp
OpenMP/teams_distribute_parallel_for_if_codegen.cpp
OpenMP/teams_distribute_parallel_for_simd_if_codegen.cpp

This reverts commit 90eaedda9b.
2020-09-30 15:12:21 -04:00
Joseph Huber 90eaedda9b [OpenMP] Replace OpenMP RTL Functions With OMPIRBuilder and OMPKinds.def
Summary:
Replace the OpenMP Runtime Library functions used in CGOpenMPRuntimeGPU
for OpenMP device code generation with ones in OMPKinds.def and use
OMPIRBuilder for generating runtime calls. This allows us to consolidate
more OpenMP code generation into the OMPIRBuilder. This patch also
invalidates specifying target architectures with conflicting pointer
sizes.

Reviewers: jdoerfert

Subscribers: aaron.ballman cfe-commits guansong llvm-commits sstefan1 yaxunl

Tags: #OpenMP #Clang #LLVM

Differential Revision: https://reviews.llvm.org/D88430
2020-09-30 14:00:01 -04:00
Johannes Doerfert 95a25e4c32 [OpenMP][FIX] Do not use TBAA in type punning reduction GPU code PR46156
When we implement OpenMP GPU reductions we use type punning a lot during
the shuffle and reduce operations. This is not always compatible with
language rules on aliasing. So far we generated TBAA which later allowed
to remove some of the reduce code as accesses and initialization were
"known to not alias". With this patch we avoid TBAA in this step,
hopefully for all accesses that we need to.

Verified on the reproducer of PR46156 and QMCPack.

Reviewed By: ABataev

Differential Revision: https://reviews.llvm.org/D86037
2020-08-16 14:38:31 -05:00
Craig Topper 5c1fe4e20f [Target] Cache the command line derived feature map in TargetOptions.
We can use this to remove some calls to initFeatureMap from Sema
and CodeGen when a function doesn't have a target attribute.

This reduces compile time of the linux kernel where this map
is needed to diagnose some inline assembly constraints based
on whether sse, avx, or avx512 is enabled.

Differential Revision: https://reviews.llvm.org/D85807
2020-08-12 12:37:23 -07:00
Stanislav Mekhanoshin 105608a4c2 [AMDGPU] Added missing gfx1031 cases to CGOpenMPRuntimeGPU.cpp 2020-08-05 12:39:03 -07:00
Saiyedul Islam 160ff83765 [OpenMP][AMDGCN] Support OpenMP offloading for AMDGCN architecture - Part 3
Provides AMDGCN and NVPTX specific specialization of getGPUWarpSize,
getGPUThreadID, and getGPUNumThreads methods. Adds tests for AMDGCN
codegen for these methods in generic and simd modes. Also changes the
precondition in InitTempAlloca to be slightly more permissive. Useful for
AMDGCN OpenMP codegen where allocas are created with a cast to an
address space.

Reviewed By: ABataev

Differential Revision: https://reviews.llvm.org/D84260
2020-08-03 05:38:39 +00:00
Saiyedul Islam fc7d2908ab [OpenMP] Use common interface to access GPU Grid Values
Use common interface for accessing target specific GPU grid values in NVPTX
OpenMP codegen as proposed in https://reviews.llvm.org/D80917

Originally authored by Greg Rodgers (@gregrodgers).

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D83492
2020-07-21 05:25:46 +00:00
Saiyedul Islam c7562e77b3 [OpenMP][NFC] Generalize CGOpenMPRuntimeNVPTX as CGOpenMPRuntimeGPU
Refactors CGOpenMPRuntimeNVPTX as CGOpenMPRuntimeGPU to make it a
generalization for OpenMP GPU Codegen. Target specific specialized
methods for NVPTX are defined in class CGOpenMPRuntimeNVPTX. This
paves the way for a clean and maintainable extension to more GPU
targets for OpenMP Codegen.

For original author (git blame) list of CGOpenMPRuntimeGPU code,
look in history of CGOpenMPRuntimeNVPTX.cpp and .h, after this commit.

Reviewed By: ABataev

Differential Revision: https://reviews.llvm.org/D83723
2020-07-17 14:38:04 +00:00