llvm-project/clang/test/CodeGenCUDA
Matt Arsenault 30eeb742f1 clang: Use byref for aggregate kernel arguments
Add address space to indirect abi info and use it for kernels.

Previously, indirect arguments assumed assumed a stack passed object
in the alloca address space using byval. A stack pointer is unsuitable
for kernel arguments, which are passed in a separate, constant buffer
with a different address space.

Start using the new byref for aggregate kernel arguments. Previously
these were emitted as raw struct arguments, and turned into loads in
the backend. These will lower identically, although with byref you now
have the option of applying an explicit alignment. In the future, a
reasonable implementation would use byref for all kernel arguments
(this would be a practical problem at the moment due to losing things
like noalias on pointer arguments).

This is mostly to avoid fighting the optimizer's treatment of
aggregate load/store. SROA and instcombine both turn aggregate loads
and stores into a long sequence of element loads and stores, rather
than the optimizable memcpy I would expect in this situation. Now an
explicit memcpy will be introduced up-front which is better understood
and helps eliminate the alloca in more situations.

This skips using byref in the case where HIP kernel pointer arguments
in structs are promoted to global pointers. At minimum an additional
patch is needed to allow coercion with indirect arguments. This also
skips using it for OpenCL due to the current workaround used to
support kernels calling kernels. Distinct function bodies would need
to be generated up front instead of emitting an illegal call.
2020-08-06 15:52:26 -04:00
..
Inputs [HIP] Support new kernel launching API 2019-09-24 19:16:40 +00:00
address-spaces.cu
alias.cu
amdgpu-hip-implicit-kernarg.cu [AMDGPU] Increased the number of implicit argument bytes for both OpenCL and HIP (CLANG). 2019-07-10 15:10:08 +00:00
amdgpu-kernel-arg-pointer-type.cu [hip] Refine `clang/test/CodeGenCUDA/amdgpu-kernel-arg-pointer-type.cu` 2020-06-25 23:57:08 -04:00
amdgpu-kernel-attrs.cu [HIP] Change default --gpu-max-threads-per-block value to 1024 2020-06-03 11:09:22 -04:00
amdgpu-visibility.cu [HIP] Fix visibility for 'extern' device variables. 2019-11-05 14:19:32 -05:00
amdgpu-workgroup-size.cu [AMDGPU] Add __builtin_amdgcn_workgroup_size_x/y/z 2020-03-28 01:03:20 -04:00
builtins-amdgcn.cu Fix how cc1 command line options are mapped into FP options. 2020-06-01 22:00:30 -04:00
constexpr-variables.cu [CUDA][HIP] Support accessing static device variable in host code for -fno-gpu-rdc 2020-08-05 07:57:38 -04:00
convergent.cu
cuda-builtin-vars.cu
debug-info-address-class.cu [DEBUG_INFO][NVPTX] Generate correct data about variable address class. 2019-02-05 19:45:57 +00:00
debug-info-template.cu [CUDA][HIP][DebugInfo] Skip reference device function 2019-03-06 21:16:27 +00:00
deferred-diag.cu Speed up deferred diagnostic emitter 2020-04-06 13:07:43 -04:00
dependent-libs.cu Fix failure of lit test dependent-libs.cu 2019-05-29 01:34:44 +00:00
device-init-fun.cu [HIP] Add option -fgpu-allow-device-init 2019-10-22 16:06:20 -04:00
device-stub.cu [cuda][hip] Fix `RegisterVar` function prototype. 2020-04-03 12:57:09 -04:00
device-var-init.cu [CUDA][HIP] Skip setting `externally_initialized` for static device variables. 2019-05-29 17:23:27 +00:00
device-vtable.cu
filter-decl.cu
flush-denormals.cu clang/AMDGPU: Stop setting old denormal subtarget features 2020-04-02 17:17:12 -04:00
fp-contract.cu
function-overload.cu
kernel-amdgcn.cu [HIP] Change default --gpu-max-threads-per-block value to 1024 2020-06-03 11:09:22 -04:00
kernel-args-alignment.cu LLVM IR: Generate new-style byval-with-Type from Clang 2019-06-05 21:12:14 +00:00
kernel-args.cu clang: Use byref for aggregate kernel arguments 2020-08-06 15:52:26 -04:00
kernel-call.cu [HIP] Support new kernel launching API 2019-09-24 19:16:40 +00:00
kernel-dbg-info.cu Fix debug info for NoDebug attr 2020-05-21 09:02:56 -04:00
kernel-stub-name.cu [HIP] Fix device stub name 2020-03-09 16:40:05 -04:00
lambda.cu [CUDA][HIP] Let lambda be host device by default 2020-07-08 13:10:26 -04:00
launch-bounds.cu
library-builtin.cu Fix how cc1 command line options are mapped into FP options. 2020-06-01 22:00:30 -04:00
link-device-bitcode.cu
llvm-used.cu
ms-linker-options.cu [CUDA][HIP] Disable emitting llvm.linker.options in device compilation 2019-11-04 23:21:39 -05:00
norecurse.cu [OpenCL][CUDA][HIP][SYCL] Add norecurse 2020-02-16 20:41:00 -05:00
nothrow.cu
openmp-target.cu [CUDA][HIP] Fix host/device check with -fopenmp 2019-10-09 23:54:10 +00:00
printf-aggregate.cu
printf.cu
propagate-metadata.cu Assume ieee behavior without denormal-fp-math attribute 2020-03-07 12:10:56 -05:00
ptx-kernels.cu
static-device-var-no-rdc.cu [CUDA][HIP] Support accessing static device variable in host code for -fno-gpu-rdc 2020-08-05 07:57:38 -04:00
surface.cu [cuda][hip] Add CUDA builtin surface/texture reference support. 2020-03-27 17:18:49 -04:00
texture.cu [cuda][hip] Add CUDA builtin surface/texture reference support. 2020-03-27 17:18:49 -04:00
types.cu Do not copy long double and 128-bit fp format from aux target for AMDGPU 2019-01-31 21:57:51 +00:00
unnamed-types.cu [HIP] Fix device stub name 2020-03-09 16:40:05 -04:00
usual-deallocators.cu IR: print value numbers for unnamed function arguments 2019-08-03 14:28:34 +00:00