forked from OSchip/llvm-project
30eeb742f1
Add address space to indirect abi info and use it for kernels. Previously, indirect arguments assumed assumed a stack passed object in the alloca address space using byval. A stack pointer is unsuitable for kernel arguments, which are passed in a separate, constant buffer with a different address space. Start using the new byref for aggregate kernel arguments. Previously these were emitted as raw struct arguments, and turned into loads in the backend. These will lower identically, although with byref you now have the option of applying an explicit alignment. In the future, a reasonable implementation would use byref for all kernel arguments (this would be a practical problem at the moment due to losing things like noalias on pointer arguments). This is mostly to avoid fighting the optimizer's treatment of aggregate load/store. SROA and instcombine both turn aggregate loads and stores into a long sequence of element loads and stores, rather than the optimizable memcpy I would expect in this situation. Now an explicit memcpy will be introduced up-front which is better understood and helps eliminate the alloca in more situations. This skips using byref in the case where HIP kernel pointer arguments in structs are promoted to global pointers. At minimum an additional patch is needed to allow coercion with indirect arguments. This also skips using it for OpenCL due to the current workaround used to support kernels calling kernels. Distinct function bodies would need to be generated up front instead of emitting an illegal call. |
||
---|---|---|
.. | ||
Inputs | ||
address-spaces.cu | ||
alias.cu | ||
amdgpu-hip-implicit-kernarg.cu | ||
amdgpu-kernel-arg-pointer-type.cu | ||
amdgpu-kernel-attrs.cu | ||
amdgpu-visibility.cu | ||
amdgpu-workgroup-size.cu | ||
builtins-amdgcn.cu | ||
constexpr-variables.cu | ||
convergent.cu | ||
cuda-builtin-vars.cu | ||
debug-info-address-class.cu | ||
debug-info-template.cu | ||
deferred-diag.cu | ||
dependent-libs.cu | ||
device-init-fun.cu | ||
device-stub.cu | ||
device-var-init.cu | ||
device-vtable.cu | ||
filter-decl.cu | ||
flush-denormals.cu | ||
fp-contract.cu | ||
function-overload.cu | ||
kernel-amdgcn.cu | ||
kernel-args-alignment.cu | ||
kernel-args.cu | ||
kernel-call.cu | ||
kernel-dbg-info.cu | ||
kernel-stub-name.cu | ||
lambda.cu | ||
launch-bounds.cu | ||
library-builtin.cu | ||
link-device-bitcode.cu | ||
llvm-used.cu | ||
ms-linker-options.cu | ||
norecurse.cu | ||
nothrow.cu | ||
openmp-target.cu | ||
printf-aggregate.cu | ||
printf.cu | ||
propagate-metadata.cu | ||
ptx-kernels.cu | ||
static-device-var-no-rdc.cu | ||
surface.cu | ||
texture.cu | ||
types.cu | ||
unnamed-types.cu | ||
usual-deallocators.cu |