Added --gpu-bundle-output to control bundling/unbundling output of HIP device compilation.
By default preprocessor expansion, llvm bitcode and assembly are unbundled, code objects are
bundled.
Reviewed by: Artem Belevich, Jan Svoboda
Differential Revision: https://reviews.llvm.org/D101630
Currently CUDA/HIP toolchain uses "unknown" as bound arch
for offload action for fat binary. This causes -mcpu or -march
with "unknown" added in HIPToolChain::TranslateArgs or
CUDAToolChain::TranslateArgs.
This causes issue for https://reviews.llvm.org/D88377 since
HIP toolchain needs to check -mcpu in HIPToolChain::TranslateArgs.
The bound arch of offload action for fat binary is not really
used, therefore set it to CudaArch::UNUSED.
Differential Revision: https://reviews.llvm.org/D88524
This patch is a follow up on https://reviews.llvm.org/D78759.
Extract the HIP Linker script from generic GNU linker,
and move it into HIP ToolChain. Update OffloadActionBuilder
Link actions feature to apply device linking and host linking
actions separately. Using MC Directives, embed the device images
and define symbols.
Reviewers: JonChesterfield, yaxunl
Subscribers: tra, echristo, jdoerfert, msearles, scchan
Differential Revision: https://reviews.llvm.org/D81963
This patch is a follow up on https://reviews.llvm.org/D81627.
In addition to default -fno-gpu-rdc case, this patches let
HIP toolchain not use llvm-link/opt/llc to link device code
for -fgpu-rdc case. Instead, uses standard lto.
This will eliminate some redundant optimizations and speed
up the compilation/linking.
Differential Revision: https://reviews.llvm.org/D81861
Currently HIP toolchain calls clang to emit bitcode then calls opt/llc for device compilation for the default -fno-gpu-rdc
case, which is unnecessary since clang is able to compile a single source file to ISA.
This patch fixes the HIP action builder and toolchain so that the default -fno-gpu-rdc can be done like a canonical
toolchain, i.e. one clang -cc1 invocation to compile source code to ISA.
This can avoid unnecessary processes to speed up the compilation, and avoid redundant LLVM passes which are
performed in clang -cc1 and opt.
Differential Revision: https://reviews.llvm.org/D81627