This patch renames -f{no-}cuda-rdc to -f{no-}gpu-rdc and keeps the original
options as aliases. When -fgpu-rdc is off,
clang will assume the device code in each translation unit does not call
external functions except those in the device library, therefore it is possible
to compile the device code in each translation unit to self-contained kernels
and embed them in the host object, so that the host object behaves like
usual host object which can be linked by lld.
The benefits of this feature is: 1. allow users to create static libraries which
can be linked by host linker; 2. amortized device code linking time.
This patch modifies HIP action builder to insert actions for linking device
code and generating HIP fatbin, and pass HIP fatbin to host backend action.
It extracts code for constructing command for generating HIP fatbin as
a function so that it can be reused by early finalization. It also modifies
codegen of HIP host constructor functions to embed the device fatbin
when it is available.
Differential Revision: https://reviews.llvm.org/D52377
llvm-svn: 343611
As a first step, pass '-c/--compile-only' to ptxas so that it
doesn't complain about references to external function. This
will successfully generate object files, but they won't work
at runtime because the registration routines need to adapted.
Differential Revision: https://reviews.llvm.org/D42921
llvm-svn: 324878
Summary: When compiling code being offloaded by OpenMP to an NVIDIA GPU, pass the -v to PTXAS if it was passed to the CLANG driver.
Reviewers: arpith-jacob, caomhin, carlo.bertolli, ABataev, jlebar, hfinkel, tstellar
Reviewed By: jlebar
Subscribers: Hahnfeld, rengolin, cfe-commits
Differential Revision: https://reviews.llvm.org/D29644
llvm-svn: 310295
This is recommit of r302775, reverted in r302777 due to a fail in
clang-tidy. Original mesage is below.
Now if clang driver is given wrong arguments, in some cases it
continues execution and returns zero code. This change fixes this
behavior.
The fix revealed some errors in clang test set.
File test/Driver/gfortran.f90 added in r118203 checks forwarding
gfortran flags to GCC. Now driver reports error on this file, because
the option -working-directory implemented in clang differs from the
option with the same name implemented in gfortran, in clang the option
requires argument, in gfortran does not.
In the file test/Driver/arm-darwin-builtin.c clang is called with
options -fbuiltin-strcat and -fbuiltin-strcpy. These option were removed
in r191435 and now clang reports error on this test.
File arm-default-build-attributes.s uses option -verify, which is not
supported by driver, it is cc1 option.
Similarly, the file split-debug.h uses options -fmodules-embed-all-files
and -fmodule-format=obj, which are not supported by driver.
Other revealed errors are mainly mistypes.
Differential Revision: https://reviews.llvm.org/D33013
llvm-svn: 303756
Now if clang driver is given wrong arguments, in some cases it
continues execution and returns zero code. This change fixes this
behavior.
The fix revealed some errors in clang test set.
File test/Driver/gfortran.f90 added in r118203 checks forwarding
gfortran flags to GCC. Now driver reports error on this file, because
the option -working-directory implemented in clang differs from the
option with the same name implemented in gfortran, in clang the option
requires argument, in gfortran does not.
In the file test/Driver/arm-darwin-builtin.c clang is called with
options -fbuiltin-strcat and -fbuiltin-strcpy. These option were removed
in r191435 and now clang reports error on this test.
File arm-default-build-attributes.s uses option -verify, which is not
supported by driver, it is cc1 option.
Similarly, the file split-debug.h uses options -fmodules-embed-all-files
and -fmodule-format=obj, which are not supported by driver.
Other revealed errors are mainly mistypes.
Differential Revision: https://reviews.llvm.org/D33013
llvm-svn: 302775
Summary:
Compiling CUDA device code requires us to know the host toolchain,
because CUDA device-side compiles pull in e.g. host headers.
When we only supported Linux compilation, this worked because
CudaToolChain, which is responsible for device-side CUDA compilation,
inherited from the Linux toolchain. But in order to support MacOS,
CudaToolChain needs to take a HostToolChain pointer.
Because a CUDA toolchain now requires a host TC, we no longer will
create a CUDA toolchain from Driver::getToolChain -- you have to go
through CreateOffloadingDeviceToolChains. I am *pretty* sure this is
correct, and that previously any attempt to create a CUDA toolchain
through getToolChain() would eventually have resulted in us throwing
"error: unsupported use of NVPTX for host compilation".
In any case hacking getToolChain to create a CUDA+host toolchain would
be wrong, because a Driver can be reused for multiple compilations,
potentially with different host TCs, and getToolChain will cache the
result, causing us to potentially use a stale host TC.
So that's the main change in this patch.
In addition, we have to pull CudaInstallationDetector out of Generic_GCC
and into a top-level class. It's now used by the Generic_GCC and MachO
toolchains.
Reviewers: tra
Subscribers: rryan, hfinkel, sfantao
Differential Revision: https://reviews.llvm.org/D26774
llvm-svn: 287285
ptxas optimizations are disabled if we need to generate debug info
as ptxas does not accept '-g' otherwise.
Differential Revision: http://reviews.llvm.org/D17111
llvm-svn: 261018
Summary:
Previously we'd crash the driver if you passed -O0. Now we try to
handle all of clang's various optimization flags in a sane way.
Reviewers: tra
Subscribers: cfe-commits, echristo, jhen
Differential Revision: http://reviews.llvm.org/D16307
llvm-svn: 258174
Summary:
Previously we compiled CUDA device code to PTX assembly and embedded
that asm as text in our host binary. Now we compile to PTX assembly and
then invoke ptxas to assemble the PTX into a cubin file. We gather the
ptx and cubin files for each of our --cuda-gpu-archs and combine them
using fatbinary, and then embed that into the host binary.
Adds two new command-line flags, -Xcuda_ptxas and -Xcuda_fatbinary,
which pass args down to the external tools.
Reviewers: tra, echristo
Subscribers: cfe-commits, jhen
Differential Revision: http://reviews.llvm.org/D16082
llvm-svn: 257809