__global__ functions are a special case in CUDA.
Even when the symbol would normally not be externally
visible according to C++ rules, they still must be visible
in CUDA GPU object so host-side stub can launch them.
Differential Revision: http://reviews.llvm.org/D19748
llvm-svn: 268299
Summary:
Setting this flag causes all functions are annotated with the
"nvvm-f32ftz" = "true" attribute.
In addition, we annotate the module with "nvvm-reflect-ftz" set
to 0 or 1, depending on whether -cuda-flush-denormals-to-zero is set.
This is read by the NVVMReflect pass.
Reviewers: tra, rnk
Subscribers: cfe-commits
Differential Revision: http://reviews.llvm.org/D18671
llvm-svn: 265435
We already have this flag in most of the file, but we need it everywhere
else, to disable the NVVMReflect pass, which we're explicitly checking
doesn't run here. (Upcoming changes to llvm will cause it to be run.)
llvm-svn: 264969
Summary:
* -fcuda-target-overloads
Previously unconditionally set to true by the driver. Necessary for
correct functioning of the compiler -- our CUDA headers wrapper won't
compile without this.
* -fcuda-disable-target-call-checks
Previously unconditionally set to true by the driver. Necessary to
compile almost any external CUDA code -- almost all libraries assume
that host+device code can call host or device functions.
* -fcuda-allow-host-calls-from-host-device
No effect when target overloading is enabled.
Reviewers: tra
Subscribers: rsmith, cfe-commits
Differential Revision: http://reviews.llvm.org/D18416
llvm-svn: 264739
Summary:
Previously we were using the codegen test to ensure that we choose the
right overload. But we can do this within sema, with a bit of
cleverness.
I left the constructor/destructor checks in CodeGen, because these
overloads (particularly on the destructors) are hard to check in Sema.
Reviewers: tra
Subscribers: cfe-commits
Differential Revision: http://reviews.llvm.org/D18386
llvm-svn: 264207
* Run cc with -triple x86_64-linux-gnu to make symbol mangling predictable.
* Use temporary file as a fake GPU input so its content
does not interfere with pattern matching.
llvm-svn: 262516
... and register them with CUDA runtime.
This is needed for commonly used cudaMemcpy*() APIs that use address of
host-side shadow to access their counterparts on device side.
Fixes PR26340
Differential Revision: http://reviews.llvm.org/D17779
llvm-svn: 262498
Summary:
This is important for e.g. the following case:
void sync() { __syncthreads(); }
void foo() {
do_something();
sync();
do_something_else():
}
Without this change, if the optimizer does not inline sync() (which it
won't because __syncthreads is also marked as noduplicate, for now
anyway), it is free to perform optimizations on sync() that it would not
be able to perform on __syncthreads(), because sync() is not marked as
convergent.
Similarly, we need a notion of convergent calls, since in the case when
we can't statically determine a call's target(s), we need to know
whether it's safe to perform optimizations around the call.
This change is conservative; the optimizer will remove these attrs where
it can, see r260318, r260319.
Reviewers: majnemer
Subscribers: cfe-commits, jhen, echristo, tra
Differential Revision: http://reviews.llvm.org/D17056
llvm-svn: 261779
This is an artefact of split-mode CUDA compilation that we need to
mimic. HD functions are sometimes allowed to call H or D functions. Due
to split compilation mode device-side compilation will not see host-only
function and thus they will not be considered at all. For clang both H
and D variants will become function overloads visible to
compiler. Normally target attribute is considered only if C++ rules can
not determine which function is better. However in this case we need to
ignore functions that would not be present during current compilation
phase before we apply normal overload resolution rules.
Changes:
* introduced another level of call preference to better describe
possible call combinations.
* removed WrongSide functions from consideration if the set contains
SameSide function.
* disabled H->D, D->H and G->H calls. These combinations are
not allowed by CUDA and we were reluctantly allowing them to work
around device-side calls to math functions in std namespace.
We no longer need it after r258880.
Differential Revision: http://reviews.llvm.org/D16870
llvm-svn: 260697
Summary:
We can't do the right thing, since there's no right thing to do, but at
least we can not crash the compiler.
Reviewers: majnemer, rnk
Subscribers: cfe-commits, jhen, tra
Differential Revision: http://reviews.llvm.org/D17103
llvm-svn: 260479
In general CUDA does not allow dynamic initialization of
global device-side variables. One exception is that CUDA allows
records with empty constructors as described in section E2.2.1 of
CUDA 7.5 Programming guide.
This patch applies initializer checks for all device-side variables.
Empty constructors are accepted, but no code is generated for them.
Differential Revision: http://reviews.llvm.org/D15305
llvm-svn: 259592
Summary:
This is necessary to prevent llvm from generating stacksave intrinsics
around this alloca. NVVM doesn't have a stack, and we don't handle said
intrinsics.
Reviewers: rnk, echristo
Subscribers: cfe-commits, jhen, tra
Differential Revision: http://reviews.llvm.org/D16664
llvm-svn: 259122
Summary:
These aliases are done to support inline asm, but there's nothing we can
do: NVPTX doesn't support aliases.
Reviewers: tra
Subscribers: cfe-commits, jhen, echristo
Differential Revision: http://reviews.llvm.org/D16501
llvm-svn: 258734
C++ emits vtables for classes that have key function present in the
current TU. While we compile CUDA the fact that key function was found
in this TU does not mean that we are going to generate code for it. E.g.
vtable for a class with host-only methods should not (and can not) be
generated on device side, because we'll never generate code for them
during device-side compilation.
This patch adds an extra CUDA-specific check during key method computation
and filters out potential key methods that are not suitable for this side
of CUDA compilation.
When we codegen vtable, entries for unsuitable methods are set to null.
Differential Revision: http://reviews.llvm.org/D15309
llvm-svn: 255911
Linking options for particular file depend on the option that specifies the file.
Currently there are two:
* -mlink-bitcode-file links in complete content of the specified file.
* -mlink-cuda-bitcode links in only the symbols needed by current TU.
Linked symbols are internalized. This bitcode linking mode is used to
link device-specific bitcode provided by CUDA.
Files are linked in order they are specified on command line.
-mlink-cuda-bitcode replaces -fcuda-uses-libdevice flag.
Differential Revision: http://reviews.llvm.org/D13913
llvm-svn: 251427
Adjust __global__ functions with DiscardableODR linkage to use
StrongODR linkage instead, so they are visible externally.
Differential Revision: http://reviews.llvm.org/D13067
llvm-svn: 248400
The patch makes it possible to parse CUDA files that contain host/device
functions with identical signatures, but different attributes without
having to physically split source into host-only and device-only parts.
This change is needed in order to parse CUDA header files that have
a lot of name clashes with standard include files.
Gory details are in design doc here: https://goo.gl/EXnymm
Feel free to leave comments there or in this review thread.
This feature is controlled with CC1 option -fcuda-target-overloads
and is disabled by default.
Differential Revision: http://reviews.llvm.org/D12453
llvm-svn: 248295
This makes sure that we emit kernels that were instantiated from the
host code and which would never be explicitly referenced by anything
else on device side.
Differential Revision: http://reviews.llvm.org/D11666
llvm-svn: 248293
Summary:
According to CUDA documentation, global variables declared with __device__,
__constant__ can be initialized from host code, so mark them as
externally initialized. Because __shared__ variables cannot have an
initialization as part of their declaration and since the value maybe kept
across different kernel invocation, the value of __shared__ is effectively
undefined instead of zero initialized.
Wrongly using zero initializer may cause illegitimate optimization, e.g.
removing unused __constant__ variable because it's not updated in the device
code and the value is initialized with zero.
Test Plan: test/CodeGenCUDA/address-spaces.cu
Patch by Xuetian Weng
Reviewers: jholewinski, eliben, tra, jingyue
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D12241
llvm-svn: 245786
This allows emitting kernels that were instantiated from the host code
and which would never be explicitly referenced otherwise.
Differential Revision: http://reviews.llvm.org/D11666
llvm-svn: 244501
- added -fcuda-include-gpubinary option to incorporate results of
device-side compilation into host-side one.
- generate code to register GPU binaries and associated kernels
with CUDA runtime and clean-up on exit.
- added test case for init/deinit code generation.
Differential Revision: http://reviews.llvm.org/D9507
llvm-svn: 236765
Currently clang emits file-scope asm during *both* host and device
compilation modes which is usually a wrong thing to do.
There's no way to attach any attribute to an __asm statement, so
there's no way to differentiate between host-side and device-side
file-scope asm. This patch makes clang to match nvcc behavior and
emit file-scope-asm only during host-side compilation.
Differential Revision: http://reviews.llvm.org/D9270
llvm-svn: 235905
- Changed CUDALaunchBounds arguments from integers to Expr* so they can
be saved in AST for instantiation.
- Added support for template instantiation of launch_bounds attrubute.
- Moved evaluation of launch_bounds arguments to NVPTXTargetCodeGenInfo::
SetTargetAttributes() where it can be done after template instantiation.
- Added a warning on negative launch_bounds arguments.
- Amended test cases.
Differential Revision: http://reviews.llvm.org/D8985
llvm-svn: 235452
Added cuda_builtin_vars.h which implements built-in CUDA variables
using __declattr(property).
Fields of built-in variables (except for warpSize) are implemented
using __declattr(property) which replaces read/write of a member field
with a call to a getter/setter member function, in this case with
appropriate NVPTX builtin.
Added a test case to check diagnostics on attempt to construct or
improperly access a built-in variable.
Differential Revision: http://reviews.llvm.org/D9064
llvm-svn: 235448
Added cuda_builtin_vars.h which implements built-in CUDA variables
using __declattr(property).
Fields of built-in variables (except for warpSize) are implemented
using __declattr(property) which replaces read/write of a member field
with a call to a getter/setter member function, in this case with
appropriate NVPTX builtin.
Added a test case to check diagnostics on attempt to construct or
improperly access a built-in variable.
Differential Revision: http://reviews.llvm.org/D9064
llvm-svn: 235398
Summary:
Due to CUDA's implicit address space casting, the type of a static local
variable may be more specific (i.e. with address space qualifiers) than
the type expected by the constructor. Emit an addrspacecast in that
case.
Test Plan: Clang used to crash on the added test.
Reviewers: nlewycky, pcc, eliben, rsmith
Reviewed By: eliben, rsmith
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D8575
llvm-svn: 233208
Summary:
This is especially important for targets that use multiple address spaces,
and commonly place global variables in address spaces other than zero.
Fixes PR22383
Test Plan: New test case added: llvm-used.cu
Reviewers: jingyue
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D7345
llvm-svn: 227861
This makes the C++ ABI depend entirely on the target: MS ABI for -win32 triples,
Itanium otherwise. It's no longer possible to do weird combinations.
To be able to run a test with a specific ABI without constraining it to a
specific triple, new substitutions are added to lit: %itanium_abi_triple and
%ms_abi_triple can be used to get the current target triple adjusted to the
desired ABI. For example, if the test suite is running with the i686-pc-win32
target, %itanium_abi_triple will expand to i686-pc-mingw32.
Differential Revision: http://llvm-reviews.chandlerc.com/D2545
llvm-svn: 199250
In preparation for making the Win32 triple imply MS ABI mode,
make all tests pass in this mode, or make them use the Itanium
mode explicitly.
Differential Revision: http://llvm-reviews.chandlerc.com/D2401
llvm-svn: 199130
variables without a storage class within a function, to implement
CUDA B.2.5: "__shared__ and __constant__ variables have implied static
storage [duration]."
llvm-svn: 162788
Because in CUDA types do not have associated address spaces,
globals are declared in their "native" address space, and accessed
by bitcasting the pointer to address space 0. This relies on address
space 0 being a unified address space.
llvm-svn: 157167