Currently, when there is a global register variable in a program that
is bound to an invalid register, clang/llvm prints an error message that
is not very user-friendly.
This commit improves the diagnostic and moves the check that used to be
in the backend to Sema. In addition, it makes changes to error out if
the size of the register doesn't match the declared variable size.
e.g., volatile register int B asm ("rbp");
rdar://problem/23084219
Differential Revision: http://reviews.llvm.org/D13834
llvm-svn: 253405
Clang needs to know target triple for both sides of compilation so that
preprocessor macros and target builtins from both sides are available.
This change augments Compilation class to carry information about
toolchains used during different CUDA compilation passes and refactors
BuildActions to use it when it constructs CUDA jobs.
Removed DeviceTriple from CudaHostAction/CudaDeviceAction as it's no
longer needed.
Differential Revision: http://reviews.llvm.org/D13144
llvm-svn: 253385
* adds -aux-triple option to specify target triple
* propagates aux target info to AST context and Preprocessor
* pulls in target specific preprocessor macros.
* pulls in target-specific builtins from aux target.
* sets appropriate host or device attribute on builtins.
Differential Revision: http://reviews.llvm.org/D12917
llvm-svn: 248299
The changes are part of attribute-based CUDA function overloading (D12453)
and as such are only enabled when it's in effect (-fcuda-target-overloads).
Differential Revision: http://reviews.llvm.org/D12122
llvm-svn: 248296
The patch makes it possible to parse CUDA files that contain host/device
functions with identical signatures, but different attributes without
having to physically split source into host-only and device-only parts.
This change is needed in order to parse CUDA header files that have
a lot of name clashes with standard include files.
Gory details are in design doc here: https://goo.gl/EXnymm
Feel free to leave comments there or in this review thread.
This feature is controlled with CC1 option -fcuda-target-overloads
and is disabled by default.
Differential Revision: http://reviews.llvm.org/D12453
llvm-svn: 248295
The main purpose is to avoid errors and warnings while parsing CUDA
header files. The attributes are currently unused otherwise.
Differential version: http://reviews.llvm.org/D11690
llvm-svn: 244497
During device-side CUDA compilation clang currently complains about
all TLS variables, regardless of whether they are __host__ or
__device__.
This patch suppresses "TLS unsupported" errors for host variables
during device compilation and for device variables during host
compilation.
Differential Revision: http://reviews.llvm.org/D9269
llvm-svn: 235907
- Changed CUDALaunchBounds arguments from integers to Expr* so they can
be saved in AST for instantiation.
- Added support for template instantiation of launch_bounds attrubute.
- Moved evaluation of launch_bounds arguments to NVPTXTargetCodeGenInfo::
SetTargetAttributes() where it can be done after template instantiation.
- Added a warning on negative launch_bounds arguments.
- Amended test cases.
Differential Revision: http://reviews.llvm.org/D8985
llvm-svn: 235452
Added cuda_builtin_vars.h which implements built-in CUDA variables
using __declattr(property).
Fields of built-in variables (except for warpSize) are implemented
using __declattr(property) which replaces read/write of a member field
with a call to a getter/setter member function, in this case with
appropriate NVPTX builtin.
Added a test case to check diagnostics on attempt to construct or
improperly access a built-in variable.
Differential Revision: http://reviews.llvm.org/D9064
llvm-svn: 235448
Added cuda_builtin_vars.h which implements built-in CUDA variables
using __declattr(property).
Fields of built-in variables (except for warpSize) are implemented
using __declattr(property) which replaces read/write of a member field
with a call to a getter/setter member function, in this case with
appropriate NVPTX builtin.
Added a test case to check diagnostics on attempt to construct or
improperly access a built-in variable.
Differential Revision: http://reviews.llvm.org/D9064
llvm-svn: 235398
For CUDA source, Sema checks that the targets of call expressions make sense
(e.g. a host function can't call a device function).
Adding a flag that lets us skip this check. Motivation: for source-to-source
translation tools that have to accept code that's not strictly kosher CUDA but
is still accepted by nvcc. The source-to-source translation tool can then fix
the code and leave calls that are semantically valid for the actual compilation
stage.
Differential Revision: http://reviews.llvm.org/D9036
llvm-svn: 235049
In SemaCUDA all implicit functions were considered host device, this led to
errors such as the following code snippet failing to compile:
struct Copyable {
const Copyable& operator=(const Copyable& x) { return *this; }
};
struct Simple {
Copyable b;
};
void foo() {
Simple a, b;
a = b;
}
Above the implicit copy assignment operator was inferred as host device but
there was only a host assignment copy defined which is an error in device
compilation mode.
Differential Revision: http://reviews.llvm.org/D6565
llvm-svn: 224358
Placing the attribute after the kernel keyword would incorrectly
reject the attribute, so use the smae workaround that other
kernel only attributes use.
Also add a FIXME because there are two different phrasings now
for the same error, althoug amdgpu_num_[sv]gpr uses a consistent one.
llvm-svn: 223490
Summary:
Allow CUDA host device functions with two code paths using __CUDA_ARCH__
to differentiate between code path being compiled.
For example:
__host__ __device__ void host_device_function(void) {
#ifdef __CUDA_ARCH__
device_only_function();
#else
host_only_function();
#endif
}
Patch by Jacques Pienaar.
Reviewed By: rnk
Differential Revision: http://reviews.llvm.org/D6457
llvm-svn: 223271
r218624 implemented target inference for implicit special members. However,
other entities can be implicit - for example intrinsics. These can not have
inference running on them, so they should be marked host device as before. This
is the safest and most flexible setting, since by construction these functions
don't invoke anything, and we'd like them to be invokable from both host and
device code. LLVM's intrinsics definitions (where these intrinsics come from in
the case of CUDA/NVPTX) have no notion of target, so both host and device
intrinsics can be supported this way.
llvm-svn: 218688
As PR20495 demonstrates, Clang currenlty infers the CUDA target (host/device,
etc) for implicit members (constructors, etc.) incorrectly. This causes errors
and even assertions in Clang when compiling code (assertions in C++11 mode where
implicit move constructors are added into the mix).
Fix the problem by inferring the target from the methods the implicit member
should call (depending on its base classes and fields).
llvm-svn: 218624
Updating the diagnostics in the launch_bounds test since they have been improved in that case. Adding a test for nonnull since it has little test coverage, but has truly variadic arguments.
llvm-svn: 214407