There are HIP applications e.g. Tensorflow 1.3 using amdgpu kernel attributes, however
currently they are only allowed on OpenCL kernel functions.
This patch will allow amdgpu kernel attributes to be applied to CUDA/HIP __global__
functions.
Differential Revision: https://reviews.llvm.org/D47958
llvm-svn: 334561
We were already performing checks on non-template variables,
but the checks on templated ones were missing.
Differential Revision: https://reviews.llvm.org/D45231
llvm-svn: 334143
Summary:
Previously this triggered a -Wundefined-internal warning. But it's not
an undefined variable -- any variable of this form is a pointer to the
base of GPU core's shared memory.
Reviewers: tra
Subscribers: sanjoy, rsmith
Differential Revision: https://reviews.llvm.org/D46782
llvm-svn: 332621
Found via codespell -q 3 -I ../clang-whitelist.txt
Where whitelist consists of:
archtype
cas
classs
checkk
compres
definit
frome
iff
inteval
ith
lod
methode
nd
optin
ot
pres
statics
te
thru
Patch by luzpaz! (This is a subset of D44188 that applies cleanly with a few
files that have dubious fixes reverted.)
Differential revision: https://reviews.llvm.org/D44188
llvm-svn: 329399
We were already performing checks on non-template variables,
but the checks on templated ones were missing.
Differential Revision: https://reviews.llvm.org/D45231
llvm-svn: 329127
The diagnostic system for Clang can already handle many AST nodes. Instead
of converting them to strings first, just hand the AST node directly to
the diagnostic system and let it handle the output. Minor changes in some
diagnostic output.
llvm-svn: 328688
Launching a kernel from the host code does not generate code for the
kernel itself. This fixes an issue with clang erroneously reporting
an error for a HD->D call from within the kernel.
Differential Revision: https://reviews.llvm.org/D44837
llvm-svn: 328362
According to the CUDA Programming Guide this is prohibited in
whole program compilation mode. This makes sense because external
references cannot be satisfied in that mode anyway. However,
such variables are allowed in separate compilation mode which
is a valid use case.
Differential Revision: https://reviews.llvm.org/D42923
llvm-svn: 325136
This fixes erroneously reported CUDA compilation errors
in host-side code during device-side compilation.
I've also restricted OpenMP-specific checks to trigger only
if we're compiling with OpenMP enabled.
Differential Revision: https://reviews.llvm.org/D40275
llvm-svn: 319201
This also clarifies some terminology used by the diagnostic (methods -> Objective-C methods, fields -> non-static data members, etc).
Many of the tests needed to be updated in multiple places for the diagnostic wording tweaks. The first instance of the diagnostic for that attribute is fully specified and subsequent instances cut off the complete list (to make it easier if additional subjects are added in the future for the attribute).
llvm-svn: 319002
Summary:
That is, instead of "1 error generated", we now say "1 error generated
when compiling for sm_35".
This (partially) solves a usability foogtun wherein e.g. users call a
function that's only defined on sm_60 when compiling for sm_35, and they
get an unhelpful error message.
Reviewers: tra
Subscribers: sanjoy, cfe-commits
Differential Revision: https://reviews.llvm.org/D37548
llvm-svn: 312736
the class becoming complete and its inline methods being parsed.
This replaces the hack of using the "late parsed template" flag to track member
functions with bodies we've not parsed yet; instead we now use the "will have
body" flag, which carries the desired implication that the function declaration
*is* a definition, and that we've just not parsed its body yet.
llvm-svn: 310776
Summary:
When compiling device code, we may still see host code with explicit
calling conventions. NVPTX needs to claim that it supports these CCs,
so that (a) we don't raise noisy warnings, and (b) we don't break
existing code which relies on the existence of these CCs when
specializing templates. (If a CC doesn't exist, clang ignores it, so
two template specializations which are different only insofar as one
specifies a CC are considered identical and therefore are an error if
that CC is not supported.)
Reviewers: tra
Subscribers: cfe-commits
Differential Revision: https://reviews.llvm.org/D28323
llvm-svn: 291136
Summary: CUDA attributes are spelled __declspec(__foo__) on Windows.
Reviewers: tra
Subscribers: cfe-commits, rnk
Differential Revision: https://reviews.llvm.org/D28321
llvm-svn: 291134
Some functions and templates are treated as __host__ __device__ even
when they don't have explicitly specified target attributes.
What's worse, this treatment may change depending on command line
options (-fno-cuda-host-device-constexpr) or
#pragma clang force_cuda_host_device.
Combined with strict checking for matching function target that comes
with D25809(r288962), it makes it hard to write code which would
explicitly instantiate or specialize some functions regardless of
pragmas or command line options in effect.
This patch changes the way we match target attributes of base template
vs attributes used in explicit instantiation or specialization so that
only explicitly specified attributes are considered. This makes base
template selection behave consistently regardless of pragma of command
line options that may affect CUDA target.
Differential Revision: https://reviews.llvm.org/D25845
llvm-svn: 289091
* __host__ __device__ functions are no longer considered to be
redeclarations of __host__ or __device__ functions. This prevents
unintentional merging of target attributes across them.
* Function target attributes are not considered (and must match) during
explicit instantiation and specialization of function templates.
Differential Revision: https://reviews.llvm.org/D25809
llvm-svn: 288962
Summary:
Previously we'd look at the GVALinkage of whatever FunctionDecl you
happened to be calling.
This is not right. In the absence of the gnu_inline attribute, to be
handled separately, the function definition determines the function's
linkage. So we need to wait until we get a def before we can know
whether something is known-emitted.
Reviewers: tra
Subscribers: cfe-commits, rsmith
Differential Revision: https://reviews.llvm.org/D26268
llvm-svn: 286313
Summary:
In CUDA compilation, we call isInlineDefinitionExternallyVisible (via
getGVALinkageForFunction) on functions while parsing their definitions.
At the point in time when we call getGVALinkageForFunction, we haven't
yet added the body to the function, so we trip this assert. But as far
as I can tell, this is harmless.
To work around this, we add a new flag to FunctionDecl, "WillHaveBody".
There was other code that was working around the existing assert with a
really awful hack -- this change lets us get rid of that hack.
Reviewers: rsmith, tra
Subscribers: aemerson, cfe-commits
Differential Revision: https://reviews.llvm.org/D25640
llvm-svn: 285410
Summary:
Previously, when you did something not allowed in a host+device function
and then caused it to be codegen'ed, we would print out an error telling
you that you did something bad, but we wouldn't tell you how we decided
that the function needed to be codegen'ed.
This change causes us to print out a callstack when emitting deferred
errors. This is immensely helpful when debugging highly-templated code,
where it's often unclear how a function became known-emitted.
We only print the callstack once per function, after we print the all
deferred errors.
This patch also switches all of our hashtables to using canonical
FunctionDecls instead of regular FunctionDecls. This prevents a number
of bugs, some of which are caught by tests added here, in which we
assume that two FDs for the same function have the same pointer value.
Reviewers: rnk
Subscribers: cfe-commits, tra
Differential Revision: https://reviews.llvm.org/D25704
llvm-svn: 284647
Summary:
This fixes two related bugs:
1) Previously, if you had a non-wrong side call at some source code
location L, we wouldn't emit errors for wrong-side calls that appeared
at L.
2) We'd only emit one wrong-side error per source code location, when we
actually want to emit it twice if we hit this line more than once due to
e.g. template instantiation.
Reviewers: tra
Subscribers: rnk, cfe-commits
Differential Revision: https://reviews.llvm.org/D25702
llvm-svn: 284643
Summary:
Previously we had to split out a lot of our tests into a test that
checked only immediate errors and a test that checked only deferred
errors. This was because, if you emitted any immediate errors, we
wouldn't run codegen, where the deferred errors were emitted.
We've fixed this, and now emit deferred errors during sema. This lets
us merge a bunch of tests, and lets us convert some other tests to
-fsyntax-only.
Reviewers: tra
Subscribers: cfe-commits
Differential Revision: https://reviews.llvm.org/D25755
llvm-svn: 284553
Previously: When compiling for host, our constructed call graph went
*through* kernel calls. This meant that if we had
host calls kernel calls HD
we would incorrectly mark the HD function as known-emitted on the host
side, and thus perform host-side checks on it.
Fixing this exposed another issue, wherein when marking a function as
known-emitted, we also need to traverse the callgraph of its template,
because non-dependent calls are attached to a function's template, not
its instantiation.
llvm-svn: 284355
Summary:
Emitting deferred diagnostics during codegen was a hack. It did work,
but usability was poor, both for us as compiler devs and for users. We
don't codegen if there are any sema errors, so for users this meant that
they wouldn't see deferred errors if there were any non-deferred errors.
For devs, this meant that we had to carefully split up our tests so that
when we tested deferred errors, we didn't emit any non-deferred errors.
This change moves checking for deferred errors into Sema. See the big
comment in SemaCUDA.cpp for an overview of the idea.
This checking adds overhead to compilation, because we have to maintain
a partial call graph. As a result, this change makes deferred errors a
CUDA-only concept (whereas before they were a general concept). If
anyone else wants to use this framework for something other than CUDA,
we can generalize at that time.
This patch makes the minimal set of test changes -- after this lands,
I'll go back through and do a cleanup of the tests that we no longer
have to split up.
Reviewers: rnk
Subscribers: cfe-commits, rsmith, tra
Differential Revision: https://reviews.llvm.org/D25541
llvm-svn: 284158
Summary:
Together these let you easily create diagnostics that
- are never emitted for host code
- are always emitted for __device__ and __global__ functions, and
- are emitted for __host__ __device__ functions iff these functions are
codegen'ed.
At the moment there are only three diagnostics that need this treatment,
but I have more to add, and it's not sustainable to write code for emitting
every such diagnostic twice, and from a special wrapper in SemaCUDA.cpp.
While we're at it, don't emit the function name in
err_cuda_device_exceptions: It's not necessary to print it, and making
this work in the new framework in the face of a null value for
dyn_cast<FunctionDecl>(CurContext) isn't worth the effort.
Reviewers: rnk
Subscribers: cfe-commits, tra
Differential Revision: https://reviews.llvm.org/D25139
llvm-svn: 284143
Previously, this was an immediate, don't pass go, don't collect $200
error. But this precludes us from writing code like
__host__ __device__ void launch_kernel() {
kernel<<<...>>>();
}
Such code isn't wrong, following our notions of right and wrong in CUDA,
unless it's codegen'ed.
llvm-svn: 283963
match other CUDA preference orders, per discussion with jlebar. We now model
this in an attempt to match overload resolution as closely as possible:
- First, we throw out all non-callable (due to CUDA host/device mismatch)
operator delete functions.
- Then we apply sizedness / alignedness preferences based on whether the type
is overaligned and whether the deallocation function is a member.
- Finally, we use the CUDA callability preference as a tiebreaker.
llvm-svn: 283830
Summary: This matches the idiom we use for our other CUDA wrapper headers.
Reviewers: tra
Subscribers: beanz, mgorny, cfe-commits
Differential Revision: https://reviews.llvm.org/D24978
llvm-svn: 283679
Summary:
Move CheckCUDACall from ActOnCallExpr and BuildDeclRefExpr to
DiagnoseUseOfDecl. This lets us catch some edge cases we were missing,
specifically around class operators.
This necessitates a few other changes:
- Avoid emitting duplicate deferred diags in CheckCUDACall.
Previously we'd carefully placed our call to CheckCUDACall such that
it would only ever run once for a particular callsite. But now this
isn't the case.
- Emit deferred diagnostics from a template
specialization/instantiation's primary template, in addition to from
the specialization/instantiation itself. DiagnoseUseOfDecl ends up
putting the deferred diagnostics on the template, rather than the
specialization, so we need to check both.
Reviewers: rsmith
Subscribers: cfe-commits, tra
Differential Revision: https://reviews.llvm.org/D24573
llvm-svn: 283637
Summary:
We'd attempted to allow this, but turns out we were doing a very bad
job. :)
Making this work properly would be a giant change in clang. For
example, we'd need to make CXXRecordDecl::getDestructor()
context-sensitive, because the destructor you end up with depends on
where you're calling it from.
For now (and hopefully for ever), just disallow overloading of
destructors in CUDA.
Reviewers: rsmith
Subscribers: cfe-commits, tra
Differential Revision: https://reviews.llvm.org/D24571
llvm-svn: 283120
Also add a test that we disallow
__constant__ __shared__ int x;
because it's possible to break this without breaking
__shared__ __constant__ int x;
Reviewers: rnk
Subscribers: cfe-commits, tra
Differential Revision: https://reviews.llvm.org/D25125
llvm-svn: 282985
__attribute__((amdgpu_flat_work_group_size(<min>, <max>))) - request minimum and maximum flat work group size
__attribute__((amdgpu_waves_per_eu(<min>[, <max>]))) - request minimum and/or maximum waves per execution unit
Differential Revision: https://reviews.llvm.org/D24513
llvm-svn: 282371
Summary: This functionality is used by Thrust.
Reviewers: tra
Subscribers: cfe-commits
Differential Revision: https://reviews.llvm.org/D24581
llvm-svn: 281543
CUDA target attributes are used for function overloading and must not be merged.
This fixes a bug where attributes were inherited during function template
specialization in CUDA and made it impossible for specialized function
to provide its own target attributes.
Differential Revision: https://reviews.llvm.org/D24522
llvm-svn: 281406
Summary:
Some function calls in CUDA are allowed to appear in
semantically-correct programs but are an error if they're ever
codegen'ed. Specifically, a host+device function may call a host
function, but it's an error if such a function is ever codegen'ed in
device mode (and vice versa).
Previously, clang made no attempt to catch these errors. For the most
part, they would be caught by ptxas, and reported as "call to unknown
function 'foo'".
Now we catch these errors and report them the same as we report other
illegal calls (e.g. a call from a host function to a device function).
This has a small change in error-message behavior for calls that were
previously disallowed (e.g. calls from a host to a device function).
Previously, we'd catch disallowed calls fairly early, before doing
additional semantic checking e.g. of the call's arguments. Now we catch
these illegal calls at the very end of our semantic checks, so we'll
only emit a "illegal CUDA call" error if the call is otherwise
well-formed.
Reviewers: tra, rnk
Subscribers: cfe-commits
Differential Revision: https://reviews.llvm.org/D23242
llvm-svn: 278759
Summary:
When we emit err_ref_bad_target, we should emit a "'method' declared
here" note. We already do so in most places, just not in
BuildCallToMemberFunction.
Reviewers: tra
Subscribers: cfe-commits
Differential Revision: https://reviews.llvm.org/D23240
llvm-svn: 278195
Summary:
Specifically, this patch adds testcases for all three calls to
EraseUnwantedCUDAMatches. The addr-of-overloaded-fn test I accidentally
neutered in r264207, which moved much of
CodeGenCUDA/function-overload.cu into SemaCUDA/function-overload.cu.
The coverage from overloaded-delete test is new.
Reviewers: tra
Subscribers: cfe-commits
Differential Revision: http://reviews.llvm.org/D21913
llvm-svn: 275232
Summary:
You can overload a destructor in CUDA, and SemaOverload needs to be
tweaked not to crash when it sees an explicit call to an overloaded
destructor.
Reviewers: rsmith
Subscribers: cfe-commits, tra
Differential Revision: http://reviews.llvm.org/D21912
llvm-svn: 275231
Some people have weird CI systems that run each test subdirectory
independently without access to other parallel trees.
Unfortunately, this means we have to suffer some duplication until Art
can sort out how to share these types.
llvm-svn: 270164
According to Cuda Programming guide (v7.5, E2.3.1):
> __device__, __constant__ and __shared__ variables defined in namespace
> scope, that are of class type, cannot have a non-empty constructor or a
> non-empty destructor.
Clang already deals with device-side constructors (see D15305).
This patch enforces similar rules for destructors.
Differential Revision: http://reviews.llvm.org/D20140
llvm-svn: 270108
Codegen tests for device-side variable initialization are subset of test
cases used to verify Sema's part of the job.
Including CodeGenCUDA/device-var-init.cu from SemaCUDA makes it easier to
keep both sides in sync.
Differential Revision: http://reviews.llvm.org/D20139
llvm-svn: 270107
a base class via a using-declaration. If a class has a using-declaration
declaring either a constructor or an assignment operator, eagerly declare its
special members in case they need to displace a shadow declaration from a
using-declaration.
llvm-svn: 269398
With this patch, by a constexpr function is implicitly host+device
unless:
a) it's a variadic function (variadic functions are not allowed on the
device side), or
b) it's preceeded by a __device__ overload in a system header.
The restriction on overloading __host__ __device__ functions on the
basis of their CUDA attributes remains in place, but we use (b) to allow
us to define __device__ overloads for constexpr functions in cmath,
which would otherwise be __host__ __device__ and thus not overloadable.
You can disable this behavior with -fno-cuda-host-device-constexpr.
Reviewers: tra, rnk, rsmith
Subscribers: cfe-commits
Differential Revision: http://reviews.llvm.org/D18380
llvm-svn: 264964
Summary:
* -fcuda-target-overloads
Previously unconditionally set to true by the driver. Necessary for
correct functioning of the compiler -- our CUDA headers wrapper won't
compile without this.
* -fcuda-disable-target-call-checks
Previously unconditionally set to true by the driver. Necessary to
compile almost any external CUDA code -- almost all libraries assume
that host+device code can call host or device functions.
* -fcuda-allow-host-calls-from-host-device
No effect when target overloading is enabled.
Reviewers: tra
Subscribers: rsmith, cfe-commits
Differential Revision: http://reviews.llvm.org/D18416
llvm-svn: 264739
Summary:
Previously we were using the codegen test to ensure that we choose the
right overload. But we can do this within sema, with a bit of
cleverness.
I left the constructor/destructor checks in CodeGen, because these
overloads (particularly on the destructors) are hard to check in Sema.
Reviewers: tra
Subscribers: cfe-commits
Differential Revision: http://reviews.llvm.org/D18386
llvm-svn: 264207
Summary:
Principally, don't hardcode the line numbers of various notes. This
lets us make changes to the test without recomputing linenos everywhere.
Instead, just tell -verify that we may get 0 or more notes pointing to
the relevant function definitions. Checking that we get exactly the
right note isn't so important (and anyway is checked elsewhere).
Reviewers: tra
Subscribers: cfe-commits
Differential Revision: http://reviews.llvm.org/D18385
llvm-svn: 264206
__global__ functions are present on both host and device side,
so providing __host__ or __device__ overloads is not going to
do anything useful.
llvm-svn: 261778
This is an artefact of split-mode CUDA compilation that we need to
mimic. HD functions are sometimes allowed to call H or D functions. Due
to split compilation mode device-side compilation will not see host-only
function and thus they will not be considered at all. For clang both H
and D variants will become function overloads visible to
compiler. Normally target attribute is considered only if C++ rules can
not determine which function is better. However in this case we need to
ignore functions that would not be present during current compilation
phase before we apply normal overload resolution rules.
Changes:
* introduced another level of call preference to better describe
possible call combinations.
* removed WrongSide functions from consideration if the set contains
SameSide function.
* disabled H->D, D->H and G->H calls. These combinations are
not allowed by CUDA and we were reluctantly allowing them to work
around device-side calls to math functions in std namespace.
We no longer need it after r258880.
Differential Revision: http://reviews.llvm.org/D16870
llvm-svn: 260697
Summary:
Turns out the variadic function checking added in r258643 was too strict
for some existing users; give them an escape valve. When
-fcuda-allow-variadic-functions is passed, the front-end makes no
attempt to disallow C-style variadic functions. Calls to va_arg are
still not allowed.
Reviewers: tra
Subscribers: cfe-commits, jhen, echristo, bkramer
Differential Revision: http://reviews.llvm.org/D16559
llvm-svn: 258822
Summary:
Warn for NVCC compatibility if you declare a static member function or
inline function as __global__.
Reviewers: tra
Subscribers: jhen, echristo, cfe-commits
Differential Revision: http://reviews.llvm.org/D16261
llvm-svn: 258263
Value, type, and instantiation dependence were not being handled
correctly for CUDAKernelCallExpr AST nodes. As a result, if an
undeclared identifier was used in the triple-angle-bracket kernel call
configuration, there would be no error during parsing, and there would
be a crash during code gen. This patch makes sure that an error will be
issued during parsing in this case, just as there would be for any other
use of an undeclared identifier in C++.
Patch by Jason Henline.
Reviewers: jlebar, rsmith
Differential Revision: http://reviews.llvm.org/D15858
llvm-svn: 257839
Summary: Thanks to jhen for helping me figure this out.
Reviewers: tra, echristo
Subscribers: jhen
Differential Revision: http://reviews.llvm.org/D16129
llvm-svn: 257554
Currently, when there is a global register variable in a program that
is bound to an invalid register, clang/llvm prints an error message that
is not very user-friendly.
This commit improves the diagnostic and moves the check that used to be
in the backend to Sema. In addition, it makes changes to error out if
the size of the register doesn't match the declared variable size.
e.g., volatile register int B asm ("rbp");
rdar://problem/23084219
Differential Revision: http://reviews.llvm.org/D13834
llvm-svn: 253405
Clang needs to know target triple for both sides of compilation so that
preprocessor macros and target builtins from both sides are available.
This change augments Compilation class to carry information about
toolchains used during different CUDA compilation passes and refactors
BuildActions to use it when it constructs CUDA jobs.
Removed DeviceTriple from CudaHostAction/CudaDeviceAction as it's no
longer needed.
Differential Revision: http://reviews.llvm.org/D13144
llvm-svn: 253385
* adds -aux-triple option to specify target triple
* propagates aux target info to AST context and Preprocessor
* pulls in target specific preprocessor macros.
* pulls in target-specific builtins from aux target.
* sets appropriate host or device attribute on builtins.
Differential Revision: http://reviews.llvm.org/D12917
llvm-svn: 248299
The changes are part of attribute-based CUDA function overloading (D12453)
and as such are only enabled when it's in effect (-fcuda-target-overloads).
Differential Revision: http://reviews.llvm.org/D12122
llvm-svn: 248296
The patch makes it possible to parse CUDA files that contain host/device
functions with identical signatures, but different attributes without
having to physically split source into host-only and device-only parts.
This change is needed in order to parse CUDA header files that have
a lot of name clashes with standard include files.
Gory details are in design doc here: https://goo.gl/EXnymm
Feel free to leave comments there or in this review thread.
This feature is controlled with CC1 option -fcuda-target-overloads
and is disabled by default.
Differential Revision: http://reviews.llvm.org/D12453
llvm-svn: 248295
The main purpose is to avoid errors and warnings while parsing CUDA
header files. The attributes are currently unused otherwise.
Differential version: http://reviews.llvm.org/D11690
llvm-svn: 244497
During device-side CUDA compilation clang currently complains about
all TLS variables, regardless of whether they are __host__ or
__device__.
This patch suppresses "TLS unsupported" errors for host variables
during device compilation and for device variables during host
compilation.
Differential Revision: http://reviews.llvm.org/D9269
llvm-svn: 235907
- Changed CUDALaunchBounds arguments from integers to Expr* so they can
be saved in AST for instantiation.
- Added support for template instantiation of launch_bounds attrubute.
- Moved evaluation of launch_bounds arguments to NVPTXTargetCodeGenInfo::
SetTargetAttributes() where it can be done after template instantiation.
- Added a warning on negative launch_bounds arguments.
- Amended test cases.
Differential Revision: http://reviews.llvm.org/D8985
llvm-svn: 235452
Added cuda_builtin_vars.h which implements built-in CUDA variables
using __declattr(property).
Fields of built-in variables (except for warpSize) are implemented
using __declattr(property) which replaces read/write of a member field
with a call to a getter/setter member function, in this case with
appropriate NVPTX builtin.
Added a test case to check diagnostics on attempt to construct or
improperly access a built-in variable.
Differential Revision: http://reviews.llvm.org/D9064
llvm-svn: 235448
Added cuda_builtin_vars.h which implements built-in CUDA variables
using __declattr(property).
Fields of built-in variables (except for warpSize) are implemented
using __declattr(property) which replaces read/write of a member field
with a call to a getter/setter member function, in this case with
appropriate NVPTX builtin.
Added a test case to check diagnostics on attempt to construct or
improperly access a built-in variable.
Differential Revision: http://reviews.llvm.org/D9064
llvm-svn: 235398
For CUDA source, Sema checks that the targets of call expressions make sense
(e.g. a host function can't call a device function).
Adding a flag that lets us skip this check. Motivation: for source-to-source
translation tools that have to accept code that's not strictly kosher CUDA but
is still accepted by nvcc. The source-to-source translation tool can then fix
the code and leave calls that are semantically valid for the actual compilation
stage.
Differential Revision: http://reviews.llvm.org/D9036
llvm-svn: 235049
In SemaCUDA all implicit functions were considered host device, this led to
errors such as the following code snippet failing to compile:
struct Copyable {
const Copyable& operator=(const Copyable& x) { return *this; }
};
struct Simple {
Copyable b;
};
void foo() {
Simple a, b;
a = b;
}
Above the implicit copy assignment operator was inferred as host device but
there was only a host assignment copy defined which is an error in device
compilation mode.
Differential Revision: http://reviews.llvm.org/D6565
llvm-svn: 224358
Placing the attribute after the kernel keyword would incorrectly
reject the attribute, so use the smae workaround that other
kernel only attributes use.
Also add a FIXME because there are two different phrasings now
for the same error, althoug amdgpu_num_[sv]gpr uses a consistent one.
llvm-svn: 223490
Summary:
Allow CUDA host device functions with two code paths using __CUDA_ARCH__
to differentiate between code path being compiled.
For example:
__host__ __device__ void host_device_function(void) {
#ifdef __CUDA_ARCH__
device_only_function();
#else
host_only_function();
#endif
}
Patch by Jacques Pienaar.
Reviewed By: rnk
Differential Revision: http://reviews.llvm.org/D6457
llvm-svn: 223271
r218624 implemented target inference for implicit special members. However,
other entities can be implicit - for example intrinsics. These can not have
inference running on them, so they should be marked host device as before. This
is the safest and most flexible setting, since by construction these functions
don't invoke anything, and we'd like them to be invokable from both host and
device code. LLVM's intrinsics definitions (where these intrinsics come from in
the case of CUDA/NVPTX) have no notion of target, so both host and device
intrinsics can be supported this way.
llvm-svn: 218688
As PR20495 demonstrates, Clang currenlty infers the CUDA target (host/device,
etc) for implicit members (constructors, etc.) incorrectly. This causes errors
and even assertions in Clang when compiling code (assertions in C++11 mode where
implicit move constructors are added into the mix).
Fix the problem by inferring the target from the methods the implicit member
should call (depending on its base classes and fields).
llvm-svn: 218624
Updating the diagnostics in the launch_bounds test since they have been improved in that case. Adding a test for nonnull since it has little test coverage, but has truly variadic arguments.
llvm-svn: 214407