Commit Graph

182 Commits

Author SHA1 Message Date
Yaxun Liu c3dfe9082b [HIP] Support attribute hip_pinned_shadow
This patch introduces support of hip_pinned_shadow variable for HIP.

A hip_pinned_shadow variable is a global variable with attribute hip_pinned_shadow.
It has external linkage on device side and has no initializer. It has internal
linkage on host side and has initializer or static constructor. It can be accessed
in both device code and host code.

This allows HIP runtime to implement support of HIP texture reference.

Differential Revision: https://reviews.llvm.org/D62738

llvm-svn: 364381
2019-06-26 03:47:37 +00:00
Erich Keane 505427cb2f Permit redeclarations of a builtin to specify calling convention.
After https://reviews.llvm.org/rL355317 we noticed that quite a decent
amount of code redeclares builtins (memcpy in particular, I believe
reduced from an MSVC header) with a calling convention specified.
This gets particularly troublesome when the user specifies a new
'default' calling convention on the command line.

When looking to add a diagnostic for this case, it was noticed that we
had 3 other diagnostics that differed only slightly.  This patch ALSO
unifies those under a 'select'.  Unfortunately, the order of words in
ONE of these diagnostics was reversed ("'thiscall' calling convention"
vs "calling convention 'thiscall'"), so this patch also standardizes on
the former.

Differential Revision: https://reviews.llvm.org/D59560

Change-Id: I79f99fe7c2301640755ffdd774b46eb44526bb22
llvm-svn: 356663
2019-03-21 13:30:56 +00:00
Yaxun Liu c5be267003 [CUDA][HIP][Sema] Fix template kernel with function as template parameter
If a kernel template has a function as its template parameter, a device function should be
allowed as template argument since a kernel can call a device function. However,
currently if the kernel template is instantiated in a host function, clang will emit an error
message saying the device function is an invalid candidate for the template parameter.

This happens because clang checks the reference to the device function during parsing
the template arguments. At this point, the template is not instantiated yet. Clang incorrectly
assumes the device function is called by the host function and emits the error message.

This patch fixes the issue by disabling checking of device function during parsing template
arguments and deferring the check to the instantion of the template. At that point, the
template decl is already available, therefore the check can be done against the instantiated
function template decl.

Differential Revision: https://reviews.llvm.org/D56411

llvm-svn: 355421
2019-03-05 18:19:35 +00:00
Yaxun Liu fa49c3a888 [CUDA][HIP] Check calling convention based on function target
MSVC header files using vectorcall to differentiate overloaded functions, which
causes failure for AMDGPU target. This is because clang does not check function
calling convention based on function target.

This patch checks calling convention using the proper target info.

Differential Revision: https://reviews.llvm.org/D57716

llvm-svn: 354929
2019-02-26 22:24:49 +00:00
Alexey Bataev 305b6b9647 [OPENMP][CUDA]Do not emit warnings for variables in late-reported asm
statements.

If the assembler instruction is not generated and the delayed diagnostic
is emitted, we may end up with extra warning message for variables used
in the asm statement. Since the asm statement is not built, the
variables may be left non-referenced and it may produce a warning about
a use of the non-initialized variables.

llvm-svn: 354928
2019-02-26 21:51:16 +00:00
Michael Liao 7557afa000 [AMDGPU] Allow using integral non-type template parameters
Summary:
- Allow using integral non-type template parameters in the following
  attributes

  __attribute__((amdgpu_flat_work_group_size(<min>, <max>)))
  __attribute__((amdgpu_waves_per_eu(<min>[, <max>])))

Reviewers: kzhuravl, yaxunl

Subscribers: jvesely, wdng, nhaehnle, dstuttard, tpr, t-tye, jdoerfert, cfe-commits

Tags: #clang

Differential Revision: https://reviews.llvm.org/D58623

llvm-svn: 354909
2019-02-26 18:49:36 +00:00
Alexey Bataev e69f94e022 [OPENMP] Delayed diagnostics for VLA support.
Generalized processing of the deferred diagnostics for OpenMP/CUDA code.

llvm-svn: 354690
2019-02-22 20:36:10 +00:00
Alexey Bataev bbd5c55c66 Revert "[OPENMP] Delayed diagnostics for VLA support."
This reverts commit r354679 to fix the problem with the Windows
buildbots

llvm-svn: 354680
2019-02-22 17:16:50 +00:00
Alexey Bataev b09bcf8efd [OPENMP] Delayed diagnostics for VLA support.
Generalized processing of the deferred diagnostics for OpenMP/CUDA code.

llvm-svn: 354679
2019-02-22 16:49:13 +00:00
Alexey Bataev 3167b3035e [CUDA]Delayed diagnostics for the asm instructions.
Adapted targetDiag for the CUDA and used for the delayed diagnostics in
asm constructs. Works for both host and device compilation sides.

Differential Revision: https://reviews.llvm.org/D58463

llvm-svn: 354671
2019-02-22 14:42:48 +00:00
Alexey Bataev 12a21e4b69 Revert "[CUDA]Delayed diagnostics for the asm instructions."
This reverts commit r354593 to fix the problem with the crash on
windows.

llvm-svn: 354596
2019-02-21 16:40:21 +00:00
Alexey Bataev 16d3e1a4d2 [CUDA]Delayed diagnostics for the asm instructions.
Summary:
Adapted targetDiag for the CUDA and used for the delayed diagnostics in
asm constructs. Works for both host and device compilation sides.

Reviewers: tra, jlebar

Subscribers: jdoerfert, cfe-commits

Tags: #clang

Differential Revision: https://reviews.llvm.org/D58463

llvm-svn: 354593
2019-02-21 15:51:30 +00:00
Artem Belevich c62214da3d [CUDA] add support for the new kernel launch API in CUDA-9.2+.
Instead of calling CUDA runtime to arrange function arguments,
the new API constructs arguments in a local array and the kernels
are launched with __cudaLaunchKernel().

The old API has been deprecated and is expected to go away
in the next CUDA release.

Differential Revision: https://reviews.llvm.org/D57488

llvm-svn: 352799
2019-01-31 21:34:03 +00:00
Yaxun Liu 95f2ca541f [HIP] Fix size_t for MSVC environment
In 64 bit MSVC environment size_t is defined as unsigned long long.
In single source language like HIP, data layout should be consistent
in device and host compilation, therefore copy data layout controlling
fields from Aux target for AMDGPU target.

Differential Revision: https://reviews.llvm.org/D56318

llvm-svn: 352620
2019-01-30 12:26:54 +00:00
Yaxun Liu d442500f5d [CUDA][HIP] Do not diagnose use of _Float16
r352221 caused regressions in CUDA/HIP since device function may use _Float16 whereas host does not support it.
In this case host compilation should not diagnose usage of _Float16 in device functions or variables.

For now just do not diagnose _Float16 for CUDA/HIP. In the future we should have more precise check.

Differential Revision: https://reviews.llvm.org/D57369

llvm-svn: 352488
2019-01-29 13:20:23 +00:00
Yaxun Liu a461174cfd [CUDA][HIP] Fix ShouldDeleteSpecialMember for inherited constructors
ShouldDeleteSpecialMember is called upon inherited constructors.
It calls inferCUDATargetForImplicitSpecialMember.

Normally the special member enum passed to ShouldDeleteSpecialMember
matches the constructor. However this is not true when inherited
constructor is passed, where DefaultConstructor is passed to treat
the inherited constructor as DefaultConstructor. However
inferCUDATargetForImplicitSpecialMember expects the special
member enum argument to match the constructor, which results
in assertion when this expection is not satisfied.

This patch checks whether the constructor is inherited. If true it will
get the real special member enum for the constructor and pass it
to inferCUDATargetForImplicitSpecialMember.

Differential Revision: https://reviews.llvm.org/D51809

llvm-svn: 344057
2018-10-09 15:53:14 +00:00
Yaxun Liu 9767089d00 [HIP] Support early finalization of device code for -fno-gpu-rdc
This patch renames -f{no-}cuda-rdc to -f{no-}gpu-rdc and keeps the original
options as aliases. When -fgpu-rdc is off,
clang will assume the device code in each translation unit does not call
external functions except those in the device library, therefore it is possible
to compile the device code in each translation unit to self-contained kernels
and embed them in the host object, so that the host object behaves like
usual host object which can be linked by lld.

The benefits of this feature is: 1. allow users to create static libraries which
can be linked by host linker; 2. amortized device code linking time.

This patch modifies HIP action builder to insert actions for linking device
code and generating HIP fatbin, and pass HIP fatbin to host backend action.
It extracts code for constructing command for generating HIP fatbin as
a function so that it can be reused by early finalization. It also modifies
codegen of HIP host constructor functions to embed the device fatbin
when it is available.

Differential Revision: https://reviews.llvm.org/D52377

llvm-svn: 343611
2018-10-02 17:48:54 +00:00
Richard Smith 9b2c5e7c44 [cxx2a] P0641R2: (Some) type mismatches on defaulted functions only
render the function deleted instead of rendering the program ill-formed.

This change also adds an enabled-by-default warning for the case where
an explicitly-defaulted special member function of a non-template class
is implicitly deleted by the type checking rules. (This fires either due
to this language change or due to pre-C++20 reasons for the member being
implicitly deleted). I've tested this on a large codebase and found only
bugs (where the program means something that's clearly different from
what the programmer intended), so this is enabled by default, but we
should revisit this if there are problems with this being enabled by
default.

llvm-svn: 343285
2018-09-28 01:16:43 +00:00
Artem Belevich 78929efb4d [CUDA] Ignore uncallable functions when we check for usual deallocators.
Previously clang considered function variants from both sides of
compilation and that resulted in picking up wrong deallocation function.

Differential Revision: https://reviews.llvm.org/D51808

llvm-svn: 342749
2018-09-21 17:29:33 +00:00
Artem Belevich fe5b1ac142 Revert the tests that should've been reverted in rL341115
llvm-svn: 341118
2018-08-30 20:53:15 +00:00
Jonas Hahnfeld 931939bf92 [CUDA/OpenMP] Define only some host macros during device compilation
When compiling CUDA or OpenMP device code Clang parses header files
that expect certain predefined macros from the host architecture. To
make this work the compiler passes the host triple via the -aux-triple
argument and (until now) pulls in all macros for that "auxiliary triple"
unconditionally.

However this results in defines like __SSE_MATH__ that will trigger
inline assembly making use of the "advertised" target features. See
the discussion of D47849 and PR38464 for a detailed explanation of
the encountered problems.

Instead of blacklisting "known bad" examples this patch starts adding
defines that are needed for certain headers like bits/wordsize.h and
bits/mathinline.h.
The disadvantage of this approach is that it decouples the definitions
from their target toolchain. However in my opinion it's more important
to keep definitions for one header close together. For one this will
include a clear documentation why these particular defines are needed.
Furthermore it simplifies maintenance because adding defines for a new
header or support for a new aux-triple only needs to touch one piece
of code.

Differential Revision: https://reviews.llvm.org/D50845

llvm-svn: 340681
2018-08-25 13:42:40 +00:00
Yaxun Liu a4005e13f7 [CUDA][HIP] Allow function-scope static const variable
CUDA 8.0 E.3.9.4 says: Within the body of a __device__ or __global__
function, only __shared__ variables or variables without any device
memory qualifiers may be declared with static storage class.

It is unclear how a function-scope non-const static variable
without device memory qualifier is implemented, therefore only static
const variable without device memory qualifier is allowed, which
can be emitted as a global variable in constant address space.

Currently clang only allows function-scope static variable with
__shared__ qualifier.

This patch also allows function-scope static const variable without
device memory qualifier and emits it as a global variable in constant
address space.

Differential Revision: https://reviews.llvm.org/D49931

llvm-svn: 338188
2018-07-28 03:05:25 +00:00
Yaxun Liu aa24601f98 [CUDA][HIP] Allow CUDA __global__ functions to have amdgpu kernel attributes
There are HIP applications e.g. Tensorflow 1.3 using amdgpu kernel attributes, however
currently they are only allowed on OpenCL kernel functions.

This patch will allow amdgpu kernel attributes to be applied to CUDA/HIP __global__
functions.

Differential Revision: https://reviews.llvm.org/D47958

llvm-svn: 334561
2018-06-12 23:58:59 +00:00
Artem Belevich e9fa53a09b [CUDA] Check initializers of instantiated template variables.
We were already performing checks on non-template variables,
but the checks on templated ones were missing.

Differential Revision: https://reviews.llvm.org/D45231

llvm-svn: 334143
2018-06-06 22:37:25 +00:00
Justin Lebar 5489f85fda [CUDA] Allow "extern __shared__ Foo foo[]" within anon. namespaces.
Summary:
Previously this triggered a -Wundefined-internal warning.  But it's not
an undefined variable -- any variable of this form is a pointer to the
base of GPU core's shared memory.

Reviewers: tra

Subscribers: sanjoy, rsmith

Differential Revision: https://reviews.llvm.org/D46782

llvm-svn: 332621
2018-05-17 16:15:07 +00:00
Alexander Kornienko 2a8c18d991 Fix typos in clang
Found via codespell -q 3 -I ../clang-whitelist.txt
Where whitelist consists of:

  archtype
  cas
  classs
  checkk
  compres
  definit
  frome
  iff
  inteval
  ith
  lod
  methode
  nd
  optin
  ot
  pres
  statics
  te
  thru

Patch by luzpaz! (This is a subset of D44188 that applies cleanly with a few
files that have dubious fixes reverted.)

Differential revision: https://reviews.llvm.org/D44188

llvm-svn: 329399
2018-04-06 15:14:32 +00:00
Artem Belevich 67d22c8a84 Revert "[CUDA] Check initializers of instantiated template variables."
This (temporarily) reverts commit r329127 due to the problems
it exposed in TensorFlow.

llvm-svn: 329229
2018-04-04 20:48:42 +00:00
Artem Belevich d9189d1e76 [CUDA] Check initializers of instantiated template variables.
We were already performing checks on non-template variables,
but the checks on templated ones were missing.

Differential Revision: https://reviews.llvm.org/D45231

llvm-svn: 329127
2018-04-03 22:41:06 +00:00
Richard Trieu b402580616 Fix some handling of AST nodes with diagnostics.
The diagnostic system for Clang can already handle many AST nodes.  Instead
of converting them to strings first, just hand the AST node directly to
the diagnostic system and let it handle the output.  Minor changes in some
diagnostic output.

llvm-svn: 328688
2018-03-28 04:16:13 +00:00
Artem Belevich e2ae8b5510 [CUDA] Fixed false error reporting in case of calling H->G->HD->D.
Launching a kernel from the host code does not generate code for the
kernel itself. This fixes an issue with clang erroneously reporting
an error for a HD->D call from within the kernel.

Differential Revision: https://reviews.llvm.org/D44837

llvm-svn: 328362
2018-03-23 19:49:03 +00:00
Jonas Hahnfeld ee47d8cb96 [CUDA] Allow external variables in separate compilation
According to the CUDA Programming Guide this is prohibited in
whole program compilation mode. This makes sense because external
references cannot be satisfied in that mode anyway. However,
such variables are allowed in separate compilation mode which
is a valid use case.

Differential Revision: https://reviews.llvm.org/D42923

llvm-svn: 325136
2018-02-14 16:04:03 +00:00
Artem Belevich 50e6e54587 [CUDA] Report "unsupported VLA" errors only on device side.
This fixes erroneously reported CUDA compilation errors
in host-side code during device-side compilation.

I've also restricted OpenMP-specific checks to trigger only
if we're compiling with OpenMP enabled.

Differential Revision: https://reviews.llvm.org/D40275

llvm-svn: 319201
2017-11-28 18:51:42 +00:00
Aaron Ballman adf66b6174 Determine the attribute subject for diagnostics based on declarative information in DeclNodes.td. This greatly reduces the number of enumerated values used for more complex diagnostics; these are now only required when the "attribute only applies to" diagnostic needs to be generated manually as part of semantic processing.
This also clarifies some terminology used by the diagnostic (methods -> Objective-C methods, fields -> non-static data members, etc).

Many of the tests needed to be updated in multiple places for the diagnostic wording tweaks. The first instance of the diagnostic for that attribute is fully specified and subsequent instances cut off the complete list (to make it easier if additional subjects are added in the future for the attribute).

llvm-svn: 319002
2017-11-26 20:01:12 +00:00
Justin Lebar 78137ec868 [CUDA] When compilation fails, print the compilation mode.
Summary:
That is, instead of "1 error generated", we now say "1 error generated
when compiling for sm_35".

This (partially) solves a usability foogtun wherein e.g. users call a
function that's only defined on sm_60 when compiling for sm_35, and they
get an unhelpful error message.

Reviewers: tra

Subscribers: sanjoy, cfe-commits

Differential Revision: https://reviews.llvm.org/D37548

llvm-svn: 312736
2017-09-07 18:37:16 +00:00
Richard Smith 6c716116df PR34163: Don't cache an incorrect key function for a class if queried between
the class becoming complete and its inline methods being parsed.

This replaces the hack of using the "late parsed template" flag to track member
functions with bodies we've not parsed yet; instead we now use the "will have
body" flag, which carries the desired implication that the function declaration
*is* a definition, and that we've just not parsed its body yet.

llvm-svn: 310776
2017-08-12 01:46:03 +00:00
Justin Lebar 86c4e63ff9 [CUDA] Let NVPTX inherit the host's calling conventions.
Summary:
When compiling device code, we may still see host code with explicit
calling conventions.  NVPTX needs to claim that it supports these CCs,
so that (a) we don't raise noisy warnings, and (b) we don't break
existing code which relies on the existence of these CCs when
specializing templates.  (If a CC doesn't exist, clang ignores it, so
two template specializations which are different only insofar as one
specifies a CC are considered identical and therefore are an error if
that CC is not supported.)

Reviewers: tra

Subscribers: cfe-commits

Differential Revision: https://reviews.llvm.org/D28323

llvm-svn: 291136
2017-01-05 16:53:38 +00:00
Justin Lebar 0203f2c26e [CUDA] Add __declspec spellings for CUDA attributes.
Summary: CUDA attributes are spelled __declspec(__foo__) on Windows.

Reviewers: tra

Subscribers: cfe-commits, rnk

Differential Revision: https://reviews.llvm.org/D28321

llvm-svn: 291134
2017-01-05 16:53:04 +00:00
Artem Belevich 64135c35f7 [CUDA] Ignore implicit target attributes during function template instantiation.
Some functions and templates are treated as __host__ __device__ even
when they don't have explicitly specified target attributes.
What's worse, this treatment may change depending on command line
options (-fno-cuda-host-device-constexpr) or
#pragma clang force_cuda_host_device.

Combined with strict checking for matching function target that comes
with D25809(r288962), it makes it hard to write code which would
explicitly instantiate or specialize some functions regardless of
pragmas or command line options in effect.

This patch changes the way we match target attributes of base template
vs attributes used in explicit instantiation or specialization so that
only explicitly specified attributes are considered. This makes base
template selection behave consistently regardless of pragma of command
line options that may affect CUDA target.

Differential Revision: https://reviews.llvm.org/D25845

llvm-svn: 289091
2016-12-08 19:38:13 +00:00
Artem Belevich 13e9b4d768 [CUDA] Improve target attribute checking for function templates.
* __host__ __device__ functions are no longer considered to be
  redeclarations of __host__ or __device__ functions. This prevents
  unintentional merging of target attributes across them.
* Function target attributes are not considered (and must match) during
  explicit instantiation and specialization of function templates.

Differential Revision: https://reviews.llvm.org/D25809

llvm-svn: 288962
2016-12-07 19:27:16 +00:00
Justin Lebar 2d56c26504 [CUDA] Use only the GVALinkage on function definitions.
Summary:
Previously we'd look at the GVALinkage of whatever FunctionDecl you
happened to be calling.

This is not right.  In the absence of the gnu_inline attribute, to be
handled separately, the function definition determines the function's
linkage.  So we need to wait until we get a def before we can know
whether something is known-emitted.

Reviewers: tra

Subscribers: cfe-commits, rsmith

Differential Revision: https://reviews.llvm.org/D26268

llvm-svn: 286313
2016-11-08 23:45:51 +00:00
Justin Lebar 2b42ccc78b [CUDA] [AST] Allow isInlineDefinitionExternallyVisible to be called on functions without bodies.
Summary:
In CUDA compilation, we call isInlineDefinitionExternallyVisible (via
getGVALinkageForFunction) on functions while parsing their definitions.

At the point in time when we call getGVALinkageForFunction, we haven't
yet added the body to the function, so we trip this assert.  But as far
as I can tell, this is harmless.

To work around this, we add a new flag to FunctionDecl, "WillHaveBody".

There was other code that was working around the existing assert with a
really awful hack -- this change lets us get rid of that hack.

Reviewers: rsmith, tra

Subscribers: aemerson, cfe-commits

Differential Revision: https://reviews.llvm.org/D25640

llvm-svn: 285410
2016-10-28 16:26:26 +00:00
Justin Lebar 4d38a5cf74 [CUDA] Simplify some repeated diagnostic expectations in CUDA tests.
Instead of repeating the diagnostic, use "expected-note N".

Test-only change.

llvm-svn: 284882
2016-10-21 20:50:47 +00:00
Artem Belevich 07db5cf6c8 Declare H and H new/delete.
llvm-svn: 284879
2016-10-21 20:34:05 +00:00
Justin Lebar 6c86e9160d [CUDA] When we emit an error that might have been deferred, also print a callstack.
Summary:
Previously, when you did something not allowed in a host+device function
and then caused it to be codegen'ed, we would print out an error telling
you that you did something bad, but we wouldn't tell you how we decided
that the function needed to be codegen'ed.

This change causes us to print out a callstack when emitting deferred
errors.  This is immensely helpful when debugging highly-templated code,
where it's often unclear how a function became known-emitted.

We only print the callstack once per function, after we print the all
deferred errors.

This patch also switches all of our hashtables to using canonical
FunctionDecls instead of regular FunctionDecls.  This prevents a number
of bugs, some of which are caught by tests added here, in which we
assume that two FDs for the same function have the same pointer value.

Reviewers: rnk

Subscribers: cfe-commits, tra

Differential Revision: https://reviews.llvm.org/D25704

llvm-svn: 284647
2016-10-19 21:15:01 +00:00
Justin Lebar 9730ae943f [CUDA] Emit errors for wrong-side calls made on the same line as non-wrong-side calls.
Summary:
This fixes two related bugs:

1) Previously, if you had a non-wrong side call at some source code
location L, we wouldn't emit errors for wrong-side calls that appeared
at L.

2) We'd only emit one wrong-side error per source code location, when we
actually want to emit it twice if we hit this line more than once due to
e.g. template instantiation.

Reviewers: tra

Subscribers: rnk, cfe-commits

Differential Revision: https://reviews.llvm.org/D25702

llvm-svn: 284643
2016-10-19 21:03:38 +00:00
Justin Lebar d3fd70dedd [CUDA] Rework tests now that we emit deferred diagnostics during sema. Test-only change.
Summary:
Previously we had to split out a lot of our tests into a test that
checked only immediate errors and a test that checked only deferred
errors.  This was because, if you emitted any immediate errors, we
wouldn't run codegen, where the deferred errors were emitted.

We've fixed this, and now emit deferred errors during sema.  This lets
us merge a bunch of tests, and lets us convert some other tests to
-fsyntax-only.

Reviewers: tra

Subscribers: cfe-commits

Differential Revision: https://reviews.llvm.org/D25755

llvm-svn: 284553
2016-10-19 00:06:49 +00:00
Justin Lebar d692dfb65e [CUDA] Fix false-positive in known-emitted handling.
Previously: When compiling for host, our constructed call graph went
*through* kernel calls.  This meant that if we had

  host calls kernel calls HD

we would incorrectly mark the HD function as known-emitted on the host
side, and thus perform host-side checks on it.

Fixing this exposed another issue, wherein when marking a function as
known-emitted, we also need to traverse the callgraph of its template,
because non-dependent calls are attached to a function's template, not
its instantiation.

llvm-svn: 284355
2016-10-17 02:25:55 +00:00
Justin Lebar 23d954241b [CUDA] Emit deferred diagnostics during Sema rather than during codegen.
Summary:
Emitting deferred diagnostics during codegen was a hack.  It did work,
but usability was poor, both for us as compiler devs and for users.  We
don't codegen if there are any sema errors, so for users this meant that
they wouldn't see deferred errors if there were any non-deferred errors.
For devs, this meant that we had to carefully split up our tests so that
when we tested deferred errors, we didn't emit any non-deferred errors.

This change moves checking for deferred errors into Sema.  See the big
comment in SemaCUDA.cpp for an overview of the idea.

This checking adds overhead to compilation, because we have to maintain
a partial call graph.  As a result, this change makes deferred errors a
CUDA-only concept (whereas before they were a general concept).  If
anyone else wants to use this framework for something other than CUDA,
we can generalize at that time.

This patch makes the minimal set of test changes -- after this lands,
I'll go back through and do a cleanup of the tests that we no longer
have to split up.

Reviewers: rnk

Subscribers: cfe-commits, rsmith, tra

Differential Revision: https://reviews.llvm.org/D25541

llvm-svn: 284158
2016-10-13 20:52:12 +00:00
Justin Lebar 44f547aa3f [CUDA] Allow static variables in __host__ __device__ functions, so long as they're never codegen'ed for device.
Reviewers: tra, rnk

Subscribers: cfe-commits

Differential Revision: https://reviews.llvm.org/D25150

llvm-svn: 284145
2016-10-13 18:45:17 +00:00
Justin Lebar aa370bd0d7 [CUDA] Disallow __shared__ variables in host functions.
Reviewers: tra, rnk

Subscribers: cfe-commits

Differential Revision: https://reviews.llvm.org/D25143

llvm-svn: 284144
2016-10-13 18:45:13 +00:00
Justin Lebar 179bdce72a [CUDA] Add Sema::CUDADiagBuilder and Sema::CUDADiagIf{Device,Host}Code().
Summary:
Together these let you easily create diagnostics that

 - are never emitted for host code
 - are always emitted for __device__ and __global__ functions, and
 - are emitted for __host__ __device__ functions iff these functions are
   codegen'ed.

At the moment there are only three diagnostics that need this treatment,
but I have more to add, and it's not sustainable to write code for emitting
every such diagnostic twice, and from a special wrapper in SemaCUDA.cpp.

While we're at it, don't emit the function name in
err_cuda_device_exceptions: It's not necessary to print it, and making
this work in the new framework in the face of a null value for
dyn_cast<FunctionDecl>(CurContext) isn't worth the effort.

Reviewers: rnk

Subscribers: cfe-commits, tra

Differential Revision: https://reviews.llvm.org/D25139

llvm-svn: 284143
2016-10-13 18:45:08 +00:00
Artem Belevich 89997ecd8f Added REQUIRED triples to the test that fails on some ARM buildbots.
llvm-svn: 283964
2016-10-12 02:08:08 +00:00
Justin Lebar 0254c46300 [CUDA] Make touching a kernel from a __host__ __device__ function a deferred error.
Previously, this was an immediate, don't pass go, don't collect $200
error.  But this precludes us from writing code like

  __host__ __device__ void launch_kernel() {
    kernel<<<...>>>();
  }

Such code isn't wrong, following our notions of right and wrong in CUDA,
unless it's codegen'ed.

llvm-svn: 283963
2016-10-12 01:30:08 +00:00
Richard Smith f75dcbef20 Aligned allocation versus CUDA: make deallocation function preference order
match other CUDA preference orders, per discussion with jlebar. We now model
this in an attempt to match overload resolution as closely as possible:

- First, we throw out all non-callable (due to CUDA host/device mismatch)
  operator delete functions.
- Then we apply sizedness / alignedness preferences based on whether the type
  is overaligned and whether the deallocation function is a member.
- Finally, we use the CUDA callability preference as a tiebreaker.

llvm-svn: 283830
2016-10-11 00:21:10 +00:00
Justin Lebar 2dfbe9a3b4 [CUDA] Rename cuda_builtin_vars.h to __clang_cuda_builtin_vars.h.
Summary: This matches the idiom we use for our other CUDA wrapper headers.

Reviewers: tra

Subscribers: beanz, mgorny, cfe-commits

Differential Revision: https://reviews.llvm.org/D24978

llvm-svn: 283679
2016-10-08 22:16:08 +00:00
Justin Lebar 9fdb46e71c [CUDA] Do a better job at detecting wrong-side calls.
Summary:
Move CheckCUDACall from ActOnCallExpr and BuildDeclRefExpr to
DiagnoseUseOfDecl.  This lets us catch some edge cases we were missing,
specifically around class operators.

This necessitates a few other changes:

 - Avoid emitting duplicate deferred diags in CheckCUDACall.

   Previously we'd carefully placed our call to CheckCUDACall such that
   it would only ever run once for a particular callsite.  But now this
   isn't the case.

 - Emit deferred diagnostics from a template
   specialization/instantiation's primary template, in addition to from
   the specialization/instantiation itself.  DiagnoseUseOfDecl ends up
   putting the deferred diagnostics on the template, rather than the
   specialization, so we need to check both.

Reviewers: rsmith

Subscribers: cfe-commits, tra

Differential Revision: https://reviews.llvm.org/D24573

llvm-svn: 283637
2016-10-08 01:07:11 +00:00
Justin Lebar e060feb7b1 [CUDA] Disallow overloading destructors.
Summary:
We'd attempted to allow this, but turns out we were doing a very bad
job.  :)

Making this work properly would be a giant change in clang.  For
example, we'd need to make CXXRecordDecl::getDestructor()
context-sensitive, because the destructor you end up with depends on
where you're calling it from.

For now (and hopefully for ever), just disallow overloading of
destructors in CUDA.

Reviewers: rsmith

Subscribers: cfe-commits, tra

Differential Revision: https://reviews.llvm.org/D24571

llvm-svn: 283120
2016-10-03 16:48:23 +00:00
Justin Lebar 281ce2af17 [CUDA] Allow extern __shared__ on empty-length arrays.
"extern __shared__ int x[]" is OK.

llvm-svn: 283068
2016-10-02 15:24:50 +00:00
Justin Lebar e71b2fa4c9 [CUDA] Disallow __constant__ local variables.
Reviewers: tra, rnk

Subscribers: cfe-commits

Differential Revision: https://reviews.llvm.org/D25129

llvm-svn: 282986
2016-09-30 23:57:34 +00:00
Justin Lebar 1041101953 [CUDA] Disallow 'extern __shared__' variables.
Also add a test that we disallow

  __constant__ __shared__ int x;

because it's possible to break this without breaking

  __shared__ __constant__ int x;

Reviewers: rnk

Subscribers: cfe-commits, tra

Differential Revision: https://reviews.llvm.org/D25125

llvm-svn: 282985
2016-09-30 23:57:30 +00:00
Justin Lebar 20614e0d9c [CUDA] Fix implicit-device-lambda.cu after r282911.
This commit added a warning that we're (correctly) hitting in this test.
Just ignore it.

llvm-svn: 282927
2016-09-30 20:17:37 +00:00
Justin Lebar 7ca116cacf [CUDA] Make lambdas inherit __host__ and __device__ attributes from the scope in which they're created.
Summary: NVCC compat.  Fixes bug 30567.

Reviewers: tra

Subscribers: cfe-commits, rnk

Differential Revision: https://reviews.llvm.org/D25105

llvm-svn: 282880
2016-09-30 17:14:53 +00:00
Justin Lebar b17840de33 [CUDA] Disallow variable-length arrays in CUDA device code.
Reviewers: tra

Subscribers: cfe-commits, jhen

Differential Revision: https://reviews.llvm.org/D25050

llvm-svn: 282647
2016-09-28 22:45:58 +00:00
Justin Lebar 2a8db34044 [CUDA] Disallow exceptions in device code.
Reviewers: tra

Subscribers: cfe-commits, jhen

Differential Revision: https://reviews.llvm.org/D25036

llvm-svn: 282646
2016-09-28 22:45:54 +00:00
Konstantin Zhuravlyov 5b48d725a0 [AMDGPU] Expose flat work group size, register and wave control attributes
__attribute__((amdgpu_flat_work_group_size(<min>, <max>))) - request minimum and maximum flat work group size
__attribute__((amdgpu_waves_per_eu(<min>[, <max>]))) - request minimum and/or maximum waves per execution unit

Differential Revision: https://reviews.llvm.org/D24513

llvm-svn: 282371
2016-09-26 01:02:57 +00:00
Justin Lebar 5fd18d17e5 [CUDA] Add test checking our ability to take a function pointer to a __global__ function on the host side.
Summary: This functionality is used by Thrust.

Reviewers: tra

Subscribers: cfe-commits

Differential Revision: https://reviews.llvm.org/D24581

llvm-svn: 281543
2016-09-14 21:50:11 +00:00
Artem Belevich bed18e9cc4 [CUDA] Do not merge CUDA target attributes.
CUDA target attributes are used for function overloading and must not be merged.

This fixes a bug where attributes were inherited during function template
specialization in CUDA and made it impossible for specialized function
to provide its own target attributes.

Differential Revision: https://reviews.llvm.org/D24522

llvm-svn: 281406
2016-09-13 22:16:30 +00:00
Justin Lebar 26bb31123a [CUDA] Fix "declared here" note on deferred wrong-side errors.
Previously we weren't deferring these "declared here" notes, which is
obviously wrong.

llvm-svn: 278767
2016-08-16 00:48:21 +00:00
Justin Lebar 18e2d82297 [CUDA] Raise an error if a wrong-side call is codegen'ed.
Summary:
Some function calls in CUDA are allowed to appear in
semantically-correct programs but are an error if they're ever
codegen'ed.  Specifically, a host+device function may call a host
function, but it's an error if such a function is ever codegen'ed in
device mode (and vice versa).

Previously, clang made no attempt to catch these errors.  For the most
part, they would be caught by ptxas, and reported as "call to unknown
function 'foo'".

Now we catch these errors and report them the same as we report other
illegal calls (e.g. a call from a host function to a device function).

This has a small change in error-message behavior for calls that were
previously disallowed (e.g. calls from a host to a device function).
Previously, we'd catch disallowed calls fairly early, before doing
additional semantic checking e.g. of the call's arguments.  Now we catch
these illegal calls at the very end of our semantic checks, so we'll
only emit a "illegal CUDA call" error if the call is otherwise
well-formed.

Reviewers: tra, rnk

Subscribers: cfe-commits

Differential Revision: https://reviews.llvm.org/D23242

llvm-svn: 278759
2016-08-15 23:00:49 +00:00
Justin Lebar c989c3e784 [CUDA] Reject calls to __device__ functions from host variable global initializers.
Reviewers: tra

Subscribers: cfe-commits

Differential Revision: https://reviews.llvm.org/D23335

llvm-svn: 278196
2016-08-10 01:09:21 +00:00
Justin Lebar 7d078bddbd [CUDA] Print a "previous-decl" note when calling an illegal member fn.
Summary:
When we emit err_ref_bad_target, we should emit a "'method' declared
here" note.  We already do so in most places, just not in
BuildCallToMemberFunction.

Reviewers: tra

Subscribers: cfe-commits

Differential Revision: https://reviews.llvm.org/D23240

llvm-svn: 278195
2016-08-10 01:09:18 +00:00
Justin Lebar 4c2c6fd1c4 [CUDA] Add additional testcases for EraseUnwantedCUDAMatches.
Summary:
Specifically, this patch adds testcases for all three calls to
EraseUnwantedCUDAMatches.  The addr-of-overloaded-fn test I accidentally
neutered in r264207, which moved much of
CodeGenCUDA/function-overload.cu into SemaCUDA/function-overload.cu.
The coverage from overloaded-delete test is new.

Reviewers: tra

Subscribers: cfe-commits

Differential Revision: http://reviews.llvm.org/D21913

llvm-svn: 275232
2016-07-12 23:23:12 +00:00
Justin Lebar d35f706cc2 [CUDA] Don't assume that destructors can't be overloaded.
Summary:
You can overload a destructor in CUDA, and SemaOverload needs to be
tweaked not to crash when it sees an explicit call to an overloaded
destructor.

Reviewers: rsmith

Subscribers: cfe-commits, tra

Differential Revision: http://reviews.llvm.org/D21912

llvm-svn: 275231
2016-07-12 23:23:01 +00:00
Justin Bogner 2d5de7e568 NVPTX: Use the nvvm builtins to read SRegs rather than the legacy ptx ones
The ptx spellings were removed from LLVM in r274769.

llvm-svn: 274770
2016-07-07 16:41:08 +00:00
Artem Belevich bcec9dac14 [CUDA] Add implicit conversion of __launch_bounds__ arguments to rvalue.
Fixes clang crash reported in PR27778.

Differential Revision: http://reviews.llvm.org/D20985

llvm-svn: 271951
2016-06-06 22:54:57 +00:00
Reid Kleckner a769fd50ba Avoid depending on test inputes that aren't in Inputs
Some people have weird CI systems that run each test subdirectory
independently without access to other parallel trees.

Unfortunately, this means we have to suffer some duplication until Art
can sort out how to share these types.

llvm-svn: 270164
2016-05-20 00:38:25 +00:00
Artem Belevich 3650bbeebc [CUDA] Do not allow non-empty destructors for global device-side variables.
According to Cuda Programming guide (v7.5, E2.3.1):
> __device__, __constant__ and __shared__ variables defined in namespace
> scope, that are of class type, cannot have a non-empty constructor or a
> non-empty destructor.

Clang already deals with device-side constructors (see D15305).
This patch enforces similar rules for destructors.

Differential Revision: http://reviews.llvm.org/D20140

llvm-svn: 270108
2016-05-19 20:13:53 +00:00
Artem Belevich 85b6f63f42 [CUDA] Split device-var-init.cu tests into separate Sema and CodeGen parts.
Codegen tests for device-side variable initialization are subset of test
cases used to verify Sema's part of the job.
Including CodeGenCUDA/device-var-init.cu from SemaCUDA makes it easier to
keep both sides in sync.

Differential Revision: http://reviews.llvm.org/D20139

llvm-svn: 270107
2016-05-19 20:13:39 +00:00
Richard Smith 12e7931d0b Add support for derived class special members hiding functions brought in from
a base class via a using-declaration. If a class has a using-declaration
declaring either a constructor or an assignment operator, eagerly declare its
special members in case they need to displace a shadow declaration from a
using-declaration.

llvm-svn: 269398
2016-05-13 06:47:56 +00:00
Justin Lebar ba122ab42f [CUDA] Make unattributed constexpr functions implicitly host+device.
With this patch, by a constexpr function is implicitly host+device
unless:

 a) it's a variadic function (variadic functions are not allowed on the
    device side), or
 b) it's preceeded by a __device__ overload in a system header.

The restriction on overloading __host__ __device__ functions on the
basis of their CUDA attributes remains in place, but we use (b) to allow
us to define __device__ overloads for constexpr functions in cmath,
which would otherwise be __host__ __device__ and thus not overloadable.

You can disable this behavior with -fno-cuda-host-device-constexpr.

Reviewers: tra, rnk, rsmith

Subscribers: cfe-commits

Differential Revision: http://reviews.llvm.org/D18380

llvm-svn: 264964
2016-03-30 23:30:21 +00:00
Justin Lebar 25c4a81e79 [CUDA] Remove three obsolete CUDA cc1 flags.
Summary:
* -fcuda-target-overloads

  Previously unconditionally set to true by the driver.  Necessary for
  correct functioning of the compiler -- our CUDA headers wrapper won't
  compile without this.

* -fcuda-disable-target-call-checks

  Previously unconditionally set to true by the driver.  Necessary to
  compile almost any external CUDA code -- almost all libraries assume
  that host+device code can call host or device functions.

* -fcuda-allow-host-calls-from-host-device

  No effect when target overloading is enabled.

Reviewers: tra

Subscribers: rsmith, cfe-commits

Differential Revision: http://reviews.llvm.org/D18416

llvm-svn: 264739
2016-03-29 16:24:16 +00:00
Justin Lebar e5eed04d52 [CUDA] Merge most of CodeGenCUDA/function-overload.cu into SemaCUDA/function-overload.cu.
Summary:
Previously we were using the codegen test to ensure that we choose the
right overload.  But we can do this within sema, with a bit of
cleverness.

I left the constructor/destructor checks in CodeGen, because these
overloads (particularly on the destructors) are hard to check in Sema.

Reviewers: tra

Subscribers: cfe-commits

Differential Revision: http://reviews.llvm.org/D18386

llvm-svn: 264207
2016-03-23 22:42:30 +00:00
Justin Lebar e82caa3055 [CUDA] Simplify SemaCUDA/function-overload.cu test.
Summary:
Principally, don't hardcode the line numbers of various notes.  This
lets us make changes to the test without recomputing linenos everywhere.

Instead, just tell -verify that we may get 0 or more notes pointing to
the relevant function definitions.  Checking that we get exactly the
right note isn't so important (and anyway is checked elsewhere).

Reviewers: tra

Subscribers: cfe-commits

Differential Revision: http://reviews.llvm.org/D18385

llvm-svn: 264206
2016-03-23 22:42:28 +00:00
Justin Lebar d33adadb0e [CUDA] Don't allow templated variadic functions.
Reviewers: tra

Subscribers: cfe-commits

Differential Revision: http://reviews.llvm.org/D18373

llvm-svn: 264106
2016-03-22 22:06:19 +00:00
Artem Belevich 1ef9b59284 [CUDA] do not allow attribute-based overloading for __global__ functions.
__global__ functions are present on both host and device side,
so providing __host__ or __device__ overloads is not going to
do anything useful.

llvm-svn: 261778
2016-02-24 21:54:45 +00:00
Justin Lebar 048a4c1ca9 [CUDA] Don't specify exact line numbers in cuda-builtin-vars.cu.
This makes the test less fragile to changes to cuda_builtin_vars.h.

Test-only change.

llvm-svn: 261775
2016-02-24 21:49:30 +00:00
Artem Belevich 186091094a [CUDA] Tweak attribute-based overload resolution to match nvcc behavior.
This is an artefact of split-mode CUDA compilation that we need to
mimic. HD functions are sometimes allowed to call H or D functions. Due
to split compilation mode device-side compilation will not see host-only
function and thus they will not be considered at all. For clang both H
and D variants will become function overloads visible to
compiler. Normally target attribute is considered only if C++ rules can
not determine which function is better. However in this case we need to
ignore functions that would not be present during current compilation
phase before we apply normal overload resolution rules.

Changes:
* introduced another level of call preference to better describe
  possible call combinations.
* removed WrongSide functions from consideration if the set contains
  SameSide function.
* disabled H->D, D->H and G->H calls. These combinations are
  not allowed by CUDA and we were reluctantly allowing them to work
  around device-side calls to math functions in std namespace.
  We no longer need it after r258880.

Differential Revision: http://reviews.llvm.org/D16870

llvm-svn: 260697
2016-02-12 18:29:18 +00:00
Justin Lebar 1eac5948db [CUDA] Add -fcuda-allow-variadic-functions.
Summary:
Turns out the variadic function checking added in r258643 was too strict
for some existing users; give them an escape valve.  When
-fcuda-allow-variadic-functions is passed, the front-end makes no
attempt to disallow C-style variadic functions.  Calls to va_arg are
still not allowed.

Reviewers: tra

Subscribers: cfe-commits, jhen, echristo, bkramer

Differential Revision: http://reviews.llvm.org/D16559

llvm-svn: 258822
2016-01-26 17:47:20 +00:00
Justin Lebar e48cd6c530 [CUDA] Disallow variadic functions other than printf in device code.
Reviewers: tra

Subscribers: cfe-commits, echristo, jhen

Differential Revision: http://reviews.llvm.org/D16484

llvm-svn: 258643
2016-01-23 21:28:17 +00:00
Justin Lebar a8f0254bc1 [CUDA] Reject the alias attribute in CUDA device code.
Summary: CUDA (well, strictly speaking, NVPTX) doesn't support aliases.

Reviewers: echristo

Subscribers: cfe-commits, jhen, tra

Differential Revision: http://reviews.llvm.org/D16502

llvm-svn: 258641
2016-01-23 21:28:10 +00:00
Justin Lebar 6644e366b0 [CUDA] Bail, rather than crash, on va_arg in device code.
Reviewers: tra

Subscribers: echristo, jhen, cfe-commits

Differential Revision: http://reviews.llvm.org/D16331

llvm-svn: 258264
2016-01-20 00:27:00 +00:00
Justin Lebar c66a10652a [CUDA] Only allow __global__ on free functions and static member functions.
Summary:
Warn for NVCC compatibility if you declare a static member function or
inline function as __global__.

Reviewers: tra

Subscribers: jhen, echristo, cfe-commits

Differential Revision: http://reviews.llvm.org/D16261

llvm-svn: 258263
2016-01-20 00:26:57 +00:00
Justin Lebar f8bdacbadc [CUDA] Warn undeclared identifiers in CUDA kernel calls
Value, type, and instantiation dependence were not being handled
correctly for CUDAKernelCallExpr AST nodes. As a result, if an
undeclared identifier was used in the triple-angle-bracket kernel call
configuration, there would be no error during parsing, and there would
be a crash during code gen. This patch makes sure that an error will be
issued during parsing in this case, just as there would be for any other
use of an undeclared identifier in C++.

Patch by Jason Henline.

Reviewers: jlebar, rsmith

Differential Revision: http://reviews.llvm.org/D15858

llvm-svn: 257839
2016-01-14 23:31:30 +00:00
Justin Lebar 3eaaf86397 [CUDA] Report an error if code tries to mix incompatible CUDA attributes.
Summary: Thanks to jhen for helping me figure this out.

Reviewers: tra, echristo

Subscribers: jhen

Differential Revision: http://reviews.llvm.org/D16129

llvm-svn: 257554
2016-01-13 01:07:35 +00:00
Akira Hatanaka 8c26ea663d Produce a better diagnostic for global register variables.
Currently, when there is a global register variable in a program that
is bound to an invalid register, clang/llvm prints an error message that
is not very user-friendly.

This commit improves the diagnostic and moves the check that used to be
in the backend to Sema. In addition, it makes changes to error out if
the size of the register doesn't match the declared variable size.

e.g., volatile register int B asm ("rbp");

rdar://problem/23084219

Differential Revision: http://reviews.llvm.org/D13834

llvm-svn: 253405
2015-11-18 00:15:28 +00:00
Artem Belevich 5e2a3ecd48 [CUDA] use -aux-triple to pass target triple of opposite side of compilation
Clang needs to know target triple for both sides of compilation so that
preprocessor macros and target builtins from both sides are available.

This change augments Compilation class to carry information about
toolchains used during different CUDA compilation passes and refactors
BuildActions to use it when it constructs CUDA jobs.

Removed DeviceTriple from CudaHostAction/CudaDeviceAction as it's no
longer needed.

Differential Revision: http://reviews.llvm.org/D13144

llvm-svn: 253385
2015-11-17 22:28:40 +00:00
Artem Belevich b5bc923af4 [CUDA] Allow parsing of host and device code simultaneously.
* adds -aux-triple option to specify target triple
 * propagates aux target info to AST context and Preprocessor
 * pulls in target specific preprocessor macros.
 * pulls in target-specific builtins from aux target.
 * sets appropriate host or device attribute on builtins.

Differential Revision: http://reviews.llvm.org/D12917

llvm-svn: 248299
2015-09-22 17:23:22 +00:00
Artem Belevich 9674a64cd9 [CUDA] Add appropriate host/device attribute to builtins.
The changes are part of attribute-based CUDA function overloading (D12453)
and as such are only enabled when it's in effect (-fcuda-target-overloads).

Differential Revision: http://reviews.llvm.org/D12122

llvm-svn: 248296
2015-09-22 17:23:05 +00:00
Artem Belevich 94a55e8169 [CUDA] Allow function overloads in CUDA based on host/device attributes.
The patch makes it possible to parse CUDA files that contain host/device
functions with identical signatures, but different attributes without
having to physically split source into host-only and device-only parts.

This change is needed in order to parse CUDA header files that have
a lot of name clashes with standard include files.

Gory details are in design doc here: https://goo.gl/EXnymm
Feel free to leave comments there or in this review thread.

This feature is controlled with CC1 option -fcuda-target-overloads
and is disabled by default.

Differential Revision: http://reviews.llvm.org/D12453

llvm-svn: 248295
2015-09-22 17:22:59 +00:00
Artem Belevich 5ef02c2db7 [CUDA] Check register names on appropriate side of cuda compilation only.
Differential Revision: http://reviews.llvm.org/D11950

llvm-svn: 246193
2015-08-27 19:54:21 +00:00