Summary:
- Even though the bindless surface/texture interfaces are promoted,
there are still code using surface/texture references. For example,
[PR#26400](https://bugs.llvm.org/show_bug.cgi?id=26400) reports the
compilation issue for code using `tex2D` with texture references. For
better compatibility, this patch proposes the support of
surface/texture references.
- Due to the absent documentation and magic headers, it's believed that
`nvcc` does use builtins for texture support. From the limited NVVM
documentation[^nvvm] and NVPTX backend texture/surface related
tests[^test], it's believed that surface/texture references are
supported by replacing their reference types, which are annotated with
`device_builtin_surface_type`/`device_builtin_texture_type`, with the
corresponding handle-like object types, `cudaSurfaceObject_t` or
`cudaTextureObject_t`, in the device-side compilation. On the host
side, that global handle variables are registered and will be
established and updated later when corresponding binding/unbinding
APIs are called[^bind]. Surface/texture references are most like
device global variables but represented in different types on the host
and device sides.
- In this patch, the following changes are proposed to support that
behavior:
+ Refine `device_builtin_surface_type` and
`device_builtin_texture_type` attributes to be applied on `Type`
decl only to check whether a variable is of the surface/texture
reference type.
+ Add hooks in code generation to replace that reference types with
the correponding object types as well as all accesses to them. In
particular, `nvvm.texsurf.handle.internal` should be used to load
object handles from global reference variables[^texsurf] as well as
metadata annotations.
+ Generate host-side registration with proper template argument
parsing.
---
[^nvvm]: https://docs.nvidia.com/cuda/pdf/NVVM_IR_Specification.pdf
[^test]: https://raw.githubusercontent.com/llvm/llvm-project/master/llvm/test/CodeGen/NVPTX/tex-read-cuda.ll
[^bind]: See section 3.2.11.1.2 ``Texture reference API` in [CUDA C Programming Guide](https://docs.nvidia.com/cuda/pdf/CUDA_C_Programming_Guide.pdf).
[^texsurf]: According to NVVM IR, `nvvm.texsurf.handle` should be used. But, the current backend doesn't have that supported. We may revise that later.
Reviewers: tra, rjmccall, yaxunl, a.sidorin
Subscribers: cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D76365
This patch removes the explicit call graph for CUDA/HIP/OpenMP deferred
diagnostics generated during parsing since it is error prone due to
incomplete information about function declarations during parsing. In stead,
this patch does a post-parsing AST traverse and emits deferred diagnostics
based on the use graph implicitly generated during the traverse.
Differential Revision: https://reviews.llvm.org/D70172
norecurse function attr indicates the function is not called recursively
directly or indirectly.
Add norecurse to OpenCL functions, SYCL functions in device compilation
and CUDA/HIP kernels.
Although there is LLVM pass adding norecurse to functions, it only works
for whole-program compilation. Also FE adding norecurse can make that
pass run faster since functions with norecurse do not need to be checked
again.
Differential Revision: https://reviews.llvm.org/D73651
CUDA/HIP program may be compiled with -fopenmp. In this case, -fopenmp is only passed to host compilation
to take advantages of multi-threads computation.
CUDA/HIP and OpenMP both use Sema::DeviceCallGraph to store functions to be analyzed and remove them
once they decide the function is sure to be emitted. CUDA/HIP and OpenMP have different functions to determine
if a function is sure to be emitted.
To check host/device correctly for CUDA/HIP when -fopenmp is enabled, there needs a unified logic to determine
whether a function is to be emitted. The logic needs to be aware of both CUDA and OpenMP logic.
Differential Revision: https://reviews.llvm.org/D67837
llvm-svn: 374263
This matches how GCC handles it, see e.g. https://gcc.godbolt.org/z/HPplnl.
GCC documents the gnu_inline attribute with "In C++, this attribute does
not depend on extern in any way, but it still requires the inline keyword
to enable its special behavior."
The previous behaviour of gnu_inline in C++, without the extern
keyword, can be traced back to the original commit that added
support for gnu_inline, SVN r69045.
Differential Revision: https://reviews.llvm.org/D67414
llvm-svn: 373078
Summary:
- Even though only `void` is still accepted as the deduced return type,
enabling deduction/instantiation on the return type allows more
consistent coding.
Reviewers: tra, jlebar
Subscribers: cfe-commits, yaxunl
Tags: #clang
Differential Revision: https://reviews.llvm.org/D68031
llvm-svn: 372898
Clang does not respect the explicit device host attributes of defaulted special members.
Also clang does not respect the hostness of special members determined by their
first declarations.
Clang also adds duplicate implicit device or host attributes in certain cases.
This patch fixes that.
Differential Revision: https://reviews.llvm.org/D67509
llvm-svn: 372394
- r372318 causes violation of `use-of-uninitialized-value` detected by
MemorySanitizer. Once `Viable` field is set to false, `FailureKind`
needs setting as well as it will be checked during destruction if
`Viable` is not true.
- Revert the part trying to skip `std::vector` erasing.
llvm-svn: 372356
Summary:
- Should consider viable ones only when checking SameSide candidates.
- Replace erasing with clearing viable flag to reduce data
moving/copying.
- Add one and revise another one as the diagnostic message are more
relevant compared to previous one.
Reviewers: tra
Subscribers: cfe-commits, yaxunl
Tags: #clang
Differential Revision: https://reviews.llvm.org/D67730
llvm-svn: 372318
As discussed in D64780 the wording of this warning message is being
changed to say 'is not supported' instead of 'ignored', and the
diag ID itself is being changed to warn_cconv_not_supported.
llvm-svn: 366368
This patch introduces support of hip_pinned_shadow variable for HIP.
A hip_pinned_shadow variable is a global variable with attribute hip_pinned_shadow.
It has external linkage on device side and has no initializer. It has internal
linkage on host side and has initializer or static constructor. It can be accessed
in both device code and host code.
This allows HIP runtime to implement support of HIP texture reference.
Differential Revision: https://reviews.llvm.org/D62738
llvm-svn: 364381
After https://reviews.llvm.org/rL355317 we noticed that quite a decent
amount of code redeclares builtins (memcpy in particular, I believe
reduced from an MSVC header) with a calling convention specified.
This gets particularly troublesome when the user specifies a new
'default' calling convention on the command line.
When looking to add a diagnostic for this case, it was noticed that we
had 3 other diagnostics that differed only slightly. This patch ALSO
unifies those under a 'select'. Unfortunately, the order of words in
ONE of these diagnostics was reversed ("'thiscall' calling convention"
vs "calling convention 'thiscall'"), so this patch also standardizes on
the former.
Differential Revision: https://reviews.llvm.org/D59560
Change-Id: I79f99fe7c2301640755ffdd774b46eb44526bb22
llvm-svn: 356663
If a kernel template has a function as its template parameter, a device function should be
allowed as template argument since a kernel can call a device function. However,
currently if the kernel template is instantiated in a host function, clang will emit an error
message saying the device function is an invalid candidate for the template parameter.
This happens because clang checks the reference to the device function during parsing
the template arguments. At this point, the template is not instantiated yet. Clang incorrectly
assumes the device function is called by the host function and emits the error message.
This patch fixes the issue by disabling checking of device function during parsing template
arguments and deferring the check to the instantion of the template. At that point, the
template decl is already available, therefore the check can be done against the instantiated
function template decl.
Differential Revision: https://reviews.llvm.org/D56411
llvm-svn: 355421
MSVC header files using vectorcall to differentiate overloaded functions, which
causes failure for AMDGPU target. This is because clang does not check function
calling convention based on function target.
This patch checks calling convention using the proper target info.
Differential Revision: https://reviews.llvm.org/D57716
llvm-svn: 354929
statements.
If the assembler instruction is not generated and the delayed diagnostic
is emitted, we may end up with extra warning message for variables used
in the asm statement. Since the asm statement is not built, the
variables may be left non-referenced and it may produce a warning about
a use of the non-initialized variables.
llvm-svn: 354928
Adapted targetDiag for the CUDA and used for the delayed diagnostics in
asm constructs. Works for both host and device compilation sides.
Differential Revision: https://reviews.llvm.org/D58463
llvm-svn: 354671
Summary:
Adapted targetDiag for the CUDA and used for the delayed diagnostics in
asm constructs. Works for both host and device compilation sides.
Reviewers: tra, jlebar
Subscribers: jdoerfert, cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D58463
llvm-svn: 354593
Instead of calling CUDA runtime to arrange function arguments,
the new API constructs arguments in a local array and the kernels
are launched with __cudaLaunchKernel().
The old API has been deprecated and is expected to go away
in the next CUDA release.
Differential Revision: https://reviews.llvm.org/D57488
llvm-svn: 352799
In 64 bit MSVC environment size_t is defined as unsigned long long.
In single source language like HIP, data layout should be consistent
in device and host compilation, therefore copy data layout controlling
fields from Aux target for AMDGPU target.
Differential Revision: https://reviews.llvm.org/D56318
llvm-svn: 352620
r352221 caused regressions in CUDA/HIP since device function may use _Float16 whereas host does not support it.
In this case host compilation should not diagnose usage of _Float16 in device functions or variables.
For now just do not diagnose _Float16 for CUDA/HIP. In the future we should have more precise check.
Differential Revision: https://reviews.llvm.org/D57369
llvm-svn: 352488
ShouldDeleteSpecialMember is called upon inherited constructors.
It calls inferCUDATargetForImplicitSpecialMember.
Normally the special member enum passed to ShouldDeleteSpecialMember
matches the constructor. However this is not true when inherited
constructor is passed, where DefaultConstructor is passed to treat
the inherited constructor as DefaultConstructor. However
inferCUDATargetForImplicitSpecialMember expects the special
member enum argument to match the constructor, which results
in assertion when this expection is not satisfied.
This patch checks whether the constructor is inherited. If true it will
get the real special member enum for the constructor and pass it
to inferCUDATargetForImplicitSpecialMember.
Differential Revision: https://reviews.llvm.org/D51809
llvm-svn: 344057
This patch renames -f{no-}cuda-rdc to -f{no-}gpu-rdc and keeps the original
options as aliases. When -fgpu-rdc is off,
clang will assume the device code in each translation unit does not call
external functions except those in the device library, therefore it is possible
to compile the device code in each translation unit to self-contained kernels
and embed them in the host object, so that the host object behaves like
usual host object which can be linked by lld.
The benefits of this feature is: 1. allow users to create static libraries which
can be linked by host linker; 2. amortized device code linking time.
This patch modifies HIP action builder to insert actions for linking device
code and generating HIP fatbin, and pass HIP fatbin to host backend action.
It extracts code for constructing command for generating HIP fatbin as
a function so that it can be reused by early finalization. It also modifies
codegen of HIP host constructor functions to embed the device fatbin
when it is available.
Differential Revision: https://reviews.llvm.org/D52377
llvm-svn: 343611
render the function deleted instead of rendering the program ill-formed.
This change also adds an enabled-by-default warning for the case where
an explicitly-defaulted special member function of a non-template class
is implicitly deleted by the type checking rules. (This fires either due
to this language change or due to pre-C++20 reasons for the member being
implicitly deleted). I've tested this on a large codebase and found only
bugs (where the program means something that's clearly different from
what the programmer intended), so this is enabled by default, but we
should revisit this if there are problems with this being enabled by
default.
llvm-svn: 343285
Previously clang considered function variants from both sides of
compilation and that resulted in picking up wrong deallocation function.
Differential Revision: https://reviews.llvm.org/D51808
llvm-svn: 342749
When compiling CUDA or OpenMP device code Clang parses header files
that expect certain predefined macros from the host architecture. To
make this work the compiler passes the host triple via the -aux-triple
argument and (until now) pulls in all macros for that "auxiliary triple"
unconditionally.
However this results in defines like __SSE_MATH__ that will trigger
inline assembly making use of the "advertised" target features. See
the discussion of D47849 and PR38464 for a detailed explanation of
the encountered problems.
Instead of blacklisting "known bad" examples this patch starts adding
defines that are needed for certain headers like bits/wordsize.h and
bits/mathinline.h.
The disadvantage of this approach is that it decouples the definitions
from their target toolchain. However in my opinion it's more important
to keep definitions for one header close together. For one this will
include a clear documentation why these particular defines are needed.
Furthermore it simplifies maintenance because adding defines for a new
header or support for a new aux-triple only needs to touch one piece
of code.
Differential Revision: https://reviews.llvm.org/D50845
llvm-svn: 340681
CUDA 8.0 E.3.9.4 says: Within the body of a __device__ or __global__
function, only __shared__ variables or variables without any device
memory qualifiers may be declared with static storage class.
It is unclear how a function-scope non-const static variable
without device memory qualifier is implemented, therefore only static
const variable without device memory qualifier is allowed, which
can be emitted as a global variable in constant address space.
Currently clang only allows function-scope static variable with
__shared__ qualifier.
This patch also allows function-scope static const variable without
device memory qualifier and emits it as a global variable in constant
address space.
Differential Revision: https://reviews.llvm.org/D49931
llvm-svn: 338188
There are HIP applications e.g. Tensorflow 1.3 using amdgpu kernel attributes, however
currently they are only allowed on OpenCL kernel functions.
This patch will allow amdgpu kernel attributes to be applied to CUDA/HIP __global__
functions.
Differential Revision: https://reviews.llvm.org/D47958
llvm-svn: 334561
We were already performing checks on non-template variables,
but the checks on templated ones were missing.
Differential Revision: https://reviews.llvm.org/D45231
llvm-svn: 334143
Summary:
Previously this triggered a -Wundefined-internal warning. But it's not
an undefined variable -- any variable of this form is a pointer to the
base of GPU core's shared memory.
Reviewers: tra
Subscribers: sanjoy, rsmith
Differential Revision: https://reviews.llvm.org/D46782
llvm-svn: 332621
Found via codespell -q 3 -I ../clang-whitelist.txt
Where whitelist consists of:
archtype
cas
classs
checkk
compres
definit
frome
iff
inteval
ith
lod
methode
nd
optin
ot
pres
statics
te
thru
Patch by luzpaz! (This is a subset of D44188 that applies cleanly with a few
files that have dubious fixes reverted.)
Differential revision: https://reviews.llvm.org/D44188
llvm-svn: 329399
We were already performing checks on non-template variables,
but the checks on templated ones were missing.
Differential Revision: https://reviews.llvm.org/D45231
llvm-svn: 329127
The diagnostic system for Clang can already handle many AST nodes. Instead
of converting them to strings first, just hand the AST node directly to
the diagnostic system and let it handle the output. Minor changes in some
diagnostic output.
llvm-svn: 328688
Launching a kernel from the host code does not generate code for the
kernel itself. This fixes an issue with clang erroneously reporting
an error for a HD->D call from within the kernel.
Differential Revision: https://reviews.llvm.org/D44837
llvm-svn: 328362
According to the CUDA Programming Guide this is prohibited in
whole program compilation mode. This makes sense because external
references cannot be satisfied in that mode anyway. However,
such variables are allowed in separate compilation mode which
is a valid use case.
Differential Revision: https://reviews.llvm.org/D42923
llvm-svn: 325136
This fixes erroneously reported CUDA compilation errors
in host-side code during device-side compilation.
I've also restricted OpenMP-specific checks to trigger only
if we're compiling with OpenMP enabled.
Differential Revision: https://reviews.llvm.org/D40275
llvm-svn: 319201
This also clarifies some terminology used by the diagnostic (methods -> Objective-C methods, fields -> non-static data members, etc).
Many of the tests needed to be updated in multiple places for the diagnostic wording tweaks. The first instance of the diagnostic for that attribute is fully specified and subsequent instances cut off the complete list (to make it easier if additional subjects are added in the future for the attribute).
llvm-svn: 319002
Summary:
That is, instead of "1 error generated", we now say "1 error generated
when compiling for sm_35".
This (partially) solves a usability foogtun wherein e.g. users call a
function that's only defined on sm_60 when compiling for sm_35, and they
get an unhelpful error message.
Reviewers: tra
Subscribers: sanjoy, cfe-commits
Differential Revision: https://reviews.llvm.org/D37548
llvm-svn: 312736