For -fgpu-rdc mode, static device vars in different TU's may have the same name.
To support accessing file-scope static device variables in host code, we need to give them
a distinct name and external linkage. This can be done by postfixing each static device variable with
a distinct CUID (Compilation Unit ID) hash.
Since the static device variables have different name across compilation units, now we let
them have external linkage so that they can be looked up by the runtime.
Reviewed by: Artem Belevich, and Jon Chesterfield
Differential Revision: https://reviews.llvm.org/D85223
Defer constant checking of dependent initializer to template instantiation
since it cannot be done for dependent values.
Reviewed by: Artem Belevich
Differential Revision: https://reviews.llvm.org/D95840
This patch implements codegen for __managed__ variable attribute for HIP.
Diagnostics will be added later.
Differential Revision: https://reviews.llvm.org/D94814
Defaulted destructor was treated inconsistently, compared to other
compiler-generated functions.
When Sema::IdentifyCUDATarget() got called on just-created dtor which didn't
have implicit __host__ __device__ attributes applied yet, it would treat it as a
host function. That happened to (sometimes) hide the error when dtor referred
to a host-only functions.
Even when we had identified defaulted dtor as a HD function, we still treated it
inconsistently during selection of usual deallocators, where we did not allow
referring to wrong-side functions, while it is allowed for other HD functions.
This change brings handling of defaulted dtors in line with other HD functions.
Differential Revision: https://reviews.llvm.org/D94732
`isCUDADeviceBuiltinSurfaceType()`/`isCUDADeviceBuiltinTextureType()` do not
work on dependent types as they rely on specific type attributes.
Differential Revision: https://reviews.llvm.org/D92893
This patch implements correct hostness based overloading resolution
in isBetterOverloadCandidate.
Based on hostness, if one candidate is emittable whereas the other
candidate is not emittable, the emittable candidate is better.
If both candidates are emittable, or neither is emittable based on hostness, then
other rules should be used to determine which is better. This is because
hostness based overloading resolution is mostly for determining
viability of a function. If two functions are both viable, other factors
should take precedence in preference.
If other rules cannot determine which is better, CUDA preference will be
used again to determine which is better.
However, correct hostness based overloading resolution
requires overloading resolution diagnostics to be deferred,
which is not on by default. The rationale is that deferring
overloading resolution diagnostics may hide overloading reslolutions
issues in header files.
An option -fgpu-exclude-wrong-side-overloads is added, which is off by
default.
When -fgpu-exclude-wrong-side-overloads is off, keep the original behavior,
that is, exclude wrong side overloads only if there are same side overloads.
This may result in incorrect overloading resolution when there are no
same side candates, but is sufficient for most CUDA/HIP applications.
When -fgpu-exclude-wrong-side-overloads is on, enable deferring
overloading resolution diagnostics and enable correct hostness
based overloading resolution, i.e., always exclude wrong side overloads.
Differential Revision: https://reviews.llvm.org/D80450
This patch diagnoses invalid references of global host variables in device,
global, or host device functions.
Differential Revision: https://reviews.llvm.org/D91281
callee in constant evaluation.
We previously made a deep copy of function parameters of class type when
passing them, resulting in the destructor for the parameter applying to
the original argument value, ignoring any modifications made in the
function body. This also meant that the 'this' pointer of the function
parameter could be observed changing between the caller and the callee.
This change completely reimplements how we model function parameters
during constant evaluation. We now model them roughly as if they were
variables living in the caller, albeit with an artificially reduced
scope that covers only the duration of the function call, instead of
modeling them as temporaries in the caller that we partially "reparent"
into the callee at the point of the call. This brings some minor
diagnostic improvements, as well as significantly reduced stack usage
during constant evaluation.
callee in constant evaluation.
We previously made a deep copy of function parameters of class type when
passing them, resulting in the destructor for the parameter applying to
the original argument value, ignoring any modifications made in the
function body. This also meant that the 'this' pointer of the function
parameter could be observed changing between the caller and the callee.
This change completely reimplements how we model function parameters
during constant evaluation. We now model them roughly as if they were
variables living in the caller, albeit with an artificially reduced
scope that covers only the duration of the function call, instead of
modeling them as temporaries in the caller that we partially "reparent"
into the callee at the point of the call. This brings some minor
diagnostic improvements, as well as significantly reduced stack usage
during constant evaluation.
callee in constant evaluation.
We previously made a deep copy of function parameters of class type when
passing them, resulting in the destructor for the parameter applying to
the original argument value, ignoring any modifications made in the
function body. This also meant that the 'this' pointer of the function
parameter could be observed changing between the caller and the callee.
This change completely reimplements how we model function parameters
during constant evaluation. We now model them roughly as if they were
variables living in the caller, albeit with an artificially reduced
scope that covers only the duration of the function call, instead of
modeling them as temporaries in the caller that we partially "reparent"
into the callee at the point of the call. This brings some minor
diagnostic improvements, as well as significantly reduced stack usage
during constant evaluation.
In CUDA/HIP a function may become implicit host device function by
pragma or constexpr. A host device function is checked in both
host and device compilation. However it may be emitted only
on host or device side, therefore the diagnostics should be
deferred until it is known to be emitted.
Currently clang is only able to defer certain diagnostics. This causes
false alarms and limits the usefulness of host device functions.
This patch lets clang defer all overloading resolution diagnostics for host device functions.
An option -fgpu-defer-diag is added to control this behavior. By default
it is off.
It is NFC for other languages.
Differential Revision: https://reviews.llvm.org/D84364
When a device function calls a host function or vice versa, this is wrong-sided
reference. Currently clang immediately diagnose it. This is different from nvcc
behavior, where it is diagnosed only if the function is really emitted.
Current clang behavior causes false alarms for valid use cases.
This patch let clang always defer diagnostics for wrong-sided
reference.
Differential Revision: https://reviews.llvm.org/D83893
This patch let lambda be host device by default and adds diagnostics for
capturing host variable by reference in device lambda.
Differential Revision: https://reviews.llvm.org/D78655
This reverts commit 263390d4f5.
This can still cause bogus errors:
eigen3/Eigen/src/Core/CoreEvaluators.h:94:38: error: call to implicitly-deleted copy constructor of 'unary_evaluator<Eigen::Inverse<Eigen::Matrix<double, 4, 4, 0, 4, 4>>>'
thrust/system/detail/generic/for_each.h:49:3: error: implicit instantiation of undefined template
'thrust::detail::STATIC_ASSERTION_FAILURE<false>'
recommit e03394c6a6 with fix
When implicit HD function calls a function in device compilation,
if one candidate is an implicit HD function, current resolution rule is:
D wins over HD and H
HD and H are equal
this caused regression when there is an otherwise worse D candidate
This patch changes that to
D, HD and H are all equal
The rationale is that we already know for host compilation there is already
a valid candidate in HD and H candidates that will not cause error. Allowing
HD and H gives us a fall back candidate that will not cause error. If D wins,
that means D has to be a better match otherwise, therefore D should also
be a valid candidate that will not cause error. In this way, we can guarantee
no regression.
Differential Revision: https://reviews.llvm.org/D80450
constexpr variables are compile time constants and implicitly const, therefore
they are safe to emit on both device and host side. Besides, in many cases
they are intended for both device and host, therefore it makes sense
to emit them on both device and host sides if necessary.
In most cases constexpr variables are used as rvalue and the variables
themselves do not need to be emitted. However if their address is taken,
then they need to be emitted.
For C++14, clang is able to handle that since clang emits them with
available_externally linkage together with the initializer.
However for C++17, the constexpr static data member of a class or template class
become inline variables implicitly. Therefore they become definitions with
linkonce_odr or weak_odr linkages. As such, they can not have available_externally
linkage.
This patch fixes that by adding implicit constant attribute to
file scope constexpr variables and constexpr static data members
in device compilation.
Differential Revision: https://reviews.llvm.org/D79237
recommit c77a4078e0 with fix
https://reviews.llvm.org/D77954 caused regressions due to diagnostics in implicit
host device functions.
For now, it seems the most feasible workaround is to treat implicit host device function and explicit host
device function differently. Basically in device compilation for implicit host device functions, keep the
old behavior, i.e. give host device candidates and wrong-sided candidates equal preference. For explicit
host device functions, favor host device candidates against wrong-sided candidates.
The rationale is that explicit host device functions are blessed by the user to be valid host device functions,
that is, they should not cause diagnostics in both host and device compilation. If diagnostics occur, user is
able to fix them. However, there is no guarantee that implicit host device function can be compiled in
device compilation, therefore we need to preserve its overloading resolution in device compilation.
Differential Revision: https://reviews.llvm.org/D79526
union ctor does not call ctors of its data members. union dtor does not call dtors of its data members.
Also union does not have base class.
Currently when clang checks whether union has an empty ctor/dtor, it checks the ctors/dtors of its
data members. This causes incorrectly diagnose device side global variables and shared variables as
having non-empty ctors/dtors.
This patch fixes that.
Differential Revision: https://reviews.llvm.org/D79367
https://reviews.llvm.org/D77954 caused a regression about ambiguity of new operator
in file scope.
This patch recovered the previous behavior for comparison without a caller.
This is a workaround. For real fix we need D71227
https://reviews.llvm.org/D78970
Currently clang fails to compile the following CUDA program in device compilation:
__host__ int foo(int x) {
return 1;
}
template<class T>
__device__ __host__ int foo(T x) {
return 2;
}
__device__ __host__ int bar() {
return foo(1);
}
__global__ void test(int *a) {
*a = bar();
}
This is due to foo is resolved to the __host__ foo instead of __device__ __host__ foo.
This seems to be a bug since __device__ __host__ foo is a viable callee for foo whereas
clang is unable to choose it.
This patch fixes that.
Differential Revision: https://reviews.llvm.org/D77954
Summary:
- Use `device_builtin_surface` and `device_builtin_texture` for
surface/texture reference support. So far, both the host and device
use the same reference type, which could be revised later when
interface/implementation is stablized.
Reviewers: yaxunl
Subscribers: cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D77583
Move function emitDeferredDiags from Sema to DeferredDiagsEmitter since it
is only used by DeferredDiagsEmitter.
Also skip visited functions to avoid exponential compile time.
Differential Revision: https://reviews.llvm.org/D77028
Summary:
- Even though the bindless surface/texture interfaces are promoted,
there are still code using surface/texture references. For example,
[PR#26400](https://bugs.llvm.org/show_bug.cgi?id=26400) reports the
compilation issue for code using `tex2D` with texture references. For
better compatibility, this patch proposes the support of
surface/texture references.
- Due to the absent documentation and magic headers, it's believed that
`nvcc` does use builtins for texture support. From the limited NVVM
documentation[^nvvm] and NVPTX backend texture/surface related
tests[^test], it's believed that surface/texture references are
supported by replacing their reference types, which are annotated with
`device_builtin_surface_type`/`device_builtin_texture_type`, with the
corresponding handle-like object types, `cudaSurfaceObject_t` or
`cudaTextureObject_t`, in the device-side compilation. On the host
side, that global handle variables are registered and will be
established and updated later when corresponding binding/unbinding
APIs are called[^bind]. Surface/texture references are most like
device global variables but represented in different types on the host
and device sides.
- In this patch, the following changes are proposed to support that
behavior:
+ Refine `device_builtin_surface_type` and
`device_builtin_texture_type` attributes to be applied on `Type`
decl only to check whether a variable is of the surface/texture
reference type.
+ Add hooks in code generation to replace that reference types with
the correponding object types as well as all accesses to them. In
particular, `nvvm.texsurf.handle.internal` should be used to load
object handles from global reference variables[^texsurf] as well as
metadata annotations.
+ Generate host-side registration with proper template argument
parsing.
---
[^nvvm]: https://docs.nvidia.com/cuda/pdf/NVVM_IR_Specification.pdf
[^test]: https://raw.githubusercontent.com/llvm/llvm-project/master/llvm/test/CodeGen/NVPTX/tex-read-cuda.ll
[^bind]: See section 3.2.11.1.2 ``Texture reference API` in [CUDA C Programming Guide](https://docs.nvidia.com/cuda/pdf/CUDA_C_Programming_Guide.pdf).
[^texsurf]: According to NVVM IR, `nvvm.texsurf.handle` should be used. But, the current backend doesn't have that supported. We may revise that later.
Reviewers: tra, rjmccall, yaxunl, a.sidorin
Subscribers: cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D76365
This patch removes the explicit call graph for CUDA/HIP/OpenMP deferred
diagnostics generated during parsing since it is error prone due to
incomplete information about function declarations during parsing. In stead,
this patch does a post-parsing AST traverse and emits deferred diagnostics
based on the use graph implicitly generated during the traverse.
Differential Revision: https://reviews.llvm.org/D70172
norecurse function attr indicates the function is not called recursively
directly or indirectly.
Add norecurse to OpenCL functions, SYCL functions in device compilation
and CUDA/HIP kernels.
Although there is LLVM pass adding norecurse to functions, it only works
for whole-program compilation. Also FE adding norecurse can make that
pass run faster since functions with norecurse do not need to be checked
again.
Differential Revision: https://reviews.llvm.org/D73651
CUDA/HIP program may be compiled with -fopenmp. In this case, -fopenmp is only passed to host compilation
to take advantages of multi-threads computation.
CUDA/HIP and OpenMP both use Sema::DeviceCallGraph to store functions to be analyzed and remove them
once they decide the function is sure to be emitted. CUDA/HIP and OpenMP have different functions to determine
if a function is sure to be emitted.
To check host/device correctly for CUDA/HIP when -fopenmp is enabled, there needs a unified logic to determine
whether a function is to be emitted. The logic needs to be aware of both CUDA and OpenMP logic.
Differential Revision: https://reviews.llvm.org/D67837
llvm-svn: 374263
This matches how GCC handles it, see e.g. https://gcc.godbolt.org/z/HPplnl.
GCC documents the gnu_inline attribute with "In C++, this attribute does
not depend on extern in any way, but it still requires the inline keyword
to enable its special behavior."
The previous behaviour of gnu_inline in C++, without the extern
keyword, can be traced back to the original commit that added
support for gnu_inline, SVN r69045.
Differential Revision: https://reviews.llvm.org/D67414
llvm-svn: 373078
Summary:
- Even though only `void` is still accepted as the deduced return type,
enabling deduction/instantiation on the return type allows more
consistent coding.
Reviewers: tra, jlebar
Subscribers: cfe-commits, yaxunl
Tags: #clang
Differential Revision: https://reviews.llvm.org/D68031
llvm-svn: 372898