This is not supported to change anything but allow us to reuse the math
functions separately from the device functions, e.g., source them at
different times. This will be used by the OpenMP overlay.
This also adds two `return` keywords that were missing.
Reviewed By: tra
Differential Revision: https://reviews.llvm.org/D77238
Summary:
- Even though the bindless surface/texture interfaces are promoted,
there are still code using surface/texture references. For example,
[PR#26400](https://bugs.llvm.org/show_bug.cgi?id=26400) reports the
compilation issue for code using `tex2D` with texture references. For
better compatibility, this patch proposes the support of
surface/texture references.
- Due to the absent documentation and magic headers, it's believed that
`nvcc` does use builtins for texture support. From the limited NVVM
documentation[^nvvm] and NVPTX backend texture/surface related
tests[^test], it's believed that surface/texture references are
supported by replacing their reference types, which are annotated with
`device_builtin_surface_type`/`device_builtin_texture_type`, with the
corresponding handle-like object types, `cudaSurfaceObject_t` or
`cudaTextureObject_t`, in the device-side compilation. On the host
side, that global handle variables are registered and will be
established and updated later when corresponding binding/unbinding
APIs are called[^bind]. Surface/texture references are most like
device global variables but represented in different types on the host
and device sides.
- In this patch, the following changes are proposed to support that
behavior:
+ Refine `device_builtin_surface_type` and
`device_builtin_texture_type` attributes to be applied on `Type`
decl only to check whether a variable is of the surface/texture
reference type.
+ Add hooks in code generation to replace that reference types with
the correponding object types as well as all accesses to them. In
particular, `nvvm.texsurf.handle.internal` should be used to load
object handles from global reference variables[^texsurf] as well as
metadata annotations.
+ Generate host-side registration with proper template argument
parsing.
---
[^nvvm]: https://docs.nvidia.com/cuda/pdf/NVVM_IR_Specification.pdf
[^test]: https://raw.githubusercontent.com/llvm/llvm-project/master/llvm/test/CodeGen/NVPTX/tex-read-cuda.ll
[^bind]: See section 3.2.11.1.2 ``Texture reference API` in [CUDA C Programming Guide](https://docs.nvidia.com/cuda/pdf/CUDA_C_Programming_Guide.pdf).
[^texsurf]: According to NVVM IR, `nvvm.texsurf.handle` should be used. But, the current backend doesn't have that supported. We may revise that later.
Reviewers: tra, rjmccall, yaxunl, a.sidorin
Subscribers: cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D76365
This makes clang somewhat forward-compatible with new CUDA releases
without having to patch it for every minor release without adding
any new function.
If an unknown version is found, clang issues a warning (can be disabled
with -Wno-cuda-unknown-version) and assumes that it has detected
the latest known version. CUDA releases are usually supersets
of older ones feature-wise, so it should be sufficient to keep
released clang versions working with minor CUDA updates without
having to upgrade clang, too.
Differential Revision: https://reviews.llvm.org/D73231
Summary:
These all had somewhat custom file headers with different text from the
ones I searched for previously, and so I missed them. Thanks to Hal and
Kristina and others who prompted me to fix this, and sorry it took so
long.
Reviewers: hfinkel
Subscribers: mcrosier, javed.absar, cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D60406
llvm-svn: 357941
Instead of calling CUDA runtime to arrange function arguments,
the new API constructs arguments in a local array and the kernels
are launched with __cudaLaunchKernel().
The old API has been deprecated and is expected to go away
in the next CUDA release.
Differential Revision: https://reviews.llvm.org/D57488
llvm-svn: 352799
Clang can use CUDA-9.1 now, though new APIs (are not implemented yet.
The major change is that headers in CUDA-9.1 went through substantial
changes that started in CUDA-9.0 which required substantial changes
in the cuda compatibility headers provided by clang.
There are two major issues:
* CUDA SDK no longer provides declarations for libdevice functions.
* A lot of device-side functions have become nvcc's builtins and
CUDA headers no longer contain their implementations.
This patch changes the way CUDA headers are handled if we compile
with CUDA 9.x. Both 9.0 and 9.1 are affected.
* Clang provides its own declarations of libdevice functions.
* For CUDA-9.x clang now provides implementation of device-side
'standard library' functions using libdevice.
This patch should not affect compilation with CUDA-8. There may be
some observable differences for CUDA-9.0, though they are not expected
to affect functionality.
Tested: CUDA test-suite tests for all supported combinations of:
CUDA: 7.0,7.5,8.0,9.0,9.1
GPU: sm_20, sm_35, sm_60, sm_70
Differential Revision: https://reviews.llvm.org/D42513
llvm-svn: 323713
CUDA-9 headers check for specific libc++ version and ifdef out
some of the definitions we need if LIBCPP_VERSION >= 3800.
Differential Revision: https://reviews.llvm.org/D40198
llvm-svn: 319485
In CUDA-9 some of device-side math functions that we need are conditionally
defined within '#if _GLIBCXX_MATH_H'. We need to temporarily undo the guard
around inclusion of math_functions.hpp.
Differential Revision: https://reviews.llvm.org/D37906
llvm-svn: 313369
For now CUDA-9 is not included in the list of CUDA versions clang
searches for, so the path to CUDA-9 must be explicitly passed
via --cuda-path=.
On LLVM side NVPTX added sm_70 GPU type which bumps required
PTX version to 6.0, but otherwise is equivalent to sm_62 at the moment.
Differential Revision: https://reviews.llvm.org/D37576
llvm-svn: 312734
CUDA-8.0 comes with new headers which nvcc pre-includes via cuda_runtime.h
Clang now makes them available as well.
Differential Revision: https://reviews.llvm.org/D28301
llvm-svn: 290982
Previously, these were always included -- after this change, you have to
#include <new>, which is consistent with how things ought to work.
llvm-svn: 285251
These were reverted in r283753 and r283747.
The first patch added a header to the root 'Headers' install directory,
instead of into 'Headers/cuda_wrappers'. This was fixed in the second
patch, but by then the damage was done: The bad header stayed in the
'Headers' directory, continuing to break the build.
We reverted both patches in an attempt to fix things, but that still
didn't get rid of the header, so the Windows boostrap build remained
broken.
It's probably worth fixing up our cmake logic to remove things from the
install dirs, but in the meantime, re-land these patches, since we
believe they no longer have this bug.
llvm-svn: 283907
Breaks bootstrap builds on (at least) Windows:
In file included from D:\buildslave\clang-x64-ninja-win7\llvm\lib\Support\Allocator.cpp:14:
In file included from D:\buildslave\clang-x64-ninja-win7\llvm\include\llvm/Support/Allocator.h:24:
In file included from D:\buildslave\clang-x64-ninja-win7\llvm\include\llvm/ADT/SmallVector.h:20:
In file included from D:\buildslave\clang-x64-ninja-win7\llvm\include\llvm/Support/MathExtras.h:19:
D:\buildslave\clang-x64-ninja-win7\stage1.install\bin\..\lib\clang\4.0.0\include\algorithm(63,8) :
error: unknown type name '__device__'
inline __device__ const __T &
llvm-svn: 283747
Summary:
We do this by wrapping <complex> and <algorithm>.
Tests are in the test-suite.
Reviewers: tra
Subscribers: jhen, beanz, cfe-commits, mgorny
Differential Revision: https://reviews.llvm.org/D24979
llvm-svn: 283680
Summary: This matches the idiom we use for our other CUDA wrapper headers.
Reviewers: tra
Subscribers: beanz, mgorny, cfe-commits
Differential Revision: https://reviews.llvm.org/D24978
llvm-svn: 283679
Summary:
Previously these sort of worked because they didn't end up resulting in
calls at the ptx layer. But I'm adding stricter checks that break
placement new without these changes.
Reviewers: tra
Subscribers: cfe-commits
Differential Revision: https://reviews.llvm.org/D23239
llvm-svn: 278194
Summary: Clang changes to make use of the LLVM intrinsics added in D21160.
Reviewers: tra
Subscribers: jholewinski, cfe-commits
Differential Revision: http://reviews.llvm.org/D21162
llvm-svn: 272299
Summary:
Previously it was implemented as inline asm in the CUDA headers.
This change allows us to use the [addr+imm] addressing mode when
executing ld.global.nc instructions. This translates into a 1.3x
speedup on some benchmarks that call this instruction from within an
unrolled loop.
Reviewers: tra, rsmith
Subscribers: jhen, cfe-commits, jholewinski
Differential Revision: http://reviews.llvm.org/D19990
llvm-svn: 270150
Since r265060 LLVM infers correct __nvvm_reflect attributes, so
explicit declaration of __nvvm_reflect() is no longer needed.
Differential Revision: http://reviews.llvm.org/D19074
llvm-svn: 267062
Summary:
See comments in patch; we were assuming that some stdlib math functions
would be defined in namespace std, when in fact the spec says they
should be defined in the global namespace. libstdc++4.9 became more
conforming and broke us.
This new implementation seems to cover the known knowns.
Reviewers: rsmith
Subscribers: cfe-commits, tra
Differential Revision: http://reviews.llvm.org/D18882
llvm-svn: 265751
Summary:
This is necessary for a future patch which will make all constexpr
functions implicitly host+device. cmath may declare constexpr
functions, but these we do *not* want to be host+device. The forward
declares added in this patch prevent this (because the rule will be,
constexpr functions become implicitly host+device unless they're
preceeded by a decl with __device__).
Reviewers: tra
Subscribers: cfe-commits, rnk, rsmith
Differential Revision: http://reviews.llvm.org/D18539
llvm-svn: 264963
Summary:
We decided this makes life too difficult for code authors. For example,
people may want to detect NVCC and disable variadic templates, which
NVCC does not support, but which we do.
Since people are going to have to change compiler flags *anyway* in
order to compile with clang, if they really want the old behavior, they
can pass -D__NVCC__.
Tested with tensorflow and thrust, no apparent problems.
Reviewers: tra
Subscribers: cfe-commits
Differential Revision: http://reviews.llvm.org/D18417
llvm-svn: 264205
Summary:
This lets you write, e.g.
uint3 a = threadIdx;
uint3 b = blockIdx;
dim3 c = gridDim;
dim3 d = blockDim;
which is legal in nvcc, but was not legal in clang.
The fact that e.g. the type of threadIdx is not actually uint3 is still
observable, but now you have to try to observe it.
Reviewers: tra
Subscribers: echristo, cfe-commits
Differential Revision: http://reviews.llvm.org/D17561
llvm-svn: 261777
Summary:
curand.h includes curand_mtgp32_kernel.h. In host mode, this header
redefines threadIdx and blockDim, giving them their "proper" types of
uint3 and dim3, respectively.
clang has its own plan for these variables -- their types are magic
builtin classes. So these redefinitions are incompatible.
As a hack, we force-include the offending CUDA header and use #defines
to get the right types for threadIdx and blockDim.
Reviewers: tra
Subscribers: echristo, cfe-commits
Differential Revision: http://reviews.llvm.org/D17562
llvm-svn: 261776
CUDA expects math functions in std:: namespace to work on device side.
In order to make it work with clang without allowing device-side code
generation for functions w/o appropriate target attributes, this patch
provides device-side implementations for <cmath> functions. Most of
them call global-scope math functions provided by CUDA headers. In few
cases we use clang builtins.
Tested out-of tree by compiling and running thrust's unit_tests.
https://github.com/thrust/thrust/tree/master/testing
Differential Revision: http://reviews.llvm.org/D16593
llvm-svn: 258880
* Pull in host-only implementations of few CUDA-specific math functions.
* #nclude <cmath> early to prevent its inclusion from CUDA headers after
they've messed with __THROW macro.
llvm-svn: 255933
Currently it's easy to break CUDA compilation by passing
"-isystem /path/to/cuda/include" to compiler which leads to
compiler including real cuda_runtime.h from there instead
of the wrapper we need.
Renaming the wrapper ensures that we can include the wrapper
regardless of user-specified include paths and files.
Differential Revision: http://reviews.llvm.org/D15534
llvm-svn: 255802