Visual Studio's C++ standard library headers include intrin.h, so the intrinsic
headers get included a lot more often in Microsoft mode than elsewhere. The
AVX512 intrinsics are a lot of code (0.7 MB, causing 30% compile time overhead
for small programs including e.g. <string> and 6% compile time overhead for
larger projects like e.g. v8). Since multiversioning can't be relied on in
Microsoft mode (cl.exe doesn't support it), having faster compiles seems like
the much better tradeoff until we have a better intrinsic story going forward
(which we'll need for e.g. PR19898).
Actually using intrinsics on Windows already requires the right /arch:
settings, so this patch should have no big behavior change.
See also thread "The intrinsics headers (especially avx512) are too big. What
to do about it?" on cfe-dev.
http://reviews.llvm.org/D20291
llvm-svn: 269675
(1) Removed \code.. \endcode tags around the instruction name. This matches the doxygen format for all other intrinsics.
(2) Did a better formatting for the comments (to fit into 80 columns more compactly).
llvm-svn: 267676
Since r265060 LLVM infers correct __nvvm_reflect attributes, so
explicit declaration of __nvvm_reflect() is no longer needed.
Differential Revision: http://reviews.llvm.org/D19074
llvm-svn: 267062
xmmintrin.h a bit more directed. If for whatever reason modules are enabled but
we textually include one of these headers, don't deploy the special case for
modules. To make this work cleanly, extend __building_module to be defined
even when modules is disabled.
llvm-svn: 266945
The doxygen comments are automatically generated based on Sony's intrinsics document.
I got an OK from Eric Christopher to commit doxygen comments without prior code review upstream. This patch was internally reviewed by Paul Robinson.
llvm-svn: 265844
Summary:
See comments in patch; we were assuming that some stdlib math functions
would be defined in namespace std, when in fact the spec says they
should be defined in the global namespace. libstdc++4.9 became more
conforming and broke us.
This new implementation seems to cover the known knowns.
Reviewers: rsmith
Subscribers: cfe-commits, tra
Differential Revision: http://reviews.llvm.org/D18882
llvm-svn: 265751
The module.modulemap file in the lib/Headers directory was missing the LLVM
copyright notice. This patch adds the copyright notice just like the rest of
the files in this directory.
Differential Revision: http://reviews.llvm.org/D18709
llvm-svn: 265325
Summary:
This is necessary for a future patch which will make all constexpr
functions implicitly host+device. cmath may declare constexpr
functions, but these we do *not* want to be host+device. The forward
declares added in this patch prevent this (because the rule will be,
constexpr functions become implicitly host+device unless they're
preceeded by a decl with __device__).
Reviewers: tra
Subscribers: cfe-commits, rnk, rsmith
Differential Revision: http://reviews.llvm.org/D18539
llvm-svn: 264963
Summary:
We decided this makes life too difficult for code authors. For example,
people may want to detect NVCC and disable variadic templates, which
NVCC does not support, but which we do.
Since people are going to have to change compiler flags *anyway* in
order to compile with clang, if they really want the old behavior, they
can pass -D__NVCC__.
Tested with tensorflow and thrust, no apparent problems.
Reviewers: tra
Subscribers: cfe-commits
Differential Revision: http://reviews.llvm.org/D18417
llvm-svn: 264205
Only around 25% of the intrinsics in this file are documented here. The patches for the other half will be sent out later.
The doxygen comments are automatically generated based on Sony's intrinsics document.
I got an OK from Eric Christopher to commit doxygen comments without prior code review upstream.
llvm-svn: 263175
Only half of the intrinsics in this file is documented here. The patch for the other half will be sent out later.
The doxygen comments are automatically generated based on Sony's intrinsics document.
I got an OK from Eric Christopher to commit doxygen comments without prior code review upstream.
llvm-svn: 263098
Includes new built-in, conversion of built-in to target-independent intrinsic
and update in the header file. Tests are also updated. There is a second part in
the backend for which I will post a separate code-review. BACKEND PART SHOULD BE
COMMITTED FIRST.
Phabricator: http://reviews.llvm.org/D17816
llvm-svn: 263051
The doxygen comments are automatically generated based on Sony's intrinsics document.
I got an OK from Eric Christopher to commit doxygen comments without prior code review upstream.
llvm-svn: 262895
The doxygen comments are automatically generated based on Sony's intrinsics document.
I got an OK from Eric Christopher to commit doxygen comments without prior code review upstream.
llvm-svn: 262565
The doxygen comments are automatically generated based on Sony's intrinsics documentation.
Differential Revision: http://reviews.llvm.org/D17550
llvm-svn: 262385
Issue: https://llvm.org/bugs/show_bug.cgi?id=26720
Fix compile error when building ffmpeg for PowerPC64LE because of some
vec_vsx_ld/vec_vsx_st intrinsics are not supported by current clang.
New added intrinsics:
(vector) {signed|unsigned} {short|char} vec_vsx_ld: (total: 8)
bool vec_vsx_ld: (total: 1)
(vector) {signed|unsigned} {short|char} vec_vsx_st: (total: 8)
bool vec_vsx_st: (total: 1)
Total: 18 intrinsics
Phabricator: http://reviews.llvm.org/D17637
llvm-svn: 262359
Adds a number of constants, defined in the ARM EHABI spec, to the Clang
lib/Headers/unwind.h header. This is prerequisite for landing
http://reviews.llvm.org/D15781, as previously discussed there.
Patch by Timon Van Overveldt.
llvm-svn: 262178
Summary:
This lets you write, e.g.
uint3 a = threadIdx;
uint3 b = blockIdx;
dim3 c = gridDim;
dim3 d = blockDim;
which is legal in nvcc, but was not legal in clang.
The fact that e.g. the type of threadIdx is not actually uint3 is still
observable, but now you have to try to observe it.
Reviewers: tra
Subscribers: echristo, cfe-commits
Differential Revision: http://reviews.llvm.org/D17561
llvm-svn: 261777
Summary:
curand.h includes curand_mtgp32_kernel.h. In host mode, this header
redefines threadIdx and blockDim, giving them their "proper" types of
uint3 and dim3, respectively.
clang has its own plan for these variables -- their types are magic
builtin classes. So these redefinitions are incompatible.
As a hack, we force-include the offending CUDA header and use #defines
to get the right types for threadIdx and blockDim.
Reviewers: tra
Subscribers: echristo, cfe-commits
Differential Revision: http://reviews.llvm.org/D17562
llvm-svn: 261776
Fixing problem with the lib/include/avx512vlintrin.h file.
Adding one more _ to the prefix of _extension__ -> __extension__.
Differential Revision: http://reviews.llvm.org/D16985
llvm-svn: 261518
The doxygen comments are automatically generated based on Sony's intrinsics document.
Differential Revision: http://reviews.llvm.org/D16562
llvm-svn: 259275
CUDA expects math functions in std:: namespace to work on device side.
In order to make it work with clang without allowing device-side code
generation for functions w/o appropriate target attributes, this patch
provides device-side implementations for <cmath> functions. Most of
them call global-scope math functions provided by CUDA headers. In few
cases we use clang builtins.
Tested out-of tree by compiling and running thrust's unit_tests.
https://github.com/thrust/thrust/tree/master/testing
Differential Revision: http://reviews.llvm.org/D16593
llvm-svn: 258880
Summary:
This patch is provided in preparation for removing autoconf on 1/26. The proposal to remove autoconf on 1/26 was discussed on the llvm-dev thread here: http://lists.llvm.org/pipermail/llvm-dev/2016-January/093875.html
"This is the way [autoconf] ends
Not with a bang but a whimper."
-T.S. Eliot
Reviewers: chandlerc, grosbach, bob.wilson, echristo
Subscribers: klimek, cfe-commits
Differential Revision: http://reviews.llvm.org/D16472
llvm-svn: 258862
The Intel manual documents both an unsigned form (_mm_popcnt_u32)
and a signed form (_popcnt32) of the intrinsic. Add the missing signed form.
Differential Revision: http://reviews.llvm.org/D15568
llvm-svn: 256121
* Pull in host-only implementations of few CUDA-specific math functions.
* #nclude <cmath> early to prevent its inclusion from CUDA headers after
they've messed with __THROW macro.
llvm-svn: 255933
Currently it's easy to break CUDA compilation by passing
"-isystem /path/to/cuda/include" to compiler which leads to
compiler including real cuda_runtime.h from there instead
of the wrapper we need.
Renaming the wrapper ensures that we can include the wrapper
regardless of user-specified include paths and files.
Differential Revision: http://reviews.llvm.org/D15534
llvm-svn: 255802
This more closely matches their locations as described by Intel
documentation, and lets us remove a pair of redundant typedefs.
Differential Revision: http://reviews.llvm.org/D15127
llvm-svn: 254528
Use undefined instead of setzero as the pass through input since its going to be fully overwritten. Use cmpeq of two zero vectors to produce the all 1s vector. Casting -1 to a double and vectorizing causes a constant load of a -1.0 floating point value.
llvm-svn: 254389
Header files that come with CUDA are assuming split host/device
compilation and are not usable by clang out of the box.
With a bit of preprocessor magic it's possible to twist them
into something clang can use.
This wrapper always includes CUDA headers exactly the same way during
host and device compilation passes and produces identical preprocessed
content during host and device side compilation for sm_35 GPUs. Device
compilation passes for older GPUs will see a smaller subset of device
functions supported by particular GPU.
The wrapper assumes specific contents of CUDA header files and works
only with CUDA 7.0 and 7.5.
Differential Revision: http://reviews.llvm.org/D13171
llvm-svn: 253388
The tzcnt intrinsics are used non non-BMI targets by code (e.g. ffmpeg)
that uses it as a potentially faster BSF.
The TZCNT instruction is special in that it's encoded in a
backward-compatible way and behaves as BSF on non-BMI targets.
Differential Revision: http://reviews.llvm.org/D14748
llvm-svn: 253358
These two intrinsics are defined in arm_acle.h.
__rev16l needs to rotate by 16 bits, bit it was actually rotating by 2 bits.
For AArch64, where long is 64 bits, this would still be wrong.
__rev16ll was incorrect, it reversed the bytes in each 32-bit word, rather than
each 16-bit halfword. The correct implementation is to apply __rev16 to the top
and bottom words of the 64-bit value.
For AArch32 targets, these get compiled down to the hardware rev16 instruction
at -O1 and above. For AArch64 targets, the 64-bit ones get compiled to two
32-bit rev16 instructions, because there is not currently a pattern for the
64-bit rev16 instruction.
Differential Revision: http://reviews.llvm.org/D14609
llvm-svn: 253211