Previously we negated the whole vector after splatting infinity. But its better to negate the infinity before splatting. This generates IR with the negate already folded with the infinity constant.
llvm-svn: 333062
These were included in emmintrin.h to match Intel Intrinsics Guide documentation. But this is because icc is capable of emulating them on targets that don't support F16C using library calls. Clang/LLVM doesn't have this emulation support. So it makes more sense to include them in immintrin.h instead.
I've left a comment behind to hopefully deter someone from trying to move them again in the future.
llvm-svn: 333033
Intel documents the 128-bit versions as being in emmintrin.h and the 256-bit version as being in immintrin.h.
This patch makes a new __emmtrin_f16c.h to hold the 128-bit versions to be included from emmintrin.h. And makes the existing f16cintrin.h contain the 256-bit versions and include it from immintrin.h with an error if its included directly.
Differential Revision: https://reviews.llvm.org/D47174
llvm-svn: 333014
I believe this is safe assuming default default FP environment. The conversion might be inexact, but it can never overflow the FP type so this shouldn't be undefined behavior for the uitofp/sitofp instructions.
We already do something similar for scalar conversions.
Differential Revision: https://reviews.llvm.org/D46863
llvm-svn: 332882
Summary:
These look to be a couple things that weren't removed when we switched to target attribute.
The popcnt makes including just smmintrin.h also include popcntintrin.h. The popcnt file itself already contains target attrributes.
The prefetch ones are just wrappers around __builtin_prefetch which we have graceful fallbacks for in the backend if the exact instruction isn't available. So there's no reason to hide them. And it makes them available in functions that have the write target attribute but not a -march command line flag.
Reviewers: echristo, RKSimon, spatel, DavidKreitzer
Reviewed By: echristo
Subscribers: cfe-commits
Differential Revision: https://reviews.llvm.org/D47029
llvm-svn: 332830
As long as the destination type is a 256 or 128 bit vector with the same number of elements we can use __builtin_convertvector to directly generate trunc IR instruction which will be handled natively by the backend.
Differential Revision: https://reviews.llvm.org/D46742
llvm-svn: 332266
If we're using default rounding mode we can let __builtin_convertvector to generate an fpextend. This matches 128 and 256 bit.
If we're using the version that takes an explicit rounding mode argument we would need to look at the immediate to see if its CUR_DIRECTION.
llvm-svn: 332210
We can use direct C code for these that will use uitofp and insertelement instructions.
For the versions that take an explicit rounding mode we can't do this.
llvm-svn: 332203
This is unnecessary for AVX512VL supporting CPUs like SKX. We can just emit a 128-bit masked load/store here no matter what. The backend will widen it to 512-bits on KNL CPUs.
Fixes the frontend portion of PR37386. Need to fix the backend to optimize the new sequences well.
llvm-svn: 331958
This is similar to the LLVM change https://reviews.llvm.org/D46290.
We've been running doxygen with the autobrief option for a couple of
years now. This makes the \brief markers into our comments
redundant. Since they are a visual distraction and we don't want to
encourage more \brief markers in new code either, this patch removes
them all.
Patch produced by
for i in $(git grep -l '\@brief'); do perl -pi -e 's/\@brief //g' $i & done
for i in $(git grep -l '\\brief'); do perl -pi -e 's/\\brief //g' $i & done
Differential Revision: https://reviews.llvm.org/D46320
llvm-svn: 331834
Without this we throw an error on the header file instead of the user code when the right features aren't enabled in clang.
Rename the other DEFAULT_FN_ATTRS defines to _Z for 512-bit since I used _Y for this case.
llvm-svn: 331682
It reverts r331378 as it caused test failures
ThreadSanitizer-x86_64 :: Darwin/gcd-groups-destructor.mm
ThreadSanitizer-x86_64 :: Darwin/libcxx-shared-ptr-stress.mm
ThreadSanitizer-x86_64 :: Darwin/xpc-race.mm
Only clang part of the change is reverted, libc++ part remains as is because it
emits error less aggressively.
llvm-svn: 331392
Atomics in C and C++ are incompatible at the moment and mixing the
headers can result in confusing error messages.
Emit an error explicitly telling about the incompatibility. Introduce
the macro `__ALLOW_STDC_ATOMICS_IN_CXX__` that allows to choose in C++
between C atomics and C++ atomics.
rdar://problem/27435938
Reviewers: rsmith, EricWF, mclow.lists
Reviewed By: mclow.lists
Subscribers: jkorous-apple, christof, bumblebritches57, JonChesterfield, smeenai, cfe-commits
Differential Revision: https://reviews.llvm.org/D45470
llvm-svn: 331378
On AVX512F targets we'll produce an emulated sequence using 3 pmuludqs with shifts and adds. On AVX512DQ we'll use vpmulld.
Fixes PR37140.
llvm-svn: 330923
The unmasked versions already didn't have this restrction. I don't think gcc or icc limit these to 64-bit mode so we shouldn't either.
llvm-svn: 330681
A previously missing intrinsic for an old instruction.
Reviewers: craig.topper, echristo
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D45311
llvm-svn: 329937
The WBNOINVD instruction writes back all modified
cache lines in the processor’s internal cache to main memory
but does not invalidate (flush) the internal caches.
Reviewers: craig.topper, zvi, ashlykov
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D43817
llvm-svn: 329848
Found via codespell -q 3 -I ../clang-whitelist.txt
Where whitelist consists of:
archtype
cas
classs
checkk
compres
definit
frome
iff
inteval
ith
lod
methode
nd
optin
ot
pres
statics
te
thru
Patch by luzpaz! (This is a subset of D44188 that applies cleanly with a few
files that have dubious fixes reverted.)
Differential revision: https://reviews.llvm.org/D44188
llvm-svn: 329399
- Fix instruction mappings/listings for various intrinsics
This patch was made by Craig Flores
Differential Revision: https://reviews.llvm.org/D41517
llvm-svn: 327090
Summary:
The _get_ssp intrinsic can be used to retrieve the
shadow stack pointer, independent of the current arch -- in
contract with the rdsspd and the rdsspq intrinsics.
Also, this intrinsic returns zero on CPUs which don't
support CET. The rdssp[d|q] instruction is decoded as nop,
essentially just returning the input operand, which is zero.
Example result of compilation:
```
xorl %eax, %eax
movl %eax, %ecx
rdsspq %rcx # NOP when CET is not supported
movq %rcx, %rax # return zero
```
Reviewers: craig.topper
Subscribers: cfe-commits
Differential Revision: https://reviews.llvm.org/D43814
llvm-svn: 326689
Initial commit missed sincos(float), llabs() and few atomics that we
used to pull in from device_functions.hpp, which we no longer include.
Differential Revision: https://reviews.llvm.org/D43602
llvm-svn: 325814
Clang can use CUDA-9.1 now, though new APIs (are not implemented yet.
The major change is that headers in CUDA-9.1 went through substantial
changes that started in CUDA-9.0 which required substantial changes
in the cuda compatibility headers provided by clang.
There are two major issues:
* CUDA SDK no longer provides declarations for libdevice functions.
* A lot of device-side functions have become nvcc's builtins and
CUDA headers no longer contain their implementations.
This patch changes the way CUDA headers are handled if we compile
with CUDA 9.x. Both 9.0 and 9.1 are affected.
* Clang provides its own declarations of libdevice functions.
* For CUDA-9.x clang now provides implementation of device-side
'standard library' functions using libdevice.
This patch should not affect compilation with CUDA-8. There may be
some observable differences for CUDA-9.0, though they are not expected
to affect functionality.
Tested: CUDA test-suite tests for all supported combinations of:
CUDA: 7.0,7.5,8.0,9.0,9.1
GPU: sm_20, sm_35, sm_60, sm_70
Differential Revision: https://reviews.llvm.org/D42513
llvm-svn: 323713
- Fix inaccurate instruction listings.
- Fix small issues in _mm_getcsr and _mm_setcsr.
- Fix description of NaN handling in comparison intrinsics.
- Fix inaccurate description of _mm_movemask_pi8.
- Fix inaccurate instruction mappings.
- Fix typos.
- Clarify wording on some descriptions.
- Fix bit ranges in return value.
- Fix typo in _mm_move_ms intrinsic instruction since it operates on singe-precision values, not double.
- This patch was made by Craig Flores
Differential Revision: https://reviews.llvm.org/D41523
llvm-svn: 322778
Summary:
kunpck intrinsics were removed in favor of native IR a few months ago. The implementation lowers them as by operation on the integer types passed to the intrinsic and then just shifting, masking, and oring them together. A special X86 DAG combine was added to recognize this patter and turn it into a concat_vector operation.
I think it makes more sense to keep the IR implementation closer to vector operations on vXi1. Given that we expect these builtins to be used around other builtins that operate on k-registers which we try to represent in IR with vXi1. InstCombine should be able to get rid of the bitcasts between integers and vXi1 leaving only the vector operations.
Reviewers: RKSimon, spatel, zvi, jina.nahias
Reviewed By: RKSimon
Subscribers: cfe-commits
Differential Revision: https://reviews.llvm.org/D42016
llvm-svn: 322461
- Fix incorrect wording in various intrinsic descriptions. Previously the descriptions used "low-order" and "high-order" when the intended meaning was "even-indexed" and "odd-indexed".
- Fix a few typos and errors found during review.
- Restore new line endings.
This patch was made by Craig Flores
llvm-svn: 322027
- Fix formatting issue due to hyphenated terms at line breaks.
- Fix typo
This patch was made by Craig Flores
Differential Revision: https://reviews.llvm.org/D41520
llvm-svn: 321671
- Fix incorrect wording in various intrinsic descriptions. Previously the descriptions used "low-order" and "high-order" when the intended meaning was "even-indexed" and "odd-indexed".
This patch was made by Craig Flores
Differential Revision: https://reviews.llvm.org/D41518
llvm-svn: 321670
- Fixed innaccurate instruction mappings for various intrinsics.
- Fixed description of NaN handling in comparison intrinsics.
- Unify description of _mm_store_pd1 to match _mm_store1_pd.
- Fix incorrect wording in various intrinsic descriptions. Previously the descriptions used "low-order" and "high-order" when the intended meaning was "even-indexed" and "odd-indexed".
- Fix typos.
- Add missing italics command (\a) for params and fixed some parameter spellings.
This patch was made by Craig Flores
Differential Revision: https://reviews.llvm.org/D41516
llvm-svn: 321669
added vbmi2 feature recognition
added intrinsics support for vbmi2 instructions
_mm[128,256,512]_mask[z]_compress_epi[16,32]
_mm[128,256,512]_mask_compressstoreu_epi[16,32]
_mm[128,256,512]_mask[z]_expand_epi[16,32]
_mm[128,256,512]_mask[z]_expandloadu_epi[16,32]
_mm[128,256,512]_mask[z]_sh[l,r]di_epi[16,32,64]
_mm[128,256,512]_mask_sh[l,r]dv_epi[16,32,64]
matching a similar work on the backend (D40206)
Differential Revision: https://reviews.llvm.org/D41557
llvm-svn: 321487
added vpclmulqdq feature recognition
added intrinsics support for vpclmulqdq instructions
_mm256_clmulepi64_epi128
_mm512_clmulepi64_epi128
matching a similar work on the backend (D40101)
Differential Revision: https://reviews.llvm.org/D41573
llvm-svn: 321480
added vaes feature recognition
added intrinsics support for vaes instructions, matching a similar work on the backend (D40078)
_mm256_aesenc_epi128
_mm512_aesenc_epi128
_mm256_aesenclast_epi128
_mm512_aesenclast_epi128
_mm256_aesdec_epi128
_mm512_aesdec_epi128
_mm256_aesdeclast_epi128
_mm512_aesdeclast_epi128
llvm-svn: 321474
* __shfl_{up,down}* uses unsigned int for the third parameter.
* added [unsigned] long overloads for non-sync shuffles.
Differential Revision: https://reviews.llvm.org/D41521
llvm-svn: 321326
This patch, together with a matching llvm patch (https://reviews.llvm.org/D39720), implements the lowering of X86 kunpack intrinsics to IR.
Differential Revision: https://reviews.llvm.org/D39719
Change-Id: Id5d3cb394ad33b98be79a6783d1d15569e2b798d
llvm-svn: 319777
Use this function to create the install targets rather than doing so
manually, which gains us the `-stripped` install targets to perform
stripped installations.
Differential Revision: https://reviews.llvm.org/D40675
llvm-svn: 319489
CUDA-9 headers check for specific libc++ version and ifdef out
some of the definitions we need if LIBCPP_VERSION >= 3800.
Differential Revision: https://reviews.llvm.org/D40198
llvm-svn: 319485
Shadow stack solution introduces a new stack for return addresses only.
The stack has a Shadow Stack Pointer (SSP) that points to the last address to which we expect to return.
If we return to a different address an exception is triggered.
This patch includes shadow stack intrinsics as well as the corresponding CET header.
It includes CET clang flags for shadow stack and Indirect Branch Tracking.
For more information, please see the following:
https://software.intel.com/sites/default/files/managed/4d/2a/control-flow-enforcement-technology-preview.pdf
Differential Revision: https://reviews.llvm.org/D40224
Change-Id: I79ad0925a028bbc94c8ecad75f6daa2f214171f1
llvm-svn: 318995
fma4 instructions zero the upper bits of the xmm register. fma3 instructions leave the bits unmodified. This requires separate builtins for the different semantics.
While we're cleaning up the scalar builtins this also removes the fma3 fmsub/fnmadd/fnmsub builtins by using negates in the header file.
llvm-svn: 318985
Summary:
__builtin_nexttoward lowers to a libcall, e.g. nexttowardf(), that CUDA
does not have.
Rather than try to implement it, we simply remove these functions --
nvcc doesn't support them either, and nextafter, which does work, does
essentially the same thing on GPUs, because GPUs don't have long double.
Reviewers: tra
Subscribers: cfe-commits, sanjoy
Differential Revision: https://reviews.llvm.org/D40152
llvm-svn: 318494
Change Header files of the intrinsics for lowering test and testn intrinsics to IR code.
Removed test and testn builtins from clang
Differential Revision: https://reviews.llvm.org/D38737
llvm-svn: 318035
This patch, together with a matching llvm patch (https://reviews.llvm.org/D38671), implements the lowering of X86 shuffle i/f intrinsics to IR.
Differential Revision: https://reviews.llvm.org/D38672
Change-Id: I9b3c2f2b34323bd9ccb21d0c1832f848b88ec047
llvm-svn: 318025
Summary:
How embarrassing.
This is tested in the test-suite -- fix to come there in a separate
patch.
Reviewers: tra
Subscribers: sanjoy, cfe-commits
Differential Revision: https://reviews.llvm.org/D39817
llvm-svn: 317961
The backend should be able to combine the negates to create fmsub, fnmadd, and fnmsub. faddsub converting to fsubadd still needs work I think, but should be very doable.
This matches what we already do for the masked builtins.
This only covers the packed builtins. Scalar builtins will be done after FMA4 is fixed.
llvm-svn: 317873
I think we need to use different builtins for the FMA4 instructions since those instructions zero the upper bits and FMA3 instructions pass the bits through.
So this moves the existing builtins to be the FMA3 versions. New versions will be added for FMA4.
llvm-svn: 317766
Summary: According to Intel docs this should take void const *. We had char*. The lack of const is the main issue.
Reviewers: RKSimon, zvi, igorb
Reviewed By: igorb
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D38782
llvm-svn: 315470
The __builtin_ia32_pbroadcastq512_mem_mask we were previously trying to use in 32-bit mode is not implemented in the x86 backend and causes isel to fail in release builds. In debug builds it fails even earlier during legalization with an llvm_unreachable.
While there add the missing test case for this intrinsic for this for 64-bit mode.
This fixes PR34631. D37668 should be able to recover this for 32-bit mode soon. But I wanted to fix the crash ahead of that.
llvm-svn: 313392
In CUDA-9 some of device-side math functions that we need are conditionally
defined within '#if _GLIBCXX_MATH_H'. We need to temporarily undo the guard
around inclusion of math_functions.hpp.
Differential Revision: https://reviews.llvm.org/D37906
llvm-svn: 313369
For now CUDA-9 is not included in the list of CUDA versions clang
searches for, so the path to CUDA-9 must be explicitly passed
via --cuda-path=.
On LLVM side NVPTX added sm_70 GPU type which bumps required
PTX version to 6.0, but otherwise is equivalent to sm_62 at the moment.
Differential Revision: https://reviews.llvm.org/D37576
llvm-svn: 312734
Summary:
Tests have to live in the test-suite, and so will come in a separate
patch.
Fixes PR34360.
Reviewers: tra
Subscribers: llvm-commits, sanjoy
Differential Revision: https://reviews.llvm.org/D37539
llvm-svn: 312681
Based off the Intel Intrinsics guide, we should expect a void const* argument.
Prevents 'passing 'const void *' to parameter of type 'void *' discards qualifiers' warnings.
Differential Revision: https://reviews.llvm.org/D37449
llvm-svn: 312523
This patch implements the broadcastf32x2/broadcasti32x2 intrinsics using __builtin_shufflevector.
Differential Revision: https://reviews.llvm.org/D37287
llvm-svn: 312135
GCC will interpret `__attribute__((__aligned__))` as 8-byte alignment on
ARM, but clang will not. Explicitly specify the alignment. This
mirrors the declaration in libunwind.
llvm-svn: 311576
The C++ ABI requires that the exception object (which under AEABI is the
`_Unwind_Control_Block`) is double-word aligned. The attribute was
applied to the `_Unwind_Exception` type, but not the
`_Unwind_Control_Block`. This should fix the libunwind test for the
alignment of the exception type.
llvm-svn: 311563
OpenCL spec v2.0 s6.13.6:
gentype select (gentype a,
gentype b,
igentype c)
gentype select (gentype a,
gentype b,
ugentype c)
igentype and ugentype must have the same number
of elements and bits as gentype.
Differential Revision: https://reviews.llvm.org/D36259
llvm-svn: 310160
OpenCL 2.0 atomic builtin functions have a scope argument which is ideally
represented as synchronization scope argument in LLVM atomic instructions.
Clang supports translating Clang atomic builtin functions to LLVM atomic
instructions. However it currently does not support synchronization scope
of LLVM atomic instructions. Without this, users have to use LLVM assembly
code to implement OpenCL atomic builtin functions.
This patch adds OpenCL 2.0 atomic builtin functions as Clang builtin
functions, which supports generating LLVM atomic instructions with
synchronization scope operand.
Currently only constant memory scope argument is supported. Support of
non-constant memory scope argument will be added later.
Differential Revision: https://reviews.llvm.org/D28691
llvm-svn: 310082
Clang specifies a max type alignment of 16 bytes on darwin targets (annoyingly in the driver not via cc1), meaning that the builtin nontemporal stores don't correctly align the loads/stores to 32 or 64 bytes when required, resulting in lowering to temporal unaligned loads/stores.
This patch casts the vectors to explicitly aligned types prior to the load/store to ensure that the require alignment is respected.
Differential Revision: https://reviews.llvm.org/D35996
llvm-svn: 309488
The EHABI definition was being inlined into the users even when EHABI
was not in use. Adjust the condition to ensure that the right version
is defined.
llvm-svn: 309327
Ensure that we define the `_Unwind_Control_Block` structure used on ARM
EHABI targets. This is needed for building libc++abi with the unwind.h
from the resource dir. A minor fallout of this is that we needed to
create a typedef for _Unwind_Exception to work across ARM EHABI and
non-EHABI targets. The structure definitions here are based originally
on the documentation from ARM under the "Exception Handling ABI for the
ARM® Architecture" Section 7.2. They are then adjusted to more closely
reflect the definition in libunwind from LLVM. Those changes are
compatible in layout but permit easier use in libc++abi and help
maintain compatibility between libunwind and the compiler provided
definition.
llvm-svn: 309226
This patch updates the vecintrin.h header file to provide the new
set of high-level vector built-in functions. This matches the
updated definition implemented by other compilers for the platform,
indicated by the pre-defined macro __VEC__ == 10302.
Note that some of the new functions (notably those involving the
vector float data type) are only available with -march=z14
(indicated by __ARCH__ == 12).
llvm-svn: 308199
Corrected several typos and incorrect parameters description that Sony
's techinical writer found during review.
I got an OK from Eric Christopher to commit doxygen comments without prior code
review upstream.
llvm-svn: 307838
Summary:
The _bit_scan_forward and _bit_scan_reverse intrinsics were accidentally
masked under the preprocessor checks that prune intrinsics definitions for the
benefit of faster compile-time on Windows. This patch moves the
definitons out of that region.
Fixes pr33722
Reviewers: craig.topper, aaboud, thakis
Reviewed By: craig.topper
Subscribers: cfe-commits
Differential Revision: https://reviews.llvm.org/D35184
llvm-svn: 307524
The second argument must be a constant, otherwise instruction selection
will fail. always_inline is not enough for isel to always fold
everything away at -O0.
Sadly the overloading turned this into a big macro mess. Fixes PR33212.
llvm-svn: 304205
AVX512_VPOPCNTDQ is a new feature set that was published by Intel.
The patch represents the Clang side of the addition of six intrinsics for two new machine instructions (vpopcntd and vpopcntq).
It also includes the addition of the new feature set.
Differential Revision: https://reviews.llvm.org/D33170
llvm-svn: 303857
(2) Removed uncessary anymore \c commands, since the same effect will be achived by <c> ... </c> sequence.
I got an OK from Eric Christopher to commit doxygen comments without prior code
review upstream.
llvm-svn: 303228
Separated very long brief sections into two sections.
I got an OK from Eric Christopher to commit doxygen comments without prior code
review upstream.
llvm-svn: 303031
This patch adds support for the the LightWeight Profiling (LWP) instructions which are available on all AMD Bulldozer class CPUs (bdver1 to bdver4).
Differential Revision: https://reviews.llvm.org/D32770
llvm-svn: 302418
Implemented the remaining integer data processing intrinsics from
the ARM ACLE v2.1 spec, such as parallel arithemtic and DSP style
multiplications.
Differential Revision: https://reviews.llvm.org/D32282
llvm-svn: 302131
- I removed doxygen comments for the intrinsics that "alias" the other existing documented intrinsics and that only sligtly differ in spelling (single underscores vs. double underscores).
#define _tzcnt_u16(a) (__tzcnt_u16((a)))
It will be very hard to keep the documentation for these "aliases" in sync with the documentation for the intrinsics they alias to. Out of sync documentation will be more confusing than no documentation.
I got an OK from Eric Christopher to commit doxygen comments without prior code
review upstream.
llvm-svn: 301652
size_t is usually defined as unsigned long, but on 64-bit platforms,
stdint.h currently defines SIZE_MAX using "ull" (unsigned long long).
Although this is the same width, it doesn't necessarily have the same
alignment or calling convention. It also triggers printf warnings when
using the format flag "%zu" to print SIZE_MAX.
This changes SIZE_MAX to reuse the compiler-provided __SIZE_MAX__, and
provides similar fixes for the other integers:
- INTPTR_MIN
- INTPTR_MAX
- UINTPTR_MAX
- PTRDIFF_MIN
- PTRDIFF_MAX
- INTMAX_MIN
- INTMAX_MAX
- UINTMAX_MAX
- INTMAX_C()
- UINTMAX_C()
... and fixes the typedefs for intptr_t and uintptr_t to use
__INTPTR_TYPE__ and __UINTPTR_TYPE__ instead of int32_t, effectively
reverting r89224, r89226, and r89237 (r89221 already having been
effectively reverted).
We can probably also kill __INTPTR_WIDTH__, __INTMAX_WIDTH__, and
__UINTMAX_WIDTH__ in a follow-up, but I was hesitant to delete all the
per-target CHECK lines in this commit since those might serve their own
purpose.
rdar://problem/11811377
llvm-svn: 301593
Summary: This patch makes the header `stdatomic.h` work when `-fms-compatibility` is specified.
Reviewers: rsmith
Reviewed By: rsmith
Subscribers: cfe-commits
Differential Revision: https://reviews.llvm.org/D32322
llvm-svn: 300919
- To be consistent with the rest of the intrinsics headers, I removed the tags <i> .. </i> for marking instruction names in italics in in smmintrin.h.
- Formatting changes to fit into 80 characters.
I got an OK from Eric Christopher to commit doxygen comments without prior code
review upstream.
llvm-svn: 300578
MOVNTDQA non-temporal aligned vector loads can be correctly represented using generic builtin loads, allowing us to remove the existing x86 intrinsics.
LLVM companion patch: D31767.
Differential Revision: https://reviews.llvm.org/D31766
llvm-svn: 300326
It's used by MS headers in VS 2017 without including intrin.h, so we
can't implement it in the header anymore.
Differential Revision: https://reviews.llvm.org/D31736
llvm-svn: 299782
It seems MS headers have started using __readgsqword, and since it's
used in a header that doesn't include intrin.h, we can't implement it as
an inline function anymore.
That was already the case for __readfsdword, which Saleem added support
for in r220859. This patch reuses that codegen to implement all of
__read[fg]s{byte,word,dword,qword}.
Differential Revision: https://reviews.llvm.org/D31248
llvm-svn: 298538
I made some small changes in smmintrin.h and emmintrin.h intrinsics.
- changed some regular comments '//' into doxygen-style comments '///' where necessary
- removed some trailing spaces in doxygen comments.
I got an OK from Eric Christopher to commit doxygen comments without prior code
review upstream.
llvm-svn: 298371
- Fix a variable naming mismatch
- Fix gcc extension pointer arithmetic on void to cast to char *.
- Test that the header (and htmintrin.h) parse.
llvm-svn: 298318
Reapply r289181 but rename the include guard to avoid
conflict with the one from Darwin.
Allow darwin to provide additional definitions and implementation
specifc values for tgmath.h on Apple platforms.
rdar://problem/19019845
llvm-svn: 298013
The DAZ feature introduces the denormal zero support for x86.
Currently the definitions are located under SSE3 header, however there are some SSE2 targets that support the feature as well.
Differential Revision: https://reviews.llvm.org/D30194
llvm-svn: 296296
Note: The doxygen comments are automatically generated based on Sony's intrinsic
s document.
I got an OK from Eric Christopher to commit doxygen comments without prior code
review upstream.
llvm-svn: 295404
Removed ndrange_t as Clang builtin type and added
as a struct type in the OpenCL header.
Use type name to do the Sema checking in enqueue_kernel
and modify IR generation accordingly.
Review: D28058
Patch by Dmitry Borisenkov!
llvm-svn: 295311
__fastfail terminates the process immediately with a special system
call. It does not run any process shutdown code or exception recovery
logic.
Fixes PR31854
llvm-svn: 294606
1. Adds the command line flag for clzero.
2. Includes the clzero flag under znver1.
3. Defines the macro for clzero.
4. Adds a new file which has the intrinsic definition for clzero instruction.
Patch by Ganesh Gopalasubramanian with some additional tests from me.
Differential revision: https://reviews.llvm.org/D29386
llvm-svn: 294559
Added doxygen comments to prfchwintrin.h's intrinsics.
Note: The doxygen comments are automatically generated based on Sony's intrinsic
s document.
I got an OK from Eric Christopher to commit doxygen comments without prior code
review upstream.
llvm-svn: 293745
Prior to OpenCL 2.0, image3d_t can only be used with the write_only
access qualifier when the cl_khr_3d_image_writes extension is enabled,
see e.g. OpenCL 1.1 s6.8b.
Require the extension for write_only image3d_t types and guard uses of
write_only image3d_t in the OpenCL header.
Patch by Sven van Haastregt!
Review: https://reviews.llvm.org/D28860
llvm-svn: 293050
For a << b (as original vec_sl does), if b >= sizeof(a) * 8, the
behavior is undefined. However, Power instructions do define the
behavior, which is equivalent to a << (b % (sizeof(a) * 8)).
This patch changes altivec.h to use a << (b % (sizeof(a) * 8)), to
ensure the consistent semantic of the instructions. Then it combines
the generated multiple instructions back to a single shift.
This patch handles left shift only. Right shift, on the other hand, is
more complicated, considering arithematic/logical right shift.
Differential Revision: https://reviews.llvm.org/D28037
llvm-svn: 292659
Added doxygen comments for the newly added intrinsics in avxintrin.h, namely _mm256_cvtsd_f64, _mm256_cvtsi256_si32 and _mm256_cvtss_f32
Added doxygen comments for the new intrinsics in emmintrin.h, namely _mm_loadu_si64 and _mm_load_sd.
Explicit parameter names were added for _mm_clflush and _mm_setcsr
The rest of the changes are editorial, removing trailing spaces at the end of the lines.
Differential Revision: https://reviews.llvm.org/D28503
llvm-svn: 291876
Add builtins for the functions and custom codegen mapping the builtins to their
corresponding intrinsics and handling the endian related swapping.
https://reviews.llvm.org/D26546
llvm-svn: 291179
Summary:
MSVC seems to use "__in" and "__out" for its own purposes, so we have to
pick different names in this macro.
Reviewers: tra
Subscribers: cfe-commits
Differential Revision: https://reviews.llvm.org/D28325
llvm-svn: 291138
Summary:
These duplicate declarations cause a problem for CUDA compiles on
Windows. All implicitly-defined functions are host+device, and this
applies to the declarations in Builtin.def. But then when we see the
declarations in intrin.h, they have no attributes, so are host-only
functions. This is an error.
(A better fix might be to make these builtins host-only, but that is a
much bigger change.)
Reviewers: rnk
Subscribers: cfe-commits, echristo
Differential Revision: https://reviews.llvm.org/D28317
llvm-svn: 291128
CUDA-8.0 comes with new headers which nvcc pre-includes via cuda_runtime.h
Clang now makes them available as well.
Differential Revision: https://reviews.llvm.org/D28301
llvm-svn: 290982
Added \n commands to insert a line breaks where necessary, since one long line of documentation is nearly unreadable.
Formatted comments to fit into 80 chars.
In some cases added \a command in front of the parameter names to display them in italics.
llvm-svn: 290619
Improved doxygen comments for the following intrinsics headers: __wmmintrin_pclmul.h, bmiintrin.h, emmintrin.h, f16cintrin.h, immintrin.h, mmintrin.h, pmmintrin.h, tmmintrin.h
Added \n commands to insert a line breaks where necessary, since one long line of documentation is nearly unreadable.
Formatted comments to fit into 80 chars.
In some cases added \a command in front of the parameter names to display them in italics.
llvm-svn: 290561
According to extended asm syntax, a case where the clobber list includes a variable from the inputs or outputs should be an error - conflict.
for example:
const long double a = 0.0;
int main()
{
char b;
double t1 = a;
__asm__ ("fucompp": "=a" (b) : "u" (t1), "t" (t1) : "cc", "st", "st(1)");
return 0;
}
This should conflict with the output - t1 which is st, and st which is st aswell.
The patch fixes it.
Commit on behald of Ziv Izhar.
Differential Revision: https://reviews.llvm.org/D15075
llvm-svn: 290539
Added \n commands to insert a line breaks where necessary to make the documentation more readable.
Formatted comments to fit into 80 chars.
llvm-svn: 290458
Tagged parameter names with \a doxygen command to display parameters in italics.
Added \n commands to insert a line break to make the documentation more readable.
Formatted comments to fit into 80 chars.
llvm-svn: 290455
Added a map to associate types and declarations with extensions.
Refactored existing diagnostic for disabled types associated with extensions and extended it to declarations for generic situation.
Fixed some bugs for types associated with extensions.
Allow users to use pragma to declare types and functions for supported extensions, e.g.
#pragma OPENCL EXTENSION the_new_extension_name : begin
// declare types and functions associated with the extension here
#pragma OPENCL EXTENSION the_new_extension_name : end
Differential Revision: https://reviews.llvm.org/D21698
llvm-svn: 289979
Reverts r289181: it's currently breaking modules using simd.h in
10.12 SDK.
This reverts commit 6e73e3464e96a4e00492c24aa790d36e1adb5702.
llvm-svn: 289487
This will allow the backend to constant fold these to generic shuffle vectors like 128-bit and 256-bit without having to working about handling masking.
llvm-svn: 289351
This will allow the backend to constant fold these to generic shuffle vectors like 128-bit and 256-bit without having to working about handling masking.
llvm-svn: 289345
Tagged instruction names with <c> INSTR_NAME </c> to display them in typewriter font.
In the past, \c command was used, unfortunately it applied to only one word.
<c> .. </c> has the same meaning, but applies to all words in between the tags.
llvm-svn: 289249
Allow darwin to provide additional definitions and implementation
specifc values for tgmath.h on Apple platforms.
rdar://problem/19019845
llvm-svn: 289181
Improved doxygen comments for fxsrintrin.h and mmintrin.h intrinsics by taagging parameter names with \a doxygen command to display parameters in italics.
Formatted comments to fit into 80 chars.
llvm-svn: 289154
Improved doxygen comments for __wmmintrin_pclmul.h and ammintrin.h intrinsics by taagging parameter names with \a doxygen command to display parameters in italics.
Formatted comments to fit into 80 chars.
llvm-svn: 289083
Documentation for some of the avxintrin.h's intrinsics errorneously said that
non VEX-prefixed instructions could be generated. This was fixed.
I tried several different solutions to achieve pretty printing of unordered lists (nested and non-nested) in param sections in doxygen.
llvm-svn: 287990
(commit again after fixing the buildbot failures)
This adds various overloads of the following builtins to altivec.h:
vec_neg
vec_nabs
vec_adde
vec_addec
vec_sube
vec_subec
vec_subc
Note that for vec_sub builtins on 32 bit integers, the semantics is similar to
what ISA describes for instructions like vsubecuq that work on quadwords: the
first operand is added to the one's complement of the second operand. (As
opposed to two's complement which I expected).
llvm-svn: 287872
(commit again after fixing the buildbot failures)
This adds various overloads of the following builtins to altivec.h:
vec_neg
vec_nabs
vec_adde
vec_addec
vec_sube
vec_subec
vec_subc
Note that for vec_sub builtins on 32 bit integers, the semantics is similar to
what ISA describes for instructions like vsubecuq that work on quadwords: the
first operand is added to the one's complement of the second operand. (As
opposed to two's complement which I expected).
llvm-svn: 287795
This adds various overloads of the following builtins to altivec.h:
vec_neg
vec_nabs
vec_adde
vec_addec
vec_sube
vec_subec
vec_subc
Note that for vec_sub builtins on 32 bit integers, the semantics is similar to
what ISA describes for instructions like vsubecuq that work on quadwords: the
first operand is added to the one's complement of the second operand. (As
opposed to two's complement which I expected).
llvm-svn: 287772
The doxygen comments are automatically generated based on Sony's intrinsics docu
ment.
I got an OK from Eric Christopher to commit doxygen comments without prior code
review upstream. This patch was internally reviewed by Charles Li.
llvm-svn: 287483
Added doxygen comments to avxintrin.h's intrinsics. As of now, all the intrinsics in this file that were documented by Sony's intrinsics guide should have corresponding doxygen comments.
Note: The doxygen comments are automatically generated based on Sony's intrinsic
s document.
I got an OK from Eric Christopher to commit doxygen comments without prior code
review upstream.
Reviewed by Wolfgang Pieb.
llvm-svn: 287436
The doxygen comments are automatically generated based on Sony's intrinsics document.
I got an OK from Eric Christopher to commit doxygen comments without prior code
review upstream. This patch was internally reviewed by Charles Li.
llvm-svn: 287317
The doxygen comments are automatically generated based on Sony's intrinsics document.
I got an OK from Eric Christopher to commit doxygen comments without prior code
review upstream. This patch was internally reviewed by Paul Robinson and Charles Li.
llvm-svn: 287295
I made several changes for consistency with the rest of x86 instrinsics header files. Some of these changes help to render doxygen comments better.
1. avxintrin.h – Moved the opening bracket on a separate line for several
intrinsics (for consistency with the rest of the intrinsics).
2. emmintrin.h - Moved the doxygen comment next to the body of the function;
- Added braces after extern "C" even though there is only
one declaration each time
3. xmmintrin.h - Moved the doxygen comment next to the body of the function;
- Added intrinsic prototypes for a couple of macro definitions
into the doxygen comment;
- Added braces after extern "C" even though there is only one
declaration each time
4. ammintrin.h – Removed extra line between the doxygen comment and the body
of the functions (for consistency with the rest of the files).
Desk reviewed by Paul Robinson.
llvm-svn: 287278
Both the (V)CVTDQ2PD (i32 to f64) and (V)CVTUDQ2PD (u32 to f64) conversion instructions are lossless and can be safely represented as generic __builtin_convertvector calls instead of x86 intrinsics without affecting final codegen.
This patch removes the clang builtins and their use in the headers - a future patch will deal with removing the llvm intrinsics.
This is an extension patch to D20528 which dealt with the equivalent sse/avx cases.
Differential Revision: https://reviews.llvm.org/D26686
llvm-svn: 287088
This patch implements all the overloads for vec_xl_be and vec_xst_be. On BE,
they behaves exactly the same with vec_xl and vec_xst, therefore they are
simply implemented by defining a matching macro. On LE, they are implemented
by defining new builtins and intrinsics. For int/float/long long/double, it
is just a load (lxvw4x/lxvd2x) or store(stxvw4x/stxvd2x). For char/char/short,
we also need some extra shuffling before or after call the builtins to get the
desired BE order. For int128, simply call vec_xl or vec_xst.
llvm-svn: 286971
Adds 2 vector functions for converting from a vector of unsigned short to a
vector of float. One converts the low 4 halfwords and one converts the high
4 halfwords.
Differential Revision: https://reviews.llvm.org/D26534
llvm-svn: 286863
Add vector extract exponent/significand functions to altivec.h, as well as
functions (and related constants) to test the data class of vector float
and vector double.
Differential Revision: https://reviews.llvm.org/D26271
llvm-svn: 286830
This is part of a set of changes to allow InstCombine in the backend to optimize variable shifts without having to know about masking.
llvm-svn: 286757
Summary: Inverting the mask argument does not reflect the intended semantics of the intrinsic.
Reviewers: igorb, delena
Subscribers: cfe-commits
Differential Revision: https://reviews.llvm.org/D26019
llvm-svn: 286733
Added doxygen comments to avxintrin.h's intrinsics. As of now, around 75% of the
intrinsics in this file are documented here. The patches for the other 25% will be se
nt out later.
Removed extra spaces in emmitrin.h.
Note: The doxygen comments are automatically generated based on Sony's intrinsics document.
I got an OK from Eric Christopher to commit doxygen comments without prior code
review upstream.
llvm-svn: 286336
Certain OpenCL builtin functions are supposed to be executed by all threads in a work group or sub group. Such functions should not be made divergent during transformation. It makes sense to mark them with convergent attribute.
The adding of convergent attribute is based on Ettore Speziale's work and the original proposal and patch can be found at https://www.mail-archive.com/cfe-commits@lists.llvm.org/msg22271.html.
Differential Revision: https://reviews.llvm.org/D25343
llvm-svn: 285725
Unfortunately, the backend currently doesn't fold masks into the instructions correctly when they come from these shufflevectors. I'll work on that in a future commit.
llvm-svn: 285667
Unfortunately, the backend currently doesn't fold masks into the instructions correctly when they come from these shufflevectors. I'll work on that in a future commit.
llvm-svn: 285540
After LGTM and Check-all
Vector-reduction arithmetic accepts vectors as inputs and produces
scalars as outputs.This class of vector operation forms the basis
of many scientific computations. In vector-reduction arithmetic,
the evaluation off is independent of the order of the input elements of V.
Reviewer: 1. craig.topper
2. igorb
Differential Revision: https://reviews.llvm.org/D25988
llvm-svn: 285493
OpenCL disallows using variadic arguments (s6.9.e and s6.12.5 OpenCL v2.0)
apart from some exceptions:
- printf
- enqueue_kernel
This change adds error diagnostic for variadic functions but accepts printf
and any compiler internal function (which should cover __enqueue_kernel_XXX cases).
It also unifies diagnostic with block prototype and adds missing uncaught cases for blocks.
llvm-svn: 285395
Previously, these were always included -- after this change, you have to
#include <new>, which is consistent with how things ought to work.
llvm-svn: 285251
Summary: The preserved input should be the first argument and the vector inputs should be in the same order as the intrinsics it is used to implement.
Reviewers: igorb, delena
Subscribers: cfe-commits
Differential Revision: https://reviews.llvm.org/D25902
llvm-svn: 285175
Committed after LGTM and check-all
Vector-reduction arithmetic accepts vectors as inputs and produces scalars as outputs.
This class of vector operation forms the basis of many scientific computations.
In vector-reduction arithmetic, the evaluation off is independent of the order of the input elements of V.
Used bisection method. At each step, we partition the vector with previous
step in half, and the operation is performed on its two halves.
This takes log2(n) steps where n is the number of elements in the vector.
Reviwer: 1. igorb
2. craig.topper
Differential Revision: https://reviews.llvm.org/D25527
llvm-svn: 285054
Committed after LGTM and check-all
Vector-reduction arithmetic accepts vectors as inputs and produces scalars as outputs.
This class of vector operation forms the basis of many scientific computations.
In vector-reduction arithmetic, the evaluation off is independent of the order of the input elements of V.
Used bisection method. At each step, we partition the vector with previous
step in half, and the operation is performed on its two halves.
This takes log2(n) steps where n is the number of elements in the vector.
Differential Revision: https://reviews.llvm.org/D25527
llvm-svn: 284963
With this patch, all intrinsics in this file (with an exception of a handful of a recently added ones) will be documented. I will send out a patch for 4 missining intrisics later.
The doxygen comments are automatically generated based on Sony's intrinsics document.
I got an OK from Eric Christopher to commit doxygen comments without prior code
review upstream. This patch was internally reviewed by Yunzhong Gao.
llvm-svn: 284934
With this patch, 75% of the intrinsics in this file will be documented now. The patches for the rest of the intrisics in this file will be send out later.
The doxygen comments are automatically generated based on Sony's intrinsics document.
I got an OK from Eric Christopher to commit doxygen comments without prior code review upstream. This patch was internally reviewed by Yunzhong Gao.
llvm-svn: 284754
Summary: We need `__stosb` to be an intrinsic, because SecureZeroMemory function uses it without including intrin.h. Implementing it as a volatile memset is not consistent with MSDN specification, but it gives us target-independent IR while keeping the most important properties of `__stosb`.
Reviewers: rnk, hans, thakis, majnemer
Subscribers: cfe-commits
Differential Revision: https://reviews.llvm.org/D25334
llvm-svn: 284253
Summary: Previously global 64-bit versions of _Interlocked functions broke buildbots on i386, so now I'm adding them as builtins for x86-64 and ARM only (should they be also on AArch64? I had problems with testing it for AArch64, so I left it)
Reviewers: hans, majnemer, mstorsjo, rnk
Subscribers: cfe-commits, aemerson
Differential Revision: https://reviews.llvm.org/D25576
llvm-svn: 284172
Summary: _BitScan intrinsics (and some others, for example _Interlocked and _bittest) are supposed to work on both ARM and x86. This is an attempt to isolate them, avoiding repeating their code or writing separate function for each builtin.
Reviewers: hans, thakis, rnk, majnemer
Subscribers: RKSimon, cfe-commits, aemerson
Differential Revision: https://reviews.llvm.org/D25264
llvm-svn: 284060
These were reverted in r283753 and r283747.
The first patch added a header to the root 'Headers' install directory,
instead of into 'Headers/cuda_wrappers'. This was fixed in the second
patch, but by then the damage was done: The bad header stayed in the
'Headers' directory, continuing to break the build.
We reverted both patches in an attempt to fix things, but that still
didn't get rid of the header, so the Windows boostrap build remained
broken.
It's probably worth fixing up our cmake logic to remove things from the
install dirs, but in the meantime, re-land these patches, since we
believe they no longer have this bug.
llvm-svn: 283907
Breaks bootstrap builds on (at least) Windows:
In file included from D:\buildslave\clang-x64-ninja-win7\llvm\lib\Support\Allocator.cpp:14:
In file included from D:\buildslave\clang-x64-ninja-win7\llvm\include\llvm/Support/Allocator.h:24:
In file included from D:\buildslave\clang-x64-ninja-win7\llvm\include\llvm/ADT/SmallVector.h:20:
In file included from D:\buildslave\clang-x64-ninja-win7\llvm\include\llvm/Support/MathExtras.h:19:
D:\buildslave\clang-x64-ninja-win7\stage1.install\bin\..\lib\clang\4.0.0\include\algorithm(63,8) :
error: unknown type name '__device__'
inline __device__ const __T &
llvm-svn: 283747
Summary:
We do this by wrapping <complex> and <algorithm>.
Tests are in the test-suite.
Reviewers: tra
Subscribers: jhen, beanz, cfe-commits, mgorny
Differential Revision: https://reviews.llvm.org/D24979
llvm-svn: 283680
Summary: This matches the idiom we use for our other CUDA wrapper headers.
Reviewers: tra
Subscribers: beanz, mgorny, cfe-commits
Differential Revision: https://reviews.llvm.org/D24978
llvm-svn: 283679
Summary:
Currently we declare our inline __device__ math functions in namespace
std. But libstdc++ and libc++ declare these functions in an inline
namespace inside namespace std. We need to match this because, in a
later patch, we want to get e.g. <complex> to use our device overloads,
and it only will if those overloads are in the right inline namespace.
Reviewers: tra
Subscribers: cfe-commits, jhen
Differential Revision: https://reviews.llvm.org/D24977
llvm-svn: 283678
Summary: We need x86-64-specific builtins if we want to implement some of the MS intrinsics - winnt.h contains definitions of some functions for i386, but not for x86-64 (for example _InterlockedOr64), which means that we cannot treat them as builtins for both i386 and x86-64, because then we have definitions of builtin functions in winnt.h on i386.
Reviewers: thakis, majnemer, hans, rnk
Subscribers: cfe-commits
Differential Revision: https://reviews.llvm.org/D24598
llvm-svn: 283264
This patch corresponds to review:
https://reviews.llvm.org/D24397
It adds the __POWER9_VECTOR__ macro and the -mpower9-vector option along with
a number of altivec.h functions (refer to the code review for a list).
llvm-svn: 282481
On ARM, there are multiple versions of each of the intrinsics, with
acquire/relaxed/release barrier semantics.
The newly added ones are provided as inline functions here instead of builtins,
since they should only be available on certain archs (arm/aarch64).
This is necessary in order to compile C++ code for ARM in MSVC mode.
Patch by Martin Storsjö!
llvm-svn: 282447
This patch adds the msa.h header file containing the shorter names for the
MSA instrinsics, e.g. msa_sll_b for builtin_msa_sll_b.
Reviewers: vkalintiris, zoran.jovanovic
Differential Review: https://reviews.llvm.org/D24674
llvm-svn: 281975
Summary:
We need to add a bunch more "using"s, which weren't necessary with
libstdc++.
Once this is in I can check in a test to the test-suite.
Reviewers: tra
Subscribers: cfe-commits
Differential Revision: https://reviews.llvm.org/D24588
llvm-svn: 281544
Summary: There was no definition for __nop function - added inline
assembly.
Patch by Albert Gutowski!
Reviewers: rnk, thakis
Subscribers: cfe-commits
Differential Revision: https://reviews.llvm.org/D24286
llvm-svn: 280826
This adds support for modules that require (non-)freestanding
environment, such as the compiler builtin mm_malloc submodule.
Differential Revision: https://reviews.llvm.org/D23871
llvm-svn: 280613
These will be reused when removing some builtins from avx512vldqintrin.h and this will make the tests for that change show a better number of vector elements.
llvm-svn: 280196
This adds support for modules that require (no-)gnu-inline-asm
environment, such as the compiler builtin cpuid submodule.
This is the gnu-inline-asm variant of https://reviews.llvm.org/D23871
Differential Revision: https://reviews.llvm.org/D23905
rdar://problem/26931199
llvm-svn: 280159
Summary: Make is_valid_event and create_user_event overloadable like other built-ins.
Patch by Evgeniy Tyurin.
Reviewers: bader, yaxunl
Subscribers: Anastasia, cfe-commits
Differential Revision: https://reviews.llvm.org/D23914
llvm-svn: 280097
Summary:
A bunch of related changes here to our CUDA math headers.
- The second arg to nexttoward is a double (well, technically, long
double, but we don't have that), not a float.
- Add a forward-declare of llround(float), which is defined in the CUDA
headers. We need this for the same reason we need most of the other
forward-declares: To prevent a constexpr function in our standard
library from becoming host+device.
- Add nexttowardf implementation.
- Pull "foobarf" functions defined by the CUDA headers in the global
namespace into namespace std. This lets you do e.g. std::sinf.
- Add overloads for math functions accepting integer types. This lets
you do e.g. std::sin(0) without having an ambiguity between the
overload that takes a float and the one that takes a double.
With these changes, we pass testcases derived from libc++ for cmath and
math.h. We can check these testcases in to the test-suite once support
for CUDA lands there.
Reviewers: tra
Subscribers: cfe-commits
Differential Revision: https://reviews.llvm.org/D23627
llvm-svn: 279140
Summary:
Previously these sort of worked because they didn't end up resulting in
calls at the ptx layer. But I'm adding stricter checks that break
placement new without these changes.
Reviewers: tra
Subscribers: cfe-commits
Differential Revision: https://reviews.llvm.org/D23239
llvm-svn: 278194
This fixes compiling with headers from the Windows SDK for ARM, where the
YieldProcessor function (in winnt.h) refers to _ARM_BARRIER_ISHST.
The actual MSVC armintr.h contains a lot more definitions, but this is enough to
build code that uses the Windows SDK but doesn't use ARM intrinsics directly.
An alternative would to just keep the addition to intrin.h (to include
armintr.h), but not actually ship armintr.h, instead having clang's intrin.h
include armintr.h from MSVC's include directory. (That one works fine with
clang, at least for building code that uses the Windows SDK.)
Patch by Martin Storsjö!
llvm-svn: 277928
There should be no native_ builtin functions with double type arguments.
Patch by Aaron En Ye Shi.
Differential Revision : https://reviews.llvm.org/D23071
llvm-svn: 277754
Summary:
Some cpuid bit defines are named slightly different from how gcc's
cpuid.h calls them.
Define a few more compatibility names to appease software built for gcc:
* `bit_PCLMUL` alias of `bit_PCLMULQDQ`
* `bit_SSE4_1` alias of `bit_SSE41`
* `bit_SSE4_2` alias of `bit_SSE42`
* `bit_AES` alias of `bit_AESNI`
* `bit_CMPXCHG8B` alias of `bit_CX8`
While here, add the misssing 29th bit, `bit_F16C` (which is how gcc
calls this bit).
Reviewers: joerg, rsmith
Subscribers: bruno, cfe-commits
Differential Revision: https://reviews.llvm.org/D22010
llvm-svn: 277307
Added CLK_ABGR definition for get_image_channel_order return value inside opencl-c.h file.
Patch by Aaron En Ye Shi.
Differential Revision: https://reviews.llvm.org/D22767
llvm-svn: 277179
Only around 50% of the intrinsics in this file are documented now. The patches for the rest of the intrisics in this file will be send out later.
The doxygen comments are automatically generated based on Sony's intrinsics docu
ment.
I got an OK from Eric Christopher to commit doxygen comments without prior code
review upstream. This patch was internally reviewed by Paul Robinson.
llvm-svn: 276499
D20859 and D20860 attempted to replace the SSE (V)CVTTPS2DQ and VCVTTPD2DQ truncating conversions with generic IR instead.
It turns out that the behaviour of these intrinsics is different enough from generic IR that this will cause problems, INF/NAN/out of range values are guaranteed to result in a 0x80000000 value - which plays havoc with constant folding which converts them to either zero or UNDEF. This is also an issue with the scalar implementations (which were already generic IR and what I was trying to match).
This patch changes both scalar and packed versions back to using x86-specific builtins.
It also deals with the other scalar conversion cases that are runtime rounding mode dependent and can have similar issues with constant folding.
Differential Revision: https://reviews.llvm.org/D22105
llvm-svn: 276102