Commit Graph

1181 Commits

Author SHA1 Message Date
Michael Zuckerman 25eb420233 [X86][AVX512][Clang][Intrinsics][reduce] Adding missing reduce (max|min) intrinsics to Clang .
After LGTM and Check-all 

Vector-reduction arithmetic accepts vectors as inputs and produces 
scalars as outputs.This class of vector operation forms the basis 
of many scientific computations. In vector-reduction arithmetic, 
the evaluation off is independent of the order of the input elements of V.

Reviewer: 1. craig.topper 
          2. igorb

Differential Revision: https://reviews.llvm.org/D25988

llvm-svn: 285493
2016-10-29 10:29:20 +00:00
Nemanja Ivanovic 931bc548e6 [PPC] add float and double overloads for vec_orc and vec_nand in altivec.h
This patch corresponds to review https://reviews.llvm.org/D25950.
Committing on behalf of Sean Fertile.

llvm-svn: 285439
2016-10-28 20:04:53 +00:00
Nemanja Ivanovic 4f69f924df Implement vector count leading/trailing bytes with zero lsb and vector parity
builtins - clang portion

This patch corresponds to review: https://reviews.llvm.org/D26002
Committing on behalf of Zaara Syeda.

llvm-svn: 285436
2016-10-28 19:49:03 +00:00
Michael Zuckerman edd99eb07a 1. Fixing small types issue (PD|PS) (reduce) .
2. Cosmetic changes

llvm-svn: 285405
2016-10-28 15:16:03 +00:00
Anastasia Stulova 7c30533362 [OpenCL] Diagnose variadic arguments
OpenCL disallows using variadic arguments (s6.9.e and s6.12.5 OpenCL v2.0)
apart from some exceptions:
- printf
- enqueue_kernel

This change adds error diagnostic for variadic functions but accepts printf
and any compiler internal function (which should cover __enqueue_kernel_XXX cases).

It also unifies diagnostic with block prototype and adds missing uncaught cases for blocks.

llvm-svn: 285395
2016-10-28 12:59:39 +00:00
Nemanja Ivanovic 09dd423a7d [PPC] add vector byte reverse functions to altivec.h
This patch corresponds to review https://reviews.llvm.org/D25915.
Committing on behalf of Sean Fertile.

llvm-svn: 285268
2016-10-27 06:23:57 +00:00
Justin Lebar ebeeab87a1 [CUDA] Move device placement new definitions into a wrapper header.
Previously, these were always included -- after this change, you have to
 #include <new>, which is consistent with how things ought to work.

llvm-svn: 285251
2016-10-26 22:13:26 +00:00
Justin Lebar 6f5ec7ee88 [CUDA] Switch cuda_wrappers/complex to use a proper include guard instead of #pragma once.
This is consistent with the rest of our internal headers.

llvm-svn: 285250
2016-10-26 22:13:20 +00:00
Nemanja Ivanovic 3de0a385c9 [PowerPC] Implement vector_insert_exp builtins - clang portion
This patch corresponds to review https://reviews.llvm.org/D25956.
Committing on behalf of Zaara Syeda.

llvm-svn: 285229
2016-10-26 19:27:11 +00:00
Nemanja Ivanovic 85a28dcc5d [PPC] Implement vector reverse elements builtins (vec_reve)
This patch corresponds to review https://reviews.llvm.org/D25906.
Committing on behalf of Tony Jiang.

llvm-svn: 285218
2016-10-26 18:25:45 +00:00
Craig Topper f202365910 [AVX-512] Fix the operand order for all calls to __builtin_ia32_vfmaddss3_mask.
Summary: The preserved input should be the first argument and the vector inputs should be in the same order as the intrinsics it is used to implement.

Reviewers: igorb, delena

Subscribers: cfe-commits

Differential Revision: https://reviews.llvm.org/D25902

llvm-svn: 285175
2016-10-26 05:35:38 +00:00
Yaxun Liu a49bd14843 [OpenCL] Add missing atom_xor for 64 bit to opencl-c.h
Differential Revision: https://reviews.llvm.org/D25954

llvm-svn: 285125
2016-10-25 21:37:05 +00:00
Michael Zuckerman facb37cabf [X86][AVX512][Clang][Intrinsics][reduce] Adding missing reduce (Operators: +,*,&&,||) intrinsics to Clang
Committed after LGTM and check-all

 
Vector-reduction arithmetic accepts vectors as inputs and produces scalars as outputs.
This class of vector operation forms the basis of many scientific computations.
In vector-reduction arithmetic, the evaluation off is independent of the order of the input elements of V.
 
Used bisection method. At each step, we partition the vector with previous
step in half, and the operation is performed on its two halves.
This takes log2(n) steps where n is the number of elements in the vector.
 
Reviwer: 1. igorb
         2. craig.topper    
Differential Revision: https://reviews.llvm.org/D25527

llvm-svn: 285054
2016-10-25 07:56:04 +00:00
Michael Zuckerman 33bd5b235b revert r284963
because new test file is failing in some OS.
test/CodeGen/avx512-reduceIntrin.c

llvm-svn: 284967
2016-10-24 11:30:23 +00:00
Michael Zuckerman 98cb041891 [X86][AVX512][Clang][Intrinsics][reduce] Adding missing reduce (Operators: +,*,&&,||) intrinsics to Clang
Committed after LGTM and check-all

Vector-reduction arithmetic accepts vectors as inputs and produces scalars as outputs.
This class of vector operation forms the basis of many scientific computations.
In vector-reduction arithmetic, the evaluation off is independent of the order of the input elements of V.

Used bisection method. At each step, we partition the vector with previous
step in half, and the operation is performed on its two halves.
This takes log2(n) steps where n is the number of elements in the vector.

Differential Revision: https://reviews.llvm.org/D25527

llvm-svn: 284963
2016-10-24 10:53:20 +00:00
Craig Topper eee7c0520c [AVX-512] Replace masked 128/256-bit byte, word, and dword min/max builtins with selects and the older unmasked builtins.
llvm-svn: 284954
2016-10-23 23:57:30 +00:00
Craig Topper 0c5da26572 [AVX-512] Replace 512-bit pmovzx/sx builtins with native IR.
llvm-svn: 284936
2016-10-23 07:35:47 +00:00
Craig Topper 4ef879ac2c [AVX-512] Remove masked 128/256-bit packss/packus builtins and replace with selects and the older unmasked builtins.
llvm-svn: 284935
2016-10-23 07:35:39 +00:00
Ekaterina Romanova 06477bf035 Add more doxygen comments to emmintrin.h's intrinsics.
With this patch, all intrinsics in this file (with an exception of a handful of a recently added ones) will be documented. I will send out a patch for 4 missining intrisics later.

The doxygen comments are automatically generated based on Sony's intrinsics document.

I got an OK from Eric Christopher to commit doxygen comments without prior code
review upstream. This patch was internally reviewed by Yunzhong Gao.

llvm-svn: 284934
2016-10-23 07:30:50 +00:00
Craig Topper 4d63dfc286 [AVX-512] Replace masked 128/256-bit pavg builtins and replace with select and older unmasked builtins.
llvm-svn: 284929
2016-10-22 21:24:56 +00:00
Craig Topper 622c63614d [AVX-512] Replace masked 128/256-bit saturating add/sub builtins with select and older unmasked builtins.
llvm-svn: 284928
2016-10-22 21:24:52 +00:00
Craig Topper 11dda92405 [AVX-512] Replace masked 128/256-bit vpmovzx/vpmovsx builtins with native IR.
llvm-svn: 284927
2016-10-22 21:24:48 +00:00
Craig Topper eb1c0afa90 [AVX-512] Remove masked 128/256-bit pshufb builtins. Replace with a select and the older unmaksed builtins.
llvm-svn: 284925
2016-10-22 21:24:42 +00:00
Craig Topper 78a9c40326 [AVX-512] Remove builtins for 128/256-bit pabsb/pabsw. We can use a select and the older non-masked versions instead.
llvm-svn: 284924
2016-10-22 21:24:38 +00:00
Craig Topper c2c7e42bfe [AVX-512] Add typecasts to alignr intrinsics that were modified in r284920.
llvm-svn: 284923
2016-10-22 21:24:34 +00:00
Craig Topper f6373bc6fd [AVX-512] Remove masked 128/256-bit palignr builtins. We can just use a select in the header file with the older unmasked versions instead.
llvm-svn: 284920
2016-10-22 18:32:33 +00:00
Ekaterina Romanova 493091fdef Add more doxygen comments to emmintrin.h's intrinsics.
With this patch, 75% of the intrinsics in this file will be documented now. The patches for the rest of the intrisics in this file will be send out later.

The doxygen comments are automatically generated based on Sony's intrinsics document.

I got an OK from Eric Christopher to commit doxygen comments without prior code review upstream. This patch was internally reviewed by Yunzhong Gao.

llvm-svn: 284754
2016-10-20 17:59:15 +00:00
Albert Gutowski 1deab38717 Implement __stosb intrinsic as a volatile memset
Summary: We need `__stosb` to be an intrinsic, because SecureZeroMemory function uses it without including intrin.h. Implementing it as a volatile memset is not consistent with MSDN specification, but it gives us target-independent IR while keeping the most important properties of `__stosb`.

Reviewers: rnk, hans, thakis, majnemer

Subscribers: cfe-commits

Differential Revision: https://reviews.llvm.org/D25334

llvm-svn: 284253
2016-10-14 17:33:05 +00:00
Albert Gutowski 5e08df0266 Add 64-bit MS _Interlocked functions as builtins again
Summary: Previously global 64-bit versions of _Interlocked functions broke buildbots on i386, so now I'm adding them as builtins for x86-64 and ARM only (should they be also on AArch64? I had problems with testing it for AArch64, so I left it)

Reviewers: hans, majnemer, mstorsjo, rnk

Subscribers: cfe-commits, aemerson

Differential Revision: https://reviews.llvm.org/D25576

llvm-svn: 284172
2016-10-13 22:35:07 +00:00
Albert Gutowski 397d81bb9a Implement MS _ReturnAddress and _AddressOfReturnAddress intrinsics
Reviewers: rnk, thakis, majnemer, hans

Subscribers: cfe-commits

Differential Revision: https://reviews.llvm.org/D25540

llvm-svn: 284131
2016-10-13 16:03:42 +00:00
Yunzhong Gao d9fa56a4fb [NFC] Fixing the description for _mm_store_ps and _mm_store_ps1.
It seems that the doxygen description of these two intrinsics were swapped by
mistake.

llvm-svn: 284080
2016-10-12 23:27:27 +00:00
Albert Gutowski 2a0621e58a Implement MS _BitScan intrinsics
Summary: _BitScan intrinsics (and some others, for example _Interlocked and _bittest) are supposed to work on both ARM and x86. This is an attempt to isolate them, avoiding repeating their code or writing separate function for each builtin.

Reviewers: hans, thakis, rnk, majnemer

Subscribers: RKSimon, cfe-commits, aemerson

Differential Revision: https://reviews.llvm.org/D25264

llvm-svn: 284060
2016-10-12 22:01:05 +00:00
Yunzhong Gao c37e2231ad [NFC] Trial change to remove a redundant blank line.
llvm-svn: 284033
2016-10-12 19:33:33 +00:00
Justin Lebar 49ec14692a [CUDA] Re-land support for <complex> (r283683 and r283680).
These were reverted in r283753 and r283747.

The first patch added a header to the root 'Headers' install directory,
instead of into 'Headers/cuda_wrappers'.  This was fixed in the second
patch, but by then the damage was done: The bad header stayed in the
'Headers' directory, continuing to break the build.

We reverted both patches in an attempt to fix things, but that still
didn't get rid of the header, so the Windows boostrap build remained
broken.

It's probably worth fixing up our cmake logic to remove things from the
install dirs, but in the meantime, re-land these patches, since we
believe they no longer have this bug.

llvm-svn: 283907
2016-10-11 17:36:03 +00:00
Albert Gutowski fcea61c563 Implement MS read/write barriers and __faststorefence intrinsic
Reviewers: hans, rnk, majnemer

Subscribers: cfe-commits

Differential Revision: https://reviews.llvm.org/D25442

llvm-svn: 283793
2016-10-10 19:40:51 +00:00
Albert Gutowski 7216f17653 Implement __emul, __emulu, _mul128 and _umul128 MS intrinsics
Reviewers: rnk, thakis, majnemer, hans

Subscribers: cfe-commits

Differential Revision: https://reviews.llvm.org/D25353

llvm-svn: 283785
2016-10-10 18:09:27 +00:00
Nico Weber 21b9c7a6dc Revert r283683 because r283680 got reverted.
llvm-svn: 283753
2016-10-10 14:20:35 +00:00
Nico Weber 67dd74ef89 Revert r283680.
Breaks bootstrap builds on (at least) Windows:
In file included from D:\buildslave\clang-x64-ninja-win7\llvm\lib\Support\Allocator.cpp:14:
In file included from D:\buildslave\clang-x64-ninja-win7\llvm\include\llvm/Support/Allocator.h:24:
In file included from D:\buildslave\clang-x64-ninja-win7\llvm\include\llvm/ADT/SmallVector.h:20:
In file included from D:\buildslave\clang-x64-ninja-win7\llvm\include\llvm/Support/MathExtras.h:19:
D:\buildslave\clang-x64-ninja-win7\stage1.install\bin\..\lib\clang\4.0.0\include\algorithm(63,8) :
    error: unknown type name '__device__'
    inline __device__ const __T &

llvm-svn: 283747
2016-10-10 14:10:00 +00:00
Justin Lebar 3b593f56fc [CUDA] Don't install cuda_wrappers/{algorithm,complex} into the main include dir.
This is obviously wrong -- if we do this, then all compiles will pick up
these wrappers, which is not what we want.

llvm-svn: 283683
2016-10-09 00:27:39 +00:00
Justin Lebar d3c5d2a4de [CUDA] Support <complex> and std::min/max on the device.
Summary:
We do this by wrapping <complex> and <algorithm>.

Tests are in the test-suite.

Reviewers: tra

Subscribers: jhen, beanz, cfe-commits, mgorny

Differential Revision: https://reviews.llvm.org/D24979

llvm-svn: 283680
2016-10-08 22:16:12 +00:00
Justin Lebar 2dfbe9a3b4 [CUDA] Rename cuda_builtin_vars.h to __clang_cuda_builtin_vars.h.
Summary: This matches the idiom we use for our other CUDA wrapper headers.

Reviewers: tra

Subscribers: beanz, mgorny, cfe-commits

Differential Revision: https://reviews.llvm.org/D24978

llvm-svn: 283679
2016-10-08 22:16:08 +00:00
Justin Lebar e9eb792a0f [CUDA] Declare our __device__ math functions in the same inline namespace as our standard library.
Summary:
Currently we declare our inline __device__ math functions in namespace
std.  But libstdc++ and libc++ declare these functions in an inline
namespace inside namespace std.  We need to match this because, in a
later patch, we want to get e.g. <complex> to use our device overloads,
and it only will if those overloads are in the right inline namespace.

Reviewers: tra

Subscribers: cfe-commits, jhen

Differential Revision: https://reviews.llvm.org/D24977

llvm-svn: 283678
2016-10-08 22:16:03 +00:00
Michael Zuckerman 9e43ccfe68 [Clang][AVX512][BuiltIn]Adding missing intrinsics move_{sd|ss} to clang
Differential Revision: http://reviews.llvm.org/D21021

llvm-svn: 283314
2016-10-05 12:56:06 +00:00
Albert Gutowski f3a0bce155 Separate builtins for x84-64 and i386; implement __mulh and __umulh
Summary: We need x86-64-specific builtins if we want to implement some of the MS intrinsics - winnt.h contains definitions of some functions for i386, but not for x86-64 (for example _InterlockedOr64), which means that we cannot treat them as builtins for both i386 and x86-64, because then we have definitions of builtin functions in winnt.h on i386.

Reviewers: thakis, majnemer, hans, rnk

Subscribers: cfe-commits

Differential Revision: https://reviews.llvm.org/D24598

llvm-svn: 283264
2016-10-04 22:29:49 +00:00
Craig Topper c4a8228bcc [AVX-512] Use native IR for masked 512-bit add/sub/mul/div ps/pd intrinsics when rounding mode isn't used.
llvm-svn: 283073
2016-10-02 17:43:00 +00:00
Artem Belevich d4d9dc8252 [CUDA] Added support for CUDA-8
Differential Revision: https://reviews.llvm.org/D24946

llvm-svn: 282610
2016-09-28 17:47:40 +00:00
Martin Storsjo 963f75efc2 [Headers] Replace stray indentation with tabs with spaces. NFC.
This matches the rest of the surrounding file.

llvm-svn: 282569
2016-09-28 09:34:51 +00:00
Ayman Musa 17a2819b05 Update to commit r282488, fix the buildboot failure.
llvm-svn: 282492
2016-09-27 15:37:31 +00:00
Ayman Musa 2e250e8845 [avx512] Add aliases to some missing avx512 intrinsics.
Differential Revision:https: //reviews.llvm.org/D24961

llvm-svn: 282488
2016-09-27 14:06:32 +00:00
Nemanja Ivanovic 10e2b5dcaa [Power9] Builtins for ELF v.2 ABI conformance - front end portion
This patch corresponds to review:
https://reviews.llvm.org/D24397

It adds the __POWER9_VECTOR__ macro and the -mpower9-vector option along with
a number of altivec.h functions (refer to the code review for a list).

llvm-svn: 282481
2016-09-27 10:45:22 +00:00
Saleem Abdulrasool eae64f8a62 headers: add missing Windows ARM Interlocked intrinsics
On ARM, there are multiple versions of each of the intrinsics, with
acquire/relaxed/release barrier semantics.

The newly added ones are provided as inline functions here instead of builtins,
since they should only be available on certain archs (arm/aarch64).

This is necessary in order to compile C++ code for ARM in MSVC mode.

Patch by Martin Storsjö!

llvm-svn: 282447
2016-09-26 22:12:43 +00:00
Simon Dardis 3d9c763816 [mips] MSA intrinsics header file
This patch adds the msa.h header file containing the shorter names for the
MSA instrinsics, e.g. msa_sll_b for builtin_msa_sll_b.

Reviewers: vkalintiris, zoran.jovanovic

Differential Review: https://reviews.llvm.org/D24674

llvm-svn: 281975
2016-09-20 15:07:36 +00:00
Justin Lebar e3612a039f [CUDA] Make __clang_cuda_cmath.h compatible with libc++.
Summary:
We need to add a bunch more "using"s, which weren't necessary with
libstdc++.

Once this is in I can check in a test to the test-suite.

Reviewers: tra

Subscribers: cfe-commits

Differential Revision: https://reviews.llvm.org/D24588

llvm-svn: 281544
2016-09-14 21:50:14 +00:00
Albert Gutowski 727ab8a803 Add some MS aliases for existing intrinsics
Reviewers: thakis, compnerd, majnemer, rsmith, rnk

Subscribers: alexshap, cfe-commits

Differential Revision: https://reviews.llvm.org/D24330

llvm-svn: 281540
2016-09-14 21:19:43 +00:00
Albert Gutowski fc19fa3721 Temporary fix for MS _Interlocked intrinsics
llvm-svn: 281401
2016-09-13 21:51:37 +00:00
Albert Gutowski 9918cb6573 Reverse commit 281375 (breaks building Chromium)
llvm-svn: 281399
2016-09-13 21:24:51 +00:00
Albert Gutowski ce7a9a47b2 Add bunch of _Interlocked builtins
Reviewers: compnerd, thakis, Prazek, majnemer, rnk

Subscribers: cfe-commits

Differential Revision: https://reviews.llvm.org/D24153

llvm-svn: 281378
2016-09-13 19:43:33 +00:00
Albert Gutowski ae3fb3113f Add some MS aliases for existing intrinsics
Reviewers: thakis, compnerd, majnemer, rsmith, rnk

Subscribers: cfe-commits

Differential Revision: https://reviews.llvm.org/D24330

llvm-svn: 281375
2016-09-13 19:26:42 +00:00
Albert Gutowski b6a11acb53 Implement MS _rot intrinsics
Reviewers: thakis, Prazek, compnerd, rnk

Subscribers: majnemer, cfe-commits

Differential Revision: https://reviews.llvm.org/D24311

llvm-svn: 280997
2016-09-08 22:32:19 +00:00
Reid Kleckner 5de2bcdcf6 Add MS __nop intrinsic to intrin.h
Summary: There was no definition for __nop function - added inline
assembly.

Patch by Albert Gutowski!

Reviewers: rnk, thakis

Subscribers: cfe-commits

Differential Revision: https://reviews.llvm.org/D24286

llvm-svn: 280826
2016-09-07 16:55:12 +00:00
Craig Topper 2dfab63bb3 [AVX-512] Remove 128-bit and 256-bit masked floating point add/sub/mul/div builtins and replace with native operations.
We can't do the 512-bit ones because they take a rounding mode argument that we can't represent.

llvm-svn: 280635
2016-09-04 18:30:17 +00:00
Elad Cohen fb6358d2b5 [Modules] Add 'freestanding' to the 'requires-declaration' feature-list.
This adds support for modules that require (non-)freestanding
environment, such as the compiler builtin mm_malloc submodule.

Differential Revision: https://reviews.llvm.org/D23871

llvm-svn: 280613
2016-09-04 06:00:42 +00:00
Joerg Sonnenberger b50b2fac9f Trailing dot that shouldn't have been committed.
llvm-svn: 280609
2016-09-04 00:51:02 +00:00
Joerg Sonnenberger 82216f0faa PR 27200: Fix names of the atomic lock-free macros.
llvm-svn: 280607
2016-09-04 00:44:10 +00:00
Craig Topper f43e4a1728 [AVX-512] Remove masked integer mullo builtins and replace with native IR.
llvm-svn: 280597
2016-09-03 19:19:49 +00:00
Craig Topper 0e18976b8d [AVX-512] Remove masked integer add/sub builtins and replace with native IR.
llvm-svn: 280596
2016-09-03 18:29:35 +00:00
Craig Topper a815f488d5 [AVX-512] Implement masked floating point logical operations with native IR and remove the builtins.
llvm-svn: 280197
2016-08-31 05:38:58 +00:00
Craig Topper d0681d528d [X86] Use v2i64 vectors to implement _mm_and/andn/or/xor_pd.
These will be reused when removing some builtins from avx512vldqintrin.h and this will make the tests for that change show a better number of vector elements.

llvm-svn: 280196
2016-08-31 05:38:55 +00:00
Bruno Cardoso Lopes 6736e199c7 [Modules] Add 'gnuinlineasm' to the 'requires-declaration' feature-list.
This adds support for modules that require (no-)gnu-inline-asm
environment, such as the compiler builtin cpuid submodule.

This is the gnu-inline-asm variant of https://reviews.llvm.org/D23871

Differential Revision: https://reviews.llvm.org/D23905

rdar://problem/26931199

llvm-svn: 280159
2016-08-30 21:25:42 +00:00
Alexey Bader b5d90e57dc [OpenCL] Make is_valid_event, create_user_event overloadable.
Summary: Make is_valid_event and create_user_event overloadable like other built-ins.

Patch by Evgeniy Tyurin.

Reviewers: bader, yaxunl

Subscribers: Anastasia, cfe-commits

Differential Revision: https://reviews.llvm.org/D23914

llvm-svn: 280097
2016-08-30 14:42:54 +00:00
Asaf Badouh 356bb76809 [X86][AVX512F] minor fix of the parameter names
add "__" prefix

Bug 28842 https://llvm.org/bugs/show_bug.cgi?id=29040

Differential Revision: https://reviews.llvm.org/D23753


 

llvm-svn: 279392
2016-08-21 07:56:47 +00:00
Justin Lebar cb20a09f54 [CUDA] Improve handling of math functions.
Summary:
A bunch of related changes here to our CUDA math headers.

- The second arg to nexttoward is a double (well, technically, long
  double, but we don't have that), not a float.

- Add a forward-declare of llround(float), which is defined in the CUDA
  headers.  We need this for the same reason we need most of the other
  forward-declares: To prevent a constexpr function in our standard
  library from becoming host+device.

- Add nexttowardf implementation.

- Pull "foobarf" functions defined by the CUDA headers in the global
  namespace into namespace std.  This lets you do e.g. std::sinf.

- Add overloads for math functions accepting integer types.  This lets
  you do e.g. std::sin(0) without having an ambiguity between the
  overload that takes a float and the one that takes a double.

With these changes, we pass testcases derived from libc++ for cmath and
math.h.  We can check these testcases in to the test-suite once support
for CUDA lands there.

Reviewers: tra

Subscribers: cfe-commits

Differential Revision: https://reviews.llvm.org/D23627

llvm-svn: 279140
2016-08-18 20:43:13 +00:00
Yaxun Liu 3317446301 [OpenCL] AMDGPU: Add extensions cl_amd_media_ops and cl_amd_media_ops2
Differential Revision: https://reviews.llvm.org/D23322

llvm-svn: 278851
2016-08-16 20:49:49 +00:00
Reid Kleckner 66e7717b46 Revert "[X86] Add xgetbv/x[X86] Add xgetbv xsetbv intrinsics to non-windows platforms"
This reverts commit r278783.  It breaks usage of _xgetbv on Windows.

llvm-svn: 278814
2016-08-16 16:04:14 +00:00
Marina Yatsina 197b65f833 [X86] Add xgetbv/x[X86] Add xgetbv xsetbv intrinsics to non-windows platforms
commit on behalf of guyblank

Differential Revision: https://reviews.llvm.org/D21959

llvm-svn: 278783
2016-08-16 08:13:36 +00:00
Lama Saba 5d01f224cf [X86][AVX512] lower __mm512_andnot_ps/__mm512_andnot_pd to IR
Differential revision: https://reviews.llvm.org/D23262
 

llvm-svn: 278209
2016-08-10 10:34:45 +00:00
Justin Lebar 2ef3dabd45 [CUDA] Add __device__ overloads for placement new and delete.
Summary:
Previously these sort of worked because they didn't end up resulting in
calls at the ptx layer.  But I'm adding stricter checks that break
placement new without these changes.

Reviewers: tra

Subscribers: cfe-commits

Differential Revision: https://reviews.llvm.org/D23239

llvm-svn: 278194
2016-08-10 01:09:14 +00:00
Asaf Badouh 2f344b788c [AVX512] integer comparisions enumeration.
fix Bug 28842 https://llvm.org/bugs/show_bug.cgi?id=28842

Differential Revision: https://reviews.llvm.org/D22212

 

llvm-svn: 277955
2016-08-07 10:43:04 +00:00
Saleem Abdulrasool afdef205d8 Headers: Add ARM support to intrin.h for MSVC compatibility
This fixes compiling with headers from the Windows SDK for ARM, where the
YieldProcessor function (in winnt.h) refers to _ARM_BARRIER_ISHST.

The actual MSVC armintr.h contains a lot more definitions, but this is enough to
build code that uses the Windows SDK but doesn't use ARM intrinsics directly.

An alternative would to just keep the addition to intrin.h (to include
armintr.h), but not actually ship armintr.h, instead having clang's intrin.h
include armintr.h from MSVC's include directory. (That one works fine with
clang, at least for building code that uses the Windows SDK.)

Patch by Martin Storsjö!

llvm-svn: 277928
2016-08-06 17:58:24 +00:00
Yaxun Liu c489e39eca [OpenCL] Remove extra native_ functions from opencl-c.h
There should be no native_ builtin functions with double type arguments.

Patch by Aaron En Ye Shi.

Differential Revision : https://reviews.llvm.org/D23071

llvm-svn: 277754
2016-08-04 19:30:54 +00:00
Dimitry Andric f8099f256d Add more gcc compatibility names to clang's cpuid.h
Summary:
Some cpuid bit defines are named slightly different from how gcc's
cpuid.h calls them.

Define a few more compatibility names to appease software built for gcc:

* `bit_PCLMUL`      alias of `bit_PCLMULQDQ`
* `bit_SSE4_1`      alias of `bit_SSE41`
* `bit_SSE4_2`      alias of `bit_SSE42`
* `bit_AES`         alias of `bit_AESNI`
* `bit_CMPXCHG8B`   alias of `bit_CX8`

While here, add the misssing 29th bit, `bit_F16C` (which is how gcc
calls this bit).

Reviewers: joerg, rsmith

Subscribers: bruno, cfe-commits

Differential Revision: https://reviews.llvm.org/D22010

llvm-svn: 277307
2016-07-31 20:23:23 +00:00
Eric Christopher b638558e12 Remove unused variable.
Fixes PR28761.

llvm-svn: 277221
2016-07-29 22:11:11 +00:00
Yaxun Liu c944e65a24 [OpenCL] Added CLK_ABGR definition for get_image_channel_order return value
Added CLK_ABGR definition for get_image_channel_order return value inside opencl-c.h file.

Patch by Aaron En Ye Shi.

Differential Revision: https://reviews.llvm.org/D22767

llvm-svn: 277179
2016-07-29 17:50:10 +00:00
Craig Topper 351ed42795 [X86] Block pbroadcastq instructions on 32-bit targets instead of pbroadcastb.
Thanks to Simon Pilgrim for catching the mistake.

llvm-svn: 276564
2016-07-24 14:58:06 +00:00
Ekaterina Romanova a84c24f39c Add doxygen comments to emmintrin.h's intrinsics.
Only around 50% of the intrinsics in this file are documented now. The patches for the rest of the intrisics in this file will be send out later.

The doxygen comments are automatically generated based on Sony's intrinsics docu
ment.

I got an OK from Eric Christopher to commit doxygen comments without prior code
review upstream. This patch was internally reviewed by Paul Robinson.

llvm-svn: 276499
2016-07-22 23:49:37 +00:00
Craig Topper 45db56c375 [X86] Add missing __x86_64__ qualifiers on a bunch of intrinsics that assume 64-bit GPRs are available.
Usages of these intrinsics in a 32-bit build results in assertions in the backend.

llvm-svn: 276249
2016-07-21 07:38:39 +00:00
Simon Pilgrim e3b9ee0645 [X86][SSE] Reimplement SSE fp2si conversion intrinsics instead of using generic IR
D20859 and D20860 attempted to replace the SSE (V)CVTTPS2DQ and VCVTTPD2DQ truncating conversions with generic IR instead.

It turns out that the behaviour of these intrinsics is different enough from generic IR that this will cause problems, INF/NAN/out of range values are guaranteed to result in a 0x80000000 value - which plays havoc with constant folding which converts them to either zero or UNDEF. This is also an issue with the scalar implementations (which were already generic IR and what I was trying to match).

This patch changes both scalar and packed versions back to using x86-specific builtins.

It also deals with the other scalar conversion cases that are runtime rounding mode dependent and can have similar issues with constant folding.

Differential Revision: https://reviews.llvm.org/D22105

llvm-svn: 276102
2016-07-20 10:18:01 +00:00
Asaf Badouh a0b6f8fb56 [X86][AVX512F] minor fix of the parameter names
add "__" prefix

llvm-svn: 275384
2016-07-14 08:40:30 +00:00
Michael Zuckerman 3378653f8d [Clang][AVX512] Making cosmetic changes
llvm-svn: 275169
2016-07-12 12:42:27 +00:00
Craig Topper 4d61a3c2d8 [AVX512] Replace masked AND/OR/XOR intrinsics with native code and remove the builtins.
llvm-svn: 275049
2016-07-11 06:14:18 +00:00
Craig Topper 6e76fb61a7 [X86] Use __butilin_shufflevector for 512-bit shufps intrinsics.
llvm-svn: 275012
2016-07-10 05:57:21 +00:00
Craig Topper 95b61b0544 [X86] Use __builtin_ia32_vec_ext_v4hi and __builtin_ia32_vec_set_v4hi to implement pextrw/pinsertw MMX intrinsics instead of trying to use native IR.
Without this we end up generating code that doesn't use mmx registers and probably doesn't work well with other mmx intrinsics.

llvm-svn: 274968
2016-07-09 05:30:41 +00:00
Justin Bogner 2d5de7e568 NVPTX: Use the nvvm builtins to read SRegs rather than the legacy ptx ones
The ptx spellings were removed from LLVM in r274769.

llvm-svn: 274770
2016-07-07 16:41:08 +00:00
Justin Bogner 2f8de9fb4f NVPTX: Rename __builtin_ptx_shfl -> __nvvm_shfl
To match "NVPTX: Make the llvm.nvvm.shfl intrinsics and builtin names
consistent" in LLVM.

llvm-svn: 274663
2016-07-06 19:52:32 +00:00
Michael Zuckerman b920665493 [Clang][Feature] Adding CLFLUSHOPT feature and intrinsic to clang
Differential Revision: http://reviews.llvm.org/D21792

llvm-svn: 274559
2016-07-05 15:56:03 +00:00
Simon Pilgrim f5a8837e1b [X86][AVX512] Converted the VBROADCAST intrinsics to generic IR
llvm-svn: 274544
2016-07-05 12:59:33 +00:00
Asaf Badouh 136332888a [X86][AVX512F] add float/double abs intrinsics
add abs intrinsics that use native LLVM-IR.
change _mm512_mask[z]_and_epi{32|64} to use select intrinsic

Differential Revision: http://reviews.llvm.org/D21973

llvm-svn: 274542
2016-07-05 12:24:14 +00:00
Asaf Badouh f9cdb8de7a [AVX512] minor fix in sqrt{ss|sd} intrinsics arguments
Differential Revision: http://reviews.llvm.org/D21988

llvm-svn: 274541
2016-07-05 11:36:21 +00:00
Anastasia Stulova db7a31cce7 [OpenCL] An implementation of device side enqueue (DSE) from OpenCL v2.0 s6.13.17.
- Added new Builtins: enqueue_kernel, get_kernel_work_group_size
and get_kernel_preferred_work_group_size_multiple.

These Builtins use custom check to diagnose parameters of the passed Blocks
i. e. variable number of 'local void*' type params, and check different
overloads specified in Table 6.31 of OpenCL v2.0.

- IR is generated as an internal library call for each OpenCL Builtin,
reusing ObjC Block implementation.

Review: http://reviews.llvm.org/D20249
llvm-svn: 274540
2016-07-05 11:31:24 +00:00
Michael Zuckerman a72b49efe4 ntrinsics _mm256_permutexvar_epi64 doesn't accept three parameters as specify bellow.
I deleted the extra mask parameter.

__m256i _mm256_permutexvar_epi64 (__m256i idx, __m256i a)
#include "immintrin.h"
Instruction: vpermq
CPUID Flags: AVX512VL + AVX512F
Description
Shuffle 64-bit integers in a across lanes using the corresponding index in idx, and store the results in dst.
Operation
FOR j := 0 to 3
  i := j*64
    id := idx[i+1:i]*64
      dst[i+63:i] := a[id+63:id]
      ENDFOR
      dst[MAX:256] := 0
      dst[MAX:256] := 0
      
(From: Intel intrinsics guide)        

llvm-svn: 274539
2016-07-05 11:30:31 +00:00
Michael Zuckerman 7dac6fbdf8 [Clang][BuiltIn][AVX512] adding _mm{|256|512}_mask_cvt{s|us|}epi16_storeu_epi8 intrinsics
Differential Revision: http://reviews.llvm.org/D21729

llvm-svn: 274532
2016-07-05 08:08:01 +00:00
Craig Topper 2a383c9273 [X86] Use undefined instead of setzero in shufflevector based intrinsics when the second source is unused. Rewrite immediate extractions in shuffle intrinsics to be in ((c >> x) & y) form instead of ((c & z) >> x). This way only x varies between each use instead of having to vary x and z.
llvm-svn: 274525
2016-07-04 22:18:01 +00:00
Simon Pilgrim 427154db2a [X86][AVX512] Converted the VSHUFPD intrinsics to generic IR
llvm-svn: 274523
2016-07-04 21:30:47 +00:00
Simon Pilgrim 30db811526 [X86][AVX512] Converted the VPERMPD/VPERMQ intrinsics to generic IR
llvm-svn: 274502
2016-07-04 13:34:44 +00:00
Simon Pilgrim 17388f2569 [X86][AVX512] Converted the VPERMILPD/VPERMILPS intrinsics to generic IR
llvm-svn: 274492
2016-07-04 11:06:15 +00:00
Simon Pilgrim 275d721485 [X86][AVX512] Converted the MOVDDUP/MOVSLDUP/MOVSHDUP masked intrinsics to generic IR
llvm companion patch imminent

llvm-svn: 274442
2016-07-02 17:16:25 +00:00
Craig Topper b3a4477b13 [X86] Replace 128-bit and 256 masked vpermilps/vpermilpd builtins with native IR.
llvm-svn: 274425
2016-07-02 05:36:43 +00:00
Michael Zuckerman 3f316abdce [Clang][Intrinsics][AVX512][BuiltIn] adding intrinsics for vrangesd instruction set
Differential Revision: http://reviews.llvm.org/D21734

llvm-svn: 274218
2016-06-30 08:05:46 +00:00
Alexey Bader e5b3aebfb5 [OpenCL] Add attribute 'pure' to read_image built-in functions to enable optimizations.
Reviewers: Anastasia, yaxunl

Subscribers: pekka.jaaskelainen, pxli168, cfe-commits

Differential Revision: http://reviews.llvm.org/D21795

llvm-svn: 274122
2016-06-29 12:30:26 +00:00
David Majnemer 2916a612cd [intrin.h] Certain _Interlocked intrinsics return the old value
This fixes PR28326.

llvm-svn: 273986
2016-06-28 02:54:43 +00:00
Asaf Badouh 57819aa185 [X86] add _mm_loadu_si64
Differential Revision: http://reviews.llvm.org/D21504

llvm-svn: 273812
2016-06-26 13:51:54 +00:00
Craig Topper 50e3dfe9d0 [X86] Fix pslldq/psrldq intrinsics to not fail compilation with immediates larger than 16. This was accidentally broken in r272246.
llvm-svn: 273775
2016-06-25 07:31:14 +00:00
Craig Topper 79f53ca0b5 [AVX512] Replace masked unpack builtins with shufflevector and selects.
llvm-svn: 273533
2016-06-23 06:36:42 +00:00
Michael Zuckerman 716859aa64 [Clang][bmi][intrinsics] Adding _mm_tzcnt_64 _mm_tzcnt_32 intrinsics to clang.
Differential Revision: http://reviews.llvm.org/D21373

llvm-svn: 273401
2016-06-22 12:32:43 +00:00
Craig Topper 9ce3ddf2e6 [AVX512] Use a __v8hi vector inside of _mm_setzero_hi to match its name. Probably no real functional change.
llvm-svn: 273389
2016-06-22 06:36:23 +00:00
Craig Topper 08181f795f [AVX512] Fix _mm_setzero_di to not require avx512vl since its used by the avx512dqintrin.h. Also update the avx512dq test to not enable avx512vl feature so we can ensure correct dependencies.
llvm-svn: 273388
2016-06-22 06:36:21 +00:00
Craig Topper c89dda5938 [AVX512] Add missing typecasts to intrinsics.
llvm-svn: 273386
2016-06-22 06:36:16 +00:00
Craig Topper 879b0978f4 [AVX512] Move the 128-bit and 256-bit lzcnt intrinsics to avx512vlcdintrin.h where they belong.
llvm-svn: 273249
2016-06-21 06:53:58 +00:00
Yaxun Liu 143f083e4b [OpenCL] Include opencl-c.h by default as a clang module
Include opencl-c.h by default as a module to utilize the automatic AST caching mechanism of clang modules.

Add an option -finclude-default-header to enable default header for OpenCL, which is off by default.

Differential Revision: http://reviews.llvm.org/D20444

llvm-svn: 273191
2016-06-20 19:26:00 +00:00
Zvi Rackover 453d734201 [X86] _MM_ALIGN16 attribute support for non-windows targets
Summary:
This patch adds support for the _MM_ALIGN16 attribute on non-windows targets. This aligns Clang with ICC which supports the attribute on all targets.

Fixes PR28056

Reviewers: aaboud, echristo, cfe-commits, mkuper

Subscribers: zvi, mehdi_amini

Projects: #clang-c

Differential Revision: http://reviews.llvm.org/D21173

llvm-svn: 273095
2016-06-18 20:01:07 +00:00
Saleem Abdulrasool 5065d8cfc9 Headers: wordsmith error message
Use the marketing name for the MSVC release as pointed out by Nico Weber!

llvm-svn: 272979
2016-06-17 00:27:02 +00:00
Saleem Abdulrasool 13f3baf572 Headers: tweak for MSVC[<1800]
Earlier versions of MSVC did not include inttypes.h.  Ensure that we dont try to
include_next on those releases.

llvm-svn: 272741
2016-06-15 00:28:15 +00:00
Hans Wennborg f8b91f8336 s/Intrin.h/intrin.h/, trying to fix the build after r272701
llvm-svn: 272702
2016-06-14 20:14:24 +00:00
Nico Weber 73384a8f76 Rename Intrin.h to intrin.h, that's how all the documentation calls it.
llvm-svn: 272701
2016-06-14 19:54:40 +00:00
Michael Zuckerman c49f6ce3e1 [Clang][avx512][Intrinsics] adding prefetch gather intrinsics
Differential Revision: http://reviews.llvm.org/D21322

llvm-svn: 272667
2016-06-14 13:45:17 +00:00
Michael Zuckerman 223676d2cc [Clang][AVX512][intrinsics] Adding missing intrinsics div_pd and div_ps
Differential Revision: http://reviews.llvm.org/D20626

llvm-svn: 272658
2016-06-14 12:38:58 +00:00
David Majnemer d423574fde [immintrin] Reimplement _bit_scan_{forward,reverse}
There is no need to use a target-specific intrinsic to implement
_bit_scan_forward or _bit_scan_reverse, reimplementing them using
generic intrinsics makes it more likely that the middle end will
understand what's going on.

llvm-svn: 272564
2016-06-13 17:26:16 +00:00
Asaf Badouh 880f0c252b [X86][AVX512F] bugfix - sqrtps should get __mask16 as mask parameter
CR: Michael Zuckerman
llvm-svn: 272549
2016-06-13 15:15:57 +00:00
Simon Pilgrim beca5f295c [Clang][X86] Convert non-temporal store builtins to generic __builtin_nontemporal_store in headers
We can now use __builtin_nontemporal_store instead of target specific builtins for naturally aligned nontemporal stores which avoids the need for handling in CGBuiltin.cpp

The scalar integer nontemporal (unaligned) store builtins will have to wait as __builtin_nontemporal_store currently assumes natural alignment and doesn't accept the 'packed struct' trick that we use for normal unaligned load/stores.

The nontemporal loads require further backend support before we can safely convert them to __builtin_nontemporal_load

Differential Revision: http://reviews.llvm.org/D21272

llvm-svn: 272540
2016-06-13 09:57:52 +00:00
Craig Topper fc07498e4a [AVX512] Masked pcmpeqd, pcmpeqq, pcmpgtd, and pcmpgtq don't require avx512bw, just avx512vl.
llvm-svn: 272532
2016-06-13 04:15:11 +00:00
Craig Topper 7cc9263ec2 [AVX512] Implement masked and 512-bit pshufd intrinsics directly with __builtin_shufflevector and __builtin_ia32_select.
llvm-svn: 272467
2016-06-11 12:50:19 +00:00
Craig Topper 26d5b87316 [X86] Add explicit typecasts to some intrinsics.
llvm-svn: 272466
2016-06-11 12:50:12 +00:00
Craig Topper 68738332b8 [AVX512] Implement 512-bit and masked shufflelo and shufflehi intrinsics directly with __builtin_shufflevector and __builtin_ia32_select. Also improve the formatting of the AVX2 version.
llvm-svn: 272452
2016-06-11 03:31:13 +00:00
Craig Topper d4273a425e [AVX512] Add _mm512_bsrli_epi128 and _mm512_bslli_epi128 intrinsics.
llvm-svn: 272451
2016-06-11 03:31:07 +00:00
Ekaterina Romanova 71a68c928a Add doxygen comments to mmintrin.h's intrinsics.
The doxygen comments are automatically generated based on Sony's intrinsics docu
ment.

I got an OK from Eric Christopher to commit doxygen comments without prior code
review upstream.

llvm-svn: 272350
2016-06-10 00:10:40 +00:00
Justin Lebar 4fb5711751 [CUDA] Implement __shfl* intrinsics in clang headers.
Summary: Clang changes to make use of the LLVM intrinsics added in D21160.

Reviewers: tra

Subscribers: jholewinski, cfe-commits

Differential Revision: http://reviews.llvm.org/D21162

llvm-svn: 272299
2016-06-09 20:04:57 +00:00
Craig Topper 2769bb5753 [X86] Handle AVX2 pslldqi and psrldqi intrinsics shufflevector creation directly in the header file instead of in CGBuiltin.cpp. Simplify the sse2 equivalents as well.
llvm-svn: 272246
2016-06-09 05:15:12 +00:00
Craig Topper 3a0c7260f4 [X86] Add void to the argument list of intrinsics that don't take arguments since empty argument list mean something else in C.
llvm-svn: 272244
2016-06-09 05:14:28 +00:00
Igor Breger aadb876200 [AVX512] Emit select instruction instead of using x86 specific instrinsics.
This will allow us to remove the x86 instrinics from the backend.

Differential Revision: http://reviews.llvm.org/D21060

llvm-svn: 272141
2016-06-08 13:59:20 +00:00
Michael Zuckerman c4ae8537cf [Clang][AVX512][BUILTIN]Adding intrinsics for range_round_{sd|ss}
Differential Revision: http://reviews.llvm.org/D21002

llvm-svn: 272123
2016-06-08 08:19:27 +00:00
Ekaterina Romanova 50e94a3b34 Add doxygen comments to xmmintrin.h's intrinsics.
Only half of the intrinsics in this file is documented here. The patch for the o
ther half will be sent out later.

The doxygen comments are automatically generated based on Sony's intrinsics docu
ment.

I got an OK from Eric Christopher to commit doxygen comments without prior code
review upstream.

llvm-svn: 272121
2016-06-08 07:34:31 +00:00
Craig Topper f3efec65bb [AVX512] Reformat macro intrinsics, ensure arguments have proper typecasts, ensure result is typecasted back to the generic types.
llvm-svn: 272119
2016-06-08 06:08:07 +00:00
Craig Topper 605894985f [X86] Put parentheses around macro arguments in intrinsics.
llvm-svn: 272118
2016-06-08 06:08:04 +00:00
Michael Zuckerman 96d0399658 [clang][AVX512][Intrinsics] Adding intrinsics reduce_[round]_{ss|sd} to clang
Differential Revision: http://reviews.llvm.org/D21014

llvm-svn: 272012
2016-06-07 14:00:20 +00:00
Michael Zuckerman 1a7889f203 Fixing problem with rsqrt28_sd
maskz_rsqrt28_sd mapped to mask_rsqrt28_sd and not to the maskz.

llvm-svn: 271836
2016-06-05 15:57:49 +00:00
Michael Zuckerman 95721ac863 [Clang][AVX512]Adding set4 intrinsics
Differential Revision: http://reviews.llvm.org/D20866

llvm-svn: 271835
2016-06-05 15:43:30 +00:00
Michael Zuckerman f36f6eb036 [Clang][AVX512][Intrinsics] Adding two definitions _mm512_setzero and _mm512_setzero_epi32
Differential Revision: http://reviews.llvm.org/D20871

llvm-svn: 271832
2016-06-05 15:12:52 +00:00
Craig Topper 6a77b62640 [X86] Use unsigned types for vector arithmetic in intrinsics to avoid undefined behavior for signed integer overflow.
This is really only needed for addition, subtraction, and multiplication, but I did the bitwise ops too for overall consistency. Clang currently doesn't set NSW for signed vector operations so the undefined behavior shouldn't happen today.

llvm-svn: 271778
2016-06-04 05:43:41 +00:00
Craig Topper 406d5cdf7c [AVX512] Remove space in -1 constants. NFC
llvm-svn: 271777
2016-06-04 05:43:37 +00:00
Asaf Badouh 89f657611c [X86][AVX512] add intrinsics of Scalar FP to integer
Differential Revision: http://reviews.llvm.org/D20861

llvm-svn: 271499
2016-06-02 08:11:35 +00:00