These intrinsics are special because they directly take a memory operand (AVX2
adds the register counterparts). Typically, other non-memop intrinsics take
registers and then it's left to isel to fold memory operands.
In order to LICM intrinsics directly reading memory, we require that no stores
are in the loop (LICM) or that the folded load accesses constant memory
(MachineLICM). When neither is the case we fail to hoist a loop-invariant
broadcast.
We can work around this limitation if we expose the load as a regular load and
then just implement the broadcast using the vector initializer syntax. This
exposes the load to LICM and other optimizations.
At the IR level this is translated into a series of insertelements. The
sequence is already recognized as a broadcast so there is no impact on the
quality of codegen.
_mm256_broadcast_pd and _mm256_broadcast_ps are not updated by this patch
because right now we lack the DAG-combiner smartness to recover the broadcast
instructions. This will be tackled in a follow-on.
There will be completing changes on the LLVM side to remove the LLVM
intrinsics and to auto-upgrade bitcode files.
Fixes <rdar://problem/16494520>
llvm-svn: 209846
The last step of _mm_cvtps_pi16 should use _mm_packs_pi32, which is a function
that reads two __m64 values and packs four 32-bit values into four 16-bit
values.
<rdar://problem/16873717>
llvm-svn: 209489
Summary:
Most of the clang header patch by Simon Pilgrim @ SCEE.
Also fixed (or added) clang tests for these intrinsics.
LLVM tests to make sure we get the blend instruction out of these
shufflevectors are at http://reviews.llvm.org/D3600
Reviewers: eli.friedman, craig.topper, rafael
Subscribers: cfe-commits
Differential Revision: http://reviews.llvm.org/D3601
llvm-svn: 208664
glibc expects that stddef.h only defines a single thing if either of these
defines is set. For example, before this change, a C file containing
#include <stdlib.h>
int ptrdiff_t = 0;
would compile with gcc but not with clang. Now it compiles with clang too.
This also fixes PR12997, where older versions of the Linux headers would define
NULL incorrectly, and glibc would define __need_NULL and expect stddef.h to
redefine NULL with the correct definition.
llvm-svn: 207606
See the bug and the cfe-commits thread "[patch] Let stddef.h redefine NULL if
__need_NULL is set" for discussion.
Fixes PR12997 and is similar to the __need_wint_t bits already in this file.
llvm-svn: 207482
This patch:
1. Adds a definition for two new GCCBuiltins in BuiltinsX86.def:
__builtin_ia32_rdtsc;
__builtin_ia32_rdtscp;
2. Replaces the already existing definition of intrinsic __rdtsc in
ia32intrin.h with a simple call to the new GCC builtin __builtin_ia32_rdtsc.
3. Adds a definition for the new intrinsic __rdtscp in ia32intrin.h
llvm-svn: 207132
Don't include input and output regs in clobbers. Prefix some
identifiers with __. Add a memory constraint to __readcr3 to prevent
reordering. This constraint is heavy handed, but conservatively
correct.
Thanks to PaX Team for the suggestions.
llvm-svn: 205778
This adds Clang support for the ARM64 backend. There are definitely
still some rough edges, so please bring up any issues you see with
this patch.
As with the LLVM commit though, we think it'll be more useful for
merging with AArch64 from within the tree.
llvm-svn: 205100
They're already defined in ia32intrin.h, and this would cause including Intrin.h
in 64-bit mode to fail because of conflicting types. Update ms-intrin.cpp to
also run in 64-bit mode to catch things like this.
llvm-svn: 203714
Summary:
Our usual definition of max_align_t wouldn't match up with MSVC if it
was used in a template argument.
Reviewers: chandlerc, rsmith, rnk
Reviewed By: chandlerc
CC: cfe-commits
Differential Revision: http://llvm-reviews.chandlerc.com/D2924
llvm-svn: 202911
and one for PCLMUL support. The current immintrin.h header only includes
wmmintrin.h if AES support is enabled. It should include it if either AES or
PCLMUL is enabled (GCC's version of immintrin.h does this).
Patch by John Baldwin!
llvm-svn: 202871
Because GCC incorrectly defines _mm_prefetch to take anything that casts
to void*, people have started using that behavior. The previous patch
that made _mm_prefetch actually take a const char * broke compatibility
with existing code. This update to the patch leaves the macro that
defines _mm_prefetch with the (void*) cast when _MSC_VER is not defined.
llvm-svn: 201901
This breaks backwards compatibility with existing code. Previously, this
was defined as
#define _mm_prefetch(a, sel) (__builtin_prefetch((void *)(a), 0, (sel)))
Which basically accepts any pointer. Changing this to char* simply
breaks a lot of existing code. I have tried changing char* to
"const void*", which seems to be the right thing as per Intel
specification this should work on basically any pointer. However,
apparently this breaks windows compatibility (because of a conflicting
declaration in windows.h).
So, we probably need to #ifdef this based on whether clang is compiling
for windows. According to Chandler, this might be done by introducing an
additional symbol to a fake type in BuiltinsX86.def and then condition
the type expansion on the platform.
llvm-svn: 201775
This patch adds several built-ins that are required for ms
compatibility. _mm_prefetch must be a built-in because it takes a
compile-time constant argument and our prior approach of using a #define
to the current built-in doesn't work in the presence of re-declaration
of _mm_prefetch. The others can be obtained by including the windows
system headers. If a user includes the windows system headers but not
intrin.h they still need to work and therefore must be built-in because
we don't get a chance to implement them in intrin.h in this case.
llvm-svn: 201734
This definition is not chosen idly. There is an unfortunate reality with
max_align_t -- the specific nature of its definition leaks into the ABI
almost immediately. Because it is part of C11 and C++11 it becomes
essential for it to match with other systems on that ABI. There is an
effort to discourage any further use of this construct as a consequence
-- using max_align_t introduces an immediate ABI problem. We can never
update it to have larger alignment even as the microarchitecture changes
to necessitate higher alignment. =/
The particular definition here exactly matches the ABI of GCC's chosen
::max_align_t definition, for better or worse. This was written with the
help of Richard Smith who was decoding the exact ABI implications of the
selected definition in GCC. Notably, in-register arguments are impacted
by the particular definition chosen. =/
No one is under the illusion that this is a "good" or "useful"
definition of max_align_t, and we are working with the standards
committee to specify a more useful interface to address this need.
llvm-svn: 201729
The two identical implementations of __cpuid for X86 / X86_64 were
leftovers from my first iteration on the patch that implemented it.
llvm-svn: 200568
This failed the ms-intrin.cpp test.
This reverts commit r200237.
This also comments out the _setjmpex declaration for now so that
intrin.h will work on x64 targets.
llvm-svn: 200243
The _cpuid() implementation is the same as in lib/Headers/cpuid.h
with the parameter names adjusted to match the interface.
_xgetbv just does what the Intel manual says.
Differential Revision: http://llvm-reviews.chandlerc.com/D2564
llvm-svn: 199439
We have been seeing nasty directory layout with CMake multiconfig, such as,
bin/Release/clang.exe
lib/clang/3.x/...
lib/Release/clang/3.x/.. (duplicated)
Move the layout similar to autoconf's;
Release/bin/clang.exe
Release/lib/clang/3.x/...
Checked on Visual Studio 10. Could you guys please confirm my change on XCode(and other multiconfig builders)?
Note: Don't set variables CMAKE_*_OUTPUT_DIRECTORY any more, or a certain builder, for eaxample, msbuild.exe, would be confused.
llvm-svn: 198205
Summary:
These are deprecated in VS 2012 according to MSDN. They don't actually
compile down to any code. They prevent the compiler from reordering
memory accesses across the barrier, which is what a memory-clobbering
volatile asm does.
Reviewers: echristo
CC: cfe-commits
Differential Revision: http://llvm-reviews.chandlerc.com/D1954
llvm-svn: 192860
Fixes <rdar://problem/10679282>.
I'm not completely satisfied with this patch. Sprinkling "diagnostic ignored"
_Pragmas throughout this file is gross, but I couldn't suppress
it for the entire file.
llvm-svn: 192143
These symbols were showing up as undefined when trying to link programs on
Android. We should match libgcc's behaviour and provide inline definitions
of these on ARM.
It seems unwind.h on ARM/Darwin doesn't provide inline definitions, so we
just declare them for that platform.
llvm-svn: 191406
Intrinsics added shaintrin.h, which is included from x86intrin.h if __SHA__ is
enabled. SHA implies SSE2, which is needed for the __m128i type.
Also add the -msha/-mno-sha option.
llvm-svn: 190999
__cpuid_count() as macros to be compatible with GCC's cpuid.h. It also adds
bit_<foo> constants for the various feature bits as described in version 039
(May 2011) of Intel's SDM Volume 2 in the description of the CPUID
instruction. The list of bit_<foo> constants is a bit exhaustive (GCC
doesn't do near this many). More bits could be added from a newer version of
SDM if desired.
Patch by John Baldwin!
llvm-svn: 186696
that these headers should not be included more than once, they are in fact
included twice when building our builtins module (in order for it to generate
submodules for them), and without this, any modular build enabling AVX and
including any builtin header fails.
Testing this is tricky because including any of these headers in a modular
build is liable to fail, due to unrelated builtin headers in the same module
including headers which might not be available on the system running the tests.
Suggestion on that front are welcome (but we're getting close to being able to
run a buildbot that has modules enabled for all tests, which would nicely solve
the testing problem).
llvm-svn: 186275
These intrinsics should return the comparision result in the low bits and keep
the high bits of the first source operand.
When calling to builtin functions, the source operands are swapped and the high
bits of the second source operand are kept. To fix the issue, an extra
shufflevector is used.
rdar://14153896
llvm-svn: 184110
This adds a test to make sure we define _WCHAR_T_DEFINED and
_NATIVE_WCHAR_T_DEFINED correctly in the preprocessor, and updates
stddef.h to set it when typedeffing wchar_t.
llvm-svn: 180918
Microsoft's Source Annotation Language (SAL) defines a bunch of keywords
for annotating the inputs and outputs of functions. Empty definitions
for the keywords are provided by <stdlib.h> -> <crtdefs.h> -> <sal.h>.
This makes it basically impossible to include MSVC's stdlib.h and
Clang's *mmintrin.h headers at the same time if they have variables
named __in. As a workaround, I've renamed those variables.
This fixes the Modules/compiler_builtins.m test which was XFAILed,
presumably due to this conflict.
llvm-svn: 179860
implementation of C99's attempt to control the C++ standard. *sigh*
The C99 standard says that certain macros in <stdint.h>, such as SIZE_MAX,
should not be defined when the header is included in C++ mode, unless
__STDC_LIMIT_MACROS and __STDC_CONSTANT_MACROS are defined. The C++11 standard
says "Thanks, but no thanks" and C11 removed this rule, but various C library
implementations (such as glibc) follow C99 anyway.
g++ prior to 4.8 worked around the C99 / glibc behavior by defining
__STDC_*_MACROS in <cstdint>, which was incorrect, because <stdint.h> is
supposed to provide these macros too. g++ 4.8 works around it by defining
__STDC_*_MACROS in its builtin <stdint.h> header.
This change makes Clang act like g++ 4.8 in this regard: our <stdint.h> now
countermands any attempt by the C library to implement the undesired C99 rules,
by defining the __STDC_*_MACROS first. Unlike g++, we do this even in C++98
mode, since that was the intent of the C++ committee, matches the behavior
required in C11, and matches our built-in implementation of <stdint.h>.
llvm-svn: 179419
Per feedback by Doug, we should avoid platform-specific implementations
in lib/Headers as much as possible.
This reverts commit r178110.
llvm-svn: 178181
Module "sse" implicitly exports module "sse2".
This is bad because we also have module "sse2" export module "sse" (as intended) so we end up with a cycle
in the module import graph:
1. sse2 -> (also imports) sse
2. sse -> (also imports) sse2
To eliminate the cycle remove 2.; importing module "sse2" will also import module "sse", but just importing
module "sse" will not also import module "sse2".
rdar://13240552
llvm-svn: 178117
- Add head 'prfchwintrin.h' to define '_m_prefetchw' which is mapped to
LLVM/clang prefetch builtin
- Add option '-mprfchw' to enable PRFCHW feature and pre-define '__PRFCHW__'
macro
llvm-svn: 178041
Clang's <stddef.h> provides definitions for the C standard library
types size_t, ptrdiff_t, and wchar_t. However, the system's C standard
library headers tend to provide the same typedefs, and the two
generally avoid each other using the macros
_SIZE_T/_PTRDIFF_T/_WCHAR_T. With modules, however, we need to see
*all* of the places where these types are defined, so provide the
typedefs (ignoring the macros) when modules are enabled.
llvm-svn: 177686
being included in C++. Don't define alignof or alignas in this case. Note that
the C++11 standard is broken in various ways here (it refers to the contents
of <stdalign.h> in C99, where that header did not exist, and doesn't mention
the alignas macro at all), but we do our best to do what it intended.
llvm-svn: 175708
Several of the intrinsic headers were using plain non-reserved identifiers.
C++11 17.6.4.3.2 [global.names] p1 reservers names containing a double
begining with an underscore followed by an uppercase letter for any use.
I think I got them all, but open to being corrected. For the most part I
didn't bother updating function-like macro parameter names because I don't
believe they're subject to any such collission - though some function-like
macros already follow this convention (I didn't update them in part because
the churn was more significant as several function-like macros use the double
underscore prefixed version of the same name as a parameter in their
implementation)
llvm-svn: 172666
- New options '-mrtm'/'-mno-rtm' are added to enable/disable RTM feature
- Builtin macro '__RTM__' is defined if RTM feature is enabled
- RTM intrinsic header is added and introduces 3 new intrinsics, namely
'_xbegin', '_xend', and '_xabort'.
- 3 new builtins are added to keep compatible with gcc, namely
'__builtin_ia32_xbegin', '__builtin_ia32_xend', and '__builtin_ia32_xabort'.
- Test cases for pre-defined macro and new intrinsic codegen are added.
llvm-svn: 167665
While we're here, extend the module map to cover most of the
newly-added instrinsic headers. Only wmmintrin.h is missing, because
it needs to be split into AES/PCLMUL subheaders (as a separate commit).
llvm-svn: 167398
Corrected type for index of _mm256_mask_i32gather_pd
from 256-bit to 128-bit
Corrected types for src|dst|mask of _mm256_mask_i64gather_ps
from 256-bit to 128-bit
Support the following intrinsics:
_mm_mask_i32gather_epi64, _mm256_mask_i32gather_epi64,
_mm_mask_i64gather_epi64, _mm256_mask_i64gather_epi64,
_mm_mask_i32gather_epi32, _mm256_mask_i32gather_epi32,
_mm_mask_i64gather_epi32, _mm256_mask_i64gather_epi32
llvm-svn: 159403
Support the following intrinsics:
_mm_mask_i32gather_pd, _mm256_mask_i32gather_pd, _mm_mask_i64gather_pd
_mm256_mask_i64gather_pd, _mm_mask_i32gather_ps, _mm256_mask_i32gather_ps
_mm_mask_i64gather_ps, _mm256_mask_i64gather_ps
llvm-svn: 159222