The X86 clang/test/CodeGen/*builtins.c tests define the mm_malloc.h include
guard as a hack for avoiding its inclusion (mm_malloc.h requires a hosted
environment since it expects stdlib.h to be available - which is not the case
in these internal clang codegen tests).
This patch removes this hack and instead passes -ffreestanding to clang cc1.
Differential Revision: https://reviews.llvm.org/D24825
llvm-svn: 282581
I'm about to commit a patch for:
http://reviews.llvm.org/D8567
That patch will break this one existing test case in Clang.
I'm not sure if this file is intending to create a Clang
dependency on the LLVM IR optimizer, but that's the
consequence of specifying -O3 on this test file.
My hope is to avoid buildbot rage by removing this check,
committing the LLVM patch, and then fixing this check.
I don't know how to make a simultaneous commit to Clang
and LLVM.
I will commit the correct CHECK line fix for this test
shortly.
llvm-svn: 233109
This is very much like D8088 (checked in at r231792).
Now that we've replaced the vinsertf128 intrinsics,
do the same for their extract twins.
Differential Revision: http://reviews.llvm.org/D8275
llvm-svn: 232052
We want to replace as much custom x86 shuffling via intrinsics
as possible because pushing the code down the generic shuffle
optimization path allows for better codegen and less complexity
in LLVM.
This is the sibling patch for the LLVM half of this change:
http://reviews.llvm.org/D8086
Differential Revision: http://reviews.llvm.org/D8088
llvm-svn: 231792
These intrinsics are special because they directly take a memory operand (AVX2
adds the register counterparts). Typically, other non-memop intrinsics take
registers and then it's left to isel to fold memory operands.
In order to LICM intrinsics directly reading memory, we require that no stores
are in the loop (LICM) or that the folded load accesses constant memory
(MachineLICM). When neither is the case we fail to hoist a loop-invariant
broadcast.
We can work around this limitation if we expose the load as a regular load and
then just implement the broadcast using the vector initializer syntax. This
exposes the load to LICM and other optimizations.
At the IR level this is translated into a series of insertelements. The
sequence is already recognized as a broadcast so there is no impact on the
quality of codegen.
_mm256_broadcast_pd and _mm256_broadcast_ps are not updated by this patch
because right now we lack the DAG-combiner smartness to recover the broadcast
instructions. This will be tackled in a follow-on.
There will be completing changes on the LLVM side to remove the LLVM
intrinsics and to auto-upgrade bitcode files.
Fixes <rdar://problem/16494520>
llvm-svn: 209846