This implements the FastLowerCall hook, which is based on the DoSelectCall
function. The implementation is very similar, but the target-independent call
lowering part has been factored out.
This should also enable patchpoint intrinsic lowering for FastISel on X86.
Related to <rdar://problem/17427052>.
llvm-svn: 213049
Revert "[FastISel][X86] Implement the FastLowerIntrinsicCall hook."
Revert "[FastISel][X86] Implement the FastLowerCall hook."
This reverts commit r213035, r213036, and r213037 to make the
buildbots happy again.
llvm-svn: 213048
The constant pool entry code for WinCOFF assumed that vector constants
would be formed using ConstantDataVector, it did not expect to see a
ConstantVector. Furthermore, it did not expect undef as one of the
elements of the vector.
ConstantVectors should be handled like ConstantDataVectors, treat Undef
as zero.
llvm-svn: 213038
This implements the FastLowerCall hook, which is based on the DoSelectCall
function. The implementation is very similar, but the target-independent call
lowering part has been factored out.
This should also enable patchpoint intrinsic lowering for FastISel on X86.
Related to <rdar://problem/17427052>.
llvm-svn: 213035
No functional change.
The offsets for the other bitfields are specified symbolically. I need to
increase the size for one of the earlier fields which is easier after this
cleanup.
Why these bits are relative to VEXShift is a bit strange but that is for
another cleanup.
I made sure that the values for the enums are unchanged after this change.
llvm-svn: 213011
COFF lacks a feature that other object file formats support: mergeable
sections.
To work around this, MSVC sticks constant pool entries in special COMDAT
sections so that each constant is in it's own section. This permits
unused constants to be dropped and it also allows duplicate constants in
different translation units to get merged together.
This fixes PR20262.
Differential Revision: http://reviews.llvm.org/D4482
llvm-svn: 213006
We would emit a libcall for a 64-bit atomic on x86 after SVN r212119. This was
due to the misuse of hasCmpxchg16 to indicate if cmpxchg8b was supported on a
32-bit target. They were added at different times and would result in the
border condition being mishandled.
This fixes the border case to emit the cmpxchg8b instruction for 64-bit atomic
operations on x86 at the cost of restoring a long-standing bug in the codegen.
We emit a cmpxchg8b on all x86 targets even where the CPU does not support this
instruction (pre-Pentium CPUs). Although this bug should be fixed, this was
present prior to SVN r212119 and this change, so this is not really introducing
a regression.
llvm-svn: 212956
We construct a temporary "atomicrmw xchg" instruction when lowering atomic
stores for widths that aren't supported natively. This isn't on the top-level
worklist though, so it won't be removed automatically and we have to do it
ourselves once that itself has been lowered.
Thanks Saleem for pointing this out!
llvm-svn: 212948
Rename member variables and functions for the MCStreamer for DWARF-like
unwinding management. Rename the Windows ones as well and make the naming and
handling similar across the two. No functional change intended.
llvm-svn: 212912
No functional change. As I was trying to understand this function, I found
that variables were reused with confusing names and the broadcast case was a
bit too implicit. Hopefully, this is an improvement.
llvm-svn: 212795
It was computing the VL/n case as:
MemObjSize = VectorByteSize / ElemByteSize / Divider * ElemByteSize
ElemByteSize not only falls out but VectorByteSize/Divider now actually
matches the definition of VL/n.
Also some formatting fixes.
llvm-svn: 212794
This is minimal change for backend required to have "hello world" compiled and working on x32 target (x86_64-linux-gnux32). More patches for x32 will follow.
Differential Revision: http://reviews.llvm.org/D4181
llvm-svn: 212716
shuffle lowering: match shuffle patterns equivalent to an unpcklwd or
unpckhwd instruction.
This allows us to use generic lowering code for v8i16 shuffles and match
the unpack pattern late.
llvm-svn: 212705
combine into half-shuffles through unpack instructions that expand the
half to a whole vector without messing with the dword lanes.
This fixes some redundant instructions in splat-like lowerings for
v16i8, which are now getting to be *really* nice.
llvm-svn: 212695
that splat i8s into i16s.
Previously, we would try much too hard to arrange a sequence of i8s in
one half of the input such that we could unpack them into i16s and
shuffle those into place. This isn't always going to be a cheaper i8
shuffle than our other strategies. The case where it is always going to
be cheaper is when we can arrange all the necessary inputs into one half
using just i16 shuffles. It happens that viewing the problem this way
also makes it much easier to produce an efficient set of shuffles to
move the inputs into one half and then unpack them.
With this, our splat code gets one step closer to being not terrible
with the new experimental lowering strategy. It also exposes two
combines missing which I will add next.
llvm-svn: 212692
shuffles specifically for cases where a small subset of the elements in
the input vector are actually used.
This is specifically targetted at improving the shuffles generated for
trunc operations, but also helps out splat-like operations.
There is still some really low-hanging fruit here that I want to address
but this is a huge step in the right direction.
llvm-svn: 212680
don't need to set it manually.
This is based on feedback from Tom who pointed out that if every target
needs to handle this we need to reach out to those maintainers. In fact,
it doesn't make sense to duplicate everything when anything other than
expand seems unlikely at this stage.
llvm-svn: 212661
This lets us experiment with 512-bit vectorization without passing
force-vector-width manually.
The code generated for a simple integer memset loop is properly vectorized.
Disassembly is still broken for it though :(.
llvm-svn: 212634
Turns out my trick of using the same masks for SSE4.1 and AVX2 didn't work out
as we have to blend two vectors. While there remove unecessary cross-lane moves
from the shuffles so the backend can lower it to palignr instead of vperm.
Fixes PR20118, a miscompilation of vector sdiv by constant on AVX2.
llvm-svn: 212611
vector types to be legal and a ZERO_EXTEND node is encountered.
When we use widening to legalize vector types, extend nodes are a real
challenge. Either the input or output is likely to be legal, but in many
cases not both. As a consequence, we don't really have any way to
represent this situation and the prior code in the widening legalization
framework would just scalarize the extend operation completely.
This patch introduces a new DAG node to represent doing a zero extend of
a vector "in register". The core of the idea is to allow legal but
different vector types in the input and output. The output vector must
have fewer lanes but wider elements. The operation is defined to zero
extend the low elements of the input to the size of the output elements,
and drop all of the high elements which don't have a corresponding lane
in the output vector.
It also includes generic expansion of this node in terms of blending
a zero vector into the high elements of the vector and bitcasting
across. This in turn yields extremely nice code for x86 SSE2 when we use
the new widening legalization logic in conjunction with the new shuffle
lowering logic.
There is still more to do here. We need to support sign extension, any
extension, and potentially int-to-float conversions. My current plan is
to continue using similar synthetic nodes to model each of these
transitions with generic lowering code for each one.
However, with this patch LLVM already reaches performance parity with
GCC for the core C loops of the x264 code (assuming you disable the
hand-written assembly versions) when compiling for SSE2 and SSE3
architectures and enabling the new widening and lowering logic for
vectors.
Differential Revision: http://reviews.llvm.org/D4405
llvm-svn: 212610
has settled without incident, removing the x86-specific and overly
strict 'isVectorSplat' routine in favor of generic and more powerful
splat detection.
The primary motivation and result of this is that the x86 backend can
now see through splats which contain undef elements. This is essential
if we are using a widening form of legalization and I've updated a test
case to also run in that mode as before this change the generated code
for the test case was completely scalarized.
This version of the patch much more carefully handles the undef lanes.
- We aren't overly conservative about them in the shift lowering
(where we will never use the splat itself).
- One place where the splat would have been re-used by the existing code
now explicitly constructs a new constant splat that will be safe.
- The broadcast lowering is much more reasonable with undefs by doing
a correct check of whether the splat is the only user of a loaded
value, checking that the splat actually crosses multiple lanes before
using a broadcast, and handling broadcasts of non-constant splats.
As a consequence of the last bullet, the weird usage of vpshufd instead
of vbroadcast is gone, and we actually can lower an AVX splat with
vbroadcastss where before we emitted a really strange pattern of
a vector load and a manual splat across the vector.
llvm-svn: 212602
aggressively from the x86 shuffle lowering to the generic SDAG vector
shuffle formation code.
This code already tried to fold away shuffles of splats! It just had
lots of bugs and couldn't handle the case my new x86 shuffle lowering
needed.
First, it failed to correctly compute whether N2 was undef because it
pre-computed this, then did transformations which could *make* N2 undef,
then failed to ever re-consider the precomputed state.
Second, it didn't look through bitcasts at all, even in the safe cases
where they are just element-type bitcasts with no change to the number
of elements.
Third, it didn't handle all-zero bit casts nicely the way my code in the
x86 side of things did, which is essential to getting good zext-shuffle
lowerings.
But all of these are generic. I just ported the code down to this layer
and fixed the surrounding bugs. Tests exercising this in the x86 backend
still pass and some silly code in widen_cast-6.ll gets better. I updated
that test to be a bit more precise but it's still pretty unclear what
the value of the test is in this day and age.
llvm-svn: 212517
As destination k0 is allowed but not as predicate/writemask.
I also modified the test to allow checking of error messages by the assembler.
I applied a similar approach to the test ret.s in the same directory.
llvm-svn: 212504
When combining a sequence of two PSHUFD dag nodes into a single PSHUFD,
make sure that we assign the correct type to the resulting PSHUFD.
X86ISD::PSHUFD dag nodes can be either MVT::v4i32 or MVT::v4f32.
Before this change, an assertion failure was triggered in method
'DAGCombinerInfo::CombineTo' when trying to combine the shuffles from the test
below into a single PSHUFD.
define <4 x float> @test1(<4 x float> %V) {
%1 = shufflevector <4 x float> %V, <4 x float> undef, <4 x i32> <i32 3, i32 0, i32 2, i32 1>
%2 = shufflevector <4 x float> %1, <4 x float> undef, <4 x i32> <i32 3, i32 0, i32 2, i32 1>
ret <4 x float> %2
}
llvm-svn: 212498
Add custom lowering code for signed multiply instruction selection, because the
default FastISel instruction selection for ISD::MUL will use unsigned multiply
for the i8 type and signed multiply for all other types. This would set the
incorrect flags for the overflow check.
This fixes <rdar://problem/17549300>
llvm-svn: 212493
lanes in vector splats.
The core problem here is that undef lanes can't *unilaterally* be
considered to contribute to splats. Their handling needs to be more
cautious. There is also a reported failure of the nightly testers
(thanks Tobias!) that may well stem from the same core issue. I'm going
to fix this theoretical issue, factor the APIs a bit better, and then
verify that I don't see anything bad with Tobias's reduction from the
test suite before recommitting.
Original commit message for r212324:
[x86] Generalize BuildVectorSDNode::getConstantSplatValue to work for
any constant, constant FP, or undef splat and to tolerate any undef
lanes in a splat, then replace all uses of isSplatVector in X86's
lowering with it.
This fixes issues where undef lanes in an otherwise splat vector would
prevent the splat logic from firing. It is a touch more awkward to use
this interface, but it is much more accurate. Suggestions for better
interface structuring welcome.
With this fix, the code generated with the widening legalization
strategy for widen_cast-4.ll is *dramatically* improved as the special
lowering strategies for a v16i8 SRA kick in even though the high lanes
are undef.
We also get a slightly different choice for broadcasting an aligned
memory location, and use vpshufd instead of vbroadcastss. This looks
like a minor win for pipelining and domain crossing, but a minor loss
for the number of micro-ops. I suspect its a wash, but folks can
easily tweak the lowering if they want.
llvm-svn: 212475
essentially a DAG combine that never gets a chance to run.
We might typically expect DAG combining to remove shuffles-of-splats and
other similar patterns, but we don't get a chance to run the DAG
combiner when we recursively form sub-shuffles during the lowering of
a shuffle. So instead hand-roll a really important combine directly into
the lowering code to detect shuffles-of-splats, especially shuffles of
an all-zero splat which needn't even have the same element width, etc.
This lets the new vector shuffle lowering handle shuffles which
implement things like zero-extension really nicely. This will become
even more important when I wire the legalization of zero-extension to
vector shuffles with the new widening legalization strategy.
llvm-svn: 212444
We've been performing the wrong operation on ARM for "atomicrmw nand" for
years, since "a NAND b" is "~(a & b)" rather than ARM's very tempting "a & ~b".
This bled over into the generic expansion pass.
So I assume no-one has ever actually tried to do an atomic nand in the real
world. Oh well.
llvm-svn: 212443
any constant, constant FP, or undef splat and to tolerate any undef
lanes in a splat, then replace all uses of isSplatVector in X86's
lowering with it.
This fixes issues where undef lanes in an otherwise splat vector would
prevent the splat logic from firing. It is a touch more awkward to use
this interface, but it is much more accurate. Suggestions for better
interface structuring welcome.
With this fix, the code generated with the widening legalization
strategy for widen_cast-4.ll is *dramatically* improved as the special
lowering strategies for a v16i8 SRA kick in even though the high lanes
are undef.
We also get a slightly different choice for broadcasting an aligned
memory location, and use vpshufd instead of vbroadcastss. This looks
like a minor win for pipelining and domain crossing, but a minor loss
for the number of micro-ops. I suspect its a wash, but folks can easily
tweak the lowering if they want.
llvm-svn: 212324
Silvermont can only decode one instruction per cycle if the instruction exceeds 8 bytes.
Also in Silvermont instructions with more than 3 prefixes will cause 3 cycle penalty.
Maximum nop length is limited to 7 bytes when used for padding on Silvermont.
For other x86 processors max nop length remains unchanged 15 bytes.
Differential Revision: http://reviews.llvm.org/D4374
llvm-svn: 212321
This patch:
1) Improves the cost model for x86 alternate shuffles (originally
added at revision 211339);
2) Teaches the Cost Model Analysis pass how to analyze alternate shuffles.
Alternate shuffles are a special kind of blend; on x86, we can often
easily lowered alternate shuffled into single blend
instruction (depending on the subtarget features).
The existing cost model didn't take into account subtarget features.
Also, it had a couple of "dead" entries for vector types that are never
legal (example: on x86 types v2i32 and v2f32 are not legal; those are
always either promoted or widened to 128-bit vector types).
The new x86 cost model takes into account what target features we have
before returning the shuffle cost (i.e. the number of instructions
after the blend is lowered/expanded).
This patch also teaches the Cost Model Analysis how to identify and analyze
alternate shuffles (i.e. 'SK_Alternate' shufflevector instructions):
- added function 'isAlternateVectorMask';
- added some logic to check if an instruction is a alternate shuffle and, in
case, call the target specific TTI to get the corresponding shuffle cost;
- added a test to verify the cost model analysis on alternate shuffles.
llvm-svn: 212296
This patch adds tablegen patterns to select F16C float-to-half-float
conversion instructions from 'f32_to_f16' and 'f16_to_f32' dag nodes.
If the target doesn't have F16C, then 'f32_to_f16' and 'f16_to_f32'
are expanded into library calls.
llvm-svn: 212293
mode.
This also runs the test in that mode which would reproduce the crash.
What I love is that *every single FIXME* in the test is addressed by
switching to widening.
llvm-svn: 212254
Finkel, Eric Christopher, and a bunch of other people I'm probably
forgetting (sorry), add an option to the x86 backend to widen vectors
during type legalization rather than promote them.
This still would promote vNi1 vectors to get the masks right, but would
widen other vectors. A lot of experiments are piling up right now
showing that widening should probably be the default legalization
strategy outside of vNi1 cases, but it is very hard to test the
rammifications of that and fix bugs in widening-based legalization
without an option that enables it. I'll be checking in tests shortly
that use this option to exercise cases where widening doesn't work well
and hopefully we'll be able to switch fully to this soon.
llvm-svn: 212249
This new multiclass, avx512_perm_table_3src derives from the current one and
provides the Pat<>. The next patch will add another Pat<> that uses the
writemask.
Note that I dropped the type annotation from the intrinsic call, i.e.: (v16f32
VR512:$src1) -> R512:$src1. I think that this should be fine (at least many
intrinsic calls don't provide them) and it greatly reduces the number of
template arguments.
llvm-svn: 212222
This includes assembler and codegen support (see the new tests in
avx512-encodings.s and avx512-shuffle.ll).
<rdar://problem/17492620>
llvm-svn: 212221
Otherwise they get freed and the implicit "isa<XYZ>" tests following
turn out badly (at least under sanitizers).
Also corrects the ordering of unordered atomic stores.
llvm-svn: 212136
The argument list vector is never used after it has been passed to the
CallLoweringInfo and moving it to the CallLoweringInfo is cleaner and
pretty much as cheap as keeping a pointer to it.
llvm-svn: 212135
On targets without cmpxchg16b or cmpxchg8b, the borderline atomic
operations were slipping through the gaps.
X86AtomicExpand.cpp was delegating to ISelLowering. Generic
ISelLowering was delegating to X86ISelLowering and X86ISelLowering was
asserting. The correct behaviour is to expand to a libcall, preferably
in generic ISelLowering.
This can be achieved by X86ISelLowering deciding it doesn't want the
faff after all.
llvm-svn: 212134
The logic for expanding atomics that aren't natively supported in
terms of cmpxchg loops is much simpler to express at the IR level. It
also allows the normal optimisations and CodeGen improvements to help
out with atomics, instead of using a limited set of possible
instructions..
rdar://problem/13496295
llvm-svn: 212119
For now I only updated the _alt variants. The main variants are used by
codegen and that will need a bit more work to trigger.
<rdar://problem/17492620>
llvm-svn: 212114
Adding a writemask variant would require a third asm string to be passed to
the template. Generate the AsmString in the template instead.
No change in X86.td.expanded.
llvm-svn: 212113
seh_stackalloc 0 is not representable in Win64 SEH info, so emitting it
is a bug.
Reviewers: rnk
Differential Revision: http://reviews.llvm.org/D4334
Patch by Vadim Chugunov!
llvm-svn: 212081
This patch adds support for a new builtin instruction called
__builtin_ia32_rdpmc.
Builtin '__builtin_ia32_rdpmc' is defined as a 'GCC builtin'; on X86, it can
be used to read performance monitoring counters. It takes as input the index
of the performance counter to read, and returns the value of the specified
performance counter as a 64-bit number.
Calls to this new builtin will map to instruction RDPMC.
The index in input to the builtin call is moved to register %ECX. The result
of the builtin call is the value of the specified performance counter (RDPMC
would return that quantity in registers RDX:RAX).
This patch:
- Adds builtin int_x86_rdpmc as a GCCBuiltin;
- Adds a new x86 DAG node called 'RDPMC_DAG';
- Teaches how to lower this new builtin;
- Adds an ISel pattern to select instruction RDPMC;
- Fixes the definition of instruction RDPMC adding %RAX and %RDX as
implicit definitions, and adding %ECX as implicit use;
- Adds a LLVM test to verify that the new builtin is correctly selected.
llvm-svn: 212049
This exception format is not specific to Windows x64. A similar approach is
taken on nearly all architectures. Generalise the name to reflect reality.
This will eventually be used for Windows on ARM data emission as well.
Switch the enum and namespace into an enum class.
llvm-svn: 212000
Rename the routines to reflect the reality that they are more related to call
frame information than to Win64 EH. Although EH is implemented in an intertwined
manner by augmenting with an exception handler and an associated parameter, the
majority of these routines emit information required to unwind the frames. This
also helps identify that these routines are generic for most windows platforms
(they apply equally to nearly all architectures except x86) although the
encoding of the information is architecture dependent.
Unwinding data is emitted via EmitWinCFI* and exception handling information via
EmitWinEH*.
llvm-svn: 211994
lowering for v16i8.
ASan and some bots caught this bug with existing test cases. Fixing it
even fixed a miscompile with one of the test cases. I'm still a bit
suspicious of this test case as I've not taken a proper amount of time
to think about it, but the fix here is strict goodness.
llvm-svn: 211976
These show up really frequently, not the least with actual splats. =] We
lowered these quite badly before. The new code path tries to widen i8
shuffles to i16 shuffles in a splat-like way. There are still some
inefficiencies in our i16 splat logic though, so we aren't really done
here.
Also, for certain patterns (bit of a gather-and-splat) we still
generate pretty silly code, and I've left a fixme for addressing it.
However, I'm not actually worried about this code pattern as much. The
old shuffle lowering generates a 29 instruction monstrosity for it that
should execute much more slowly.
llvm-svn: 211974
lowering.
For maximum irony, I had already discovered this bug, diagnosed it, and
left FIXMEs about it in the test cases. =[ I just failed to go back over
those until after i had reduced a bootstrap miscompile down to a single
TU, stared at the assembly for an hour, and figured out the bug. Again.
Oh well.
llvm-svn: 211955
a bootstrap.
I managed to mis-remember how PACKUS worked on x86, and was using undef
for the high bytes instead of zero. The fix is fairly obvious.
llvm-svn: 211922
Summary:
This allows it to fold pshufd instructions across intervening
half-shuffles and other noise. This pattern actually shows up in the
generic lowering tests, but I've also added direct tests using
intrinsics to make sure that the specific desired functionality is
working even if the lowering stuff changes in the future.
Differential Revision: http://reviews.llvm.org/D4292
llvm-svn: 211892
half-shuffles, even looking through intervening instructions in a chain.
Summary:
This doesn't happen to show up with any test cases I've found for the current
shuffle lowering, but previous attempts would benefit from this and it seems
generally useful. I've tested it directly using intrinsics, which also shows
that it will work with hand vectorized code as well.
Note that even though pshufd isn't directly used in these tests, it gets
exercised because we combine some of the half shuffles into a pshufd
first, and then merge them.
Differential Revision: http://reviews.llvm.org/D4291
llvm-svn: 211890
trivially redundant.
This fixes several cases in the new vector shuffle lowering algorithm
which would generate redundant shuffle instructions for the sake of
simplicity.
I'm also deleting a testcase which was somewhat ridiculous. It was
checking for a bug in 2007 about incorrectly transforming shuffles by
looking for the string "-86" in the output of a pretty substantial
function. This test case doesn't seem to have any value at this point.
Differential Revision: http://reviews.llvm.org/D4240
llvm-svn: 211889
x86 backend.
This sketches out a new code path for vector lowering, hidden behind an
off-by-default flag while it is under development. The fundamental idea
behind the new code path is to aggressively break down the problem space
in ways that ease selecting the odd set of instructions available on
x86, and carefully avoid scalarizing code even when forced to use older
ISAs. Notably, this starts off restricting itself to SSE2 and implements
the complete vector shuffle and blend space for 128-bit vectors in SSE2
without scalarizing. The plan is to layer on top of this ISA extensions
where we can bail out of the complex SSE2 lowering and opt for
a cheaper, specialized instruction (or set of instructions). It also
needs to be generalized to AVX and AVX512 vector widths.
Currently, this does a decent but not perfect job for SSE2. There are
some specific shortcomings that I plan to address:
- We need a peephole combine to fold together shuffles where possible.
There are cases where a previous shuffle could be modified slightly to
arrange for elements to be in the correct position and a later shuffle
eliminated. Doing this eagerly added quite a bit of complexity, and
so my plan is to combine away these redundancies afterward.
- There are a lot more clever ways to use unpck and pack that need to be
added. This is essential for real world shuffles as it turns out...
Once SSE2 is polished a bit I should be able to get interesting numbers
on performance improvements on benchmarks conducive to vectorization.
All of this will be off by default until it is functionally equivalent
of course.
Differential Revision: http://reviews.llvm.org/D4225
llvm-svn: 211888
For now I used a separate template for these sub-vector/tuple broadcasts
rather than sharing the mem variants with avx512_int_broadcast_rm.
<rdar://problem/17402869>
llvm-svn: 211828
This patch teaches the backend how to canonicalize a shuffle vectors
according to the rule:
- (shuffle (FADD A, B), (FSUB A, B), Mask) ->
(shuffle (FSUB A, -B), (FADD A, -B), Mask)
Where 'Mask' is:
<0,5,2,7> ;; for v4f32 and v4f64 shuffles.
<0,3> ;; for v2f64 shuffles.
<0,9,2,11,4,13,6,15> ;; for v8f32 shuffles.
In general, ISel only knows how to pattern-match a canonical
'fadd + fsub + blendi' dag node sequence into an ADDSUB instruction.
This new rule allows to convert a non-canonical dag sequence into a
canonical one that will be matched by a single ADDSUB at ISel stage.
The idea of converting a non-canonical ADDSUB into a canonical one by
swapping the first two operands of the shuffle, and then negating the
second operand of the FADD and FSUB, was originally proposed by Hal Finkel.
llvm-svn: 211771
The *_alt defs for vcmp are used by the InstParser (the asm string in the main
def is used by the InstPrinter) . The former was accepting vector registers
as destination rather than mask registers.
llvm-svn: 211750
string_ostream is a safe and efficient string builder that combines opaque
stack storage with a built-in ostream interface.
small_string_ostream<bytes> additionally permits an explicit stack storage size
other than the default 128 bytes to be provided. Beyond that, storage is
transferred to the heap.
This convenient class can be used in most places an
std::string+raw_string_ostream pair or SmallString<>+raw_svector_ostream pair
would previously have been used, in order to guarantee consistent access
without byte truncation.
The patch also converts much of LLVM to use the new facility. These changes
include several probable bug fixes for truncated output, a programming error
that's no longer possible with the new interface.
llvm-svn: 211749
If the cmp is in a different basic block, then it is possible that not all
operands of that compare have defined registers. This can happen when one of
the operands to the cmp is a load and the load gets folded into the cmp. In
this case FastISel will skip the load instruction and the vreg is never
defined.
llvm-svn: 211730
This patch teaches method 'LowerVECTOR_SHUFFLE' to give higher precedence to
the check for 'isBlendMask'; the idea is that, when possible, we should firstly
check if a shuffle performs a blend, and in case, try to lower it into a BLENDI
instead of selecting a SHUFP or (worse) a VPERM2X128.
In general:
- AVX VBLENDPS/D always have better latency and throughput than VPERM2F128;
- BLENDPS/D instructions tend to always have better 'reciprocal throughput'
than the equivalent SHUFPS/D;
- Both BLENDPS/D and SHUFPS/D are often decoded into the same number of
m-ops; however, a m-op obtained from a BLENDPS/D can be scheduled to more
than one execution port.
This patch:
- Moves the check for 'isBlendMask' immediately before the check for
'isSHUFPMask' within method 'LowerVECTOR_SHUFFLE';
- Updates existing tests for sse/avx shuffle/blend instructions to verify
that we select (v)blendps/d when possible (instead of (v)shufps/d or
vperm2f128).
llvm-svn: 211720
--
This patch enables LLVM to emit Win64-native unwind info rather than
DWARF CFI. It handles all corner cases (I hope), including stack
realignment.
Because the unwind info is not flexible enough to describe stack frames
with a gap of unknown size in the middle, such as the one caused by
stack realignment, I modified register spilling code to place all spills
into the fixed frame slots, so that they can be accessed relative to the
frame pointer.
Patch by Vadim Chugunov!
Reviewed By: rnk
Differential Revision: http://reviews.llvm.org/D4081
llvm-svn: 211691
This patch teaches the backend how to combine a build_vector that implements
an 'addsub' between packed float vectors into a sequence of vector add
and vector sub followed by a VSELECT.
The new VSELECT is expected to be lowered into a BLENDI.
At ISel stage, the sequence 'vector add + vector sub + BLENDI' is
pattern-matched against ISel patterns added at r211427 to select
'addsub' instructions.
Added three more ISel patterns for ADDSUB.
Added test sse3-avx-addsub-2.ll to verify that we correctly emit 'addsub'
instructions.
llvm-svn: 211679
Optimize the codegen of select and branch instructions to directly use the
EFLAGS from the {s|u}{add|sub|mul}.with.overflow intrinsics.
llvm-svn: 211645
V' bit in the P2 byte of the EVEX prefix provides the top bit of the NDD and
NDS register fields. This was simply not used in the decoder until now.
Fixes <rdar://problem/17402661>
llvm-svn: 211565
The extends the select lowering coverage by emiting pseudo cmov
instructions. These insturction will be later on lowered to control-flow to
simulate the select.
llvm-svn: 211545
This extends the select lowering to support floating-point selects. The
lowering depends on SSE instructions and that the conditon comes from a
floating-point compare. Under this conditions it is possible to emit an
optimized instruction sequence that doesn't require any branches to
simulate the select.
llvm-svn: 211544