Look for patterns of the form (store (load ...), ...) in which the two
locations are known not to partially overlap. (Identical locations are OK.)
These sequences are better implemented by MVC unless either the load or
the store could use RELATIVE LONG instructions.
The testcase showed that we weren't using LHRL and LGHRL for extload16,
only sextloadi16. The patch fixes that too.
llvm-svn: 185919
Use "STC;MVC" for memsets that are too big for two STCs or MV...Is yet
small enough for a single MVC. As with memcpy, I'm leaving longer cases
till later.
The number of tests might seem excessive, but f33 & f34 from memset-04.ll
failed the first cut because I'd not added the "?:" on the calculation
of Size1.
llvm-svn: 185918
The following transforms are valid if -C is a power of 2:
(icmp ugt (xor X, C), ~C) -> (icmp ult X, C)
(icmp ult (xor X, C), -C) -> (icmp uge X, C)
These are nice, they get rid of the xor.
llvm-svn: 185915
This adds support for the .llong PowerPC-specifc assembler directive.
In doing so, I notices that .word is currently incorrect: it is
supposed to define a 2-byte data element, not a 4-byte one.
llvm-svn: 185911
This fixes another bug found by llvm-stress!
If we happen to be doing an i64 load or store into a stack slot that has less
than a 4-byte alignment, then the frame-index elimination may need to use an
indexed load or store instruction (because the offset may not be a multiple of
4, a requirement of the STD/LD instructions). The extra register needed to hold
the offset comes from the register scavenger, and it is possible that the
scavenger will need to use an emergency spill slot. As a result, we need to
make sure that a spill slot is allocated when doing an i64 load/store into a
less-than-4-byte-aligned stack slot.
Because test cases for things like this tend to be fairly fragile, I've
concatenated a few small bugpoint-reduced test cases together to form the
regression test.
llvm-svn: 185907
The Mach-O linker has been able to support the weak-def bit on any symbol for
quite a while now. The compiler however continued to place these symbols into a
"coal" section, which required the linker to map them back to the base section
name.
Replace the sections like this:
__TEXT/__textcoal_nt instead use __TEXT/__text
__TEXT/__const_coal instead use __TEXT/__const
__DATA/__datacoal_nt instead use __DATA/__data
<rdar://problem/14265330>
llvm-svn: 185872
A setting in MCAsmInfo defines the "assembler dialect" to use. This is used
by common code to choose between alternatives in a multi-alternative GNU
inline asm statement like the following:
__asm__ ("{sfe|subfe} %0,%1,%2" : "=r" (out) : "r" (in1), "r" (in2));
The meaning of these dialects is platform specific, and GCC defines those
for PowerPC to use dialect 0 for old-style (POWER) mnemonics and 1 for
new-style (PowerPC) mnemonics, like in the example above.
To be compatible with inline asm used with GCC, LLVM ought to do the same.
Specifically, this means we should always use assembler dialect 1 since
old-style mnemonics really aren't supported on any current platform.
However, the current LLVM back-end uses:
AssemblerDialect = 1; // New-Style mnemonics.
in PPCMCAsmInfoDarwin, and
AssemblerDialect = 0; // Old-Style mnemonics.
in PPCLinuxMCAsmInfo.
The Linux setting really isn't correct, we should be using new-style
mnemonics everywhere. This is changed by this commit.
Unfortunately, the setting of this variable is overloaded in the back-end
to decide whether or not we are on a Darwin target. This is done in
PPCInstPrinter (the "SyntaxVariant" is initialized from the MCAsmInfo
AssemblerDialect setting), and also in PPCMCExpr. Setting AssemblerDialect
to 1 for both Darwin and Linux no longer allows us to make this distinction.
Instead, this patch uses the MCSubtargetInfo passed to createPPCMCInstPrinter
to distinguish Darwin targets, and ignores the SyntaxVariant parameter.
As to PPCMCExpr, this patch adds an explicit isDarwin argument that needs
to be passed in by the caller when creating a target MCExpr. (To do so
this patch implicitly also reverts commit 184441.)
llvm-svn: 185858
Another bug found by llvm-stress! This fixes hitting
llvm_unreachable("Invalid integer vector compare condition");
at the end of getVCmpInst in PPCISelDAGToDAG.
llvm-svn: 185855
This adds support for the old-style time base instructions;
while new programs are supposed to use mfspr, the mftb instructions
are still supported and in use by existing assembler files.
llvm-svn: 185829
This adds support for the basic mnemoics (with the L operand) for the
fixed-point compare instructions. These are defined as aliases for the
already existing CMPW/CMPD patterns, depending on the value of L.
This requires use of InstAlias patterns with immediate literal operands.
To make this work, we need two further changes:
- define a RegisterPrefix, because otherwise literals 0 and 1 would
be parsed as literal register names
- provide a PPCAsmParser::validateTargetOperandClass routine to
recognize immediate literals (like ARM does)
llvm-svn: 185826
PPCTargetLowering::LowerFP_TO_INT() expects its source operand to be
either an f32 or f64, but this is not checked. A long double
(ppcf128) operand will normally be custom-lowered to a conversion to
f64 in this context. However, this isn't the case for an UNDEF node.
This patch recognizes a ppcf128 as a legal source operand for
FP_TO_INT only if it's an undef, in which case it creates an undef of
the target type.
At some point we might want to do a wholesale custom lowering of
ISD::UNDEF when the type is ppcf128, but it's not really clear that's
a great idea, and probably more work than it's worth for a situation
that only arises in the case of a programming error. At this point I
think simple is best.
The test case comes from PR16556, and is a crash-test only.
llvm-svn: 185821
I shaved this yak because I mistakenly thought that this was one of the
last grep tests. Turns out my search was skipping .ll files, for which
there are ~1200 more tests using grep.
llvm-svn: 185819
Back in r179493 we determined that two transforms collided with each
other. The fix back then was to reorder the transforms so that the
preferred transform would give it a try and then we would try the
secondary transform. However, it was noted that the best approach would
canonicalize one transform into the other, removing the collision and
allowing us to optimize IR given to us in that form.
llvm-svn: 185808
This fixes a bug (found by llvm-stress) in
DAGTypeLegalizer::PromoteIntRes_BUILD_VECTOR where it assumed that the result
type would always be larger than the original operands. This is not always
true, however, with boolean vectors. For example, promoting a node of type v8i1
(where the operands will be of type i32, the type to which i1 is promoted) will
yield a node with a result vector element type of i16 (and operands of type
i32). As a result, we cannot blindly assume that we can ANY_EXTEND the operands
to the result type.
llvm-svn: 185794
This fixes an oversight that Intrinsic::nearbyint was not being mapped to
ISD::FNEARBYINT (thus fixing the over-optimistic cost we were assigning to
nearbyint calls for some targets).
llvm-svn: 185783
This is a complete re-write if the bottom-up vectorization class.
Before this commit we scanned the instruction tree 3 times. First in search of merge points for the trees. Second, for estimating the cost. And finally for vectorization.
There was a lot of code duplication and adding the DCE exposed bugs. The new design is simpler and DCE was a part of the design.
In this implementation we build the tree once. After that we estimate the cost by scanning the different entries in the constructed tree (in any order). The vectorization phase also works on the built tree.
llvm-svn: 185774
For alignment purposes, the instruction array will always have an even
number of entries, with the final entry potentially unused (in which
case the array will be one longer than indicated by the count of unwind
codes field).
Reviewed by Charles Davis and Nico Rieck.
llvm-svn: 185760
data structures.
The Win64 EH data structures must be of type IMAGE_REL_AMD64_ADDR32NB
instead of IMAGE_REL_AMD64_ADDR32. This is easiely achieved by adding
the VK_COFF_IMGREL32 modifier to the symbol reference.
Change also references to start and end of the SEH range of a function
as offsets to start of the function.
Reviewed by Charles Davis and Nico Rieck.
llvm-svn: 185759
The code offset for unwind code SET_FPREG is wrong because it is set
to constant 0. The fix is to do the same as for the other unwind
codes: emit a label and later the absolute difference between the
label and the begin of the prologue.
Also enables the failing test case MC/COFF/seh.s
Reviewed by Charles Davis and Nico Rieck.
llvm-svn: 185758
ReduceLoadWidth unconditionally drops extensions from loads. Limit it to the
case when all of the bits the extension would otherwise produce are dropped by
the shrink. It would be possible to shrink the load in more cases by merging
the extensions, but this isn't trivial and a very rare case. I left a TODO for
that case.
Fixes PR16551.
llvm-svn: 185755
This prevents the emission of DAG-generated vreg definitions after a
tail call be dropping them entirely (on the grounds that nothing could
use them anyway, and they interfere with O0 CodeGen).
llvm-svn: 185754