Issue outcomes from DAGCombiner::MergeConsequtiveStores, more precisely from
mem-ops sequence sorting.
Consider, how MergeConsequtiveStores works for next example:
store i8 1, a[0]
store i8 2, a[1]
store i8 3, a[1] ; a[1] again.
return ; DAG starts here
1. Method will collect all the 3 stores.
2. It sorts them by distance from the base pointer (farthest with highest
index).
3. It takes first consecutive non-overlapping stores and (if possible) replaces
them with a single store instruction.
The point is, we can't determine here which 'store' instruction
would be the second after sorting ('store 2' or 'store 3').
It happens that 'store 3' would be the second, and 'store 2' would be the third.
So after merging we have the next result:
store i16 (1 | 3 << 8), base ; is a[0] but bit-casted to i16
store i8 2, a[1]
So actually we swapped 'store 3' and 'store 2' and got wrong contents in a[1].
Fix: In sort routine just also take into account mem-op sequence number.
llvm-svn: 200201
There are currently two issues, of which I currently know, that prevent TBAA
from being correctly usable in CodeGen:
1. Stack coloring does not update TBAA when merging allocas. This is easy
enough to fix, but is not the largest problem.
2. CGP inserts ptrtoint/inttoptr pairs when sinking address computations.
Because BasicAA does not handle inttoptr, we'll often miss basic type punning
idioms that we need to catch so we don't miscompile real-world code (like LLVM).
I don't yet have a small test case for this, but this fixes self hosting a
non-asserts build of LLVM on PPC64 when using -enable-aa-sched-mi and -misched=shuffle.
llvm-svn: 200093
This option (which is !NDEBUG only) allows restricting the use of alias
analysis in DAGCombiner to a specific function. This has proved extremely
valuable to isolating bugs related to this feature, and mirrors the
misched-only-func option provided by the new instruction scheduler.
llvm-svn: 200088
This commit caused -Woverloaded-virtual warnings. The two new
TargetTransformInfo::getIntImmCost functions were only added to the superclass,
and to the X86 subclass. The other targets were not updated, and the
warning highlighted this by pointing out that e.g. ARMTTI::getIntImmCost was
hiding the two new getIntImmCost variants.
We could pacify the warning by adding "using TargetTransformInfo::getIntImmCost"
to the various subclasses, or turning it off, but I suspect that it's wrong to
leave the functions unimplemnted in those targets. The default implementations
return TCC_Free, which I don't think is right e.g. for ARM.
llvm-svn: 200058
Retry commit r200022 with a fix for the build bot errors. Constant expressions
have (unlike instructions) module scope use lists and therefore may have users
in different functions. The fix is to simply ignore these out-of-function uses.
llvm-svn: 200034
DAGCombiner::GatherAllAliases, which is only used when AA used is enabled
during DAGCombine, had a fundamentally incorrect assumption for which this
change compensates. GatherAllAliases, which is used to find aliasing
predecessor chain nodes (so that a better chain can be selected for a load or
store to enable subsequent optimizations) assumed that walking up the chain
would always catch all possibly-aliasing loads and stores. This is not true: To
really find all aliases, we also need to search for aliases through the value
operand of a store, etc. Consider the following situation:
Token1 = ...
L1 = load Token1, %52
S1 = store Token1, L1, %51
L2 = load Token1, %52+8
S2 = store Token1, L2, %51+8
Token2 = Token(S1, S2)
L3 = load Token2, %53
S3 = store Token2, L3, %52
L4 = load Token2, %53+8
S4 = store Token2, L4, %52+8
If we search for aliases of S3 (which loads address %52), and we look only
through the chain, then we'll miss the trivial dependence on L1 (which loads
from %52). We then might change all loads and stores to use Token1 as their
chain operand, which could result in copying %53 into %52 before copying
%52 into %51 (which should happen first).
The problem is, however, that searching for such data dependencies can become
expensive, and the cost is not directly related to the chain depth. Instead,
we'll rule out such configurations by insisting that we've visited all chain
users (except for users of the original chain, which is not necessary). When
doing this, we need to look through nodes we don't care about (otherwise,
things like register copies will interfere with trivial use cases).
Unfortunately, I don't have a small test case for this problem. Creating the
underlying situation is not hard (a pair of memcpys will do it), but arranging
for the default instruction schedule to be incorrect is very fragile.
This unbreaks self hosting on PPC64 when using
-mllvm -combiner-global-alias-analysis -mllvm -combiner-alias-analysis.
llvm-svn: 200033
These transformations obviously won't work for indexed (pre/post-inc) loads and
stores. In practice, I'm not sure there is any benefit to enabling them for
indexed nodes because other transformations that these might enable likely also
won't handle indexed nodes.
I don't have an in-tree test case that hits this problem, but an upcoming bug
fix will make it much more likely.
llvm-svn: 200023
This pass identifies expensive constants to hoist and coalesces them to
better prepare it for SelectionDAG-based code generation. This works around the
limitations of the basic-block-at-a-time approach.
First it scans all instructions for integer constants and calculates its
cost. If the constant can be folded into the instruction (the cost is
TCC_Free) or the cost is just a simple operation (TCC_BASIC), then we don't
consider it expensive and leave it alone. This is the default behavior and
the default implementation of getIntImmCost will always return TCC_Free.
If the cost is more than TCC_BASIC, then the integer constant can't be folded
into the instruction and it might be beneficial to hoist the constant.
Similar constants are coalesced to reduce register pressure and
materialization code.
When a constant is hoisted, it is also hidden behind a bitcast to force it to
be live-out of the basic block. Otherwise the constant would be just
duplicated and each basic block would have its own copy in the SelectionDAG.
The SelectionDAG recognizes such constants as opaque and doesn't perform
certain transformations on them, which would create a new expensive constant.
This optimization is only applied to integer constants in instructions and
simple (this means not nested) constant cast experessions. For example:
%0 = load i64* inttoptr (i64 big_constant to i64*)
Reviewed by Eric
llvm-svn: 200022
This is a horrible bit of code. We're calling a simplification routine *in the middle* of type legalization. We tell the
simplification routine that it's running after legalization, but some of the types it will encounter will be illegal! The
fix is only to invoke the simplification if the types in question were legal, so that none of its invariants will be violated.
llvm-svn: 199847
This fixes a regression intruced by r199135.
Revision 199135 tried to simplify part of the logic in method
DAGCombiner::SimplifyVBinOp introducing calls to method BuildVectorSDNode::isConstant().
However, that revision wrongly changed the check performed by method
SimplifyVBinOp to identify dag nodes that can be folded.
Before revision 199135, that method only tried to simplify vector binary operations
if both operands were build_vector of Constant/ConstantFP/Undef only.
After revision 199135, method SimplifyVBinop tried to
simplify also vector binary operations with only one constant operand.
This fixes the problem restoring the old behavior of SimplifyVBinOp.
llvm-svn: 199328
When creating a virtual register for a def, the value type should be
used to pick the register class. If we only use the register class
constraint on the instruction, we might pick a too large register class.
Some registers can store values of different sizes. For example, the x86
xmm registers can hold f32, f64, and 128-bit vectors. The three
different value sizes are represented by register classes with identical
register sets: FR32, FR64, and VR128. These register classes have
different spill slot sizes, so it is important to use the right one.
The register class constraint on an instruction doesn't necessarily care
about the size of the value its defining. The value type determines
that.
This fixes a problem where InstrEmitter was picking 32-bit register
classes for 64-bit values on SPARC.
llvm-svn: 199187
This commit teaches DAG to reassociate vector ops, which in turn enables
constant folding of vector op chains that appear later on during custom lowering
and DAG combine.
Reviewed by Andrea Di Biagio
llvm-svn: 199135
This is a very confusing option for a feature that will go away.
-enable-misched is exposed instead to help triage issues with the new
scheduler.
llvm-svn: 199133
At the moment we expect rotates to have the form:
(or (shl X, Y), (shr X, Z))
where Y == bitsize(X) - Z or Z == bitsize(X) - Y. This form means that
the (or ...) is undefined for Y == 0 or Z == 0. This undefinedness can
be avoided by using Y == (C * bitsize(X) - Z) & (bitsize(X) - 1) or
Z == (C * bitsize(X) - Y) & (bitsize(X) - 1) for any integer C
(including 0, the most natural choice).
llvm-svn: 198861
InstCombine converts (sub 32, (add X, C)) into (sub 32-C, X),
so a rotate left of a 32-bit Y by X+C could appear as either:
(or (shl Y, (add X, C)), (shr Y, (sub 32, (add X, C))))
without InstCombine or:
(or (shl Y, (add X, C)), (shr Y, (sub 32-C, X)))
with it.
We already matched the first form. This patch handles the second too.
llvm-svn: 198860
operand into the Value interface just like the core print method is.
That gives a more conistent organization to the IR printing interfaces
-- they are all attached to the IR objects themselves. Also, update all
the users.
This removes the 'Writer.h' header which contained only a single function
declaration.
llvm-svn: 198836
are part of the core IR library in order to support dumping and other
basic functionality.
Rename the 'Assembly' include directory to 'AsmParser' to match the
library name and the only functionality left their -- printing has been
in the core IR library for quite some time.
Update all of the #includes to match.
All of this started because I wanted to have the layering in good shape
before I started adding support for printing LLVM IR using the new pass
infrastructure, and commandline support for the new pass infrastructure.
llvm-svn: 198688
There is a wrong assumption that the vector element type and the
type of each ConstantSDNode in the build_vector were the same.
However, when promoting the integer operand of a legally typed
build_vector, the operand type and the vector element type do not
need to be the same
(See method 'DAGTypeLegalizer::PromoteIntOp_BUILD_VECTOR' in
LegalizeIntegerTypes.cpp).
in AArch64 backend, the following dag sequence:
C0: i1 = Constant<0>
C1: i1 = Constant<-1>
V: v8i1 = BUILD_VECTOR C1, C1, C0, C0, C0, C0, C0, C0
is type-legalized into:
NewC0: i32 = Constant<0>
NewC1: i32 = Constant<1>
V: v8i8 = BUILD_VECTOR NewC1, NewC1, NewC0, NewC0, NewC0, NewC0, NewC0, NewC0
Forcing a getZeroExtend to VTBits to ensure that the new constant
is correctly.
llvm-svn: 198582
This moves the check up into the parent class so that all targets can use it
without having to copy (and keep in sync) the same error message.
llvm-svn: 198579
For AArch64 backend, if DAGCombiner see "sext(setcc)", it will
combine them together to a single setcc with extended value type.
Then if it see "zext(setcc)", it assumes setcc is Vxi1, and try to
create "(and (vsetcc), (1, 1, ...)". While setcc isn't Vxi1,
DAGcombiner will create wrong node and get wrong code emitted.
llvm-svn: 198190
ConstantSDNodes (or UNDEFs) into a simple BUILD_VECTOR.
For example, given the following sequence of dag nodes:
i32 C = Constant<1>
v4i32 V = BUILD_VECTOR C, C, C, C
v4i32 Result = SIGN_EXTEND_INREG V, ValueType:v4i1
The SIGN_EXTEND_INREG node can be folded into a build_vector since
the vector in input is a BUILD_VECTOR of constants.
The optimized sequence is:
i32 C = Constant<-1>
v4i32 Result = BUILD_VECTOR C, C, C, C
llvm-svn: 198084
This changes the MachineFrameInfo API to use the new SSPLayoutKind information
produced by the StackProtector pass (instead of a boolean flag) and updates a
few pass dependencies (to preserve the SSP analysis).
The stack layout follows the same approach used prior to this change - i.e.,
only LargeArray stack objects will be placed near the canary and everything
else will be laid out normally. After this change, structures containing large
arrays will also be placed near the canary - a case previously missed by the
old implementation.
Out of tree targets will need to update their usage of
MachineFrameInfo::CreateStackObject to remove the MayNeedSP argument.
The next patch will implement the rules for sspstrong and sspreq. The end goal
is to support ssp-strong stack layout rules.
WIP.
Differential Revision: http://llvm-reviews.chandlerc.com/D2158
llvm-svn: 197653
This optional register liveness analysis pass can be enabled with either
-enable-stackmap-liveness, -enable-patchpoint-liveness, or both. The pass
traverses each basic block in a machine function. For each basic block the
instructions are processed in reversed order and if a patchpoint or stackmap
instruction is encountered the current live-out register set is encoded as a
register mask and attached to the instruction.
Later on during stackmap generation the live-out register mask is processed and
also emitted as part of the stackmap.
This information is optional and intended for optimization purposes only. This
will enable a client of the stackmap to reason about the registers it can use
and which registers need to be preserved.
Reviewed by Andy
llvm-svn: 197317
This reverts commit r197254.
This was an accidental merge of Juergen's patch. It will be checked in
shortly, but wasn't meant to go in quite yet.
Conflicts:
include/llvm/CodeGen/StackMaps.h
lib/CodeGen/StackMaps.cpp
test/CodeGen/X86/stackmap-liveness.ll
llvm-svn: 197260
DAGCombiner could fold (truncate (load)) -> smaller load if the original
load was the width of the truncation result or wider. This patch extends
it to handle cases where the original load was narrower (and so the
extension type stays the same).
llvm-svn: 197030
This re-lands commit r196876, which was reverted in r196879.
The tests have been fixed to pass on platforms with a stack alignment
larger than 4.
Update to clang side tests will land shortly.
llvm-svn: 196939
One unusual feature of the z architecture is that the result of a
previous load can be reused indefinitely for subsequent loads, even if
a cache-coherent store to that location is performed by another CPU.
A special serializing instruction must be used if you want to force
a load to be reattempted.
Since volatile loads are not supposed to be omitted in this way,
we should insert a serializing instruction before each such load.
The same goes for atomic loads.
The patch implements this at the IR->DAG boundary, in a similar way
to atomic fences. It is a no-op for targets other than SystemZ.
llvm-svn: 196905
For stack frames requiring realignment, three pointers may be needed:
- ebp to address incoming arguments
- esi (could be any callee-saved register) to address locals
- esp to address outgoing arguments
We would use esi unconditionally without verifying that it did not
conflict with inline assembly.
This change doesn't do the verification, it simply emits a fatal error
on functions that use stack realignment, dynamic SP adjustments, and
inline assembly.
Because stack realignment is common on Windows, we also no longer assume
that MS inline assembly clobbers esp. Instead, we analyze the inline
instructions for implicit definitions and check if esp is there. If so,
we require the use of a base pointer and consider it in the condition
above.
Mostly fixes PR16830, but we could try harder to find a non-conflicting
base pointer.
Reviewers: sunfish
Differential Revision: http://llvm-reviews.chandlerc.com/D1317
llvm-svn: 196876
This just extends the existing hack. It should be enough to get a reproducible bootstrap
on 32 bits.
I will open a bug to track getting a real fix for this.
llvm-svn: 196462
A Direct stack map location records the address of frame index. This
address is itself the value that the runtime requested. This differs
from IndirectMemRefOp locations, which refer to a stack locations from
which the requested values must be loaded. Direct locations can
directly communicate the address if an alloca, while IndirectMemRefOp
handle register spills.
For example:
entry:
%a = alloca i64...
llvm.experimental.stackmap(i32 <ID>, i32 <shadowBytes>, i64* %a)
Since both the alloca and stackmap intrinsic are in the entry block,
and the intrinsic takes the address of the alloca, the runtime can
assume that LLVM will not substitute alloca with any intervening
value. This must be verified by the runtime by checking that the stack
map's location is a Direct location type. The runtime can then
determine the alloca's relative location on the stack immediately after
compilation, or at any time thereafter. This differs from Register and
Indirect locations, because the runtime can only read the values in
those locations when execution reaches the instruction address of the
stack map.
llvm-svn: 195712
Summary:
Moved the requirement for SelectionDAG::getConstant() to return legally
typed nodes slightly earlier. There were two optional DAGCombine passes
that were missed out and were required to produce type-legal DAGs.
Simplified a code-path in tryFoldToZero() to use SelectionDAG::getConstant().
This provides support for both promoted and expanded vector types whereas the
previous code only supported promoted vector types.
Fixes a "Type for zero vector elements is not legal" assertion detected by
an llvm-stress generated test.
Reviewers: resistor
CC: llvm-commits
Differential Revision: http://llvm-reviews.chandlerc.com/D2251
llvm-svn: 195635
Improvements over r195317:
- Set/restore EnableFastISel flag instead of just running FastISel within
SelectAllBasicBlocks; the flag is checked in various places, and
FastISel won't run properly if those places don't do the right thing.
- Test looks for normal ISel versus FastISel behavior, and not
something more subtle that doesn't work everywhere.
Based on work by Andrea Di Biagio.
llvm-svn: 195491
The legalizer can now do this type of expansion for more
type combinations without loading and storing to and
from the stack.
NOTE: This is a candidate for the 3.4 branch.
llvm-svn: 195398
This patch is a rewrite of the original patch commited in r194542. Instead of
relying on the type legalizer to do the splitting for us, we now peform the
splitting ourselves in the DAG combiner. This is necessary for the case where
the vector mask is a legal type after promotion and still wouldn't require
splitting.
Patch by: Juergen Ributzka
NOTE: This is a candidate for the 3.4 branch.
llvm-svn: 195397
Summary:
LegalizeSetCCCondCode can now legalize SETEQ and SETNE by returning the inverse
condition and requesting that the caller invert the result of the condition.
The caller of LegalizeSetCCCondCode must handle the inverted CC, and they do
so as follows:
SETCC, BR_CC:
Invert the result of the SETCC with SelectionDAG::getNOT()
SELECT_CC:
Swap the true/false operands.
This is necessary for MSA which lacks an integer SETNE instruction.
Reviewers: resistor
CC: llvm-commits
Differential Revision: http://llvm-reviews.chandlerc.com/D2229
llvm-svn: 195355
It broke, at least, i686 target. It is reproducible with "llc -mtriple=i686-unknown".
FYI, it didn't appear to add either "-O0" or "-fast-isel".
llvm-svn: 195339
Stop folding constant adds into GEP when the type size doesn't match.
Otherwise, the adds' operands are effectively being promoted, changing the
conditions of an overflow. Results are different when:
sext(a) + sext(b) != sext(a + b)
Problem originally found on x86-64, but also fixed issues with ARM and PPC,
which used similar code.
<rdar://problem/15292280>
Patch by Duncan Exon Smith!
llvm-svn: 194840
Summary:
When getConstant() is called for an expanded vector type, it is split into
multiple scalar constants which are then combined using appropriate build_vector
and bitcast operations.
In addition to the usual big/little endian differences, the case where the
element-order of the vector does not have the same endianness as the elements
themselves is also accounted for. For example, for v4i32 on big-endian MIPS,
the byte-order of the vector is <3210,7654,BA98,FEDC>. For little-endian, it is
<0123,4567,89AB,CDEF>.
Handling this case turns out to be a nop since getConstant() returns a splatted
vector (so reversing the element order doesn't change the value)
This fixes a number of cases in MIPS MSA where calling getConstant() during
operation legalization introduces illegal types (e.g. to legalize v2i64 UNDEF
into a v2i64 BUILD_VECTOR of illegal i64 zeros). It should also handle bigger
differences between illegal and legal types such as legalizing v2i64 into v8i16.
lowerMSASplatImm() in the MIPS backend no longer needs to avoid calling
getConstant() so this function has been updated in the same patch.
For the sake of transparency, the steps I've taken since the review are:
* Added 'virtual' to isVectorEltOrderLittleEndian() as requested. This revealed
that the MIPS tests were falsely passing because a polymorphic function was
not actually polymorphic in the reviewed patch.
* Fixed the tests that were now failing. This involved deleting the code to
handle the MIPS MSA element-order (which was previously doing an byte-order
swap instead of an element-order swap). This left
isVectorEltOrderLittleEndian() unused and it was deleted.
* Fixed build failures caused by rebasing beyond r194467-r194472. These build
failures involved the bset, bneg, and bclr instructions added in these commits
using lowerMSASplatImm() in a way that was no longer valid after this patch.
Some of these were fixed by calling SelectionDAG::getConstant() instead,
others were fixed by a new function getBuildVectorSplat() that provided the
removed functionality of lowerMSASplatImm() in a more sensible way.
Reviewers: bkramer
Reviewed By: bkramer
CC: llvm-commits
Differential Revision: http://llvm-reviews.chandlerc.com/D1973
llvm-svn: 194811
This is to avoid this transformation in some cases:
fold (conv (load x)) -> (load (conv*)x)
On architectures that don't natively support some vector
loads efficiently casting the load to a smaller vector of
larger types and loading is more efficient.
Patch by Micah Villmow.
llvm-svn: 194783
If a null call target is provided, don't emit a dummy call. This
allows the runtime to reserve as little nop space as it needs without
the requirement of emitting a call.
llvm-svn: 194676
This patch reapplies r193676 with an additional fix for the Hexagon backend. The
SystemZ backend has already been fixed by r194148.
The Type Legalizer recognizes that VSELECT needs to be split, because the type
is to wide for the given target. The same does not always apply to SETCC,
because less space is required to encode the result of a comparison. As a result
VSELECT is split and SETCC is unrolled into scalar comparisons.
This commit fixes the issue by checking for VSELECT-SETCC patterns in the DAG
Combiner. If a matching pattern is found, then the result mask of SETCC is
promoted to the expected vector mask type for the given target. Now the type
legalizer will split both VSELECT and SETCC.
This allows the following X86 DAG Combine code to sucessfully detect the MIN/MAX
pattern. This fixes PR16695, PR17002, and <rdar://problem/14594431>.
Reviewed by Nadav
llvm-svn: 194542
This patch moves the jump address materialization inside the noop slide. This
enables patching of the materialization itself or its complete removal. This
patch also adds the ability to define scratch registers that can be used safely
by the code called from the patchpoint intrinsic. At least one scratch register
is required, because that one is used for the materialization of the jump
address. This patch depends on D2009.
Differential Revision: http://llvm-reviews.chandlerc.com/D2074
Reviewed by Andy
llvm-svn: 194306
The idea of the AnyReg Calling Convention is to provide the call arguments in
registers, but not to force them to be placed in a paticular order into a
specified set of registers. Instead it is up tp the register allocator to assign
any register as it sees fit. The same applies to the return value (if
applicable).
Differential Revision: http://llvm-reviews.chandlerc.com/D2009
Reviewed by Andy
llvm-svn: 194293
MorphNodeTo is not safe to call during DAG building. It eagerly
deletes dependent DAG nodes which invalidates the NodeMap. We could
expose a safe interface for morphing nodes, but I don't think it's
worth it. Just create a new MachineNode and replaceAllUsesWith.
My understaning of the SD design has been that we want to support
early target opcode selection. That isn't very well supported, but
generally works. It seems reasonable to rely on this feature even if
it isn't widely used.
llvm-svn: 194102
When an extend more than doubles the size of the elements (e.g., a zext
from v16i8 to v16i32), the normal legalization method of splitting the
vectors will run into problems as by the time the destination vector is
legal, the source vector is illegal. The end result is the operation
often becoming scalarized, with the typical horrible performance. For
example, on x86_64, the simple input of:
define void @bar(<16 x i8> %a, <16 x i32>* %p) nounwind {
%tmp = zext <16 x i8> %a to <16 x i32>
store <16 x i32> %tmp, <16 x i32>*%p
ret void
}
Generates:
.section __TEXT,__text,regular,pure_instructions
.section __TEXT,__const
.align 5
LCPI0_0:
.long 255 ## 0xff
.long 255 ## 0xff
.long 255 ## 0xff
.long 255 ## 0xff
.long 255 ## 0xff
.long 255 ## 0xff
.long 255 ## 0xff
.long 255 ## 0xff
.section __TEXT,__text,regular,pure_instructions
.globl _bar
.align 4, 0x90
_bar:
vpunpckhbw %xmm0, %xmm0, %xmm1
vpunpckhwd %xmm0, %xmm1, %xmm2
vpmovzxwd %xmm1, %xmm1
vinsertf128 $1, %xmm2, %ymm1, %ymm1
vmovaps LCPI0_0(%rip), %ymm2
vandps %ymm2, %ymm1, %ymm1
vpmovzxbw %xmm0, %xmm3
vpunpckhwd %xmm0, %xmm3, %xmm3
vpmovzxbd %xmm0, %xmm0
vinsertf128 $1, %xmm3, %ymm0, %ymm0
vandps %ymm2, %ymm0, %ymm0
vmovaps %ymm0, (%rdi)
vmovaps %ymm1, 32(%rdi)
vzeroupper
ret
So instead we can check if there are legal types that enable us to split
more cleverly when the input vector is already legal such that we don't
turn it into an illegal type. If the extend is such that it's more than
doubling the size of the input we check if
- the number of vector elements is even,
- the source type is legal,
- the type of a split source is illegal,
- the type of an extended (by doubling element size) source is legal, and
- the type of that extended source when split is legal.
If the conditions are met, instead of just splitting both the
destination and the source types, we create an extend that only goes up
one "step" (doubling the element width), and the continue legalizing the
rest of the operation normally. The result is that this operates as a
new, more effecient, termination condition for the loop of "split the
operation until the destination type is legal."
With this change, the above example now compiles to:
_bar:
vpxor %xmm1, %xmm1, %xmm1
vpunpcklbw %xmm1, %xmm0, %xmm2
vpunpckhwd %xmm1, %xmm2, %xmm3
vpunpcklwd %xmm1, %xmm2, %xmm2
vinsertf128 $1, %xmm3, %ymm2, %ymm2
vpunpckhbw %xmm1, %xmm0, %xmm0
vpunpckhwd %xmm1, %xmm0, %xmm3
vpunpcklwd %xmm1, %xmm0, %xmm0
vinsertf128 $1, %xmm3, %ymm0, %ymm0
vmovaps %ymm0, 32(%rdi)
vmovaps %ymm2, (%rdi)
vzeroupper
ret
This generalizes a custom lowering that was added a while back to the
ARM backend. That lowering is no longer necessary, and is removed. The
testcases for it, however, provide excellent ARM tests for this change
and so remain.
rdar://14735100
llvm-svn: 193727
The Type Legalizer recognizes that VSELECT needs to be split, because the type
is to wide for the given target. The same does not always apply to SETCC,
because less space is required to encode the result of a comparison. As a result
VSELECT is split and SETCC is unrolled into scalar comparisons.
This commit fixes the issue by checking for VSELECT-SETCC patterns in the DAG
Combiner. If a matching pattern is found, then the result mask of SETCC is
promoted to the expected vector mask type for the given target. This mask has
usually the same size as the VSELECT return type (except for Intel KNL). Now the
type legalizer will split both VSELECT and SETCC.
This allows the following X86 DAG Combine code to sucessfully detect the MIN/MAX
pattern. This fixes PR16695, PR17002, and <rdar://problem/14594431>.
Reviewed by Nadav
llvm-svn: 193676
Making useAA() default to true for SystemZ showed that the combiner alias
analysis wasn't handling volatile accesses. This hit many of the SystemZ
tests, but I arbitrarily picked one for the purpose of this patch.
llvm-svn: 193518
Most SelectionDAG code drops the TBAA info when creating a new form of a
load and store (e.g. during legalization, or when converting a plain
load to an extending one). This patch tries to catch all cases where
the TBAA information can legitimately be carried over.
The patch adds alternative forms of getLoad() and getExtLoad() that take
a MachineMemOperand instead of individual fields. (The corresponding
getTruncStore() already exists.) The idea is to use the MachineMemOperand
forms when all fields are carried over (size, pointer info, isVolatile,
isNonTemporal, alignment and TBAA info). If some adjustment is being
made, e.g. to narrow the load, then we still pass the individual fields
but also pass the TBAA info.
llvm-svn: 193517
ARM processors without ldrex/strex need to be able to make libcalls for all
atomic operations, including the newer min/max versions.
The alternative would probably be expanding these operations in terms of
cmpxchg (as x86 does always), but in the configurations where this matters
code-size tends to be paramount so the libcall is more desirable.
llvm-svn: 193398
This optimization is not SSE specific so I am moving it to DAGco.
The new scalar_to_vector dag node exposed a missing pattern in the AArch64 target that I needed to add.
llvm-svn: 193393
For some targets, it is useful to be able to look at the original
type of an argument without having to dig through the original IR.
This also fixes a bug in SelectionDAGBuilder where InputArg.PartOffset
was not taking into account the offset of structure elements.
Patch by: Justin Holewinski
Tom Stellard:
- Changed the type of ArgVT to EVT, so it can store non-simple types
like v3i32.
llvm-svn: 193214
VTList has a long life cycle through the module and getVTList is frequently called. In current getVTList, sequential search over a std::vector is used, this is inefficient in big module.
This patch use FoldingSet to implement hashing mechanism when searching.
Reviewer: Nadav Rotem
Test : Pass unit tests & LNT test suite
llvm-svn: 193150
PR17168 describes a test case that fails when compiling for debug with
fast-isel. Investigation showed that the test was failing because a DBG_VALUE
machine instruction was placed prior to a PHI.
For this problem to occur requires the following:
* Compile for debug
* Compile with fast-isel
* In a block B, fast-isel must partially succeed before punting to DAG-isel
* B must start with a PHI
* The first unhandled node in the DAG must not generate a machine instruction
* A debug value with an order less than that of that first node exists
When all of these circumstances apply, the existing test that an instruction
was not inserted won't fire. Currently it tests whether the block is empty,
or whether the last instruction generated is a phi. When fast-isel has
partially succeeded, the last instruction generated will not be a phi.
Instead, we need to check whether the current insert position is immediately
following a phi. This patch adds that check, and adds the test case from the
PR as a regression test.
llvm-svn: 192976
There are targets that support i128 sized scalars but cannot emit
instructions that modify them directly. The proper thing to do is to
emit a libcall.
This fixes PR17481.
llvm-svn: 192957
When canonicalizing dags according to the rule
(shl (zext (shr X, c1) ), c1) ==> (zext (shl (shr X, c1), c1))
remember to add the new shl dag to the DAGCombiner worklist of nodes.
If we don't explicitly add it to the worklist of nodes to visit, we
may not trigger later on the rule that folds the shift left + logical
shift right into a AND instruction with bitmask.
llvm-svn: 192883
This happens e.g. with <2 x i64> -1 on x86_32. It cannot be generated directly
because i64 is illegal. It would be nice if getNOT would handle this
transparently, but I don't see a way to generate a legal constant there right
now. Fixes PR17487.
llvm-svn: 192795
This is really an extension of the current (shl (shr ...)) -> shl optimization.
The main difference is that certain upper bits must also not be demanded.
The motivating examples are the first two in the testcase, which occur
in llvmpipe output.
llvm-svn: 192783
This should fix the buildbots.
Original commit message:
[DAGCombiner] Slice a big load in two loads when the element are next to each
other in memory and the target has paired load and performs post-isel loads
combining.
E.g., this optimization will transform something like this:
a = load i64* addr
b = trunc i64 a to i32
c = lshr i64 a, 32
d = trunc i64 c to i32
into:
b = load i32* addr1
d = load i32* addr2
Where addr1 = addr2 +/- sizeof(i32), if the target supports paired load and
performs post-isel loads combining.
One should overload TargetLowering::hasPairedLoad to provide this information.
The default is false.
<rdar://problem/14477220>
llvm-svn: 192476
other in memory and the target has paired load and performs post-isel loads
combining.
E.g., this optimization will transform something like this:
a = load i64* addr
b = trunc i64 a to i32
c = lshr i64 a, 32
d = trunc i64 c to i32
into:
b = load i32* addr1
d = load i32* addr2
Where addr1 = addr2 +/- sizeof(i32), if the target supports paired load and
performs post-isel loads combining.
One should overload TargetLowering::hasPairedLoad to provide this information.
The default is false.
<rdar://problem/14477220>
llvm-svn: 192471
DAGCombiner::visitFP_EXTEND will apply the following transformation:
fold (fpext (load x)) -> (fpext (fptrunc (extload x)))
but the implementation does not handle indexed loads (pre/post inc.), but did
not specifically ignore them either (unlike for extending loads, which it
already ignored), causing an assert when the transformation was applied to an
indexed load. This is the minimal fix for correctness (causing the
transformation to be skipped for indexed loads).
Unfortunately, I don't have an in-tree test case.
llvm-svn: 191989
SelectionDAG will now attempt to inverse an illegal conditon in order to
find a legal one and if that doesn't work, it will attempt to swap the
operands using the inverted condition.
There are no new test cases for this, but a nubmer of the existing R600
tests hit this path.
llvm-svn: 191602
This is useful for targets like R600, which only support GT, GE, NE, and EQ
condition codes as it removes the need to handle unsupported condition
codes in target specific code.
There are no tests with this commit, but R600 has been updated to take
advantage of this new feature, so its existing selectcc tests are now
testing the swapped operands path.
llvm-svn: 191601
Interpreting the results of this function is not very intuitive, so I
cleaned it up to make it more clear whether or not a SETCC op was
legalized and how it was legalized (either by swapping LHS and RHS or
replacing with AND/OR).
This patch does change functionality in the LHS and RHS swapping case,
but unfortunately there are no in-tree tests for this. However, this
patch is a prerequisite for R600 to take advantage of the LHS and RHS
swapping, so tests will be added in subsequent commits.
llvm-svn: 191600
This change fixes the problem reported in pr17380 and re-add the dagcombine
transformation ensuring that the value types are always legal if the
transformation is triggered after Legalization took place.
Added the test case from pr17380.
llvm-svn: 191509
(shl (zext (shr A, X)), X) => (zext (shl (shr A, X), X)).
The rule only triggers when there are no other uses of the
zext to avoid materializing more instructions.
This helps the DAGCombiner understand that the shl/shr
sequence can then be converted into an and instruction.
llvm-svn: 191393
Patch by Ana Pazos.
1.Added support for v1ix and v1fx types.
2.Added Scalar Pairwise Reduce instructions.
3.Added initial implementation of Scalar Arithmetic instructions.
llvm-svn: 191263
Sometimes a copy from a vreg -> vreg sneaks into the middle of a terminator
sequence. It is safe to slice this into the stack protector success bb.
This fixes PR16979.
llvm-svn: 191260
Previously, the DAGISel function WalkChainUsers was spotting that it
had entered already-selected territory by whether a node was a
MachineNode (amongst other things). Since it's fairly common practice
to insert MachineNodes during ISelLowering, this was not the correct
check.
Looking around, it seems that other nodes get their NodeId set to -1
upon selection, so this makes sure the same thing happens to all
MachineNodes and uses that characteristic to determine whether we
should stop looking for a loop during selection.
This should fix PR15840.
llvm-svn: 191165
The Type Legalizer recognizes that VSELECT needs to be split, because the type
is to wide for the given target. The same does not always apply to SETCC,
because less space is required to encode the result of a comparison. As a result
VSELECT is split and SETCC is unrolled into scalar comparisons.
This commit fixes the issue by checking for VSELECT-SETCC patterns in the DAG
Combiner. If a matching pattern is found, then the result mask of SETCC is
promoted to the expected vector mask for the given target. This mask has usually
te same size as the VSELECT return type (except for Intel KNL). Now the type
legalizer will split both VSELECT and SETCC.
This allows the following X86 DAG Combine code to sucessfully detect the MIN/MAX
pattern. This fixes PR16695, PR17002, and <rdar://problem/14594431>.
llvm-svn: 191130
C-like languages promote types like unsigned short to unsigned int before
performing an arithmetic operation. Currently the rotate matcher in the
DAGCombiner does not consider this situation.
This commit extends the DAGCombiner in the way that the pattern
(or (shl ([az]ext x), (*ext y)), (srl ([az]ext x), (*ext (sub 32, y))))
is folded into
([az]ext (rotl x, y))
The matching is restricted to aext and zext because in this cases the upper
bits are either undefined or known. Test case is included.
This fixes PR16726.
llvm-svn: 191049
C-like languages promote types like unsigned short to unsigned int before
performing an arithmetic operation. Currently the rotate matcher in the
DAGCombiner does not consider this situation.
This commit extends the DAGCombiner in the way that the pattern
(or (shl ([az]ext x), (*ext y)), (srl ([az]ext x), (*ext (sub 32, y))))
is folded into
([az]ext (rotl x, y))
The matching is restricted to aext and zext because in this cases the upper
bits are either undefined or known. Test case is included.
This fixes PR16726.
llvm-svn: 191045
Use the DIVariable::isIndirect() flag set by the frontend instead of
guessing whether to set the machine location's indirection bit.
Paired commit with CFE.
llvm-svn: 190961
When a truncate node defines a legal vector type but uses an illegal
vector type, the legalization process was splitting the vector until
<1 x vector> type, but then it was failing to scalarize the node because
it did not know how to handle TRUNCATE.
<rdar://problem/14989896>
llvm-svn: 190830
DAGCombiner::isAlias can be called with SrcValue1 or SrcValue2 null, and we
can't use AA in this case (if we try, then the casting code in AA will assert).
llvm-svn: 190763
The vselect mask isn't a setcc.
This breaks in the case when the result of getSetCCResultType
is larger than the vector operands
e.g. %tmp = select i1 %cmp <2 x i8> %a, <2 x i8> %b
when getSetCCResultType returns <2 x i32>, the assertion
that the (MaskTy.getSizeInBits() == Op1.getValueType().getSizeInBits())
is hit.
No test since I don't think I can hit this with any of the current
targets. The R600/SI implementation would break, since it returns a
vector of i1 for this, but it doesn't reach ExpandSELECT for other
reasons.
llvm-svn: 190376
The work on this project was left in an unfinished and inconsistent state.
Hopefully someone will eventually get a chance to implement this feature, but
in the meantime, it is better to put things back the way the were. I have
left support in the bitcode reader to handle the case-range bitcode format,
so that we do not lose bitcode compatibility with the llvm 3.3 release.
This reverts the following commits: 155464, 156374, 156377, 156613, 156704,
156757, 156804 156808, 156985, 157046, 157112, 157183, 157315, 157384, 157575,
157576, 157586, 157612, 157810, 157814, 157815, 157880, 157881, 157882, 157884,
157887, 157901, 158979, 157987, 157989, 158986, 158997, 159076, 159101, 159100,
159200, 159201, 159207, 159527, 159532, 159540, 159583, 159618, 159658, 159659,
159660, 159661, 159703, 159704, 160076, 167356, 172025, 186736
llvm-svn: 190328
Occasionally DAGCombiner can spot that a SETCC operation is completely
redundant and reduce it to "all true" or "all false". If this happens to a
vector, the value produced has to take account of what a normal comparison
would have produced, which may be an all-1s bitmask.
The fix in SelectionDAG.cpp is tested, however, as far as I can see the code in
TargetLowering.cpp is possibly unreachable and almost certainly irrelevant when
triggered so there are no tests. However, I believe it's still clearly the
right change and may save someone else some hassle if it suddenly becomes
reachable. So I'm doing it anyway.
llvm-svn: 190147
This uses the TargetSubtargetInfo::useAA() function to control the defaults of
the -combiner-alias-analysis and -combiner-global-alias-analysis options.
llvm-svn: 189564
We want to convert code like (or (srl N, 8), (shl N, 8)) into (srl (bswap N),
const), but this is only valid if the bits above 16 on the source pattern are
0, the checks we were doing on this were slightly wrong before.
llvm-svn: 189348
If we have a binary operation like ISD:ADD, we can set the result type
equal to the result type of one of its operands rather than using
TargetLowering::getPointerTy().
Also, any use of DAG.getIntPtrConstant(C) as an operand for a binary
operation can be replaced with:
DAG.getConstant(C, OtherOperand.getValueType());
llvm-svn: 189227
This adds minimal support to the SelectionDAG for handling address spaces
with different pointer sizes. The SelectionDAG should now correctly
lower pointer function arguments to the correct size as well as generate
the correct code when lowering getelementptr.
This patch also updates the R600 DataLayout to use 32-bit pointers for
the local address space.
v2:
- Add more helper functions to TargetLoweringBase
- Use CHECK-LABEL for tests
llvm-svn: 189221
When truncated vector stores were being custom lowered in
VectorLegalizer::LegalizeOp(), the old (illegal) and new (legal) node pair
was not being added to LegalizedNodes list. Instead of the legalized
result being passed to VectorLegalizer::TranslateLegalizeResult(),
the result was being passed back into VectorLegalizer::LegalizeOp(),
which ended up adding a (new, new) pair to the list instead.
This was causing an assertion failure when a custom lowered truncated
vector store was the last instruction a basic block and the VectorLegalizer
was unable to find it in the LegalizedNodes list when updating the
DAG root.
llvm-svn: 188953
The small utility function that pattern matches Base + Index +
Offset patterns for loads and stores fails to recognize the base
pointer for loads/stores from/into an array at offset 0 inside a
loop. As a result DAGCombiner::MergeConsecutiveStores was not able
to merge all stores.
This commit fixes the issue by adding an additional pattern match
and also a test case.
Reviewer: Nadav
llvm-svn: 188936
SystemZTargetLowering::emitStringWrapper() previously loaded the character
into R0 before the loop and made R0 live on entry. I'd forgotten that
allocatable registers weren't allowed to be live across blocks at this stage,
and it confused LiveVariables enough to cause a miscompilation of f3 in
memchr-02.ll.
This patch instead loads R0 in the loop and leaves LICM to hoist it
after RA. This is actually what I'd tried originally, but I went for
the manual optimisation after noticing that R0 often wasn't being hoisted.
This bug forced me to go back and look at why, now fixed as r188774.
We should also try to optimize null checks so that they test the CC result
of the SRST directly. The select between null and the SRST GPR result could
then usually be deleted as dead.
llvm-svn: 188779
Previously, generation of stack protectors was done exclusively in the
pre-SelectionDAG Codegen LLVM IR Pass "Stack Protector". This necessitated
splitting basic blocks at the IR level to create the success/failure basic
blocks in the tail of the basic block in question. As a result of this,
calls that would have qualified for the sibling call optimization were no
longer eligible for optimization since said calls were no longer right in
the "tail position" (i.e. the immediate predecessor of a ReturnInst
instruction).
Then it was noticed that since the sibling call optimization causes the
callee to reuse the caller's stack, if we could delay the generation of
the stack protector check until later in CodeGen after the sibling call
decision was made, we get both the tail call optimization and the stack
protector check!
A few goals in solving this problem were:
1. Preserve the architecture independence of stack protector generation.
2. Preserve the normal IR level stack protector check for platforms like
OpenBSD for which we support platform specific stack protector
generation.
The main problem that guided the present solution is that one can not
solve this problem in an architecture independent manner at the IR level
only. This is because:
1. The decision on whether or not to perform a sibling call on certain
platforms (for instance i386) requires lower level information
related to available registers that can not be known at the IR level.
2. Even if the previous point were not true, the decision on whether to
perform a tail call is done in LowerCallTo in SelectionDAG which
occurs after the Stack Protector Pass. As a result, one would need to
put the relevant callinst into the stack protector check success
basic block (where the return inst is placed) and then move it back
later at SelectionDAG/MI time before the stack protector check if the
tail call optimization failed. The MI level option was nixed
immediately since it would require platform specific pattern
matching. The SelectionDAG level option was nixed because
SelectionDAG only processes one IR level basic block at a time
implying one could not create a DAG Combine to move the callinst.
To get around this problem a few things were realized:
1. While one can not handle multiple IR level basic blocks at the
SelectionDAG Level, one can generate multiple machine basic blocks
for one IR level basic block. This is how we handle bit tests and
switches.
2. At the MI level, tail calls are represented via a special return
MIInst called "tcreturn". Thus if we know the basic block in which we
wish to insert the stack protector check, we get the correct behavior
by always inserting the stack protector check right before the return
statement. This is a "magical transformation" since no matter where
the stack protector check intrinsic is, we always insert the stack
protector check code at the end of the BB.
Given the aforementioned constraints, the following solution was devised:
1. On platforms that do not support SelectionDAG stack protector check
generation, allow for the normal IR level stack protector check
generation to continue.
2. On platforms that do support SelectionDAG stack protector check
generation:
a. Use the IR level stack protector pass to decide if a stack
protector is required/which BB we insert the stack protector check
in by reusing the logic already therein. If we wish to generate a
stack protector check in a basic block, we place a special IR
intrinsic called llvm.stackprotectorcheck right before the BB's
returninst or if there is a callinst that could potentially be
sibling call optimized, before the call inst.
b. Then when a BB with said intrinsic is processed, we codegen the BB
normally via SelectBasicBlock. In said process, when we visit the
stack protector check, we do not actually emit anything into the
BB. Instead, we just initialize the stack protector descriptor
class (which involves stashing information/creating the success
mbbb and the failure mbb if we have not created one for this
function yet) and export the guard variable that we are going to
compare.
c. After we finish selecting the basic block, in FinishBasicBlock if
the StackProtectorDescriptor attached to the SelectionDAGBuilder is
initialized, we first find a splice point in the parent basic block
before the terminator and then splice the terminator of said basic
block into the success basic block. Then we code-gen a new tail for
the parent basic block consisting of the two loads, the comparison,
and finally two branches to the success/failure basic blocks. We
conclude by code-gening the failure basic block if we have not
code-gened it already (all stack protector checks we generate in
the same function, use the same failure basic block).
llvm-svn: 188755
This adds a llvm.copysign intrinsic; We already have Libfunc recognition for
copysign (which is turned into the FCOPYSIGN SDAG node). In order to
autovectorize calls to copysign in the loop vectorizer, we need a corresponding
intrinsic as well.
In addition to the expected changes to the language reference, the loop
vectorizer, BasicTTI, and the SDAG builder (the intrinsic is transformed into
an FCOPYSIGN node, just like the function call), this also adds FCOPYSIGN to a
few lists in LegalizeVector{Ops,Types} so that vector copysigns can be
expanded.
In TargetLoweringBase::initActions, I've made the default action for FCOPYSIGN
be Expand for vector types. This seems correct for all in-tree targets, and I
think is the right thing to do because, previously, there was no way to generate
vector-values FCOPYSIGN nodes (and most targets don't specify an action for
vector-typed FCOPYSIGN).
llvm-svn: 188728
- split WidenVecRes_Binary into WidenVecRes_Binary and WidenVecRes_BinaryCanTrap
- WidenVecRes_BinaryCanTrap preserves the original behaviour for operations
that can trap
- WidenVecRes_Binary simply widens the operation and improves codegen for
3-element vectors by allowing widening and promotion on x86 (matches the
behaviour of unary and ternary operation widening)
- use WidenVecRes_Binary for operations on integers.
Reviewed by: nrotem
llvm-svn: 188699
We had previously been asserting when faced with a FCOPYSIGN f64, ppcf128 node
because there was no way to expand the FCOPYSIGN node. Because ppcf128 is the
sum of two doubles, and the first double must have the larger magnitude, we
can take the sign from the first double. As a result, in addition to fixing the
crash, this is also an optimization.
llvm-svn: 188655
Teach the generic instruction selection helper functions to constrain
the register classes of their input operands. For non-physical register
references, the generic code needs to be careful not to mess that up
when replacing references to result registers. As the comment indicates
for MachineRegisterInfo::replaceRegWith(), it's important to call
constrainRegClass() first.
rdar://12594152
llvm-svn: 188593
Generalize r188163 to cope with return types other than MVT::i32, just
as the existing visitMemCmpCall code did. I've split this out into a
subroutine so that it can be used for other upcoming patches.
I also noticed that I'd used the wrong API to record the out chain.
It's a load that uses DAG.getRoot() rather than getRoot(), so the out
chain should go on PendingLoads. I don't have a testcase for that because
we don't do any interesting scheduling on z yet.
llvm-svn: 188540
A common idiom is to use zero and all-ones as sentinal values and to
check for both in a single conditional ("x != 0 && x != (unsigned)-1").
That generates code, for i32, like:
testl %edi, %edi
setne %al
cmpl $-1, %edi
setne %cl
andb %al, %cl
With this transform, we generate the simpler:
incl %edi
cmpl $1, %edi
seta %al
Similar improvements for other integer sizes and on other platforms. In
general, combining the two setcc instructions into one is better.
rdar://14689217
llvm-svn: 188315
LowerCallTo returns a pair with the return value of the call as the first
element and the chain associated with the return value as the second element. If
we lower a call that has a void return value, LowerCallTo returns an SDValue
with a NULL SDNode and the chain for the call. Thus makeLibCall by just
returning the first value makes it impossible for you to set up the chain so
that the call is not eliminated as dead code.
I also updated all references to makeLibCall to reflect the new return type.
llvm-svn: 188300
Previously the asserts were only checking that RHS and LHS were the same type and had the same element type as the result. All downstream code for ISD::VECTOR_SHUFFLE requires the types to be the same.
Also removed one unnecessary check of matched element counts that was present in the code.
llvm-svn: 188051
All libm floating-point rounding functions, except for round(), had their own
ISD nodes. Recent PowerPC cores have an instruction for round(), and so here I'm
adding ISD::FROUND so that round() can be custom lowered as well.
For the most part, this is straightforward. I've added an intrinsic
and a matching ISD node just like those for nearbyint() and friends. The
SelectionDAG pattern I've named frnd (because ISD::FP_ROUND has already claimed
fround).
This will be used by the PowerPC backend in a follow-up commit.
llvm-svn: 187926
This virtual function can be implemented by targets to specify the type
to use for the index operand of INSERT_VECTOR_ELT, EXTRACT_VECTOR_ELT,
INSERT_SUBVECTOR, EXTRACT_SUBVECTOR. The default implementation returns
the result from TargetLowering::getPointerTy()
The previous code was using TargetLowering::getPointerTy() for vector
indices, because this is guaranteed to be legal on all targets. However,
using TargetLowering::getPointerTy() can be a problem for targets with
pointer sizes that differ across address spaces. On such targets,
when vectors need to be loaded or stored to an address space other than the
default 'zero' address space (which is the address space assumed by
TargetLowering::getPointerTy()), having an index that
is a different size than the pointer can lead to inefficient
pointer calculations, (e.g. 64-bit adds for a 32-bit address space).
There is no intended functionality change with this patch.
llvm-svn: 187748
For a testcase like the following:
typedef unsigned long uint64_t;
typedef struct {
uint64_t lo;
uint64_t hi;
} blob128_t;
void add_128_to_128(const blob128_t *in, blob128_t *res) {
asm ("PAND %1, %0" : "+Q"(*res) : "Q"(*in));
}
where we'll fail to allocate the register for the output constraint,
our matching input constraint will not find a register to match,
and could try to search past the end of the current operands array.
On the idea that we'd like to attempt to keep compilation going
to find more errors in the module, change the error cases when
we're visiting inline asm IR to return immediately and avoid
trying to create a node in the DAG. This leaves us with only
a single error message per inline asm instruction, but allows us
to safely keep going in the general case.
llvm-svn: 187470
This patch prevents the following combine when the input vector is used more
than once.
insert_vector_elt (build_vector elt0, ..., eltN), NewEltIdx, idx
=>
build_vector elt0, ..., NewEltIdx, ..., eltN
The reasons are:
- Building a vector may be expensive, so try to reuse the existing part of a
vector instead of creating a new one (think big vectors).
- elt0 to eltN now have two users instead of one. This may prevent some other
optimizations.
llvm-svn: 187396
Adds unit tests for it too.
Split BasicBlockUtils into an analysis-half and a transforms-half, and put the
analysis bits into a new Analysis/CFG.{h,cpp}. Promote isPotentiallyReachable
into llvm::isPotentiallyReachable and move it into Analysis/CFG.
llvm-svn: 187283
CustomLowerNode was not being called during SplitVectorOperand,
meaning custom legalization could not be used by targets.
This also adds a test case for NVPTX that depends on this custom
legalization.
Differential Revision: http://llvm-reviews.chandlerc.com/D1195
Attempt to fix the buildbots by making the X86 test I just added platform independent
llvm-svn: 187202