MERGE_VALUES nodes. Replacing the result values with the
operands in one MERGE_VALUES node may cause another
MERGE_VALUES node be CSE'd with the first one, and bring
its uses along, so that the first one isn't dead, as this
code expects. Fix this by iterating until the node is
really dead. This fixes PR4699.
llvm-svn: 78619
Blackfin supports and/or/xor on i32 but not on i16. Teach
DAGCombiner::SimplifyBinOpWithSameOpcodeHands to not produce illegal nodes
after legalize ops.
llvm-svn: 78497
and high-bits values in ways that weren't correct for integer
types wider than 64 bits. This fixes a miscompile in
PPMacroExpansion.cpp in clang on x86-64.
llvm-svn: 78295
support. This isn't immediately interesting, because Legalize
ends up lowering SELECT_CC if the target doesn't support it,
but this simplifies the process.
Also, if the SELECT_CC would be expanded in Legalize, it
can potentially end up with two copies of the condition
expression. By leaving it as SELECT+SETCC, the SELECT can be
expanded into two SELECTs that use a single SETCC.
The two comparisons are usually CSE'd, but depending on
when various expressions get legalized, the comparison
expression could involve calls to library functions, such
that the comparison expression may not be able to be CSE'd.
This will be needed by a future patch.
llvm-svn: 77896
This adds location info for all llvm_unreachable calls (which is a macro now) in
!NDEBUG builds.
In NDEBUG builds location info and the message is off (it only prints
"UREACHABLE executed").
llvm-svn: 75640
Make llvm_unreachable take an optional string, thus moving the cerr<< out of
line.
LLVM_UNREACHABLE is now a simple wrapper that makes the message go away for
NDEBUG builds.
llvm-svn: 75379
VSETCC must define all bits, which is different than it was documented
to before. Since all targets that implement VSETCC already have this
behavior, and we don't optimize based on this, just change the
documentation. We now get nice code for vec_compare.ll
llvm-svn: 74978
build vectors with i64 elements will only appear on 32b x86 before legalize.
Since vector widening occurs during legalize, and produces i64 build_vector
elements, the dag combiner is never run on these before legalize splits them
into 32b elements.
Teach the build_vector dag combine in x86 back end to recognize consecutive
loads producing the low part of the vector.
Convert the two uses of TLI's consecutive load recognizer to pass LoadSDNodes
since that was required implicitly.
Add a testcase for the transform.
Old:
subl $28, %esp
movl 32(%esp), %eax
movl 4(%eax), %ecx
movl %ecx, 4(%esp)
movl (%eax), %eax
movl %eax, (%esp)
movaps (%esp), %xmm0
pmovzxwd %xmm0, %xmm0
movl 36(%esp), %eax
movaps %xmm0, (%eax)
addl $28, %esp
ret
New:
movl 4(%esp), %eax
pmovzxwd (%eax), %xmm0
movl 8(%esp), %eax
movaps %xmm0, (%eax)
ret
llvm-svn: 72957
instcombine doesn't know when it's safe. To partially compensate
for this, introduce new code to do this transformation in
dagcombine, which can use UnsafeFPMath.
llvm-svn: 72872
ADDC/ADDE use MVT::i1 (later, whatever it gets legalized to)
instead of MVT::Flag. Remove CARRY_FALSE in favor of 0; adjust
all target-independent code to use this format.
Most targets will still produce a Flag-setting target-dependent
version when selection is done. X86 is converted to use i32
instead, which means TableGen needs to produce different code
in xxxGenDAGISel.inc. This keys off the new supportsHasI1 bit
in xxxInstrInfo, currently set only for X86; in principle this
is temporary and should go away when all other targets have
been converted. All relevant X86 instruction patterns are
modified to represent setting and using EFLAGS explicitly. The
same can be done on other targets.
The immediate behavior change is that an ADC/ADD pair are no
longer tightly coupled in the X86 scheduler; they can be
separated by instructions that don't clobber the flags (MOV).
I will soon add some peephole optimizations based on using
other instructions that set the flags to feed into ADC.
llvm-svn: 72707
e.g.
orl $65536, 8(%rax)
=>
orb $1, 10(%rax)
Since narrowing is not always a win, e.g. i32 -> i16 is a loss on x86, dag combiner consults with the target before performing the optimization.
llvm-svn: 72507
The DAGCombiner created a negative shiftamount, stored in an
unsigned variable. Later the optimizer eliminated the shift entirely as being
undefined.
Example: (srl (shl X, 56) 48). ShiftAmt is 4294967288.
Fix it by checking that the shiftamount is positive, and storing in a signed
variable.
llvm-svn: 72331
Massive check in. This changes the "-fast" flag to "-O#" in llc. If you want to
use the old behavior, the flag is -O0. This change allows for finer-grained
control over which optimizations are run at different -O levels.
Most of this work was pretty mechanical. The majority of the fixes came from
verifying that a "fast" variable wasn't used anymore. The JIT still uses a
"Fast" flag. I'll change the JIT with a follow-up patch.
llvm-svn: 70343
use the old behavior, the flag is -O0. This change allows for finer-grained
control over which optimizations are run at different -O levels.
Most of this work was pretty mechanical. The majority of the fixes came from
verifying that a "fast" variable wasn't used anymore. The JIT still uses a
"Fast" flag. I'm not 100% sure if it's necessary to change it there...
llvm-svn: 70270
PR2957
ISD::VECTOR_SHUFFLE now stores an array of integers representing the shuffle
mask internal to the node, rather than taking a BUILD_VECTOR of ConstantSDNodes
as the shuffle mask. A value of -1 represents UNDEF.
In addition to eliminating the creation of illegal BUILD_VECTORS just to
represent shuffle masks, we are better about canonicalizing the shuffle mask,
resulting in substantially better code for some classes of shuffles.
llvm-svn: 70225
ISD::VECTOR_SHUFFLE now stores an array of integers representing the shuffle
mask internal to the node, rather than taking a BUILD_VECTOR of ConstantSDNodes
as the shuffle mask. A value of -1 represents UNDEF.
In addition to eliminating the creation of illegal BUILD_VECTORS just to
represent shuffle masks, we are better about canonicalizing the shuffle mask,
resulting in substantially better code for some classes of shuffles.
A clean up of x86 shuffle code, and some canonicalizing in DAGCombiner is next.
llvm-svn: 69952
type as the vector element type: allow them to be of
a wider integer type than the element type all the way
through the system, and not just as far as LegalizeDAG.
This should be safe because it used to be this way
(the old type legalizer would produce such nodes), so
backends should be able to handle it. In fact only
targets which have legal vector types with an illegal
promoted element type will ever see this (eg: <4 x i16>
on ppc). This fixes a regression with the new type
legalizer (vec_splat.ll). Also, treat SCALAR_TO_VECTOR
the same as BUILD_VECTOR. After all, it is just a
special case of BUILD_VECTOR.
llvm-svn: 69467
promoted to legal types without changing the type of the vector. This is
following a suggestion from Duncan
(http://lists.cs.uiuc.edu/pipermail/llvmdev/2009-February/019923.html).
The transformation that used to be done during type legalization is now
postponed to DAG legalization. This allows the BUILD_VECTORs to be optimized
and potentially handled specially by target-specific code.
It turns out that this is also consistent with an optimization done by the
DAG combiner: a BUILD_VECTOR and INSERT_VECTOR_ELT may be combined by
replacing one of the BUILD_VECTOR operands with the newly inserted element;
but INSERT_VECTOR_ELT allows its scalar operand to be larger than the
element type, with any extra high bits being implicitly truncated. The
result is a BUILD_VECTOR where one of the operands has a type larger the
the vector element type.
Any code that operates on BUILD_VECTORs may now need to be aware of the
potential type discrepancy between the vector element type and the
BUILD_VECTOR operands. This patch updates all of the places that I could
find to handle that case.
llvm-svn: 68996
in addition to ZERO_EXTEND and SIGN_EXTEND. Fix a bug in the
way it checked for live-out values, and simplify the way it
find users by using SDNode::use_iterator's (relatively) new
features. Also, make it slightly more permissive on targets
with free truncates.
In SelectionDAGBuild, avoid creating ANY_EXTEND nodes that are
larger than necessary. If the target's SwitchAmountTy has
enough bits, use it. This exposes the truncate to optimization
early, enabling more optimizations.
llvm-svn: 68670
with SUBREG_TO_REG, teach SimpleRegisterCoalescing to coalesce
SUBREG_TO_REG instructions (which are similar to INSERT_SUBREG
instructions), and teach the DAGCombiner to take advantage of this on
targets which support it. This eliminates many redundant
zero-extension operations on x86-64.
This adds a new TargetLowering hook, isZExtFree. It's similar to
isTruncateFree, except it only applies to actual definitions, and not
no-op truncates which may not zero the high bits.
Also, this adds a new optimization to SimplifyDemandedBits: transform
operations like x+y into (zext (add (trunc x), (trunc y))) on targets
where all the casts are no-ops. In contexts where the high part of the
add is explicitly masked off, this allows the mask operation to be
eliminated. Fix the DAGCombiner to avoid undoing these transformations
to eliminate casts on targets where the casts are no-ops.
Also, this adds a new two-address lowering heuristic. Since
two-address lowering runs before coalescing, it helps to be able to
look through copies when deciding whether commuting and/or
three-address conversion are profitable.
Also, fix a bug in LiveInterval::MergeInClobberRanges. It didn't handle
the case that a clobber range extended both before and beyond an
existing live range. In that case, multiple live ranges need to be
added. This was exposed by the new subreg coalescing code.
Remove 2008-05-06-SpillerBug.ll. It was bugpoint-reduced, and the
spiller behavior it was looking for no longer occurrs with the new
instruction selection.
llvm-svn: 68576
x * 40
=>
shlq $3, %rdi
leaq (%rdi,%rdi,4), %rax
This has the added benefit of allowing more multiply to be folded into addressing mode. e.g.
a * 24 + b
=>
leaq (%rdi,%rdi,2), %rax
leaq (%rsi,%rax,8), %rax
llvm-svn: 67917
vector shuffle mask. Forced the mask to be built using i32. Note: this will
be irrelevant once vector_shuffle no longer takes a build vector for the
shuffle mask.
llvm-svn: 67076
if FPConstant is legal because if the FPConstant doesn't need to be stored
in a constant pool, the transformation is unlikely to be profitable.
llvm-svn: 66994
1. ConstantPoolSDNode alignment field is log2 value of the alignment requirement. This is not consistent with other SDNode variants.
2. MachineConstantPool alignment field is also a log2 value.
3. However, some places are creating ConstantPoolSDNode with alignment value rather than log2 values. This creates entries with artificially large alignments, e.g. 256 for SSE vector values.
4. Constant pool entry offsets are computed when they are created. However, asm printer group them by sections. That means the offsets are no longer valid. However, asm printer uses them to determine size of padding between entries.
5. Asm printer uses expensive data structure multimap to track constant pool entries by sections.
6. Asm printer iterate over SmallPtrSet when it's emitting constant pool entries. This is non-deterministic.
Solutions:
1. ConstantPoolSDNode alignment field is changed to keep non-log2 value.
2. MachineConstantPool alignment field is also changed to keep non-log2 value.
3. Functions that create ConstantPool nodes are passing in non-log2 alignments.
4. MachineConstantPoolEntry no longer keeps an offset field. It's replaced with an alignment field. Offsets are not computed when constant pool entries are created. They are computed on the fly in asm printer and JIT.
5. Asm printer uses cheaper data structure to group constant pool entries.
6. Asm printer compute entry offsets after grouping is done.
7. Change JIT code to compute entry offsets on the fly.
llvm-svn: 66875
related transformations out of target-specific dag combine into the
ARM backend. These were added by Evan in r37685 with no testcases
and only seems to help ARM (e.g. test/CodeGen/ARM/select_xform.ll).
Add some simple X86-specific (for now) DAG combines that turn things
like cond ? 8 : 0 -> (zext(cond) << 3). This happens frequently
with the recently added cp constant select optimization, but is a
very general xform. For example, we now compile the second example
in const-select.ll to:
_test:
movsd LCPI2_0, %xmm0
ucomisd 8(%esp), %xmm0
seta %al
movzbl %al, %eax
movl 4(%esp), %ecx
movsbl (%ecx,%eax,4), %eax
ret
instead of:
_test:
movl 4(%esp), %eax
leal 4(%eax), %ecx
movsd LCPI2_0, %xmm0
ucomisd 8(%esp), %xmm0
cmovbe %eax, %ecx
movsbl (%ecx), %eax
ret
This passes multisource and dejagnu.
llvm-svn: 66779
alignment of the generated constant pool entry to the
desired alignment of a type. If we don't do this, we end up
trying to do movsd from 4-byte alignment memory. This fixes
450.soplex and 456.hmmer.
llvm-svn: 66641
extracts + build_vector into a shuffle would fail, because the
type of the new build_vector would not be legal. Try harder to
create a legal build_vector type. Note: this will be totally
irrelevant once vector_shuffle no longer takes a build_vector for
shuffle mask.
New:
_foo:
xorps %xmm0, %xmm0
xorps %xmm1, %xmm1
subps %xmm1, %xmm1
mulps %xmm0, %xmm1
addps %xmm0, %xmm1
movaps %xmm1, 0
Old:
_foo:
xorps %xmm0, %xmm0
movss %xmm0, %xmm1
xorps %xmm2, %xmm2
unpcklps %xmm1, %xmm2
pshufd $80, %xmm1, %xmm1
unpcklps %xmm1, %xmm2
pslldq $16, %xmm2
pshufd $57, %xmm2, %xmm1
subps %xmm0, %xmm1
mulps %xmm0, %xmm1
addps %xmm0, %xmm1
movaps %xmm1, 0
llvm-svn: 65791
instruction. The class also consolidates the code for detecting constant
splats that's shared across PowerPC and the CellSPU backends (and might be
useful for other backends.) Also introduces SelectionDAG::getBUID_VECTOR() for
generating new BUILD_VECTOR nodes.
llvm-svn: 65296
(Note: Eventually, commits like this will be handled via a pre-commit hook that
does this automagically, as well as expand tabs to spaces and look for 80-col
violations.)
llvm-svn: 64827
crashes or wrong code with codegen of large integers:
eliminate the legacy getIntegerVTBitMask and
getIntegerVTSignBit methods, which returned their
value as a uint64_t, so couldn't handle huge types.
llvm-svn: 63494
returned by getShiftAmountTy may be too small
to hold shift values (it is an i8 on x86-32).
Before and during type legalization, use a large
but legal type for shift amounts: getPointerTy;
afterwards use getShiftAmountTy, fixing up any
shift amounts with a big type during operation
legalization. Thanks to Dan for writing the
original patch (which I shamelessly pillaged).
llvm-svn: 63482
dagcombines that help it match in several more cases. Add
several more cases to test/CodeGen/X86/bt.ll. This doesn't
yet include matching for BT with an immediate operand, it
just covers more register+register cases.
llvm-svn: 63266
new isOperationLegalOrCustom, which does what isOperationLegal
previously did.
Update a bunch of callers to use isOperationLegalOrCustom
instead of isOperationLegal. In some case it wasn't obvious
which behavior is desired; when in doubt I changed then to
isOperationLegalOrCustom as that preserves their previous
behavior.
This is for the second half of PR3376.
llvm-svn: 63212
a uint64_t to verify that the value is in range for the given type,
to help catch accidental overflow. Fix a few places that relied on
getConstant implicitly truncating the value.
llvm-svn: 63128
tidy up SDUse and related code.
- Replace the operator= member functions with a set method, like
LLVM Use has, and variants setInitial and setNode, which take
care up updating use lists, like LLVM Use's does. This simplifies
code that calls these functions.
- getSDValue() is renamed to get(), as in LLVM Use, though most
places can either use the implicit conversion to SDValue or the
convenience functions instead.
- Fix some more node vs. value terminology issues.
Also, eliminate the one remaining use of SDOperandPtr, and
SDOperandPtr itself.
llvm-svn: 62995
testcase from PR3376, and in fact is sufficient to completely
avoid the problem in that testcase.
There's an underlying problem though; TLI.isOperationLegal
considers Custom to be Legal, which might be ok in some
cases, but that's what DAGCombiner is using in many places
to test if something is legal when LegalOperations is true.
When DAGCombiner is running after legalize, this isn't
sufficient. I'll address this in a separate commit.
llvm-svn: 62860
to "C ^ 1" is only valid when C is known to be either 0 or 1. Most of the
similar foldings in this function only handle "i1" types, but this one appears
intentionally written to handle larger integer types. If C has an integer
type larger than "i1", this needs to check if the high bits of a boolean
are known to be zero. I also changed the comment to describe this folding as
"C ^ 1" instead of "~C", since that is what the code does and since the latter
would only be valid for "i1" types. The good news is that most LLVM targets
use TargetLowering::ZeroOrOneBooleanContent so this change will not disable
the optimization; the bad news is that I've been unable to come up with a
testcase to demonstrate the problem.
I have also removed a "FIXME" comment for folding "select C, X, 0" to "C & X",
since the code looks correct to me. It could be made more aggressive by not
limiting the type to "i1", but that would then require checking for
TargetLowering::ZeroOrNegativeOneBooleanContent. Similar changes could be
done for the other SELECT foldings, but it was decided to be not worth the
trouble and complexity (see e.g., r44663).
llvm-svn: 62790
Simplify x+0 to x in unsafe-fp-math mode. This avoids a bunch of
redundant work in many cases, because in unsafe-fp-math mode,
ISD::FADD with a constant is considered free to negate, so the
DAGCombiner often negates x+0 to -0-x thinking it's free, when
in reality the end result is -x, which is more expensive than x.
Also, combine x*0 to 0.
This fixes PR3374.
llvm-svn: 62789
special cases after producing the new reduced-width load, because the
new load already has the needed adjustments built into it. This fixes
several bugs due to the special cases, including PR3317.
llvm-svn: 62692
uses are added to the From node while it is processing From's
use list, because of automatic local CSE. The fix is to avoid
visiting any new uses.
Fix a few places in the DAGCombiner that assumed that after
a RAUW call, the From node has no users and may be deleted.
This fixes PR3018.
llvm-svn: 62533
and into the ScheduleDAGInstrs class, so that they don't get
destructed and re-constructed for each block. This fixes a
compile-time hot spot in the post-pass scheduler.
To help facilitate this, tidy and do some minor reorganization
in the scheduler constructor functions.
llvm-svn: 62275
promote from i1 all the way up to the canonical SetCC type.
In order to discover an appropriate type to use, pass
MVT::Other to getSetCCResultType. In order to be able to
do this, change getSetCCResultType to take a type as an
argument, not a value (this is also more logical).
llvm-svn: 61542
ISD::ADD to emit an implicit EFLAGS. This was horribly broken. Instead, replace
the intrinsic with an ISD::SADDO node. Then custom lower that into an
X86ISD::ADD node with a associated SETCC that checks the correct condition code
(overflow or carry). Then that gets lowered into the correct X86::ADDOvf
instruction.
Similar for SUB and MUL instructions.
llvm-svn: 60915
(this doesn't happen that often, since most code
does not use illegal types) then follow it by a
DAG combiner run that is allowed to generate
illegal operations but not illegal types. I didn't
modify the target combiner code to distinguish like
this between illegal operations and illegal types,
so it will not produce illegal operations as well
as not producing illegal types.
llvm-svn: 59960
"ISD::ADDO". ISD::ADDO is lowered into a target-independent form that does the
addition and then checks if the result is less than one of the operands. (If it
is, then there was an overflow.)
llvm-svn: 59779
The CC was changed, but wasn't checked to see if it was legal if the DAG
combiner was being run after legalization. Threw in a couple of checks just to
make sure that it's okay. As far as the PR is concerned, no back-end target
actually exhibited this problem, so there isn't an associated testcase.
llvm-svn: 59035
elements. Otherwise LegalizeTypes will, reasonably
enough, legalize the mask, which may result in it
no longer being a BUILD_VECTOR node (LegalizeDAG
simply ignores the legality or not of vector masks).
llvm-svn: 57782
and add a TargetLowering hook for it to use to determine when this
is legal (i.e. not in PIC mode, etc.)
This allows instruction selection to emit folded constant offsets
in more cases, such as the included testcase, eliminating the need
for explicit arithmetic instructions.
This eliminates the need for the C++ code in X86ISelDAGToDAG.cpp
that attempted to achieve the same effect, but wasn't as effective.
Also, fix handling of offsets in GlobalAddressSDNodes in several
places, including changing GlobalAddressSDNode's offset from
int to int64_t.
The Mips, Alpha, Sparc, and CellSPU targets appear to be
unaware of GlobalAddress offsets currently, so set the hook to
false on those targets.
llvm-svn: 57748
shift counts, and patterns that match dynamic shift counts
when the subtract is obscured by a truncate node.
Add DAGCombiner support for recognizing rotate patterns
when the shift counts are defined by truncate nodes.
Fix and simplify the code for commuting shld and shrd
instructions to work even when the given instruction doesn't
have a parent, and when the caller needs a new instruction.
These changes allow LLVM to use the shld, shrd, rol, and ror
instructions on x86 to replace equivalent code using two
shifts and an or in many more cases.
llvm-svn: 57662
the SelectionDAG and DAGCombiner code. The only functionality change is that now
the DAG combiner is performing the constant folding for these operations instead
of being a no-op.
This is *not* in response to a bug, so there isn't a testcase.
llvm-svn: 56550
ConstantFP* instead of APInt and APFloat directly.
This reduces the amount of time to create ConstantSDNode
and ConstantFPSDNode nodes when ConstantInt* and ConstantFP*
respectively are already available, as is the case in
SelectionDAGBuild.cpp. Also, it reduces the amount of time
to legalize constants into constant pools, and the amount of
time to add ConstantFP operands to MachineInstrs, due to
eliminating ConstantInt::get and ConstantFP::get calls.
It increases the amount of work needed to create new constants
in cases where the client doesn't already have a ConstantInt*
or ConstantFP*, such as legalize expanding 64-bit integer constants
to 32-bit constants. And it adds a layer of indirection for the
accessor methods. But these appear to be outweight by the benefits
in most cases.
It will also make it easier to make ConstantSDNode and
ConstantFPNode more consistent with ConstantInt and ConstantFP.
llvm-svn: 56162
before. This is taken care of in the selection DAG pass. In my opinion, this
should be in one place or the other. I.e., it should probably be removed from
the DAG combiner (along with the other arithmetic transformations on constants
that are essentially no-ops).
llvm-svn: 55889
its work by putting all nodes in the worklist, requiring a big
dynamic allocation. Now, DAGCombiner just iterates over the AllNodes
list and maintains a worklist for nodes that are newly created or
need to be revisited. This allows the worklist to stay small in most
cases, so it can be a SmallVector.
This has the side effect of making DAGCombine not miss a folding
opportunity in alloca-align-rounding.ll.
llvm-svn: 55498
generic SDNode's (nodes with their own constructors
should do sanity checking in the constructor). Add
sanity checks for BUILD_VECTOR and fix all the places
that were producing bogus BUILD_VECTORs, as found by
"make check". My favorite is the BUILD_VECTOR with
only two operands that was being used to build a
vector with four elements!
llvm-svn: 53850
the night realising that it was wrong :) I
think the reason the same type was being used
for the shufflevec of indices as for the actual
indices is so that if one of them needs splitting
then so does the other. After my patch it might
be that the indices need splitting but not the
rest, yet there is no good way of handling that.
I think the right solution is to not have the
shufflevec be an operand at all: just have it
be the list of numbers it actually is, stored
as extra info in the node.
llvm-svn: 53768
mask. These are just indices into the shuffled vector
so their type is unrelated to the type of the
shuffled elements (which is what was being used before).
This fixes vec_shuffle-11.ll when using LegalizeTypes.
What seems to have happened is that Dan's recent change
r53687, which corrected the result type of the shuffle,
somehow caused LegalizeTypes to notice that the mask
operand was a BUILD_VECTOR with a legal type but elements
of an illegal type (i64). LegalizeTypes legalized this
by introducing a new BUILD_VECTOR of i32 and bitcasting
it to the old type. But the mask operand is not supposed
to be a bitcast but a straight BUILD_VECTOR of constants,
causing a crash.
llvm-svn: 53729
SelectionDAG::allnodes_size is linear, but that doesn't appear to
outweigh the benefit of reducing heap traffic. If it does become a
problem, we should teach SelectionDAG to keep a count of how many
nodes are live, because there are several other places where that
information would be useful as well.
llvm-svn: 52926
still excluding types like i1 (not byte sized)
and i120 (loading an i120 requires loading an i64,
an i32, an i16 and an i8, which is expensive).
llvm-svn: 52310
wrong for volatile loads and stores. In fact this
is almost all of them! There are three types of
problems: (1) it is wrong to change the width of
a volatile memory access. These may be used to
do memory mapped i/o, in which case a load can have
an effect even if the result is not used. Consider
loading an i32 but only using the lower 8 bits. It
is wrong to change this into a load of an i8, because
you are no longer tickling the other three bytes. It
is also unwise to make a load/store wider. For
example, changing an i16 load into an i32 load is
wrong no matter how aligned things are, since the
fact of loading an additional 2 bytes can have
i/o side-effects. (2) it is wrong to change the
number of volatile load/stores: they may be counted
by the hardware. (3) it is wrong to change a volatile
load/store that requires one memory access into one
that requires several. For example on x86-32, you
can store a double in one processor operation, but to
store an i64 requires two (two i32 stores). In a
multi-threaded program you may want to bitcast an i64
to a double and store as a double because that will
occur atomically, and be indivisible to other threads.
So it would be wrong to convert the store-of-double
into a store of an i64, because this will become two
i32 stores - no longer atomic. My policy here is
to say that the number of processor operations for
an illegal operation is undefined. So it is alright
to change a store of an i64 (requires at least two
stores; but could be validly lowered to memcpy for
example) into a store of double (one processor op).
In short, if the new store is legal and has the same
size then I say that the transform is ok. It would
also be possible to say that transforms are always
ok if before they were illegal, whether after they
are illegal or not, but that's more awkward to do
and I doubt it buys us anything much.
However this exposed an interesting thing - on x86-32
a store of i64 is considered legal! That is because
operations are marked legal by default, regardless of
whether the type is legal or not. In some ways this
is clever: before type legalization this means that
operations on illegal types are considered legal;
after type legalization there are no illegal types
so now operations are only legal if they really are.
But I consider this to be too cunning for mere mortals.
Better to do things explicitly by testing AfterLegalize.
So I have changed things so that operations with illegal
types are considered illegal - indeed they can never
map to a machine operation. However this means that
the DAG combiner is more conservative because before
it was "accidentally" performing transforms where the
type was illegal because the operation was nonetheless
marked legal. So in a few such places I added a check
on AfterLegalize, which I suppose was actually just
forgotten before. This causes the DAG combiner to do
slightly more than it used to, which resulted in the X86
backend blowing up because it got a slightly surprising
node it wasn't expecting, so I tweaked it.
llvm-svn: 52254
maps can be deleted. This happens when RAUW
replaces a node N with another equivalent node
E, deleting the first node. Solve this by
adding (N, E) to ReplacedNodes, which is already
used to remap nodes to replacements. This means
that deleted nodes are being allowed in maps,
which can be delicate: the memory may be reused
for a new node which might get confused with the
old deleted node pointer hanging around in the
maps, so detect this and flush out maps if it
occurs (ExpungeNode). The expunging operation
is expensive, however it never occurs during
a llvm-gcc bootstrap or anywhere in the nightly
testsuite. It occurs three times in "make check":
Alpha/illegal-element-type.ll,
PowerPC/illegal-element-type.ll and
X86/mmx-shift.ll. If expunging proves to be too
expensive then there are other more complicated
ways of solving the problem.
In the normal case this patch adds the overhead
of a few more map lookups, which is hopefully
negligable.
llvm-svn: 52214
of integer types. Fix the isMask APInt method to
actually work (hopefully) rather than crashing
because it adds apints of different bitwidths.
It looks like isShiftedMask is also broken, but
I'm leaving that one to the APInt people (it is
not used anywhere).
llvm-svn: 52142
of apint codegen failure is the DAG combiner doing
the wrong thing because it was comparing MVT's using
< rather than comparing the number of bits. Removing
the < method makes this mistake impossible to commit.
Instead, add helper methods for comparing bits and use
them.
llvm-svn: 52098
and better control the abstraction. Rename the type
to MVT. To update out-of-tree patches, the main
thing to do is to rename MVT::ValueType to MVT, and
rewrite expressions like MVT::getSizeInBits(VT) in
the form VT.getSizeInBits(). Use VT.getSimpleVT()
to extract a MVT::SimpleValueType for use in switch
statements (you will get an assert failure if VT is
an extended value type - these shouldn't exist after
type legalization).
This results in a small speedup of codegen and no
new testsuite failures (x86-64 linux).
llvm-svn: 52044
Rename SDOperandImpl back to SDOperand.
Introduce the SDUse class that represents a use of the SDNode referred by
an SDOperand. Now it is more similar to Use/Value classes.
Patch is approved by Dan Gohman.
llvm-svn: 49795
LLVM Value/Use does and MachineRegisterInfo/MachineOperand does.
This allows constant time for all uses list maintenance operations.
The idea was suggested by Chris. Reviewed by Evan and Dan.
Patch is tested and approved by Dan.
On normal use-cases compilation speed is not affected. On very big basic
blocks there are compilation speedups in the range of 15-20% or even better.
llvm-svn: 48822
X86 lowering normalize vector 0 to v4i32. However DAGCombine can fold (sub x, x) -> 0 after legalization. It can create a zero vector of a type that's not expected (e.g. v8i16). We don't want to disable the optimization since leaving a (sub x, x) is really bad. Add isel patterns for other types of vector 0 to ensure correctness. It's highly unlikely to happen other than in bugpoint reduced test cases.
llvm-svn: 48279
Change several cases in SimplifyDemandedMask that don't ever do any
simplifying to reuse the logic in ComputeMaskedBits instead of
duplicating it.
llvm-svn: 47648
after legalize. Just because a constant is legal (e.g. 0.0 in SSE)
doesn't mean that its negated value is legal (-0.0). We could make
this stronger by checking to see if the negated constant is actually
legal post negation, but it doesn't seem like a big deal.
llvm-svn: 47591
DAGUpdateListener object pointer instead of just returning a vector
of deleted nodes. This makes the interfaces more efficient (no more
allocating a vector [at least a malloc], filling it in, then walking
it) and more clean. This also allows the client to be notified of
nodes that are *changed* but not deleted.
llvm-svn: 46677
and StoreSDNode into their common base class LSBaseSDNode. Member
functions getLoadedVT and getStoredVT are replaced with the common
getMemoryVT to simplify code that will handle both loads and stores.
llvm-svn: 46538
registers if used by a bitconvert or using a bitconvert. This allows us to
avoid constant pool loads and use cheaper integer instructions when the
values come from or end up in integer regs anyway. For example, we now
compile CodeGen/X86/fp-in-intregs.ll to:
_test1:
movl $2147483648, %eax
xorl 4(%esp), %eax
ret
_test2:
movl $1065353216, %eax
orl 4(%esp), %eax
andl $3212836864, %eax
ret
Instead of:
_test1:
movss 4(%esp), %xmm0
xorps LCPI2_0, %xmm0
movd %xmm0, %eax
ret
_test2:
movss 4(%esp), %xmm0
andps LCPI3_0, %xmm0
movss LCPI3_1, %xmm1
andps LCPI3_2, %xmm1
orps %xmm0, %xmm1
movd %xmm1, %eax
ret
bitconverts can happen due to various calling conventions that require
fp values to passed in integer regs in some cases, e.g. when returning
a complex.
llvm-svn: 46414
delete a node even if it was not dead in some cases. Instead, just add it to
the worklist. Also, make sure to use the CombineTo methods, as it was doing
things that were unsafe: the top level combine loop could touch dangling memory.
This fixes CodeGen/Generic/2008-01-25-dag-combine-mul.ll
llvm-svn: 46384
1. we already know the value is dead, so don't bother replacing
it with undef.
2. The very case the comment describes actually makes the load
live which asserts in deletenode. If we do the replacement
and the node becomes live, just treat it as new. This fixes
a failure on X86/2008-01-16-InvalidDAGCombineXform.ll with
some local changes in my tree.
llvm-svn: 46306
dead stuff around. This gets fed into the isel pass and causes certain foldings from
happening because nodes have extraneous uses floating around. For example, if we turned
foo(bar(x)) -> baz(x), we sometimes left bar(x) around.
llvm-svn: 46305
1. Legalize now always promotes truncstore of i1 to i8.
2. Remove patterns and gunk related to truncstore i1 from targets.
3. Rename the StoreXAction stuff to TruncStoreAction in TLI.
4. Make the TLI TruncStoreAction table a 2d table to handle from/to conversions.
5. Mark a wide variety of invalid truncstores as such in various targets, e.g.
X86 currently doesn't support truncstore of any of its integer types.
6. Add legalize support for truncstores with invalid value input types.
7. Add a dag combine transform to turn store(truncate) into truncstore when
safe.
The later allows us to compile CodeGen/X86/storetrunc-fp.ll to:
_foo:
fldt 20(%esp)
fldt 4(%esp)
faddp %st(1)
movl 36(%esp), %eax
fstps (%eax)
ret
instead of:
_foo:
subl $4, %esp
fldt 24(%esp)
fldt 8(%esp)
faddp %st(1)
fstps (%esp)
movl 40(%esp), %eax
movss (%esp), %xmm0
movss %xmm0, (%eax)
addl $4, %esp
ret
llvm-svn: 46140
and switch various codegen pieces and the X86 backend over
to using it.
* Add some comments to SelectionDAGNodes.h
* Introduce a second argument to FP_ROUND, which indicates
whether the FP_ROUND changes the value of its input. If
not it is safe to xform things like fp_extend(fp_round(x)) -> x.
llvm-svn: 46125