rL310372 enabled simplifyShuffleMask to support undef shuffle mask inputs, but its causing hangs.
Removing support until I can triage the problem
llvm-svn: 310699
Summary:
Preserve chain dependecies between old and new loads constructed to
prevent loads from reordering below later stores.
Fixes PR34088.
Reviewers: craig.topper, spatel, RKSimon, efriedma
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D36528
llvm-svn: 310604
In FoldConstantArithmetic, handle BUILD_VECTOR nodes that do implicit truncation on the elements.
This is similar to what is done in FoldConstantVectorArithmetic.
Differential Revision:
https://reviews.llvm.org/D36506
llvm-svn: 310593
a legal cond operand.
When scalarizing the result of a vselect, the legalizer currently expects
to already have scalarized the operands. While this is true for the true/false
operands (which have the same type as the result), it is not case for the
condition operand. On X86 AVX512, v1i1 is legal - this leads to operations such
as '< N x type> vselect < N x i1> < N x type> < N x type>' where < N x type > is
illegal to hit an assertion during the scalarization.
The handling is similar to r205625.
This also exposes the fact that (v1i1 extract_subvector) should be legal
and selectable on AVX512 - We do this by custom lowering to vector_extract_elt.
This still leaves us in some cases with redundant dag nodes which will be
combined in a separate soon to come patch.
This fixes pr33349.
Differential revision: https://reviews.llvm.org/D36511
llvm-svn: 310552
Relanding after case to insert explicit truncation as necessary.
Allow SCALAR_TO_VECTOR of EXTRACT_VECTOR_ELT to reduce to
EXTRACT_SUBVECTOR of vector shuffle when output is smaller. Marginally
improves vector shuffle computations.
Reviewers: efriedma, RKSimon, spatel
Subscribers: javed.absar, llvm-commits
Differential Revision: https://reviews.llvm.org/D35566
llvm-svn: 310256
The NewNodesMustHaveLegalTypes flag is set to false at the beginning of CodeGenAndEmitDAG, and set to true after legalizing types.
But before calling CodeGenAndEmitDAG we build the DAG for the basic block.
So for the first basic block NewNodesMustHaveLegalTypes would be 'false' during the SDAG building, and for all other basic blocks it would be 'true'.
This patch sets the flag to false before SDAG building each basic block.
Differential Revision:
https://reviews.llvm.org/D33435
llvm-svn: 310239
We can convert any select-of-constants to math ops:
http://rise4fun.com/Alive/d7d
For this patch, I'm enhancing an existing x86 transform that uses fake multiplies
(they always become shl/lea) to avoid cmov or branching. The current code misses
cases where we have a negative constant and a positive constant, so this is just
trying to plug that hole.
The DAGCombiner diff prevents us from hitting a terrible inefficiency: we can start
with a select in IR, create a select DAG node, convert it into a sext, convert it
back into a select, and then lower it to sext machine code.
Some notes about the test diffs:
1. 2010-08-04-MaskedSignedCompare.ll - We were creating control flow that didn't exist in the IR.
2. memcmp.ll - Choose -1 or 1 is the case that got me looking at this again. I
think we could avoid the push/pop in some cases if we used 'movzbl %al' instead of an xor on
a different reg? That's a post-DAG problem though.
3. mul-constant-result.ll - The trade-off between sbb+not vs. setne+neg could be addressed if
that's a regression, but I think those would always be nearly equivalent.
4. pr22338.ll and sext-i1.ll - These tests have undef operands, so I don't think we actually care about these diffs.
5. sbb.ll - This shows a win for what I think is a common case: choose -1 or 0.
6. select.ll - There's another borderline case here: cmp+sbb+or vs. test+set+lea? Also, sbb+not vs. setae+neg shows up again.
7. select_const.ll - These are motivating cases for the enhancement; replace cmov with cheaper ops.
Assembly differences between movzbl and xor to avoid a partial reg stall are caused later by the X86 Fixup SetCC pass.
Differential Revision: https://reviews.llvm.org/D35340
llvm-svn: 310208
If all the operands of a BUILD_VECTOR extract elements from same vector then split the vector efficiently based on the maximum vector access index.
Committed on behalf of @jbhateja (Jatin Bhateja)
Differential Revision: https://reviews.llvm.org/D35788
llvm-svn: 310058
During store merge we construct a sorted list of consecutive store
candidates and consider subsequences for merging into a single
store. For each subsequence we check if the stored value type is legal
the merged store would have valid and fast and if the constructed
value to be stored is valid. The only properties that affect this
check between subsequences is the size of the subsequence, the
alignment of the first store, the alignment of the stored load value
(when merging stores-of-loads), and whether the merged value is a
constant zero.
If we do not find a viable mergeable subsequence starting from the
first store of length N, we know that a subsequence starting at a
later store of length N will also fail unless the new store's
alignment, the new load's alignment (if we're merging store-of-loads),
or we've dropped stores of nonzero value and could construct a merged
stores of zero (for merging constants).
As a result if we fail to find a valid subsequence starting from the
first store we can safely skip considering subsequences that start
with subsequent stores unless one of the above properties is
true. This significantly (2x) improves compile time in some
pathological cases.
Reviewers: RKSimon, efriedma, zvi, spatel, waltl
Subscribers: grandinj, llvm-commits
Differential Revision: https://reviews.llvm.org/D35901
llvm-svn: 309830
This pattern shows up when lowering byval copies on AMDGPU.
The byval object access is split into 4-byte chunks, adding a
constant offset to the FixedStack base. When some of the offsets
turn into ors, this prevents combining the constant offsets.
This makes it not apparent that the object is there when matching
addressing modes, so it ends up using a scratch wave offset
relative access and the lengthy frame index expansion for that.
llvm-svn: 309775
Summary:
We already have information about static alloca stack locations in our
side table. Emitting instructions for them is inefficient, and it only
happens when the address of the alloca has been materialized within the
current block, which isn't often.
Reviewers: aprantl, probinson, dblaikie
Subscribers: jfb, dschuff, sbc100, jgravelle-google, hiraditya, llvm-commits, aheejin
Differential Revision: https://reviews.llvm.org/D36117
llvm-svn: 309729
https://reviews.llvm.org/D31536 didn't really solve the problem it was
trying to solve; it got rid of the assertion failure, but we were still
scheduling the DAG incorrectly (mixing together instructions from
different calls), leading to a MachineVerifier failure.
In order to schedule the DAG correctly, we have to make sure we don't
schedule a node which should be blocked by an interference. Fix
ScheduleDAGRRList::PickNodeToScheduleBottomUp so it doesn't pick a node
like that.
The added call to FindAvailableNode() is the key change here; this makes
sure we don't try to schedule a call while we're in the middle of
scheduling a different call. I'm not sure this is the right approach; in
particular, I'm not sure how to prove we don't end up with an infinite
loop of repeatedly backtracking.
This also reverts the code change from D31536. It doesn't do anything
useful: we should never schedule an ADJCALLSTACKDOWN unless we've
already scheduled the corresponding ADJCALLSTACKUP.
Differential Revision: https://reviews.llvm.org/D33818
llvm-svn: 309642
PR33883 shows that calls to intrinsic functions should not have their vector
arguments or returns subject to ABI changes required by the target.
This resolves PR33883.
Thanks to Alex Crichton for reporting the issue!
Reviewers: zoran.jovanovic, atanasyan
Differential Revision: https://reviews.llvm.org/D35765
llvm-svn: 309561
This patch is in 2 parts:
1 - replace combineBT's use of SimplifyDemandedBits (hasOneUse only) with SelectionDAG::GetDemandedBits to more aggressively determine the lower bits used by BT.
2 - update SelectionDAG::GetDemandedBits to support ANY_EXTEND - if the demanded bits are only in the non-extended portion, then peek through and demand from the source value and then ANY_EXTEND that if we found a match.
Differential Revision: https://reviews.llvm.org/D35896
llvm-svn: 309486
There is no situation where this rarely-used argument cannot be
substituted with a DIExpression and removing it allows us to simplify
the DWARF backend. Note that this patch does not yet remove any of
the newly dead code.
rdar://problem/33580047
Differential Revision: https://reviews.llvm.org/D35951
llvm-svn: 309426
Improve DAGTypeLegalizer::convertMask's isSETCCorConvertedSETCC assertion to properly check for any mixture of SETCC or BUILD_VECTOR of constants, or a logical mask op of them.
llvm-svn: 309302
This patch moves the DAGCombiner::GetDemandedBits function to SelectionDAG::GetDemandedBits as a first step towards making it easier for targets to get to the source of any demanded bits without the limitations of SimplifyDemandedBits.
Differential Revision: https://reviews.llvm.org/D35841
llvm-svn: 308983
Check the actual memory type stored and not the extended value size
when considering if truncated store merge is worthwhile.
Reviewers: efriedma, RKSimon, spatel, jyknight
Reviewed By: efriedma
Subscribers: llvm-commits, nhaehnle
Differential Revision: https://reviews.llvm.org/D35623
llvm-svn: 308833
Summary:
When pushing an extension of a constant bitwise operator on a load
into the load, change other uses of the load value if they exist to
prevent the old load from persisting.
Reviewers: spatel, RKSimon, efriedma
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D35030
llvm-svn: 308618