This check currently doesn't seem to do anything useful on any in-tree target:
On non-x86, it always evaluates to false, so we never hit the code path that
creates the shuffle with zero.
On x86, it just forwards to isShuffleMaskLegal(), which is a reasonable thing to
query in general, but doesn't make sense if only restricted to zero blends.
Differential Revision: https://reviews.llvm.org/D24625
llvm-svn: 282567
This allows us to, in some cases, create a vector_shuffle out of a build_vector, when
the inputs to the build are extract_elements from two different vectors, at least one
of which is wider than the output. (E.g. a <8 x i16> being constructed out of
elements from a <16 x i16> and a <8 x i16>).
Differential Revision: https://reviews.llvm.org/D24491
llvm-svn: 281402
To avoid assertion, we must ensure that the inner shift constant is within range before calling ConstantSDNode::getZExtValue(). We already know that the outer shift constant is in range.
Followup to D23007
llvm-svn: 281362
This should make it easier to add cases that we currently don't cover,
like supporting more kinds of type mismatches and more than 2 input vectors.
llvm-svn: 281283
Summary:
An IR load can be invariant, dereferenceable, neither, or both. But
currently, MI's notion of invariance is IR-invariant &&
IR-dereferenceable.
This patch splits up the notions of invariance and dereferenceability at
the MI level. It's NFC, so adds some probably-unnecessary
"is-dereferenceable" checks, which we can remove later if desired.
Reviewers: chandlerc, tstellarAMD
Subscribers: jholewinski, arsenm, nemanjai, llvm-commits
Differential Revision: https://reviews.llvm.org/D23371
llvm-svn: 281151
I might have called this "r246507, the sequel". It fixes the same issue, as the
issue has cropped up in a few more places. The underlying problem is that
isSetCCEquivalent can pick up select_cc nodes with a result type that is not
legal for a setcc node to have, and if we use that type to create new setcc
nodes, nothing fixes that (and so we've violated the contract that the
infrastructure has with the backend regarding setcc node types).
Fixes PR30276.
For convenience, here's the commit message from r246507, which explains the
problem is greater detail:
[DAGCombine] Fixup SETCC legality checking
SETCC is one of those special node types for which operation actions (legality,
etc.) is keyed off of an operand type, not the node's value type. This makes
sense because the value type of a legal SETCC node is determined by its
operands' value type (via the TLI function getSetCCResultType). When the
SDAGBuilder creates SETCC nodes, it either creates them with an MVT::i1 value
type, or directly with the value type provided by TLI.getSetCCResultType.
The first problem being fixed here is that DAGCombine had several places
querying TLI.isOperationLegal on SETCC, but providing the return of
getSetCCResultType, instead of the operand type directly. This does not mean
what the author thought, and "luckily", most in-tree targets have SETCC with
Custom lowering, instead of marking them Legal, so these checks return false
anyway.
The second problem being fixed here is that two of the DAGCombines could create
SETCC nodes with arbitrary (integer) value types; specifically, those that
would simplify:
(setcc a, b, op1) and|or (setcc a, b, op2) -> setcc a, b, op3
(which is possible for some combinations of (op1, op2))
If the operands of the and|or node are actual setcc nodes, then this is not an
issue (because the and|or must share the same type), but, the relevant code in
DAGCombiner::visitANDLike and DAGCombiner::visitORLike actually calls
DAGCombiner::isSetCCEquivalent on each operand, and that function will
recognise setcc-like select_cc nodes with other return types. And, thus, when
creating new SETCC nodes, we need to be careful to respect the value-type
constraint. This is even true before type legalization, because it is quite
possible for the SELECT_CC node to have a legal type that does not happen to
match the corresponding TLI.getSetCCResultType type.
To be explicit, there is nothing that later fixes the value types of SETCC
nodes (if the type is legal, but does not happen to match
TLI.getSetCCResultType). Creating SETCCs with an MVT::i1 value type seems to
work only because, either MVT::i1 is not legal, or it is what
TLI.getSetCCResultType returns if it is legal. Fixing that is a larger change,
however. For the time being, restrict the relevant transformations to produce
only SETCC nodes with a value type matching TLI.getSetCCResultType (or MVT::i1
prior to type legalization).
Fixes PR24636.
llvm-svn: 280767
For the store of a wide value merged from a pair of values, especially int-fp pair,
sometimes it is more efficent to split it into separate narrow stores, which can
remove the bitwise instructions or sink them to colder places.
Now the feature is only enabled on x86 target, and only store of int-fp pair is
splitted. It is possible that the application scope gets extended with perf evidence
support in the future.
Differential Revision: https://reviews.llvm.org/D22840
llvm-svn: 280505
This fixes a regression introduced by revision 268094.
Revision 268094 added the following dag combine rule:
// trunc (shl x, K) -> shl (trunc x), K => K < vt.size / 2
That rule converts a truncate of a shift-by-constant into a shift of a truncated
value. We do this only if the shift count is less than half the size in bits of
the truncated value (K < vt.size / 2).
The problem is that the constraint on the shift count is incorrect, so the rule
doesn't work well in some cases involving vector types. The combine rule should
have been written instead like this:
// trunc (shl x, K) -> shl (trunc x), K => K < vt.getScalarSizeInBits()
Basically, if K is smaller than the "scalar size in bits" of the truncated value
then we know that by "sinking" the truncate into the operand of the shift we
would never accidentally make the shift undefined.
This patch fixes the check on the shift count, and adds test cases to make sure
that we don't regress the behavior.
Differential Revision: https://reviews.llvm.org/D24154
llvm-svn: 280482
Legalization tends to create anyext(trunc) patterns. This should always be
combined - into either a single trunc, a single ext, or nothing if the
types match exactly. But if we happen to combine the trunc first, we may pull
the trunc away from the anyext or make it implicit (e.g. the truncate(extract)
-> extract(bitcast) fold).
To prevent this, we can avoid doing the fold, similarly to how we already handle
fpround(fpextend).
Differential Revision: https://reviews.llvm.org/D23893
llvm-svn: 280386
If the input vector to INSERT_SUBVECTOR is another INSERT_SUBVECTOR, and this inserted subvector replaces the last insertion, then insert into the common source vector.
i.e.
INSERT_SUBVECTOR( INSERT_SUBVECTOR( Vec, SubOld, Idx ), SubNew, Idx ) --> INSERT_SUBVECTOR( Vec, SubNew, Idx )
Differential Revision: https://reviews.llvm.org/D23330
llvm-svn: 278211
As detailed on D22726, much of the shift combining code assume constant values will fit into a uint64_t value and calls ConstantSDNode::getZExtValue where it probably shouldn't (leading to asserts). Using APInt directly avoids this problem but we encounter other assertions if we attempt to compare/operate on 2 APInt of different bitwidths.
This patch adds a helper function to ensure that 2 APInt values are zero extended as required so that they can be safely used together. I've only added an initial example use for this to the '(SHIFT (SHIFT x, c1), c2) --> (SHIFT x, (ADD c1, c2))' combines. Further cases can easily be added as required.
Differential Revision: https://reviews.llvm.org/D23007
llvm-svn: 278141
On modern Intel processors hardware SQRT in many cases is faster than RSQRT
followed by Newton-Raphson refinement. The patch introduces a simple heuristic
to choose between hardware SQRT instruction and Newton-Raphson software
estimation.
The patch treats scalars and vectors differently. The heuristic is that for
scalars the compiler should optimize for latency while for vectors it should
optimize for throughput. It is based on the assumption that throughput bound
code is likely to be vectorized.
Basically, the patch disables scalar NR for big cores and disables NR completely
for Skylake. Firstly, scalar SQRT has shorter latency than NR code in big cores.
Secondly, vector SQRT has been greatly improved in Skylake and has better
throughput compared to NR.
Differential Revision: https://reviews.llvm.org/D21379
llvm-svn: 277725
We used to combine "sext(setcc x, y, cc) -> (select (setcc x, y, cc), -1, 0)"
Instead, we should combine to (select (setcc x, y, cc), T, 0) where the value
of T is 1 or -1, depending on the type of the setcc, and getBooleanContents()
for the type if it is not i1.
This fixes PR28504.
llvm-svn: 277371
Using getZExtValue() will assert if the value doesn't fit into uint64_t - SHL was already doing this, I've just updated ASHR/LSHR to match
As mentioned on D22726
llvm-svn: 276855
Summary:
Instead, we take a single flags arg (a bitset).
Also add a default 0 alignment, and change the order of arguments so the
alignment comes before the flags.
This greatly simplifies many callsites, and fixes a bug in
AMDGPUISelLowering, wherein the order of the args to getLoad was
inverted. It also greatly simplifies the process of adding another flag
to getLoad.
Reviewers: chandlerc, tstellarAMD
Subscribers: jholewinski, arsenm, jyknight, dsanders, nemanjai, llvm-commits
Differential Revision: http://reviews.llvm.org/D22249
llvm-svn: 275592
As a result, the urem instruction will not be expanded to a sequence of umull,
lsrs, muls and sub instructions, but just a call to __aeabi_uidivmod.
Differential Revision: http://reviews.llvm.org/D22131
llvm-svn: 274843
Summary:
findBetterNeighborChains may or may not find a better chain for each node it finds, which include the node ("St") that visitSTORE is currently processing. If no better chain is found for St, visitSTORE should continue instead of return SDValue(St, 0), as if it's CombinedTo'ed.
This fixes bug 28130. There might be other ways to make the test pass (see D21409). I think both of the patches are fixing actual bugs revealed by the same testcase.
Reviewers: echristo, wschmidt, hfinkel, kbarton, amehsan, arsenm, nemanjai, bogner
Subscribers: mehdi_amini, nemanjai, llvm-commits
Differential Revision: http://reviews.llvm.org/D21692
llvm-svn: 274644
This reverts commit r259387 because it inserts illegal code after legalization
in some backends where i64 OR type is illegal for example.
llvm-svn: 274573
For the most part this simplifies all callers. There were two places in X86 that needed an explicit makeArrayRef to shorten a statically sized array.
llvm-svn: 274337
Recommiting after correcting over-eager Debug Value transfer fixing PR28270.
[DAG] Previously debug values would transfer debuginfo for the selected
start node for a replacement which allows for debug to be dropped.
Push debug value transfer to occur with node/value replacement in
SelectionDAG, remove now extraneous transfers of debug values.
This refixes PR9817 which was being incompletely checked in the
testsuite.
Reviewers: jyknight
Subscribers: dblaikie, llvm-commits
Differential Revision: http://reviews.llvm.org/D21037
llvm-svn: 273585
Recommiting after fixing over-aggressive assertion
[DAG] Previously debug values would transfer debuginfo for the selected
start node for a replacement which allows for debug to be dropped.
Push debug value transfer to occur with node/value replacement in
SelectionDAG, remove now extraneous transfers of debug values.
This refixes PR9817 which was being incompletely checked in the
testsuite.
Reviewers: jyknight
Subscribers: dblaikie, llvm-commits
Differential Revision: http://reviews.llvm.org/D21037
llvm-svn: 273456
When calculating a square root using Newton-Raphson with two constants,
a naive implementation is to use five multiplications (four muls to calculate
reciprocal square root and another one to calculate the square root itself).
However, after some reassociation and CSE the same result can be obtained
with only four multiplications. Unfortunately, there's no reliable way to do
such a reassociation in the back-end. So, the patch modifies NR code itself
so that it directly builds optimal code for SQRT and doesn't rely on any
further reassociation.
Patch by Nikolai Bozhenov!
Differential Revision: http://reviews.llvm.org/D21127
llvm-svn: 272920