Commit Graph

2401 Commits

Author SHA1 Message Date
Krzysztof Parzyszek 623eb54361 [SelectionDAG] Add missing closing parentheses in comments, NFC
llvm-svn: 333907
2018-06-04 14:54:53 +00:00
Nirav Dave fc9a700f94 [DAG] Avoid checking for consecutive stores in store merge. NFCI.
llvm-svn: 333766
2018-06-01 15:05:55 +00:00
Nirav Dave 39ece11ae5 [DAG] Simplify Expression. NFC.
llvm-svn: 333765
2018-06-01 15:05:30 +00:00
Nirav Dave 0fc27acaa2 [DAG] Remove untriggerable check. NFCI.
Candidate check precludes this check.

llvm-svn: 333764
2018-06-01 15:05:05 +00:00
Nirav Dave a74921a696 [DAG] Prune store merge legal store check to stop invalid size. NFCI.
Do not consider store sizes large than the maximum legal store size.

llvm-svn: 333763
2018-06-01 15:04:40 +00:00
Krzysztof Parzyszek 0b6187c1a9 [SelectionDAG] Expand UADDO/USUBO into ADD/SUBCARRY if legal for target
Additionally, implement handling of ADD/SUBCARRY on Hexagon, utilizing
the UADDO/USUBO expansion.

Differential Revision: https://reviews.llvm.org/D47559

llvm-svn: 333751
2018-06-01 14:00:32 +00:00
Roman Lebedev 9f65d16d5d [DAGCombiner] isAllOnesConstantOrAllOnesSplatConstant(): look through bitcasts
Summary:
As pointed out in D46528, we errneously transform cases like `xor X, -1`,
even though we use said function.
It's because the `-1` is actually a bitcast there.
So i think we can just look through it in the function.

Differential Revision: https://reviews.llvm.org/D47156

llvm-svn: 332905
2018-05-21 21:41:10 +00:00
Roman Lebedev 7772de25d0 [DAGCombine][X86][AArch64] Masked merge unfolding: vector edition.
Summary:
This **appears** to be the last missing piece for the masked merge pattern handling in the backend.

This is [[ https://bugs.llvm.org/show_bug.cgi?id=37104 | PR37104 ]].

[[ https://bugs.llvm.org/show_bug.cgi?id=6773 | PR6773 ]] will introduce an IR canonicalization that is likely bad for the end assembly.
Previously, `andps`+`andnps` / `bsl` would be generated. (see `@out`)
Now, they would no longer be generated  (see `@in`), and we need to make sure that they are generated.

Differential Revision: https://reviews.llvm.org/D46528

llvm-svn: 332904
2018-05-21 21:41:02 +00:00
Craig Topper 25444c852a [DAGCombiner] Use computeKnownBits to match rotate patterns that have had their amount masking modified by simplifyDemandedBits
SimplifyDemandedBits can remove bits from the masks for the shift amounts we need to see to detect rotates.

This patch uses zeroes from computeKnownBits to fill in some of these mask bits to make the match work.

As currently written this calls computeKnownBits even when the mask hasn't been simplified because it made the code simpler. If we're worried about compile time performance we can improve this.

I know we're talking about making a rotate intrinsic, but hopefully we can go ahead and do this change and just make sure the rotate intrinsic also handles it.

Differential Revision: https://reviews.llvm.org/D47116

llvm-svn: 332895
2018-05-21 21:09:18 +00:00
Nirav Dave 11fd14c1ac [DAG] Prune cycle check in store merge.
As part of merging stores we check that fusing the nodes does not
cause a cycle due to one candidate store being indirectly dependent on
another store (this may happen via chained memory copies). This is
done by searching if a store is a predecessor to another store's
value.

Prune the search at the candidate search's root node which is a
predecessor to all candidate stores. This reduces the
size of the subgraph searched in large basic blocks.

Reviewers: jyknight

Subscribers: llvm-commits, hiraditya

Differential Revision: https://reviews.llvm.org/D46955

llvm-svn: 332490
2018-05-16 16:48:20 +00:00
Nirav Dave d9d86cb738 [DAG] Defer merge store cycle checking to just before merge. NFCI.
llvm-svn: 332489
2018-05-16 16:47:54 +00:00
Nirav Dave a87de7d846 [DAGCombine] Move load checks on store of loads into candidate
search. NFCI.

Migrate single-use and non-volatility, non-indexed requirements on
stores of immediate store values to candidate collection pass from
later stage.

llvm-svn: 332392
2018-05-15 20:31:53 +00:00
Nicola Zaghen d34e60ca85 Rename DEBUG macro to LLVM_DEBUG.
The DEBUG() macro is very generic so it might clash with other projects.
The renaming was done as follows:
- git grep -l 'DEBUG' | xargs sed -i 's/\bDEBUG\s\?(/LLVM_DEBUG(/g'
- git diff -U0 master | ../clang/tools/clang-format/clang-format-diff.py -i -p1 -style LLVM
- Manual change to APInt
- Manually chage DOCS as regex doesn't match it.

In the transition period the DEBUG() macro is still present and aliased
to the LLVM_DEBUG() one.

Differential Revision: https://reviews.llvm.org/D43624

llvm-svn: 332240
2018-05-14 12:53:11 +00:00
Vedant Kumar 99d5c072f0 [DAGCombiner] Set the right SDLoc on extended SETCC uses (7/N)
ExtendSetCCUses updates SETCC nodes which use a load (OriginalLoad) to
reflect a simplification to the load (ExtLoad).

Based on my reading, ExtendSetCCUses may create new nodes to extend a
constant attached to a SETCC. It also creates fresh SETCC nodes which
refer to any updated operands.

ISTM that the location applied to the new constant and SETCC nodes
should be the same as the location of the ExtLoad.

This was suggested by Adrian in https://reviews.llvm.org/D45995.

Part of: llvm.org/PR37262

Differential Revision: https://reviews.llvm.org/D46216

llvm-svn: 332119
2018-05-11 18:40:10 +00:00
Vedant Kumar fd340a4047 [DAGCombiner] Set the right SDLoc on a newly-created sextload (6/N)
This teaches tryToFoldExtOfLoad to set the right location on a
newly-created extload. With that in place, the logic for performing a
certain ([s|z]ext (load ...)) combine becomes identical for sexts and
zexts, and we can get rid of one copy of the logic.

The test case churn is due to dependencies on IROrders inherited from
the wrong SDLoc.

Part of: llvm.org/PR37262

Differential Revision: https://reviews.llvm.org/D46158

llvm-svn: 332118
2018-05-11 18:40:08 +00:00
Vedant Kumar f0e5f7c45e [DAGCombiner] Factor out duplicated logic for an extload combine, NFC (5/N)
Part of the logic for combining (zext (load ...)) and (sext (load ...))
is duplicated. This creates problems because bugs in one version have to
be fixed again in the other version.

To address this, as a first step, I've extracted the duplicate logic
into a helper. I'll fix the debug location bug in the helper and
eliminate the copy of its logic in a followup.

Part of: llvm.org/PR37262

Differential Revision: https://reviews.llvm.org/D46157

llvm-svn: 332117
2018-05-11 18:40:02 +00:00
Nirav Dave a5ad417589 [DAG] Avoid using deleted node in rebuildSetCC
Summary:
The combine in rebuildSetCC may be combined to another
node leaving our references stale. Keep a handle on
it to avoid stale references.

Fixes PR36602.

Reviewers: dbabokin, RKSimon, eli.friedman, davide

Subscribers: hiraditya, uabelho, JesperAntonsson, qcolombet, llvm-commits

Differential Revision: https://reviews.llvm.org/D46404

llvm-svn: 331985
2018-05-10 14:28:54 +00:00
Craig Topper 176ec8506f [DAGCombiner] In visitBITCAST when trying to constant fold the bitcast, only call getBitcast if its an fp->int or int->fp conversion even when before legalize ops.
Previously if !LegalOperations we would blindly call getBitcast and hope that getNode would constant fold it. But if the conversion is between a vector and a scalar, getNode has no simplification.

This means we would just get back the original N. We would then return that N which would make the caller of visitBITCAST think that we used CombineTo and did our own worklist management. This prevents target specific optimizations from being called for vector/scalar bitcasts until after legal operations.

llvm-svn: 331896
2018-05-09 17:14:27 +00:00
Amara Emerson 4e66142f14 [DAGCombine] Change store merge candidates check cut off to 1024.
The previous value of 8192 resulted in severe compile time hits in
some pathological cases.

rdar://39781410

Differential Revision: https://reviews.llvm.org/D46581

llvm-svn: 331888
2018-05-09 15:53:06 +00:00
Roman Lebedev 9bd6067db6 [DAGCombiner] Masked merge: enhance handling of 'andn' with immediates
Summary:
Split off from D46031.

The previous patch, D46493, completely disabled unfolding in case of immediates.
But we can do better:
{F6120274} {F6120277}

https://rise4fun.com/Alive/xJS

Reviewers: spatel, craig.topper

Reviewed By: spatel

Subscribers: andreadb, llvm-commits

Differential Revision: https://reviews.llvm.org/D46494

llvm-svn: 331685
2018-05-07 21:52:22 +00:00
Roman Lebedev cc42d08b1d [DagCombiner] Not all 'andn''s work with immediates.
Summary:
Split off from D46031.

In masked merge case, this degrades IPC by decreasing instruction count.
{F6108777}
The next patch should be able to recover and improve this.

This also affects the transform @spatel have added in D27489 / rL289738,
and the test coverage for X86 was missing.
But after i have added it, and looked at the changes in MCA, i'm somewhat confused.
{F6093591} {F6093592} {F6093593}
I'd say this regression is an improvement, since `IPC` increased in that case?

Reviewers: spatel, craig.topper

Reviewed By: spatel

Subscribers: andreadb, llvm-commits, spatel

Differential Revision: https://reviews.llvm.org/D46493

llvm-svn: 331684
2018-05-07 21:52:11 +00:00
Roman Lebedev cb1af9134a [NFC][DAGCombine] unfoldMaskedMerge(): rename two variables
The current names can be confused with the A and B sides
of the canonical masked merge pattern.

llvm-svn: 331609
2018-05-06 20:02:22 +00:00
Roman Lebedev a3b0b59f54 [DAGCombiner] Masked merge: don't touch "not" xor's.
Summary:
Split off form D46031.

It seems we don't want to transform the pattern if the `xor`'s are actually `not`'s.
In vector case, this breaks `andnpd` / `vandnps` patterns.

That being said, we may want to re-visit this `not` handling, maybe in D46073.

Reviewers: spatel, craig.topper, javed.absar

Reviewed By: spatel

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D46492

llvm-svn: 331595
2018-05-05 15:45:40 +00:00
Roman Lebedev 49ada82fa7 [NFC][DagCombiner] unfoldMaskedMerge(): improve readability.
llvm-svn: 331588
2018-05-05 10:39:54 +00:00
Craig Topper 781aa181ab Fix a bunch of places where operator-> was used directly on the return from dyn_cast.
Inspired by r331508, I did a grep and found these.

Mostly just change from dyn_cast to cast. Some cases also showed a dyn_cast result being converted to bool, so those I changed to isa.

llvm-svn: 331577
2018-05-05 01:57:00 +00:00
Michael Berg 7acc81b744 Fast Math Flag mapping into SDNode
Summary: Adding support for Fast flags in the SDNode to leverage fast math sub flag usage.

Reviewers: spatel, arsenm, jbhateja, hfinkel, escha, qcolombet, echristo, wristow, javed.absar

Reviewed By: spatel

Subscribers: llvm-commits, rampitec, nhaehnle, tstellar, FarhanaAleen, nemanjai, javed.absar, jbhateja, hfinkel, wdng

Differential Revision: https://reviews.llvm.org/D45710

llvm-svn: 331547
2018-05-04 18:48:20 +00:00
Vedant Kumar e23173b677 [DAGCombiner] Fix SDLoc in a (zext (zextload x)) combine (4/N)
The logic for this combine is almost identical to the logic for a
(sext (sextload x)) combine.

This commit factors out the logic so it can be shared by both combines,
and corrects the SDLoc assigned in the zext version of the combine.

Prior to this patch, for the given test case, we would apply the
location associated with the udiv instruction to instructions which
perform the load.

Part of: llvm.org/PR37262

llvm-svn: 331303
2018-05-01 19:51:15 +00:00
Vedant Kumar d7117ed0f9 [DAGCombiner] Fix SDLoc in a (sext (sextload x)) combine (3/N)
Prior to this patch, for the given test case, we would apply the
location associated with the sdiv instruction to instructions which
perform the load.

Part of: llvm.org/PR37262.

Differential Revision: https://reviews.llvm.org/D46222

llvm-svn: 331302
2018-05-01 19:51:15 +00:00
Vedant Kumar cc7b2a55c2 [DAGCombiner] Change the SDLoc on split extloads (2/N)
In DAGCombiner, we try to simplify this pattern:

  ([s|z]ext (load ...))

Conceptually, a new extload which is created while splitting the load
should have the same debug location as the load.

Making this change affects the IROrder of the new load, causing some
test case churn.

In practice, the new location is never different from the location of
the [s|z]ext, at least not during check-llvm or a stage2 build.

Part of: llvm.org/PR37262

Differential Revision: https://reviews.llvm.org/D46156

llvm-svn: 331301
2018-05-01 19:29:15 +00:00
Vedant Kumar ee4bfcaa5a [DAGCombiner] Set the right SDLoc on a newly-created zextload (1/N)
Setting the right SDLoc on a newly-created zextload fixes a line table
bug which resulted in non-linear stepping behavior.

Several backend tests contained CHECK lines which relied on the IROrder
inherited from the wrong SDLoc. This patch breaks that dependence where
feasbile and regenerates test cases where not.

In some cases, changing a node's IROrder may alter register allocation
and spill behavior. This can affect performance. I have chosen not to
prevent this by applying a "known good" IROrder to SDLocs, as this may
hide a more general bug in the scheduler, or cause regressions on other
test inputs.

rdar://33755881, Part of: llvm.org/PR37262

Differential Revision: https://reviews.llvm.org/D45995

llvm-svn: 331300
2018-05-01 19:26:15 +00:00
Adrian Prantl 5f8f34e459 Remove \brief commands from doxygen comments.
We've been running doxygen with the autobrief option for a couple of
years now. This makes the \brief markers into our comments
redundant. Since they are a visual distraction and we don't want to
encourage more \brief markers in new code either, this patch removes
them all.

Patch produced by

  for i in $(git grep -l '\\brief'); do perl -pi -e 's/\\brief //g' $i & done

Differential Revision: https://reviews.llvm.org/D46290

llvm-svn: 331272
2018-05-01 15:54:18 +00:00
Sanjay Patel 1babf5ff32 [DAGCombiner] rename function attribute for disabling ftrunc transform
This is the matching name change for the Clang patch at:
D46236
rL331209

Differential Revision: https://reviews.llvm.org/D46237 

llvm-svn: 331210
2018-04-30 18:20:33 +00:00
Heejin Ahn d20d0648ed [DAGCombiner] Fix a case of 1 in non-splat vector pow2 divisor
Summary:
D42479 (rL329525) enabled SDIV combine for pow2 non-splat vector
dividers. But when there is a 1 in a vector, the instruction sequence to
be generated involves shifting a value by the number of its bit widths,
which is undefined
(c64f4dbfe3/lib/CodeGen/SelectionDAG/DAGCombiner.cpp (L6000-L6006)).

Especially, in architectures that do not support vector instructions,
each of element in a vector will be computed separately using scalar
operations, and then the resulting value will be undef for '1' values
in a vector.

(All 1's vector is fine; only vectors mixed with 1 and others will be
affected.)

Reviewers: RKSimon, jgravelle-google

Subscribers: jfb, dschuff, sbc100, jgravelle-google, llvm-commits

Differential Revision: https://reviews.llvm.org/D46161

llvm-svn: 331092
2018-04-27 22:23:11 +00:00
Sanjay Patel 5a90285bd9 [DAGCombiner] limit ftrunc optimizations with function attribute
As noted, the attribute name is subject to change once we have
the clang side implemented, but it's clear that we need some
kind of attribute-based predication here based on the discussion
for:
rL330437

llvm-svn: 330951
2018-04-26 16:04:44 +00:00
Sanjay Patel a5da086386 [DAGCombiner] refactor FP->int->FP folds; NFC
As discussed in the post-review comments for rL330437,
we need to guard this fold to allow existing code to
keep working with the undefined behavior that they've
come to rely on.

That would mean duplicating more code than we already 
have, so let's fix that first. 

llvm-svn: 330947
2018-04-26 15:20:18 +00:00
Craig Topper f3cefad255 [DAGCombiner][X86] When promoting loads don't use ZEXTLOAD even its legal
We were previously prefering ZEXTLOAD over EXTLOAD if it is legal. This triggers during X86's promotion of i16->i32. Not sure about other targets.

Using ZEXTLOAD can prevent folding it to SEXTLOAD later if we were to promote a sign extended operand like we would need for SRA. However, X86 doesn't currently promote i16 SRA. I was looking into doing that which is how I found this issue.

This is also blocking our ability to fold 4 byte aligned EXTLOADs with "loadi32". This is what caused most of the test changes here.

Differential Revision: https://reviews.llvm.org/D45585#inline-402825

llvm-svn: 330781
2018-04-24 22:35:27 +00:00
Roman Lebedev 95c6eaf530 [DAGCombiner] Unfold scalar masked merge if profitable
Summary:
This is [[ https://bugs.llvm.org/show_bug.cgi?id=37104 | PR37104 ]].

[[ https://bugs.llvm.org/show_bug.cgi?id=6773 | PR6773 ]] will introduce an IR canonicalization that is likely bad for the end assembly.
Previously, `andl`+`andn`/`andps`+`andnps` / `bic`/`bsl` would be generated. (see `@out`)
Now, they would no longer be generated  (see `@in`).
So we need to make sure that they are still generated.

If the mask is constant, we do nothing. InstCombine should have unfolded it.
Else, i use `hasAndNot()` TLI hook.

For now, only handle scalars.

https://rise4fun.com/Alive/bO6

----

I *really* don't like the code i wrote in `DAGCombiner::unfoldMaskedMerge()`.
It is super fragile. Is there something like IR Pattern Matchers for this?

Reviewers: spatel, craig.topper, RKSimon, javed.absar

Reviewed By: spatel

Subscribers: andreadb, courbet, kristof.beyls, javed.absar, rengolin, nemanjai, llvm-commits

Differential Revision: https://reviews.llvm.org/D45733

llvm-svn: 330646
2018-04-23 20:38:49 +00:00
Sanjay Patel 3d453ad711 [DAGCombine] (float)((int) f) --> ftrunc (PR36617)
This was originally committed at rL328921 and reverted at rL329920 to
investigate failures in Chrome. This time I've added to the ReleaseNotes
to warn users of the potential of exposing UB and let me repeat that
here for more exposure:

  Optimization of floating-point casts is improved. This may cause surprising
  results for code that is relying on undefined behavior. Code sanitizers can
  be used to detect affected patterns such as this:

    int main() {
      float x = 4294967296.0f;
      x = (float)((int)x);
      printf("junk in the ftrunc: %f\n", x);
      return 0;
    }

    $ clang -O1 ftrunc.c -fsanitize=undefined ; ./a.out
    ftrunc.c:5:15: runtime error: 4.29497e+09 is outside the range of 
                   representable values of type 'int'
    junk in the ftrunc: 0.000000


Original commit message:

fptosi / fptoui round towards zero, and that's the same behavior as ISD::FTRUNC,
so replace a pair of casts with the equivalent node. We don't have to account for
special cases (NaN, INF) because out-of-range casts are undefined.

Differential Revision: https://reviews.llvm.org/D44909

llvm-svn: 330437
2018-04-20 15:07:55 +00:00
Gerolf Hoflehner 5b4a67af1b [DAGCombiner] Fix for oss-fuzz bug
llvm-svn: 330178
2018-04-17 07:22:34 +00:00
Sanjay Patel 6c3af659a2 [DAGCombiner, PowerPC] allow X - (fpext(-Y) --> X + fpext(Y) with multiple uses
This is a transform that I limited in instcombine in rL329821 because it was 
creating more instructions in IR when the cast has multiple uses.

But if the cast is free, then we can do the transform regardless of other
uses because it improves the potential throughput of the calculation by
removing a dependency on the fneg.

Differential Revision: https://reviews.llvm.org/D45598

llvm-svn: 330098
2018-04-15 16:43:48 +00:00
Sanjay Patel a54e7d1a6d [DAGCombiner] simplify code; NFC
llvm-svn: 329964
2018-04-12 22:14:58 +00:00
Sanjay Patel 5ace2b765a revert r328921 - [DAGCombine] (float)((int) f) --> ftrunc (PR36617)
This change is exposing UB in source code - as was warned/predicted. :)
See D44909 for discussion. Reverting while we figure out how to fix things.

llvm-svn: 329920
2018-04-12 15:27:01 +00:00
Sam Parker 1f4f4d9a08 [DAGCombine] Improve ReduceLoad for SRL
Recommitting r329283, third time lucky...

If the SRL node is only used by an AND, we may be able to set the
ExtVT to the width of the mask, making the AND redundant. To support
this, another check has been added in isLegalNarrowLoad which queries
whether the load is valid.

Differential Revision: https://reviews.llvm.org/D41350

llvm-svn: 329551
2018-04-09 08:16:11 +00:00
Zvi Rackover 7a53f169f1 DAGCombiner: Combine SDIV with non-splat vector pow2 divisor
Summary:
Extend existing SDIV combine for pow2 constant divider to handle
non-splat vectors of pow2 constants.

Reviewers: RKSimon, craig.topper, spatel, hfinkel, efriedma

Reviewed By: RKSimon

Subscribers: magabari, llvm-commits

Differential Revision: https://reviews.llvm.org/D42479

llvm-svn: 329525
2018-04-08 11:35:20 +00:00
Guozhi Wei 0eb86c8efc [DAGCombiner] Fold (zext (and/or/xor (shl/shr (load x), cst), cst))
In our real world application, we found the following optimization is missed in DAGCombiner

(zext (and/or/xor (shl/shr (load x), cst), cst)) -> (and/or/xor (shl/shr (zextload x), (zext cst)), (zext cst))

If the user of original zext is an add, it may enable further lea optimization on x86.

This patch add a new function CombineZExtLogicopShiftLoad to do this optimization.

Differential Revision: https://reviews.llvm.org/D44402

llvm-svn: 329516
2018-04-07 23:36:10 +00:00
Craig Topper 5b95eae1c3 [DAGCombiner] Add a combine to turn a build vector of zero extends of extract vector elts into a vector zero extend and possibly an extract subvector.
llvm-svn: 329509
2018-04-07 19:09:50 +00:00
Mandeep Singh Grang e92f0cfe34 [CodeGen] Change std::sort to llvm::sort in response to r327219
Summary:
r327219 added wrappers to std::sort which randomly shuffle the container before sorting.
This will help in uncovering non-determinism caused due to undefined sorting
order of objects having the same key.

To make use of that infrastructure we need to invoke llvm::sort instead of std::sort.

Note: This patch is one of a series of patches to replace *all* std::sort to llvm::sort.
Refer the comments section in D44363 for a list of all the required patches.

Reviewers: bogner, rnk, MatzeB, RKSimon

Reviewed By: rnk

Subscribers: JDevlieghere, javed.absar, llvm-commits

Differential Revision: https://reviews.llvm.org/D45133

llvm-svn: 329435
2018-04-06 18:08:42 +00:00
Sam Parker 0e7deb8104 [DAGCombine] Revert r329160
Again, broke the big endian stage 2 builders.

llvm-svn: 329283
2018-04-05 13:46:17 +00:00
Sam Parker 7ec722d603 [DAGCombine] Improve ReduceLoadWidth for SRL
Recommitting rL321259. Previosuly this caused an issue with PPCBE but
I didn't receieve a reproducer and didn't have the time to follow up.
If the issue appears again, please provide a reproducer so I can fix
it.

Original commit message:

If the SRL node is only used by an AND, we may be able to set the
ExtVT to the width of the mask, making the AND redundant. To support
this, another check has been added in isLegalNarrowLoad which queries
whether the load is valid.

Differential Revision: https://reviews.llvm.org/D41350

llvm-svn: 329160
2018-04-04 09:26:56 +00:00
Sanjay Patel 6124cae8f7 [DAGCombine] (float)((int) f) --> ftrunc (PR36617)
fptosi / fptoui round towards zero, and that's the same behavior as ISD::FTRUNC, 
so replace a pair of casts with the equivalent node. We don't have to account for 
special cases (NaN, INF) because out-of-range casts are undefined.

Differential Revision: https://reviews.llvm.org/D44909

llvm-svn: 328921
2018-03-31 17:55:44 +00:00
Sanjay Patel e09b7dcf3d [SelectionDAG] Removing FABS folding from DAGCombiner
The code has bugs dealing with -0.0.

Since D44550 introduced FABS pattern folding in InstCombine, 
this patch removes the now-redundant code that causes 
https://bugs.llvm.org/show_bug.cgi?id=36600.

Patch by Mikhail Dvoretckii!

Differential Revision: https://reviews.llvm.org/D44683

llvm-svn: 328872
2018-03-30 15:42:52 +00:00
Craig Topper 2fa1436206 [IR][CodeGen] Remove dependency on EVT from IR/Function.cpp. Move EVT to CodeGen layer.
Currently EVT is in the IR layer only because of Function.cpp needing a very small piece of the functionality of EVT::getEVTString(). The rest of EVT is used in codegen making CodeGen a better place for it.

The previous code converted a Type* to EVT and then called getEVTString. This was only expected to handle the primitive types from Type*. Since there only a few primitive types, we can just print them as strings directly.

Differential Revision: https://reviews.llvm.org/D45017

llvm-svn: 328806
2018-03-29 17:21:10 +00:00
David Blaikie 36a0f226b1 Fix layering by moving ValueTypes.h from CodeGen to IR
ValueTypes.h is implemented in IR already.

llvm-svn: 328397
2018-03-23 23:58:31 +00:00
David Blaikie 13e77db2df Fix layering of MachineValueType.h by moving it from CodeGen to Support
This is used by llvm tblgen as well as by LLVM Targets, so the only
common place is Support for now. (maybe we need another target for these
sorts of things - but for now I'm at least making them correct & we can
make them better if/when people have strong feelings)

llvm-svn: 328395
2018-03-23 23:58:25 +00:00
Martin Storsjo db75aa96d3 Revert "[DAGCombiner] Fold (zext (and/or/xor (shl/shr (load x), cst), cst))"
This reverts commit r328252. This change broke building a number
of projects when targeting ARM and AArch64, see PR36873.

llvm-svn: 328297
2018-03-23 08:36:47 +00:00
Guozhi Wei 17ff975eb1 [DAGCombiner] Fold (zext (and/or/xor (shl/shr (load x), cst), cst))
In our real world application, we found the following optimization is missed in DAGCombiner

(zext (and/or/xor (shl/shr (load x), cst), cst)) -> (and/or/xor (shl/shr (zextload x), (zext cst)), (zext cst))

If the user of original zext is an add, it may enable further lea optimization on x86.

This patch add a new function CombineZExtLogicopShiftLoad to do this optimization.

Differential Revision: https://reviews.llvm.org/D44402

llvm-svn: 328252
2018-03-22 21:47:25 +00:00
Craig Topper 956fec2a4a [DAGCombiner] Fix type in comment. NFC
llvm-svn: 327916
2018-03-19 22:25:26 +00:00
Craig Topper b36cb20ef9 [X86] Teach X86TargetLowering::targetShrinkDemandedConstant to set non-demanded bits if it helps created an and mask that can be matched as a zero extend.
I had to modify the bswap recognition to allow unshrunk masks to make this work.

Fixes PR36689.

Differential Revision: https://reviews.llvm.org/D44442

llvm-svn: 327530
2018-03-14 16:55:15 +00:00
Craig Topper 4aeec51986 [DAGCombiner] Allow visitEXTRACT_SUBVECTOR to combine with BUILD_VECTORS between LegalizeVectorOps and LegalizeDAG.
BUILD_VECTORs aren't themselves legalized until LegalizeDAG so we should still be able to create an "illegal" one before that. This helps combine with BUILD_VECTORS that are introduced during LegalizeVectorOps due to unrolling.

llvm-svn: 327446
2018-03-13 20:36:28 +00:00
Simon Pilgrim 9855b39380 [DAGCombine] visitREM - Don't assume that one divrem isn't driving another
Under some circumstances the divrems won't have been combined together before getting to this code.

So replace the assertion with a if() guard to not expand to X-((X/C)*C) to give the other combine chance to happen.

Reduced from OSS-Fuzz #6883
https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=6883

llvm-svn: 327424
2018-03-13 17:17:15 +00:00
Nirav Dave 775f07d121 Make early exit hasPredecessorHelper return true. NFCI.
All uses conservatively assume in early exit case that it will be a
predecessor. Changing default removes checking code in all uses.

llvm-svn: 327169
2018-03-09 20:56:51 +00:00
Craig Topper 4196dd12a2 [DAGCombiner] Add a peekThroughBitcast to MergeStoresOfConstantsOrVecElts to fix a crash if we are storing a bitcast of a constant.
Loading a constant into a k-register in AVX512 requires a bitcast from a scalar constant. In the test case here we have a k-register store that gets split into multiple parts of KNL. MergeConsecutiveStores sees each of these pieces as a consecutive store and looks through the bitcast to find the underly scalar constant. But when we went to create the combined store we didn't look through the same bitcast.

llvm-svn: 326677
2018-03-04 18:51:46 +00:00
Craig Topper e7ca6f5456 [DAGCombiner] When combining zero_extend of a truncate, only mask before extending for vectors.
Masking first, prevents the extend from being combine with loads. Its also interfering with some vXi1 extraction code.

Differential Revision: https://reviews.llvm.org/D42679

llvm-svn: 326500
2018-03-01 22:32:25 +00:00
Amaury Sechet 893a6b89ff [DAGCOmbine] Ensure that (brcond (setcc ...)) is handled in a canonical manner.
Summary:
There are transformation that change setcc into other constructs, and transform that try to reconstruct a setcc from the brcond condition. Depending on what order these transform are done, the end result differs.

Most of the time, it is preferable to get a setcc as a brcond argument (and this is why brcond try to recreate the setcc in the first place) so we ensure this is done every time by also doing it at the setcc level when the only user is a brcond.

Reviewers: spatel, hfinkel, niravd, craig.topper

Subscribers: nhaehnle, llvm-commits

Differential Revision: https://reviews.llvm.org/D41235

llvm-svn: 325892
2018-02-23 11:50:42 +00:00
Simon Pilgrim be72fe1fda [SelectionDAG] Move matchUnaryPredicate/matchBinaryPredicate into SelectionDAGNodes.h
This allows us to improve vector constant matching in more DAG code (backends, TargetLowering etc.).

Differential Revision: https://reviews.llvm.org/D43466

llvm-svn: 325815
2018-02-22 18:45:13 +00:00
Craig Topper 1d104b996a [DAGCombiner] Add two calls to isVector before making calls to getVectorElementType/getVectorNumElements to avoid an assert.
We looked through a BITCAST, but the bitcast might be a from a scalar type rather than a vector.

I don't have a test case. I stumbled onto it while prototyping another change that isn't ready yet.

llvm-svn: 325750
2018-02-22 07:05:27 +00:00
Craig Topper 35801fa5ce [SelectionDAG] Add LegalTypes flag to getShiftAmountTy. Use it to unify and simplify DAGCombiner and simplifySetCC code and fix a bug.
DAGCombiner and SimplifySetCC both use getPointerTy for shift amounts pre-legalization. DAGCombiner uses a single helper function to hide this. SimplifySetCC does it in multiple places.

This patch adds a defaulted parameter to getShiftAmountTy that can make it return getPointerTy for scalar types. Use this parameter to simplify the SimplifySetCC and DAGCombiner.

Additionally, there were two places in SimplifySetCC that were creating shifts using the target's preferred shift amount pre-legalization. If the target uses a narrow type and the type is illegal, this can cause SimplfiySetCC to create a shift with an amount that can't represent all possible shift values for the type. To fix this we should use pointer type there too.

Alternatively we could make getScalarShiftAmountTy for each target return a safe value for large types as proposed in D43445. And maybe we should still do that, but fixing the SimplifySetCC code keeps other targets from tripping over this in the future.

Fixes PR36250.

Differential Revision: https://reviews.llvm.org/D43449

llvm-svn: 325602
2018-02-20 17:41:05 +00:00
Simon Pilgrim d6beac3b76 [DAGCombiner] Remove simplifyShuffleMask - now handled more generally by SimplifyDemandedVectorElts.
llvm-svn: 325429
2018-02-17 12:36:56 +00:00
Craig Topper dac3c1f5c8 [DAGCombiner] Call ExtendUsesToFormExtLoad in (zext (and (load)))->(and (zextload)) even when the and does not have multiple uses
Same for the sign extend case.

Currently we check for multiple uses on the binop. Then we call ExtendUsesToFormExtLoad to capture SetCCs that use the load. So we only end up finding any setccs when the and has additional uses and the load is used by a setcc. I don't think the and having multiple uses is relevant here. I think we should only be checking for the load having multiple uses.

This changes an NVPTX test because we now find that the load has a second use by a truncate, but ExtendUsesToFormExtLoad only looks at setccs it can extend. All other operations just check isTruncateFree. Maybe we should allow widening of an existing truncate even if its not free?

Differential Revision: https://reviews.llvm.org/D43063

llvm-svn: 325289
2018-02-15 20:20:32 +00:00
Simon Pilgrim 80663ee986 [SelectionDAG] Add initial implementation of TargetLowering::SimplifyDemandedVectorElts
This is mainly a move of simplifyShuffleOperands from DAGCombiner::visitVECTOR_SHUFFLE to create a more general purpose TargetLowering::SimplifyDemandedVectorElts implementation.

Further features can be moved/added in future patches.

Differential Revision: https://reviews.llvm.org/D42896

llvm-svn: 325232
2018-02-15 12:14:15 +00:00
Alexander Ivchenko 7e5d525bd5 [SelectionDAG][X86] Fix incorrect offset generated for VMASKMOV
When creating high MachineMemOperand for MSTORE/MLOAD we supply
it with the original PointerInfo, while the pointer itself had been incremented.
The patch adds the proper offset to the PointerInfo.

llvm-svn: 325135
2018-02-14 15:55:24 +00:00
Craig Topper f73ff612ca [DAGCombiner] Add one use check to fold (not (and x, y)) -> (or (not x), (not y))
Summary:
If the and has an additional use we shouldn't invert it. That creates an additional instruction.

While there add a one use check to the transform above that looked similar.

Reviewers: spatel, RKSimon

Reviewed By: RKSimon

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D43225

llvm-svn: 325019
2018-02-13 16:25:27 +00:00
Simon Pilgrim 0be5567a89 [X86][SSE] Enable SMIN/SMAX/UMIN/UMAX custom lowering for all legal types
This allows us to recognise more saturation patterns and also simplify some MINMAX codegen that was failing to combine CMPGE comparisons to a legal CMPGT.

Differential Revision: https://reviews.llvm.org/D43014

llvm-svn: 324837
2018-02-11 10:52:37 +00:00
Craig Topper 36f913ee80 [SelectionDAG] Remove TargetLowering::getConstTrueVal. Use SelectionDAG::getBoolConstant in the one place it was used.
SelectionDAG::getBoolConstant was recently introduced. At the time I didn't know getConstTrueVal existed, but I think getBoolConstant is better as it will use the source VT to make sure it can properly detect floating point if it is configured differently.

llvm-svn: 324832
2018-02-11 04:58:58 +00:00
Vedant Kumar 7fd9a58d8c Revert "WIP: [DAGCombiner] Assert that debug info is preserved"
This reverts commit r324648. It was committed accidentally.

llvm-svn: 324650
2018-02-08 20:27:35 +00:00
Vedant Kumar 28323ff5a3 WIP: [DAGCombiner] Assert that debug info is preserved
llvm-svn: 324648
2018-02-08 20:27:09 +00:00
Craig Topper c19aed963e [DAGCombiner] Fix a couple mistakes from r324311 by really passing the original load to ExtendSetCCUses.
We're passing the binary op that uses the load instead of the load.

Noticed by inspection. Not sure how to test this because this just prevents the introduction of an extend that will later be truncated and will probably be combined out.

llvm-svn: 324568
2018-02-08 06:27:18 +00:00
Craig Topper 9b9d527427 [DAGCombiner] Don't create truncate nodes in (aext (zextload x)) -> (zextload x) and similar folds. NFCI
The truncate is being used to replace other users of of the load, but we checked that the load only has one use so there are no other uses to replace.

llvm-svn: 324567
2018-02-08 06:04:18 +00:00
Craig Topper cbfe41ac2f [DAGCombiner] Avoid creating truncate nodes in (zext (and (load)))->(and (zextload)) fold until we know for sure we're going to need it. NFCI
The truncate is only needed if the load has additional users. It used to get passed to extendSetCCUses so was created early, but that's no longer the case.

llvm-svn: 324562
2018-02-08 04:38:04 +00:00
Craig Topper bf4ed42606 [DAGCombiner] Rename variable to be slightly better. NFC
We were calling a load LN0 but it came from N0.getOperand(0) so its really more like LN00 if we follow the name used in other places.

llvm-svn: 324561
2018-02-08 04:38:02 +00:00
Craig Topper 58ecffd857 [DAGCombiner][AMDGPU][X86] Turn cttz/ctlz into cttz_zero_undef/ctlz_zero_undef if we can prove the input is never zero
X86 currently has a late DAG combine after cttz/ctlz are turned into BSR+BSF+CMOV to detect this and remove the CMOV. But we should be able to do this much earlier and avoid creating the cmov all together.

For the changed AMDGPU test case it appears that previously the i8 cttz was type legalized to i16 which introduced an OR with 256 in order to limit the result to 8 on the widened type. At this point the result is known to never be zero, but nothing checked that. Then operation legalization is told to promote all i16 cttz to i32. This introduces an extend and a truncate and another OR with 65536 to limit the result to 16. With the DAG combiner change we are able to prevent the creation of the second OR since the opcode will have been changed to cttz_zero_undef after the first OR. I the lack of the OR caused the instruction to change to v_ffbl_b32_sdwa

Differential Revision: https://reviews.llvm.org/D42985

llvm-svn: 324427
2018-02-06 23:54:37 +00:00
Craig Topper ee1f34eb9a [DAGCombiner] Pass the original load to ExtendSetCCUses not the turncate.
Summary:
This method is trying to use the truncate node to find which SETCC operand should be replaced directly with the extended load.

This used to work correctly because all uses of the original load were replaced by the truncate before this function was called. So this was used to effectively bypass the truncate and find the load under it.

All but one of the callers now call this before the truncate has replaced the laod so the setcc doesn't yet use the truncate. To account for this we should pass the original load instead.

I changed the order of that one caller to make this work there too.

I don't have a test case because this is probably hidden by later DAG combines causing the extend and truncate to cancel out. I assume this way is a little more efficient and matches what was originally intended.

Reviewers: RKSimon, spatel, niravd

Reviewed By: niravd

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D42878

llvm-svn: 324311
2018-02-06 03:23:27 +00:00
Craig Topper fc5bd023dd [DAGCombiner] When folding fold (sext/zext (and/or/xor (sextload/zextload x), cst)) -> (and/or/xor (sextload/zextload x), (sext/zext cst)) make sure we check the legality of the full extended load.
Summary:
If the load is already an extended load we should be using the memory VT for the legality check, not just the VT of the current extension.

I don't have a test case, just noticed it while investigating some load extension improvements.

Reviewers: RKSimon, spatel, niravd

Reviewed By: niravd

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D42783

llvm-svn: 324181
2018-02-03 23:00:31 +00:00
Craig Topper a5944aade1 [DAGCombiner] When folding (insert_subvector undef, (bitcast (extract_subvector N1, Idx)), Idx) -> (bitcast N1) make sure that N1 has the same total size as the original output
We were only checking the element count, but not the total width. This could cause illegal bitcasts to be created if for example the output was 512-bits, but N1 is 256 bits, and the extraction size was 128-bits.

Fixes PR36199

Differential Revision: https://reviews.llvm.org/D42809

llvm-svn: 324002
2018-02-01 20:48:50 +00:00
Sanjay Patel 657e5d8d41 [DAGCombiner] filter out denorm inputs when calculating sqrt estimate (PR34994)
As shown in the example in PR34994:
https://bugs.llvm.org/show_bug.cgi?id=34994
...we can return a very wrong answer (inf instead of 0.0) for square root when 
using a reciprocal square root estimate instruction.

Here, I've conditionalized the filtering out of denorms based on the function 
having "denormal-fp-math"="ieee" in its attributes. The other options for this 
attribute are 'preserve-sign' and 'positive-zero'.

So we don't generate this extra code by default with just '-ffast-math' (because 
then there's no denormal attribute string at all), but it works if you specify 
'-ffast-math -fdenormal-fp-math=ieee' from clang. 

As noted in the review, there may be other problems in clang that affect the 
results depending on platform (Linux x86 at least), but this should allow 
creating the desired codegen.

Differential Revision: https://reviews.llvm.org/D42323

llvm-svn: 323981
2018-02-01 16:57:18 +00:00
Craig Topper 62b62356fa [X86] Make foldLogicOfSetCCs work better for vectors pre legal types/operations
Summary:
There's a check in the code to only check getSetCCResultType after LegalOperations or if the type is MVT::i1. But the i1 check is only allowing scalar types through. I think it should check that the scalar type is MVT::i1 so that it will work for vectors.

The changed test already does this combine with AVX512VL where getSetCCResultType returns vXi1. But with avx512f and no VLX getSetCCResultType returns a type matching the width of the input type.

Reviewers: spatel, RKSimon

Reviewed By: spatel

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D42619

llvm-svn: 323631
2018-01-29 07:52:55 +00:00
Simon Pilgrim f531cf8964 [DAGCombine] reduceBuildVecToShuffle - ensure EXTRACT_VECTOR_ELT index is in range
From OSS Fuzz Test Case #5688

llvm-svn: 323535
2018-01-26 15:50:20 +00:00
Sven van Haastregt e8404780c3 [DAGCombiner] Bail out if vector size is not a multiple
For the included test case, the DAG transformation
  concat_vectors(scalar, undef) -> scalar_to_vector(sclr)
would attempt to create a v2i32 vector for a v9i8
concat_vector.  Bail out to avoid creating a bitcast with
mismatching sizes later on.

Differential Revision: https://reviews.llvm.org/D42379

llvm-svn: 323312
2018-01-24 09:53:47 +00:00
Craig Topper 7f0d85ec1e [DAGCombiner] Add a DAG combine to turn a splat build_vector where the splat elemnt is a bitcast from a vector type into a concat_vector
For example, a build_vector of i64 bitcasted from v2i32 can be turned into a concat_vectors of the v2i32 vectors with a bitcast to a vXi64 type

Differential Revision: https://reviews.llvm.org/D42090

llvm-svn: 322811
2018-01-18 04:17:06 +00:00
Benjamin Kramer 736a343e97 Revert "[DAG] Elide overlapping stores"
This reverts commit r322085. Internal PPC testing is still showing the
same symptoms as when this patch landed the last time.

llvm-svn: 322474
2018-01-15 10:57:24 +00:00
Adrian Prantl 29102b3002 dag-combine: Transfer debug information when folding (zext (truncate x))
-> (zext (truncate x))

This patch adds debug info support to the dagcombine rule (zext
(truncate x)) -> (zext (truncate x)).

Differential Revision: https://reviews.llvm.org/D41924

llvm-svn: 322304
2018-01-11 18:35:12 +00:00
Zvi Rackover 999e6c2967 DAGCombine: Let truncates negate extension through extract-subvector
Summary:
Fold cases such as:
(v8i8 truncate (v8i32 extract_subvector (v16i32 sext (v16i8 V), Idx)))
->
(v8i8 extract_subvector (v16i8 V), Idx)

This can be generalized to cases where the truncate and extend do not
fully cancel each other out, but it may require querying the target
about profitability.

Reviewers: RKSimon, craig.topper, spatel, efriedma

Reviewed By: RKSimon

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D41927

llvm-svn: 322300
2018-01-11 18:02:33 +00:00
Craig Topper af4eb17223 [SelectionDAG][X86] Explicitly store the scale in the gather/scatter ISD nodes
Currently we infer the scale at isel time by analyzing whether the base is a constant 0 or not. If it is we assume scale is 1, else we take it from the element size of the pass thru or stored value. This seems a little weird and I think it makes more sense to make it explicit in the DAG rather than doing tricky things in the backend.

Most of this patch is just making sure we copy the scale around everywhere.

Differential Revision: https://reviews.llvm.org/D40055

llvm-svn: 322210
2018-01-10 19:16:05 +00:00
Nirav Dave 30304a3bd7 [DAG] Elide overlapping stores
Relanding after fixing handling of pre-indexed memory operations in
BaseIndexOffset analysis (r322003).

Extend overlapping store elision to handle overwrites of stores by
larger stores.

Reviewers: craig.topper, rnk, t.p.northover

Subscribers: javed.absar, hiraditya, llvm-commits

Differential Revision: https://reviews.llvm.org/D40969

llvm-svn: 322085
2018-01-09 15:23:12 +00:00
Nirav Dave 6e2d03d410 [DAG] Teach BaseIndexOffset to correctly handle with indexed operations
BaseIndexOffset address analysis incorrectly ignores offsets folded
into indexed memory operations causing potential errors in alias
analysis of pre-indexed operations.

Reviewers: efriedma, RKSimon, hfinkel, jyknight

Subscribers: hiraditya, javed.absar, llvm-commits

Differential Revision: https://reviews.llvm.org/D41701

llvm-svn: 322003
2018-01-08 16:21:35 +00:00
Sam Parker 3800f0f11d [DAGCombine] Fix for PR35761
I had falsely assumed that constant operands would be operand(1) of
the bin ops that may need their constant operand to be masked.

Bugzilla: https://bugs.llvm.org/show_bug.cgi?id=35761

Differential Revision: https://reviews.llvm.org/D41667

llvm-svn: 321991
2018-01-08 13:21:24 +00:00
Sam Parker 1ad085b808 [DAGCombine] Fix for PR37563
While searching for loads to be narrowed, equal sized loads were not
added to the list, resulting in anyext loads not being converted to
zext loads.

https://bugs.llvm.org/show_bug.cgi?id=35763

Differential Revision: https://reviews.llvm.org/D41628

llvm-svn: 321862
2018-01-05 08:47:23 +00:00
Amara Emerson 8a100dd2a6 [DAGCombine] Ensure SDNode use iterator is incremented properly.
Fixes an ASAN bug found by oss-fuzz.

llvm-svn: 321813
2018-01-04 18:38:45 +00:00
Simon Pilgrim ec0a2fb703 [DAGCombine] Handle out of range EXTRACT_VECTOR_ELT indices
Handle this in DAGCombiner::visitEXTRACT_VECTOR_ELT the same as we already do in SelectionDAG::getNode and use APInt instead of getZExtValue.

This should also fix oss-fuzz #4910

llvm-svn: 321767
2018-01-03 22:42:33 +00:00
Daniel Jasper cc4903e2ba Revert r321089: "[DAG] Elide overlapping store" (and subsequent fix in r321204)
Our internal testing has revealed has discovered bugs in PPC builds.
I have forward reproduction instructions to the original author (Nirav).

llvm-svn: 321649
2018-01-02 14:38:52 +00:00
Sam Parker 3570c554b5 [DAGCombine] Fix for PR35765
Remove the acceptance of ANY_EXTEND nodes while trying to move and
nodes back to loads.

Bugzilla: https://bugs.llvm.org/show_bug.cgi?id=35765

Differential Revision: https://reviews.llvm.org/D41625

llvm-svn: 321641
2018-01-02 10:19:01 +00:00
Simon Pilgrim 6fad3cbc66 [DAGCombine] foldBinOpIntoSelect can fail to constant fold in some cases.
For example, float operations may fail to constant fold under certain circumstances (inf/nan/denormal creation etc.)

Reduced from oss-fuzz #4802 test case

llvm-svn: 321488
2017-12-27 11:36:18 +00:00
Simon Pilgrim b17c204cc0 [DAGCombine] visitANDLike - ensure APInt is is in range for getSExtValue/getZExtValue
Reduced from oss-fuzz #4782 test case

llvm-svn: 321464
2017-12-26 23:27:44 +00:00
Simon Pilgrim 628f63e5fd [DAGCombine] Don't combine (and (setne X, 0), (setne X, -1)) --> (setuge (add X, 1), 2) for i1
Reduced from oss-fuzz #4773 test case

llvm-svn: 321455
2017-12-26 14:48:28 +00:00
Craig Topper f65a6e4ed8 [DAGCombiners] Don't turn ANDs to shuffles with zero so early. Give some other combines a chance to run.
This moves the combine for turning ANDs into shuffle with zero out of SimplifyVBinOps and places it only in visitAND below the reassociate handling. This fixes the specific case I noticed where we failed to combine two ands with constants.

llvm-svn: 321417
2017-12-24 02:05:18 +00:00
Craig Topper d6a8f2e67d [SelectionDAG][X86] Don't use ->getValueType(0) after a call to getOperand to get the type of the operand.
getOperand returns an SDValue that contains the node and the result number. There is no guarantee that the result number if 0. By using the -> operator we are calling SDNode::getValueType rather than SDValue::getValueType. This requires supplying a result number and we shouldn't assume it was 0.

I don't have a test case. Just noticed while cleaning up some other code and saw that it occurred in other places.

llvm-svn: 321397
2017-12-23 02:54:50 +00:00
Nirav Dave afeae77058 [DAG] Add missing case check from findbaseoffset merge from r321389.
llvm-svn: 321391
2017-12-22 22:06:56 +00:00
Nirav Dave d8e3633db4 Integrate findBaseOffset address analyses to BaseIndexOffset. NFCI.
BaseIndexOffset supercedes findBaseOffset analysis save only Constant
Pool addresses. Migrate analysis to BaseIndexOffset.

Relanding after correcting base address matching check.

llvm-svn: 321389
2017-12-22 21:20:55 +00:00
Nirav Dave 9c26e92aab Revert "[DAG] Integrate findBaseOffset address analyses to BaseIndexOffset. NFCI."
which was causing miscompilations in for some test-suite components.

This reverts commit 3e9de9ff0f3162920a2a3cba51c7dc14b54b4d16.

llvm-svn: 321380
2017-12-22 19:33:56 +00:00
Nirav Dave 4ce902657a [DAG] Integrate findBaseOffset address analyses to BaseIndexOffset. NFCI.
BaseIndexOffset supercedes findBaseOffset analysis save only Constant
Pool addresses. Migrate analysis to BaseIndexOffset.

llvm-svn: 321364
2017-12-22 16:59:09 +00:00
Sam Parker cf426fccd4 [DAGCombine] Revert r321259
Improve ReduceLoadWidth for SRL Patch is causing an issue on the
PPC64 BE santizer.

llvm-svn: 321349
2017-12-22 08:36:25 +00:00
Simon Pilgrim 695c7da3d5 [DAGCombiner] Remove (xor (xor x, c1), c2) -> (xor x, (xor c1, c2)) fold. NFCI.
More general cases are already handled by constant canonicalization and then the ReassociateOps call at line 5327

llvm-svn: 321280
2017-12-21 16:54:03 +00:00
Simon Pilgrim 6b915d3353 [DAGCombiner] Generalize (or (and X, c1), c2) -> (and (or X, c2), c1|c2) combine to work on non-splat vectors
The knownbits_mask_or_shuffle_uitofp change is interesting - shuffle combines manage to kick in, removing the AND constant mask load. For targets with fast-variable-shuffle this should reduce further to VPOR+VPSHUFB+VCVTDQ2PS.

llvm-svn: 321279
2017-12-21 16:34:46 +00:00
Simon Pilgrim 4dd03ed7e3 [DAGCombiner] Generalize (and (or x, C), D) -> D iff (C & D) == D combine to work on non-splat vectors
llvm-svn: 321275
2017-12-21 15:17:29 +00:00
Sam Parker 59efb8cb5b [DAGCombine] Improve ReduceLoadWidth for SRL
If the SRL node is only used by an AND, we may be able to set the
ExtVT to the width of the mask, making the AND redundant. To support
this, another check has been added in isLegalNarrowLoad which queries
whether the load is valid.

Differential Revision: https://reviews.llvm.org/D41350

llvm-svn: 321259
2017-12-21 12:55:04 +00:00
Nirav Dave a869856c60 [DAG] Fix condition on overlapping store check.
Prevent overlapping store elision when overlapping store is
pre-inc/dec as analysis is wrong in these cases.

llvm-svn: 321204
2017-12-20 19:06:47 +00:00
Adrian Prantl 0e6694d111 Silence a bunch of implicit fallthrough warnings
llvm-svn: 321114
2017-12-19 22:05:25 +00:00
Nirav Dave 51425fa5ba [DAG] Elide overlapping store
Summary:
Extend overlapping store elision to handle overwrites of stores by
larger stores.

Nontemporal tests have been modified to add memory dependencies to
prevent store elision.

Reviewers: craig.topper, rnk, t.p.northover

Subscribers: javed.absar, hiraditya, llvm-commits

Differential Revision: https://reviews.llvm.org/D40969

llvm-svn: 321089
2017-12-19 17:10:56 +00:00
Sam Parker 00804efd72 [DAGCombine] Move AND nodes to multiple load leaves
Search from AND nodes to find whether they can be propagated back to
loads, so that the AND and load can be combined into a narrow load.
We search through OR, XOR and other AND nodes and all bar one of the
leaves are required to be loads or constants. The exception node then
needs to be masked off meaning that the 'and' isn't removed, but the
loads(s) are narrowed still.

Differential Revision: https://reviews.llvm.org/D41177

llvm-svn: 320962
2017-12-18 10:04:27 +00:00
Matthias Braun f1caa2833f MachineFunction: Return reference from getFunction(); NFC
The Function can never be nullptr so we can return a reference.

llvm-svn: 320884
2017-12-15 22:22:58 +00:00
Benjamin Kramer a85822cb1e Revert "[DAGCombine] Move AND nodes to multiple load leaves"
This reverts commit r320679. Causes miscompiles.

llvm-svn: 320698
2017-12-14 14:03:07 +00:00
Sam Parker ef12b41ef7 [DAGCombine] Move AND nodes to multiple load leaves
Recommitting rL319773, which was reverted due to a recursive issue
causing timeouts. This happened because I failed to check whether
the discovered loads could be narrowed further. In the case of a tree
with one or more narrow loads, that could not be further narrowed, as
well as a node that would need masking, an AND could be introduced
which could then be visited and recombined again with the same load.
This could again create the masking load, with would be combined
again... We now check that the load can be narrowed so that this
process stops.

Original commit message:
Search from AND nodes to find whether they can be propagated back to
loads, so that the AND and load can be combined into a narrow load.
We search through OR, XOR and other AND nodes and all bar one of the
leaves are required to be loads or constants. The exception node then
needs to be masked off meaning that the 'and' isn't removed, but the
loads(s) are narrowed still.

Differential Revision: https://reviews.llvm.org/D41177

llvm-svn: 320679
2017-12-14 09:31:01 +00:00
Sanjay Patel f3436d7dab [DAGCombiner] protect against an infinite loop between shl <--> mul (PR35579)
At first, I tried to thread the x86 needle and use a target hook (isVectorShiftByScalarCheap())
to disable the transform only for non-splat pow-of-2 constants, but not AVX2, but only some
element types, but...it's difficult.

Here we just avoid the loop with the x86 vector transform that conflicts with the general DAG
combine and preserve all of the existing behavior AFAICT otherwise.

Some tests that will probably fail if someone does try to restrict this in a more targeted way
for x86-only may be found in:

test/CodeGen/X86/combine-mul.ll
test/CodeGen/X86/vector-mul.ll
test/CodeGen/X86/widen_arith-5.ll

This should prevent the infinite looping seen with:
https://bugs.llvm.org/show_bug.cgi?id=35579

Differential Revision: https://reviews.llvm.org/D41040

llvm-svn: 320374
2017-12-11 15:19:31 +00:00
Nemanja Ivanovic 25d9af0cb5 [DAGCombiner] Add combined indexed load to the work list
This commit is the first part of https://reviews.llvm.org/D40348.
In order to allow target combines to be performed on newly combined
indexed loads, add them back to the worklist. The remainder of the
above patch will be committed in subsequent revisions and will use
this. Test cases will be included with those follow-up commits.

llvm-svn: 320365
2017-12-11 14:16:02 +00:00
Roger Ferrer Ibanez 5ea0f2501f [ARM] Use ADDCARRY / SUBCARRY
This is a preparatory step for D34515.

This change:
 - makes nodes ISD::ADDCARRY and ISD::SUBCARRY legal for i32
 - lowering is done by first converting the boolean value into the carry flag
   using (_, C) ← (ARMISD::ADDC R, -1) and converted back to an integer value
   using (R, _) ← (ARMISD::ADDE 0, 0, C). An ARMISD::ADDE between the two
   operations does the actual addition.
 - for subtraction, given that ISD::SUBCARRY second result is actually a
   borrow, we need to invert the value of the second operand and result before
   and after using ARMISD::SUBE. We need to invert the carry result of
   ARMISD::SUBE to preserve the semantics.
 - given that the generic combiner may lower ISD::ADDCARRY and
   ISD::SUBCARRYinto ISD::UADDO and ISD::USUBO we need to update their lowering
   as well otherwise i64 operations now would require branches. This implies
   updating the corresponding test for unsigned.
 - add new combiner to remove the redundant conversions from/to carry flags
   to/from boolean values (ARMISD::ADDC (ARMISD::ADDE 0, 0, C), -1) → C
 - fixes PR34045
 - fixes PR34564
 - fixes PR35103

Differential Revision: https://reviews.llvm.org/D35192

llvm-svn: 320355
2017-12-11 12:13:45 +00:00
Craig Topper ad45bf5895 [DAGCombiner] Support folding (mulhs/u X, 0)->0 for vectors.
We should probably also fold (mulhs/u X, 1) for vectors, but that's harder.

llvm-svn: 320344
2017-12-11 08:33:20 +00:00
Craig Topper 65ed4d4492 [DAGCombiner] Reuse existing SDLoc variable instead of creating a new one. NFC
llvm-svn: 320343
2017-12-11 08:33:19 +00:00
Sanjay Patel 9012391af1 [DAGCombiner] eliminate shuffle of insert element
I noticed this pattern in D38316 / D38388. We failed to combine a shuffle that is either 
repeating a scalar insertion at the same position in a vector or translated to a different 
element index.

Like the earlier patch, this could be an instcombine too, but since we opted to make this 
a DAG transform earlier, I've made this one a DAG patch too.

We do not need any legality checking because the new insert is identical to the existing 
insert except that it may have a different constant insertion operand.

The constant insertion test in test/CodeGen/X86/vector-shuffle-combining.ll was the 
motivation for D38756.

Differential Revision: https://reviews.llvm.org/D40209

llvm-svn: 320050
2017-12-07 15:17:58 +00:00
Nirav Dave 7d8f3e0c93 [ARM][AArch64][DAG] Reenable post-legalize store merge
Reenable post-legalize stores with constant merging computation and
corresponding test case.

 * Properly truncate store merge constants
 * Disable merging of truncated stores floating points
 * Ensure merges of constant stores into a single vector are
   constructed from legal elements.

Reviewers: eastig, efriedma

Reviewed By: eastig

Subscribers: spatel, rengolin, aemerson, javed.absar, kristof.beyls, hiraditya, llvm-commits

Differential Revision: https://reviews.llvm.org/D40701

llvm-svn: 319899
2017-12-06 15:30:13 +00:00
Vlad Tsyrklevich 0b40f21134 Revert "[DAGCombine] Move AND nodes to multiple load leaves"
This reverts commit r319773. It was causing some buildbots to hang, e.g.
http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux-android/builds/5589

llvm-svn: 319867
2017-12-06 01:16:08 +00:00
Sam Parker 0a436a9d62 [DAGCombine] Move AND nodes to multiple load leaves
Search from AND nodes to find whether they can be propagated back to
loads, so that the AND and load can be combined into a narrow load.
We search through OR, XOR and other AND nodes and all bar one of the
leaves are required to be loads or constants. The exception node then
needs to be masked off meaning that the 'and' isn't removed, but the
loads(s) are narrowed still.

Differential Revision: https://reviews.llvm.org/D39604

llvm-svn: 319773
2017-12-05 15:13:47 +00:00
Bjorn Pettersson 823b299fbc [DAGCombine] Handle big endian correctly in CombineConsecutiveLoads
Summary:
Found out, at code inspection, that there was a fault in
DAGCombiner::CombineConsecutiveLoads for big-endian targets.

A BUILD_PAIR is always having the least significant bits of
the composite value in element 0. So when we are doing the checks
for consecutive loads, for big endian targets, we should check
if the load to elt 1 is at the lower address and the load
to elt 0 is at the higher address.

Normally this bug only resulted in missed oppurtunities for
doing the load combine. I guess that in some rare situation it
could lead to faulty combines, but I've not seen that happen.

Note that this patch actually will trigger load combine for
some big endian regression tests.
One example is test/CodeGen/PowerPC/anon_aggr.ll where we now get
  t76: i64,ch = load<LD8[FixedStack-9]
instead of
  t37: i32,ch = load<LD4[FixedStack-10]>
  t35: i32,ch = load<LD4[FixedStack-9]>
  t41: i64 = build_pair t37, t35
before legalization. Then the legalization will split the LD8
into two loads, so the end result is the same. That should
verify that the transfomation is correct now.

Reviewers: niravd, hfinkel

Reviewed By: niravd

Subscribers: nemanjai, llvm-commits

Differential Revision: https://reviews.llvm.org/D40444

llvm-svn: 319771
2017-12-05 14:50:05 +00:00
Sam Parker 8b73630c32 [DAGCombine] isLegalNarrowLoad function (NFC)
Pull the checks upon the load out from ReduceLoadWidth into their own
function.

Differential Revision: https://reviews.llvm.org/D40833

llvm-svn: 319766
2017-12-05 14:03:51 +00:00
Hans Wennborg e117129ef7 DAG: Follow-up to r319692 check the truncates inputs have the same type
MatchRotate assumes the types of the types of LHS and RHS are equal,
which is always the case then they come from an OR node, but here
we're getting them from two different TRUNC nodes, so we have to check
the types.

llvm-svn: 319695
2017-12-04 20:48:50 +00:00
Hans Wennborg 7e61f24962 DAG: Match truncated rotation (PR35487)
If the truncation has been pushed past the or-node, look through it and
truncate afterwards.

Differential revision: https://reviews.llvm.org/D40792

llvm-svn: 319692
2017-12-04 20:39:57 +00:00
Sam Parker 1e26d986aa [DAGCombine] Remove isAndLoadExtLoad arguments
Both LoadedVT and NarrowLoad are passed as references and neither
of them are used by any of its callers.

Differential Revision: https://reviews.llvm.org/D40713

llvm-svn: 319645
2017-12-04 09:48:26 +00:00
Nirav Dave 3e76e1e89e [DAG][ARM] Revert "Reenable post-legalize store merge"
due to failures in AArch and ARM code gen.

llvm-svn: 319587
2017-12-01 21:55:47 +00:00
Eli Friedman b34a8198a9 [DAGCombine] Simplify ISD::AND handling in ReduceLoadWidth
Followup to D39595. Removes a bunch of redundant checks.

Differential Revision: https://reviews.llvm.org/D40667

llvm-svn: 319573
2017-12-01 19:33:56 +00:00
Nirav Dave eb2b24fded [ARM][DAG] Reenable post-legalize store merge
Summary: Reenable post-legalize stores with constant merging computation and cofrresponding test case.

Reviewers: eastig, efriedma

Subscribers: aemerson, javed.absar, kristof.beyls, hiraditya, llvm-commits

Differential Revision: https://reviews.llvm.org/D40701

llvm-svn: 319547
2017-12-01 14:49:26 +00:00
Sam Parker 4bd776e001 [DAGCombine] Refactor ReduceLoadWidth
visitAND attempts to narrow the width of extending loads that are
then masked off. ReduceLoadWidth already exists for a similar purpose
and handles shifts, so I've moved the code to handle AND nodes there.

Differential Revision: https://reviews.llvm.org/D39595

llvm-svn: 319421
2017-11-30 11:49:11 +00:00
Jonas Paulsson f0ff20f1f0 Use getStoreSize() in various places instead of 'BitSize >> 3'.
This is needed for cases when the memory access is not as big as the width of
the data type. For instance, storing i1 (1 bit) would be done in a byte (8
bits).

Using 'BitSize >> 3' (or '/ 8') would e.g. give the memory access of an i1 a
size of 0, which for instance makes alias analysis return NoAlias even when
it shouldn't.

There are no tests as this was done as a follow-up to the bugfix for the case
where this was discovered (r318824). This handles more similar cases.

Review: Björn Petterson
https://reviews.llvm.org/D40339

llvm-svn: 319173
2017-11-28 14:44:32 +00:00
Simon Dardis 3aeb1a5404 [DAGCombine] Disable finding better chains for stores at O0
Unoptimized IR can have linear sequences of stores to an array, where the
initial GEP for the first store is formed from the pointer to the array, and the
GEP for each store after the first is formed from the previous GEP with some
offset in an inductive fashion.

The (large) resulting DAG when analyzed by DAGCombine undergoes an excessive
number of combines as each store node is examined every time its' offset node
is combined with any child of the offset. One of the transformations is
findBetterNeighborChains which assists MergeConsecutiveStores. The former
relies on repeated chain walking to do its' work, however MergeConsecutiveStores
is disabled at O0 which makes the transformation redundant.

Any optimization level other than O0 would invoke InstCombine which would
resolve the chain of GEPs into flat base + offset GEP for each store which
does not exhibit the repeated examination of each store to the array.

Disabling this optimization fixes an excessive compile time issue (30~ minutes
for the test case provided) at O0.

Reviewers: niravd, craig.topper, t.p.northover

Differential Revision: https://reviews.llvm.org/D40193

llvm-svn: 319142
2017-11-28 04:07:59 +00:00
Craig Topper dbd4a7fecc [DAGCombiner] Don't combine aext(setcc) if the setcc is already using the target's preferred result type.
With AVX512 vXi1 types are legal so we shouldn't be extending them.

This change is similar to existing code in the zext(setcc) combine.

llvm-svn: 319120
2017-11-27 23:51:40 +00:00
Craig Topper 57c02d18b9 [DAGCombiner] Use EVT::changeVectorElementTypeToInteger() instead of implementing manually.
llvm-svn: 319119
2017-11-27 23:51:31 +00:00
Jonas Paulsson 181e260e32 [DAGCombiner] Bugfix in isAlias().
Since i1 is a legal type, this:

  NumBytes = Op1->getMemoryVT().getSizeInBits() >> 3;

is wrong and should be instead

  NumBytes = Op0->getMemoryVT().getStoreSize();

There seems to be more places where this should be fixed outside DAGCombiner.

Review: Hal Finkel
https://bugs.llvm.org/show_bug.cgi?id=35366

llvm-svn: 318824
2017-11-22 08:58:30 +00:00
David Blaikie b3bde2ea50 Fix a bunch more layering of CodeGen headers that are in Target
All these headers already depend on CodeGen headers so moving them into
CodeGen fixes the layering (since CodeGen depends on Target, not the
other way around).

llvm-svn: 318490
2017-11-17 01:07:10 +00:00
Craig Topper b6b61dfb15 [DAGCombiner] Use cast instead of an unchecked dyn_cast.
llvm-svn: 318450
2017-11-16 20:23:12 +00:00
Sam Parker 43fa5911a1 [DAGCombine] Enable more srl -> load combines
Change the calculation for the desired ValueType for non-sign
extending loads, as in those cases we don't care about the
higher bits. This creates a smaller ExtVT and allows for such
combinations as:
(srl (zextload i16, [addr]), 8) -> (zextload i8, [addr + 1])

Differential Revision: https://reviews.llvm.org/D40034

llvm-svn: 318390
2017-11-16 11:28:26 +00:00
Amaury Sechet 3f0f650f49 [DAGcombine] Do not replace truncate node by itself when doing constant folding, this trigger needless extra rounds of combine for nothing. NFC
llvm-svn: 317926
2017-11-10 20:59:53 +00:00
Adrian Prantl 1c8c544946 Preserve debug info when DAG-combinging (zext (truncate x)) -> (and x, mask).
rdar://problem/27139077

llvm-svn: 317825
2017-11-09 19:50:20 +00:00
Craig Topper c51aac675d [DAGCombiner] Fix typos in comments. NFC
llvm-svn: 317072
2017-11-01 03:30:52 +00:00
Guozhi Wei 7c67009fe5 [DAGCombine] Don't combine sext with extload if sextload is not supported and extload has multi users
In function DAGCombiner::visitSIGN_EXTEND_INREG, sext can be combined with extload even if sextload is not supported by target, then

  if sext is the only user of extload, there is no big difference, no harm no benefit.
  if extload has more than one user, the combined sextload may block extload from combining with other zext, causes extra zext instructions generated. As demonstrated by the attached test case.

This patch add the constraint that when sextload is not supported by target, sext can only be combined with extload if it is the only user of extload.

Differential Revision: https://reviews.llvm.org/D39108

llvm-svn: 316802
2017-10-27 21:54:24 +00:00
Matt Arsenault 878827d93a DAG: Fold fma (fneg x), K, y -> fma x, -K, y
llvm-svn: 316753
2017-10-27 09:06:07 +00:00
Simon Pilgrim 32da2f9245 [DAGCombine] Permit combining of shuffles of equivalent splat BUILD_VECTORs
combineShuffleOfScalars is very conservative about shuffled BUILD_VECTORs that can be combined together.

This patch adds one additional case - if both BUILD_VECTORs represent splats of the same scalar value but with different UNDEF elements, then we should create a single splat BUILD_VECTOR, sharing only the UNDEF elements defined by the shuffle mask.

Differential Revision: https://reviews.llvm.org/D38696

llvm-svn: 316331
2017-10-23 15:48:08 +00:00
Simon Pilgrim 654bbb3b05 [DAGCombine] Add SCALAR_TO_VECTOR undef handling to simplifyShuffleMask.
This allows us to simplify later visitVECTOR_SHUFFLE optimizations such as combineShuffleOfScalars.

Noticed whilst working on D38696

llvm-svn: 316017
2017-10-17 18:14:48 +00:00
Matt Arsenault f2db97d8fa DAG: Add opcode and source type to isFPExtFree
This is only currently used for mad/fma transforms.
This is the only case where it should be used for AMDGPU,
so add an opcode to be sure.

llvm-svn: 315740
2017-10-13 19:55:45 +00:00
Wei Mi 1736efd16a Revert r307036 because of PR34919.
llvm-svn: 315540
2017-10-12 00:24:52 +00:00
Sanjay Patel 34fd5eaaf0 [DAGCombiner] convert insertelement of bitcasted vector into shuffle
Eg:
insert v4i32 V, (v2i16 X), 2 --> shuffle v8i16 V', X', {0,1,2,3,8,9,6,7}

This is a generalization of the IR fold in D38316 to handle insertion into a non-undef vector. 
We may want to abandon that one if we can't find value in squashing the more specific pattern sooner.

We're using the existing legal shuffle target hook to avoid AVX512 horror with vXi1 shuffles.

There may be room for improvement in the shuffle lowering here, but that would be follow-up work.

Differential Revision: https://reviews.llvm.org/D38388

llvm-svn: 315460
2017-10-11 14:12:16 +00:00
David Stuttard 51c1b22806 [DAGCombine] Fix for shuffle to vector extend for non power 2 vectors
Summary:
See https://llvm.org/PR33743 for more details

It seems that for non-power of 2 vector sizes, the algorithm can produce
non-matching sizes for input and result causing an assert.

This usually isn't a problem as the isAnyExtend check will weed these out, but
in some cases (most often with lots of undefined values for the mask indices) it
can pass this check for non power of 2 vectors.

Adding in an extra check that ensures that bit size will match for the result
and input (as required)

Subscribers: nhaehnle

Differential Revision: https://reviews.llvm.org/D35241

llvm-svn: 315307
2017-10-10 12:45:45 +00:00
Sanjay Patel 2a61a821a0 [DAG] combine assertsexts around a trunc
This was a suggested follow-up to:
D37017 / https://reviews.llvm.org/rL313577

llvm-svn: 315206
2017-10-09 15:22:20 +00:00
Stanislav Mekhanoshin 6deb212522 Eliminate ftrunc if source is know to be rounded
Differential Revision: https://reviews.llvm.org/D38421

llvm-svn: 314688
2017-10-02 16:57:07 +00:00
George Burgess IV f8e11b803d [DAGCombiner] Fix an off-by-one error in vector logic
Without this, we could end up trying to get the Nth (0-indexed) element
from a subvector of size N.

Differential Revision: https://reviews.llvm.org/D37880

llvm-svn: 314380
2017-09-28 06:17:19 +00:00
Eugene Zelenko fb7f792f55 [CodeGen] Fix some Clang-tidy modernize-use-bool-literals and Include What You Use warnings; other minor fixes (NFC).
llvm-svn: 313941
2017-09-21 23:20:16 +00:00
Craig Topper de4379251e [DAGCombiner] Slightly simplify some code by using APInt::isMask() and countTrailingOnes instead of getting active bits and checking if all the bits below that make a mask.
At least for the 64-bit and less case, we should be able to determine if we even have a mask without counting any bits. This also removes the need to explicitly check for 0 active bits, isMask will return false for 0.

llvm-svn: 313908
2017-09-21 20:12:19 +00:00
Craig Topper 280f133773 [DAGCombiner] Remove duplicate code from visitZERO_EXTEND
This exact block of code exists right below.

Differential Revision: https://reviews.llvm.org/D38122

llvm-svn: 313891
2017-09-21 17:30:02 +00:00
Sanjay Patel f31b1a00ea [DAGCombiner] fold assertzexts separated by trunc
If we have an AssertZext of a truncated value that has already been AssertZext'ed, 
we can assert on the wider source op to improve the zext-y knowledge:
 assert (trunc (assert X, i8) to iN), i1 --> trunc (assert X, i1) to iN

This moves a fold from being Mips-specific to general combining, and x86 shows
improvements.

Differential Revision: https://reviews.llvm.org/D37017

llvm-svn: 313577
2017-09-18 22:05:35 +00:00
Sanjay Patel 7765c93be2 [DAG, x86] allow store merging before and after legalization (PR34217)
rL310710 allowed store merging to occur after legalization to catch stores that are created late,
but this exposes a logic hole seen in PR34217:
https://bugs.llvm.org/show_bug.cgi?id=34217

We will miss merging stores if the target lowers vector extracts into target-specific operations.
This patch allows store merging to occur both before and after legalization if the target chooses
to get maximum merging.

I don't think the potential regressions in the other tests are relevant. The tests are for
correctness of weird IR constructs rather than perf tests, and I think those are still correct.

Differential Revision: https://reviews.llvm.org/D37987

llvm-svn: 313564
2017-09-18 20:54:26 +00:00
Simon Pilgrim 8bd2d8780a [DAGCombine] (shl (or x, c1), c2) -> (or (shl x, c2), c1 << c2)
We already have a combine for this pattern when the input to shl is add, so we just need to enable the transformation when the input is or.

Original patch by @tstellar

Differential Revision: https://reviews.llvm.org/D19325

llvm-svn: 313251
2017-09-14 10:38:30 +00:00
Matt Arsenault 61ec738b60 DAG: Allow creating extract_vector_elt post-legalize
Fixes some combine issues for AMDGPU where we weren't
getting the many extract_vector_elt combines expected
in a future patch.

This should really be checking isOperationLegalOrCustom on
the extract. That improves a number of x86 lit tests, but
a few get stuck in an infinite loop from one place
where a similar looking extract is created. I have a
different workaround in the backend for that which
keeps many of those improvements, but also adds a few
regressions.

llvm-svn: 312730
2017-09-07 17:24:43 +00:00
Craig Topper 761bb1b53d [DAGCombiner] When combining EXTRACT_SUBVECTOR of a BUILD_VECTOR, make sure we don't create a BUILD_VECTOR with an illegal type after type legalization.
llvm-svn: 312621
2017-09-06 06:50:03 +00:00
Ayman Musa 44cde94935 [X86] Fix crash on assert of non-simple type after type-legalization
The function combineShuffleToVectorExtend in DAGCombine might generate an illegal typed node after "legalize types" phase, causing assertion on non-simple type to fail afterwards.

Adding a type check in case the combine is running after the type legalize pass.

Differential Revision: https://reviews.llvm.org/D37330

llvm-svn: 312438
2017-09-03 09:09:16 +00:00
Craig Topper 7081f029f3 [DAGCombiner] Do a better job of ensuring we don't split elements when combining an extract_subvector of a bitcasted build_vector.
llvm-svn: 312253
2017-08-31 17:02:22 +00:00
Hans Wennborg e7becd7e85 [DAG] Bound loop dependence check in merge optimization.
The loop dependence check looks for dependencies between store merge
candidates not captured by the chain sub-DAG doing a check of
predecessors which may be very large. Conservatively bound number of
nodes checked for compilation time. (Resolves PR34326).

Landing on behalf of Nirav Dave to unblock the 5.0.0 release.

Differential Revision: https://reviews.llvm.org/D37220

llvm-svn: 312022
2017-08-29 18:41:00 +00:00
Craig Topper 029a21dfdc [DAGCombiner] Teach visitEXTRACT_SUBVECTOR to turn extracts of BUILD_VECTOR into smaller BUILD_VECTORs
Only do this before operations are legalized of BUILD_VECTOR is Legal for the target.

Differential Revision: https://reviews.llvm.org/D37186

llvm-svn: 311892
2017-08-28 15:28:33 +00:00
Sanjay Patel a7a61d9768 [DAGCombiner] allow undef shuffle operands when eliminating bitcasts (PR34111)
As noted in the FIXME, this could be improved more, but this is the smallest fix
that helps:
https://bugs.llvm.org/show_bug.cgi?id=34111

llvm-svn: 311853
2017-08-27 17:29:30 +00:00
Jatin Bhateja e4ca95d6aa [DAGCombiner] Extending pattern detection for vector shuffle.
Summary:
If all the operands of a BUILD_VECTOR extract elements from same vector then split the
vector efficiently based on the maximum vector access index.

This will also fix PR 33784

Reviewers: zvi, delena, RKSimon, thakis

Reviewed By: RKSimon

Subscribers: chandlerc, eladcohen, llvm-commits

Differential Revision: https://reviews.llvm.org/D35788

llvm-svn: 311833
2017-08-26 19:02:36 +00:00
Jatin Bhateja b60cfbefac Revert rL311247 : To rectify commit message.
Summary: This reverts commit rL311247.

Differential Revision: https://reviews.llvm.org/D36927

llvm-svn: 311832
2017-08-26 19:02:17 +00:00
Sanjay Patel e404cbff66 [DAG] convert vector select-of-constants to logic/math
This goes back to a discussion about IR canonicalization. We'd like to preserve and convert
more IR to 'select' than we currently do because that's likely the best choice in IR:
http://lists.llvm.org/pipermail/llvm-dev/2016-September/105335.html
...but that's often not true for codegen, so we need to account for this pattern coming in
to the backend and transform it to better DAG ops.

Steps in this patch:

  1. Add an EVT param to the existing convertSelectOfConstantsToMath() TLI hook to more finely
     enable this transform. Other targets will probably want that anyway to distinguish scalars
     from vectors. We're using that here to exclude AVX512 targets, but it may not be necessary.

  2. Convert a vselect to ext+add. This eliminates a constant load/materialization, and the
     vector ext is often free.

Implementing a more general fold using xor+and can be a follow-up for targets that don't have
a legal vselect. It's also possible that we can remove the TLI hook for the special case fold
implemented here because we're eliminating a constant, but it needs to be tested on other
targets.

Differential Revision: https://reviews.llvm.org/D36840

llvm-svn: 311731
2017-08-24 23:24:43 +00:00
Hans Wennborg c39ec95d88 [DAG] Fix Node Replacement in PromoteIntBinOp
When one operand is a user of another in a promoted binary operation
we may replace and delete the returned value before returning
triggering an assertion. Reorder node replacements to prevent this.

Fixes PR34137.

Landing on behalf of Nirav.

Differential Revision: https://reviews.llvm.org/D36581

llvm-svn: 311623
2017-08-24 01:08:27 +00:00
Craig Topper 35189d5221 [SelectionDAG] Make ISD::isConstantSplatVector always return an element sized APInt.
This partially reverts r311429 in favor of making ISD::isConstantSplatVector do something not confusing. Turns out the only other user of it was also having to deal with the weird property of it returning a smaller size.

So rather than continue to deal with this quirk everywhere, just make the interface do something sane.

Differential Revision: https://reviews.llvm.org/D37039

llvm-svn: 311510
2017-08-22 23:54:13 +00:00
Jatin Bhateja 6b4c205685 [DAGCombiner] Extending pattern detection for vector shuffle.
Summary:
    If all the operands of a BUILD_VECTOR extract elements from same vector then split the
    vector efficiently based on the maximum vector access index.

    Reviewers: zvi, delena, RKSimon, thakis

    Reviewed By: RKSimon

    Subscribers: chandlerc, eladcohen, llvm-commits

    Differential Revision: https://reviews.llvm.org/D35788

llvm-svn: 311255
2017-08-19 18:08:59 +00:00
Jatin Bhateja 66f7958e91 Revert rL311247 : To rectify commit message.
Summary: This reverts commit rL311247.

Differential Revision: https://reviews.llvm.org/D36927

llvm-svn: 311252
2017-08-19 17:59:58 +00:00
Jatin Bhateja 6f0d0d23b0 Merge branch 'arcpatch-D35788'
llvm-svn: 311247
2017-08-19 17:00:04 +00:00
Jatin Bhateja 1c56863739 Revert rL311242 "Extension of shuffle vector pattern detection, updating post rebase."
Summary:

This reverts commit rL311242.

Differential Revision: https://reviews.llvm.org/D36924

llvm-svn: 311246
2017-08-19 16:40:06 +00:00
Jatin Bhateja 313f97dd84 Extension of shuffle vector pattern detection, updating post rebase.
llvm-svn: 311242
2017-08-19 15:58:36 +00:00
Craig Topper e3edd9c9be [DAGCombiner] Fix bad comment that had immediate values swapped from the code and what they need to be to make sense. NFC
llvm-svn: 311144
2017-08-18 04:52:46 +00:00
Simon Pilgrim 8be9f4af4f [DAGCombiner] Add support for non-uniform constant vectors to (mul x, (1 << c)) -> x << c
llvm-svn: 311083
2017-08-17 13:03:34 +00:00
Amaury Sechet 9c529b6be3 [DAGCombine] Do not try to deduplicate commutative operations if both operand are the same.
Summary: It is creating useless work as the commuted nodes is the same as the node we are working on in that case.

Reviewers: jyknight, nemanjai, mkuper, spatel, RKSimon, zvi, bkramer

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D33840

llvm-svn: 310832
2017-08-14 11:44:03 +00:00
Elad Cohen 3a90a0c10d Revert "[DAGCombiner] Extending pattern detection for vector shuffle (REAPPLIED)"
This reverts commit r310782.

llvm-svn: 310822
2017-08-14 09:06:00 +00:00
Craig Topper 2251ef95a3 [X86][ARM][TargetLowering] Add SrcVT to isExtractSubvectorCheap
Summary:
Without the SrcVT its hard to know what is really being asked for. For example if your target has 128, 256, and 512 bit vectors. Maybe extracting 128 from 256 is cheap, but maybe extracting 128 from 512 is not.

For x86 we do support extracting a quarter of a 512-bit register. But for i1 vectors we don't have isel patterns for extracting arbitrary pieces. So we need this to have a correct implementation of isExtractSubvectorCheap for mask vectors.

Reviewers: RKSimon, zvi, efriedma

Reviewed By: RKSimon

Subscribers: aemerson, javed.absar, kristof.beyls, llvm-commits

Differential Revision: https://reviews.llvm.org/D36649

llvm-svn: 310793
2017-08-13 17:29:07 +00:00
Simon Pilgrim 5a86f0e717 [DAGCombiner] Extending pattern detection for vector shuffle (REAPPLIED)
If all the operands of a BUILD_VECTOR extract elements from same vector then split the vector efficiently based on the maximum vector access index.

Reapplied with fix to only work with simple value types.

Committed on behalf of @jbhateja (Jatin Bhateja)

Differential Revision: https://reviews.llvm.org/D35788

llvm-svn: 310782
2017-08-12 17:43:25 +00:00
Sanjay Patel 169dae70a6 [x86] use more shift or LEA for select-of-constants (2nd try)
The previous rev (r310208) failed to account for overflow when subtracting the
constants to see if they're suitable for shift/lea. This version add a check
for that and more test were added in r310490.

We can convert any select-of-constants to math ops:
http://rise4fun.com/Alive/d7d

For this patch, I'm enhancing an existing x86 transform that uses fake multiplies
(they always become shl/lea) to avoid cmov or branching. The current code misses
cases where we have a negative constant and a positive constant, so this is just
trying to plug that hole.

The DAGCombiner diff prevents us from hitting a terrible inefficiency: we can start
with a select in IR, create a select DAG node, convert it into a sext, convert it
back into a select, and then lower it to sext machine code.

Some notes about the test diffs:

1. 2010-08-04-MaskedSignedCompare.ll - We were creating control flow that didn't exist in the IR.
2. memcmp.ll - Choose -1 or 1 is the case that got me looking at this again. We could avoid the 
   push/pop in some cases if we used 'movzbl %al' instead of an xor on a different reg? That's a 
   post-DAG problem though.
3. mul-constant-result.ll - The trade-off between sbb+not vs. setne+neg could be addressed if
   that's a regression, but those would always be nearly equivalent.
4. pr22338.ll and sext-i1.ll - These tests have undef operands, so we don't actually care about these diffs.
5. sbb.ll - This shows a win for what is likely a common case: choose -1 or 0.
6. select.ll - There's another borderline case here: cmp+sbb+or vs. test+set+lea? Also, sbb+not vs. setae+neg shows up again.
7. select_const.ll - These are motivating cases for the enhancement; replace cmov with cheaper ops.

Assembly differences between movzbl and xor to avoid a partial reg stall are caused later by the X86 Fixup SetCC pass.

Differential Revision: https://reviews.llvm.org/D35340

llvm-svn: 310717
2017-08-11 15:44:14 +00:00
Nirav Dave 0a48e5d506 Improve handling of insert_subvector of bitcast values
Fix insert_subvector / extract_subvector merges of bitcast values.

Reviewers: efriedma, craig.topper, RKSimon

Subscribers: RKSimon, llvm-commits

Differential Revision: https://reviews.llvm.org/D34571

llvm-svn: 310711
2017-08-11 13:21:41 +00:00
Simon Pilgrim 83cf3a29b5 [DAGCombiner] Remove shuffle support from simplifyShuffleMask
rL310372 enabled simplifyShuffleMask to support undef shuffle mask inputs, but its causing hangs.

Removing support until I can triage the problem

llvm-svn: 310699
2017-08-11 08:37:00 +00:00
Nirav Dave f8556ad48f Revert "[DAG] Cleanup unused nodes after store merge. NFCI."
This reverts commit r310648 which causes an unexpected assertion failure

llvm-svn: 310659
2017-08-10 21:03:36 +00:00
Nirav Dave 4d28c0ff4f [DAG] Relax type restriction for store merge
Summary: Allow stores of bitcastable types to be merged by peeking through BITCAST nodes and recasting stored values constant and vector extract nodes as necessary.

Reviewers: jyknight, hfinkel, efriedma, RKSimon, spatel

Reviewed By: RKSimon

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D34569

llvm-svn: 310655
2017-08-10 19:52:45 +00:00
Nirav Dave 99d9d24553 [DAG] Cleanup unused nodes after store merge. NFCI.
llvm-svn: 310648
2017-08-10 18:53:14 +00:00
Nirav Dave 06242a99ce [DAG] Rewrite expression. NFC.
llvm-svn: 310608
2017-08-10 15:29:33 +00:00
Nirav Dave 6110d3ad00 [DAG] Explicitly cleanup merged load values during store merge. NFCI.
llvm-svn: 310474
2017-08-09 13:37:07 +00:00
Nirav Dave 8a813cf646 [DAG] Introduce peekThroughBitcast function. NFCI.
llvm-svn: 310405
2017-08-08 20:01:18 +00:00
Nirav Dave 515116d7c2 [DAG] Update comments. NFC.
llvm-svn: 310404
2017-08-08 19:52:19 +00:00
Simon Pilgrim 91b7b991d4 [DAGCombiner] simplifyShuffleMask - handle UNDEF inputs from shuffles as well as BUILD_VECTOR
Minor extension to D36393

llvm-svn: 310372
2017-08-08 16:10:33 +00:00
Simon Pilgrim ef44228acb [DAGCombiner] Simplify shuffle mask index if the referenced input element is UNDEF
Fixes one of the cases in PR34041.

Differential Revision: https://reviews.llvm.org/D36393

llvm-svn: 310344
2017-08-08 11:03:30 +00:00
Sanjay Patel 807f92b8ff [x86] revert r310208 to investigate test-suite failures (PR34105 / PR34097)
llvm-svn: 310264
2017-08-07 15:47:48 +00:00
Nirav Dave 3d3bde7682 [DAG] Extend visitSCALAR_TO_VECTOR optimization to truncated vector.
Relanding after case to insert explicit truncation as necessary.

Allow SCALAR_TO_VECTOR of EXTRACT_VECTOR_ELT to reduce to
EXTRACT_SUBVECTOR of vector shuffle when output is smaller. Marginally
improves vector shuffle computations.

Reviewers: efriedma, RKSimon, spatel

Subscribers: javed.absar, llvm-commits

Differential Revision: https://reviews.llvm.org/D35566

llvm-svn: 310256
2017-08-07 14:07:49 +00:00
Sanjay Patel a923c2ee95 [x86] use more shift or LEA for select-of-constants
We can convert any select-of-constants to math ops:
http://rise4fun.com/Alive/d7d

For this patch, I'm enhancing an existing x86 transform that uses fake multiplies 
(they always become shl/lea) to avoid cmov or branching. The current code misses 
cases where we have a negative constant and a positive constant, so this is just 
trying to plug that hole.

The DAGCombiner diff prevents us from hitting a terrible inefficiency: we can start 
with a select in IR, create a select DAG node, convert it into a sext, convert it 
back into a select, and then lower it to sext machine code.

Some notes about the test diffs:

1. 2010-08-04-MaskedSignedCompare.ll - We were creating control flow that didn't exist in the IR.
2. memcmp.ll - Choose -1 or 1 is the case that got me looking at this again. I 
   think we could avoid the push/pop in some cases if we used 'movzbl %al' instead of an xor on 
   a different reg? That's a post-DAG problem though.
3. mul-constant-result.ll - The trade-off between sbb+not vs. setne+neg could be addressed if 
   that's a regression, but I think those would always be nearly equivalent.
4. pr22338.ll and sext-i1.ll - These tests have undef operands, so I don't think we actually care about these diffs.
5. sbb.ll - This shows a win for what I think is a common case: choose -1 or 0.
6. select.ll - There's another borderline case here: cmp+sbb+or vs. test+set+lea? Also, sbb+not vs. setae+neg shows up again.
7. select_const.ll - These are motivating cases for the enhancement; replace cmov with cheaper ops.

Assembly differences between movzbl and xor to avoid a partial reg stall are caused later by the X86 Fixup SetCC pass.

Differential Revision: https://reviews.llvm.org/D35340

llvm-svn: 310208
2017-08-06 16:27:07 +00:00
Nico Weber b24df62bb6 Revert r310058, it caused PR34073.
llvm-svn: 310118
2017-08-04 20:24:13 +00:00
Simon Pilgrim 5c63586489 [DAGCombiner] Extending pattern detection for vector shuffle.
If all the operands of a BUILD_VECTOR extract elements from same vector then split the vector efficiently based on the maximum vector access index.

Committed on behalf of @jbhateja (Jatin Bhateja)

Differential Revision: https://reviews.llvm.org/D35788

llvm-svn: 310058
2017-08-04 12:46:35 +00:00
Nirav Dave 3fc1c2365c [DAG] Allow merging of stores of vector loads
Remove restriction disallowing merging of stores vector loads into
larger store of larger vector load.

Reviewers: RKSimon, efriedma, spatel

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D36158

llvm-svn: 309951
2017-08-03 15:51:20 +00:00
Nirav Dave e44032f7e7 [DAG] Improve candidate pruning in store merge failure case. NFCI
During store merge we construct a sorted list of consecutive store
candidates and consider subsequences for merging into a single
store. For each subsequence we check if the stored value type is legal
the merged store would have valid and fast and if the constructed
value to be stored is valid. The only properties that affect this
check between subsequences is the size of the subsequence, the
alignment of the first store, the alignment of the stored load value
(when merging stores-of-loads), and whether the merged value is a
constant zero.

If we do not find a viable mergeable subsequence starting from the
first store of length N, we know that a subsequence starting at a
later store of length N will also fail unless the new store's
alignment, the new load's alignment (if we're merging store-of-loads),
or we've dropped stores of nonzero value and could construct a merged
stores of zero (for merging constants).

As a result if we fail to find a valid subsequence starting from the
first store we can safely skip considering subsequences that start
with subsequent stores unless one of the above properties is
true. This significantly (2x) improves compile time in some
pathological cases.

Reviewers: RKSimon, efriedma, zvi, spatel, waltl

Subscribers: grandinj, llvm-commits

Differential Revision: https://reviews.llvm.org/D35901

llvm-svn: 309830
2017-08-02 16:35:58 +00:00
Nirav Dave 15894f8dec [DAG] Refactor store merge subexpressions. NFC.
Distribute various expressions across ifs.

llvm-svn: 309777
2017-08-02 01:08:38 +00:00
Matt Arsenault acc5e82b0e DAG: Undo and->or combine with FrameIndexes
This pattern shows up when lowering byval copies on AMDGPU.

The byval object access is split into 4-byte chunks, adding a
constant offset to the FixedStack base. When some of the offsets
turn into ors, this prevents combining the constant offsets.

This makes it not apparent that the object is there when matching
addressing modes, so it ends up using a scratch wave offset
relative access and the lengthy frame index expansion for that.

llvm-svn: 309775
2017-08-02 00:43:42 +00:00
Nirav Dave dcc5afaad9 [DAG] Factor out common expressions. NFC.
llvm-svn: 309740
2017-08-01 20:30:52 +00:00
Nirav Dave 35dd1ac29c Pull out VectorNumElements value. NFC.
llvm-svn: 309719
2017-08-01 18:19:56 +00:00
Nirav Dave 27a6605bdc Revert "[DAG] Extend visitSCALAR_TO_VECTOR optimization to truncated vector."
This reverts commit r309680 which appears to be raising an assertion
in the test-suite.

llvm-svn: 309717
2017-08-01 18:09:25 +00:00
Nirav Dave f54c8370e5 [DAG] Convert extload check to equivalent type check. NFC.
Replace check with check that consuming store has the same type.

llvm-svn: 309708
2017-08-01 17:19:41 +00:00
Nirav Dave b5a0af6b6e [DAG] Move extload check in store merge. NFC.
Move candidate check from later check to initial candidate check.

llvm-svn: 309698
2017-08-01 16:00:47 +00:00
Nirav Dave b5cb48c6ae [DAG] Extend visitSCALAR_TO_VECTOR optimization to truncated vector.
Summary:
Allow SCALAR_TO_VECTOR of EXTRACT_VECTOR_ELT to reduce to
EXTRACT_SUBVECTOR of vector shuffle when output is smaller. Marginally
improves vector shuffle computations.

Reviewers: efriedma, RKSimon, spatel

Subscribers: javed.absar, llvm-commits

Differential Revision: https://reviews.llvm.org/D35566

llvm-svn: 309680
2017-08-01 13:45:35 +00:00
Zvi Rackover 092f199188 DAGCombiner: Extend reduceBuildVecToTrunc to handle non-zero offset
Summary:
Adding support for combining power2-strided build_vector's where the
first build_vectori's operand is extracted from a non-zero index.

Example:

 v4i32 build_vector((extract_elt V, 1),
                    (extract_elt V, 3),
                    (extract_elt V, 5),
                    (extract_elt V, 7))
 -->
 v4i32 truncate (bitcast (shuffle<1,u,3,u,5,u,7,u> V, u) to v4i64)

Reviewers: delena, RKSimon, guyblank

Reviewed By: RKSimon

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D35700

llvm-svn: 309108
2017-07-26 12:57:03 +00:00
Simon Pilgrim 6d59933175 [DAG] Move DAGCombiner::GetDemandedBits to SelectionDAG
This patch moves the DAGCombiner::GetDemandedBits function to SelectionDAG::GetDemandedBits as a first step towards making it easier for targets to get to the source of any demanded bits without the limitations of SimplifyDemandedBits.

Differential Revision: https://reviews.llvm.org/D35841

llvm-svn: 308983
2017-07-25 16:36:44 +00:00
Francois Pichet 82bf3de606 Fix endianness bug in DAGCombiner::visitTRUNCATE and visitEXTRACT_VECTOR_ELT
Summary:
Do not assume little endian architecture in DAGCombiner::visitTRUNCATE and DAGCombiner::visitEXTRACT_VECTOR_ELT.
PR33682

Reviewers: hfinkel, sdardis, RKSimon

Reviewed By: sdardis, RKSimon

Subscribers: uabelho, RKSimon, sdardis, llvm-commits

Differential Revision: https://reviews.llvm.org/D34990

llvm-svn: 308960
2017-07-25 09:40:35 +00:00
Nirav Dave 4e6dcf73f9 [DAG] Fix typo preventing some stores merges to truncated stores.
Check the actual memory type stored and not the extended value size
when considering if truncated store merge is worthwhile.

Reviewers: efriedma, RKSimon, spatel, jyknight

Reviewed By: efriedma

Subscribers: llvm-commits, nhaehnle

Differential Revision: https://reviews.llvm.org/D35623

llvm-svn: 308833
2017-07-23 02:06:28 +00:00
Xin Tong 495a3022da [DAGCombiner] Update comment. NFC
llvm-svn: 308772
2017-07-21 19:10:19 +00:00
Nirav Dave 4aa51c3af1 [DAG] Commit missed nit cleanup from r308617. NFC.
llvm-svn: 308645
2017-07-20 18:07:57 +00:00
Nirav Dave df86d2d008 [DAG] Handle missing transform in fold of value extension case.
Summary:
When pushing an extension of a constant bitwise operator on a load
into the load, change other uses of the load value if they exist to
prevent the old load from persisting.

Reviewers: spatel, RKSimon, efriedma

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D35030

llvm-svn: 308618
2017-07-20 13:57:32 +00:00
Nirav Dave 77cc6f23b9 [DAG] Optimize away degenerate INSERT_VECTOR_ELT nodes.
Summary:
Add missing vector write of vector read reduction, i.e.:

(insert_vector_elt x (extract_vector_elt x idx) idx) to x

Reviewers: spatel, RKSimon, efriedma

Reviewed By: RKSimon

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D35563

llvm-svn: 308617
2017-07-20 13:48:17 +00:00
Simon Pilgrim 2911296f10 [DAGCombiner] Match ISD::SRL non-uniform constant vectors patterns using predicates.
Use predicate matchers introduced in D35492 to match more ISD::SRL constant folds

llvm-svn: 308602
2017-07-20 11:03:30 +00:00
Simon Pilgrim b9ff25df59 Remove trailing whitespace. NFCI.
llvm-svn: 308601
2017-07-20 10:43:52 +00:00
Simon Pilgrim 7ff0e49d8c [DAGCombiner] Match ISD::SRA non-uniform constant vectors patterns using predicates.
Use predicate matchers introduced in D35492 to match more ISD::SRA constant folds

llvm-svn: 308600
2017-07-20 10:43:05 +00:00
Simon Pilgrim 9d7863b935 [DAGCombiner] Match non-uniform constant vectors using predicates.
Most combines currently recognise scalar and splat-vector constants, but not non-uniform vector constants.

This patch introduces a matching mechanism that uses predicates to check against BUILD_VECTOR of ConstantSDNode, as well as scalar ConstantSDNode cases.

I've changed a couple of predicates to demonstrate - the combine-shl changes add currently unsupported cases, while the MatchRotate replaces an existing mechanism.

Differential Revision: https://reviews.llvm.org/D35492

llvm-svn: 308598
2017-07-20 10:13:40 +00:00
Simon Pilgrim c77e262260 {DAGCombine] Convert (Val & Mask) == Mask to Mask.isSubsetof(Val). NFCI.
llvm-svn: 308460
2017-07-19 13:39:58 +00:00
Nirav Dave d839749ae8 [DAG] Improve Aliasing of operations to static alloca
Re-recommiting after landing DAG extension-crash fix.

Recommiting after adding check to avoid miscomputing alias information
on addresses of the same base but different subindices.

Memory accesses offset from frame indices may alias, e.g., we
may merge write from function arguments passed on the stack when they
are contiguous. As a result, when checking aliasing, we consider the
underlying frame index's offset from the stack pointer.

Static allocs are realized as stack objects in SelectionDAG, but its
offset is not set until post-DAG causing DAGCombiner's alias check to
consider access to static allocas to frequently alias. Modify isAlias
to consider access between static allocas and access from other frame
objects to be considered aliasing.

Many test changes are included here. Most are fixes for tests which
indirectly relied on our aliasing ability and needed to be modified to
preserve their original intent.

The remaining tests have minor improvements due to relaxed
ordering. The exception is CodeGen/X86/2011-10-19-widen_vselect.ll
which has a minor degradation dispite though the pre-legalized DAG is
improved.

Reviewers: rnk, mkuper, jonpa, hfinkel, uweigand

Reviewed By: rnk

Subscribers: sdardis, nemanjai, javed.absar, llvm-commits

Differential Revision: https://reviews.llvm.org/D33345

llvm-svn: 308350
2017-07-18 20:06:24 +00:00
Nirav Dave 041b87758a [DAG] Reverse node replacement in extension operation. NFCI.
Reorder replacements to be user first in preparation for multi-level
folding to premptively avoid inadvertantly deleting later nodes from
sharing found from replacement.

llvm-svn: 308348
2017-07-18 19:49:20 +00:00
Nirav Dave 07871007aa [DAG] Avoid deleting nodes before combining them.
When replacing a node and it's operand, replacing the operand node may
cause the deletion of the original node leading to an assertion
failure. Case around these replacements to avoid this without relying
on inspecting the DELETED_NODE opcode in various extend
dagcombiner cases.

Fixes PR32515.

Reviewers: dbabokin, RKSimon, davide, chandlerc

Subscribers: chandlerc, llvm-commits

Differential Revision: https://reviews.llvm.org/D34095

llvm-svn: 308330
2017-07-18 17:39:15 +00:00
Nirav Dave f87c8e82f6 [DAG] Allow base element type of store merge type to also be a vector.
Correctly calculate merged vector size if MemVT is already a vector.

llvm-svn: 308312
2017-07-18 14:39:09 +00:00
Simon Pilgrim 4793a11df9 [DAGCombine] Fix issue with out of bound constant rotation (PR33828)
Take the modulo of rotations by a constant greater than or equal to the bit-width

llvm-svn: 308302
2017-07-18 12:31:46 +00:00
Chandler Carruth a15e080b05 Revert r308025 due to uncovering a crash in SelectionDAG. This is filed
with a minimal test case in http://llvm.org/PR33833.

Original commit message:
  Improve Aliasing of operations to static alloca

llvm-svn: 308271
2017-07-18 07:53:47 +00:00
Andrew Zhogin 67a64041b9 [DAGCombiner] Recognise vector rotations with non-splat constants
Fixes PR33691.

Differential revision: https://reviews.llvm.org/D35381

llvm-svn: 308150
2017-07-16 23:11:45 +00:00
Simon Pilgrim e7a2e6bdf1 Strip trailing whitespace. NFCI
llvm-svn: 308108
2017-07-15 19:29:19 +00:00
Nirav Dave a8f63af9d1 Improve Aliasing of operations to static alloca
Recommiting after adding check to avoid miscomputing alias information
on addresses of the same base but different subindices.

Memory accesses offset from frame indices may alias, e.g., we
may merge write from function arguments passed on the stack when they
are contiguous. As a result, when checking aliasing, we consider the
underlying frame index's offset from the stack pointer.

Static allocs are realized as stack objects in SelectionDAG, but its
offset is not set until post-DAG causing DAGCombiner's alias check to
consider access to static allocas to frequently alias. Modify isAlias
to consider access between static allocas and access from other frame
objects to be considered aliasing.

Many test changes are included here. Most are fixes for tests which
indirectly relied on our aliasing ability and needed to be modified to
preserve their original intent.

The remaining tests have minor improvements due to relaxed
ordering. The exception is CodeGen/X86/2011-10-19-widen_vselect.ll
which has a minor degradation dispite though the pre-legalized DAG is
improved.

Reviewers: rnk, mkuper, jonpa, hfinkel, uweigand

Reviewed By: rnk

Subscribers: sdardis, nemanjai, javed.absar, llvm-commits

Differential Revision: https://reviews.llvm.org/D33345

llvm-svn: 308025
2017-07-14 13:56:21 +00:00
Simon Pilgrim bb85cb16e3 [DAGCombiner] Fix issue with rotate combines asserting if the constant value types differ from the result type.
llvm-svn: 307900
2017-07-13 10:41:49 +00:00
Simon Pilgrim 2dc42b7202 Use isNullConstantOrNullSplatConstant helper. NFCI.
llvm-svn: 307895
2017-07-13 09:39:00 +00:00
Matthias Braun b38736706e Revert "[DAG] Improve Aliasing of operations to static alloca"
Reverting as it breaks tramp3d-v4 in the llvm test-suite. I added some
comments to https://reviews.llvm.org/D33345 about it.

This reverts commit r307546.

llvm-svn: 307589
2017-07-10 20:51:30 +00:00
Nirav Dave 4dcad5dc6b Add DAG argument to canMergeStoresTo NFC.
llvm-svn: 307583
2017-07-10 20:25:54 +00:00
Nirav Dave 163e1ad9dc [DAG] Improve Aliasing of operations to static alloca
Memory accesses offset from frame indices may alias, e.g., we
may merge write from function arguments passed on the stack when they
are contiguous. As a result, when checking aliasing, we consider the
underlying frame index's offset from the stack pointer.

Static allocs are realized as stack objects in SelectionDAG, but its
offset is not set until post-DAG causing DAGCombiner's alias check to
consider access to static allocas to frequently alias. Modify isAlias
to consider access between static allocas and access from other frame
objects to be considered aliasing.

Many test changes are included here. Most are fixes for tests which
indirectly relied on our aliasing ability and needed to be modified to
preserve their original intent.

The remaining tests have minor improvements due to relaxed
ordering. The exception is CodeGen/X86/2011-10-19-widen_vselect.ll
which has a minor degradation dispite though the pre-legalized DAG is
improved.

Reviewers: rnk, mkuper, jonpa, hfinkel, uweigand

Reviewed By: rnk

Subscribers: sdardis, nemanjai, javed.absar, llvm-commits

Differential Revision: https://reviews.llvm.org/D33345

llvm-svn: 307546
2017-07-10 15:39:41 +00:00
Hiroshi Inoue a86c920b1e fix typos in comments and error messages; NFC
llvm-svn: 307533
2017-07-10 12:44:25 +00:00
Sanjay Patel 4cea2ec254 [DAGCombiner] use local variable to shorten code; NFCI
llvm-svn: 307429
2017-07-07 19:34:42 +00:00
Simon Pilgrim ac78daf517 {DAGCombiner] Fold (rot x, 0) -> x
llvm-svn: 307184
2017-07-05 18:27:11 +00:00
Andrew Zhogin 45d192823e [DAGCombiner] visitRotate patch to optimize pair of ROTR/ROTL instructions into one with combined shift operand.
For two ROTR operations with shifts C1, C2; combined shift operand will be (C1 + C2) % bitsize.

Differential revision: https://reviews.llvm.org/D12833

llvm-svn: 307179
2017-07-05 17:55:42 +00:00
Hiroshi Inoue 79f8933f23 fix trivial typos in comments; NFC
llvm-svn: 307094
2017-07-04 16:35:26 +00:00
Andrew Zhogin de5d250a0b [DAGCombiner] Intermediate variables in visitRotate promoted to the function's begin. NFC precommit for D12833.
llvm-svn: 307091
2017-07-04 15:57:39 +00:00
Zvi Rackover d7a1c334ce DAGCombine: Combine BUILD_VECTOR to TRUNCATE
Summary:
Add a combine for creating a truncate to replace a build_vector composed of extracts with
indices that form a stride-2^N series.

Example:
v8i32 V = ...

v4i32 build_vector((extract_elt V, 0), (extract_elt V, 2), (extract_elt V, 4), (extract_elt V, 6))
-->
v4i32 truncate (bitcast V to v4i64)

Related discussion in llvm-dev about canonicalizing shuffles to
truncates in LLVM IR:
http://lists.llvm.org/pipermail/llvm-dev/2017-January/108936.html.

Reviewers: spatel, RKSimon, efriedma, igorb, craig.topper, wolfgangp, delena

Reviewed By: delena

Subscribers: guyblank, delena, javed.absar, llvm-commits

Differential Revision: https://reviews.llvm.org/D34077

llvm-svn: 307036
2017-07-03 15:47:40 +00:00
Nirav Dave 168c5a6a40 [DAG] Fold FrameIndex offset into BaseIndexOffset analysis. NFCI.
Relanding after restricting equalBaseIndex to not erroneuosly consider
a FrameIndices stemming from alloca from being comparable as its
offset is set post-selectionDAG.

Pull FrameIndex comparision reasoning from DAGCombiner::isAlias to
general BaseIndexOffset.

llvm-svn: 306688
2017-06-29 15:48:11 +00:00
Stanislav Mekhanoshin a45584bebe Fold fneg and fabs like multiplications
Given no NaNs and no signed zeroes it folds:

(fmul X, (select (fcmp X > 0.0), -1.0, 1.0)) -> (fneg (fabs X))
(fmul X, (select (fcmp X > 0.0), 1.0, -1.0)) -> (fabs X)

Differential Revision: https://reviews.llvm.org/D34579

llvm-svn: 306592
2017-06-28 20:25:50 +00:00
Nirav Dave c4ce2293b0 Revert "[DAG] Fold FrameIndex offset into BaseIndexOffset analysis. NFCI."
This reverts commit r306498 which appears to cause a compilrt-rt test failures

llvm-svn: 306501
2017-06-28 03:20:04 +00:00
Stanislav Mekhanoshin eb40733bf0 Allow to truncate left shift with non-constant shift amount
That is pretty common for clang to produce code like
(shl %x, (and %amt, 31)). In this situation we can still perform
trunc (shl) into shl (trunc) conversion given the known value
range of shift amount.

Differential Revision: https://reviews.llvm.org/D34723

llvm-svn: 306499
2017-06-28 02:37:11 +00:00
Nirav Dave 8ef03802f1 [DAG] Fold FrameIndex offset into BaseIndexOffset analysis. NFCI.
Pull FrameIndex comparision reasoning from DAGCombiner::isAlias to
general BaseIndexOffset.

llvm-svn: 306498
2017-06-28 02:09:50 +00:00
Hiroshi Inoue 84aafee4fb [SelectionDAG] set dereferenceable flag in MergeConsecutiveStores to fix assetion failure
When SelectionDAG merges consecutive stores and loads in MergeConsecutiveStores, it does not set dereferenceable flag for a created load instruction. This results in an assertion failure if SelectionDAG commonizes this load instruction with other load instructions, as well as it may miss optimization opportunities.

This patch sat dereferenceable flag for the newly created load instruction if all the load instructions to be merged are dereferenceable.

Differential Revision: https://reviews.llvm.org/D34679

llvm-svn: 306404
2017-06-27 12:43:08 +00:00
Wolfgang Pieb 9f65858235 DAGCombine: Make sure we only eliminate trunc/extend when the scales of truncation and extension match.
This fixes PR33368.

Reviewer: rksimon

Differential Revision:  https://reviews.llvm.org/D34069

llvm-svn: 306345
2017-06-26 23:05:51 +00:00
Nirav Dave f2c349ccec [DAG] Add Target Store Merge pass ordering function
Allow targets to specify if they should merge stores before or after
legalization.

llvm-svn: 306006
2017-06-22 15:07:49 +00:00
Nirav Dave c1b6aa77bb [DAG] Move BaseIndexOffset into separate Libarary. NFC.
Move BaseIndexOffset analysis out of DAGCombiner for use in other
files.

llvm-svn: 305921
2017-06-21 15:40:43 +00:00
Nirav Dave 9a69d444a3 [DAG] Remove Node csonstruction from BaseIndexOffset match. NFCI.
Move GlobalAddress Offset decomposition from initial match into
comparision check and removing the possibility of constructing a new
offseted global address when examining addresses.

llvm-svn: 305917
2017-06-21 15:07:30 +00:00
Guy Blank 52d73fce85 [DAGCombiner] Add another combine from build vector to shuffle
Add support for combining a build vector to a shuffle.
When the build vector is of extracted elements from 2 vectors (vec1, vec2) where vec2 is 2 times smaller than vec1.

llvm-svn: 305883
2017-06-21 07:38:41 +00:00
Nirav Dave 47a78a2502 [DAG] Simplify BaseIndexOffset. NFCI.
Remove tail calls and cleanup codeflow.

llvm-svn: 305768
2017-06-20 02:48:39 +00:00
Nirav Dave 8dcd008d18 Allow truncated and extend memory operations in Store Merge. NFCI.
As all store merges checks are based on the memory operation
performed, allow use of truncated stores and extended loads as valid
input candidates for merging.

Relanding after fixing selection between truncated and normal store.

llvm-svn: 305701
2017-06-19 15:32:28 +00:00
Craig Topper 288b3c9e69 [SelectionDAG] Use APInt::isSubsetOf. NFC
llvm-svn: 305606
2017-06-16 23:19:14 +00:00
Craig Topper ea5b8bc9ef [SelectionDAG] Use APInt::isNullValue/isOneValue. NFC
llvm-svn: 305605
2017-06-16 23:19:12 +00:00
Ahmed Bougacha d41a9c4e66 Revert "[DAG] Allow truncated and extend memory operations in Store Merge. NFCI."
This reverts commit r305468, as it caused PR33475.

llvm-svn: 305527
2017-06-15 23:29:47 +00:00
Nirav Dave 9d79cade42 [DAG] As StoreMerge now generates only legal nodes remove unecessary guard when run post-legalization NFCI.
llvm-svn: 305477
2017-06-15 16:27:49 +00:00
Nirav Dave be2674a598 [DAG] Defer Pre/Post IndexStore merge to after mergestore. NFCI.
In preparation for doing storemerge post-legalization, reorder
visitSTORE passes to move pre/post-index combining after store
merge. Reordered passes other than store merge are unaffected.

llvm-svn: 305473
2017-06-15 15:05:48 +00:00
Nirav Dave 9464c72850 [DAG] Allow truncated and extend memory operations in Store Merge. NFCI.
As all store merges checks are based on the memory operation
performed, allow use of truncated stores and extended loads as valid
input candidates for merging.

llvm-svn: 305468
2017-06-15 14:04:07 +00:00
Nirav Dave 6a41822ba7 [DAG] Make MergeStores generate legalized stores. NFCI.
Realized merged stores as truncstores if store will be realized as
such by legalization.

llvm-svn: 305467
2017-06-15 13:34:54 +00:00
Nirav Dave 9a4998980d [DAG] Use correct size for truncated store merge of load. NFCI.
Avoid non-legal memory ops by checking correct size when merging
stores of loads into a extload-truncstore pair.

llvm-svn: 305466
2017-06-15 13:28:06 +00:00
Sanjay Patel d4765a38b4 [DAG] add helper to bind memop chains; NFCI
This step is just intended to reduce code duplication rather than change any functionality.

A follow-up would be to replace PPCTargetLowering::spliceIntoChain() usage with this new helper.

Differential Revision: https://reviews.llvm.org/D33649

llvm-svn: 305192
2017-06-12 14:41:48 +00:00
Amaury Sechet 2127452ff7 [DAGCombine] Make sure we check the ResNo from UADDO before combining
Summary: UADDO has 2 result, and one must check the result no before doing any kind of combine. Without it, the transform is invalid.

Reviewers: joerg

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D34088

llvm-svn: 305162
2017-06-11 11:36:38 +00:00
Nirav Dave 772ea3ae1a [DAG] Improve Store Merge candidate pruning. NFC.
When considering merging stores values are the results of loads only
consider stores whose values come from loads from the same base.

This fixes much of the longer compile times in PR33330.

llvm-svn: 304934
2017-06-07 18:51:56 +00:00
Simon Pilgrim be8866f691 [DAG] Move SelectionDAG::isCommutativeBinOp to TargetLowering.
This will allow commutation of target-specific DAG nodes in future patches

Differential Revision: https://reviews.llvm.org/D33882

llvm-svn: 304911
2017-06-07 14:05:04 +00:00
Sanjay Patel 6350de76fa [DAGCombine] Fix unchecked calls to DAGCombiner::*ExtPromoteOperand
Other calls to DAGCombiner::*PromoteOperand check the result, but here it could cause an assertion in getNode. 
Falling back to any extend in this case instead of failing outright seems correct to me.

No test case because:
The failure was triggered by an out of tree backend. In order to trigger it, a backend would need to overload 
TargetLowering::IsDesirableToPromoteOp to return true for a type for which ISD::SIGN_EXTEND_INREG is marked 
illegal. In tree, only X86 overloads and sometimes returns true for MVT::i16 yet it marks 
setOperationAction(ISD::SIGN_EXTEND_INREG, MVT::i16  , Legal);.

Patch by Jacob Young!

Differential Revision: https://reviews.llvm.org/D33633

llvm-svn: 304723
2017-06-05 17:01:10 +00:00
Nirav Dave 4952871630 [SDAG] Fix CombineTo ordering in visitZERO_EXTEND and visitSIGN_EXTEND
Reorder CombineTo Calls to prevent references to stale/deleted SDNodes which caused undue assertions.

Reviewers: dbabokin

Subscribers: aemerson, rengolin, llvm-commits

Differential Revision: https://reviews.llvm.org/D31625

llvm-svn: 304460
2017-06-01 19:33:50 +00:00
Matt Arsenault b083570532 DAG: Remove pointless type check
These are only integer operations.

llvm-svn: 304417
2017-06-01 14:49:46 +00:00
Amaury Sechet c84cc230b3 Only generate addcarry node when it is legal.
Summary:
This is a problem uncovered by stage2 testing. ADDCARRY end up being generated on target that do not support it.

The patch that introduced the problem has other patches layed on top of it, so we want to fix the issue rather than revert it to avoid creating a lor of churn.

A regression test will be added shortly, but this is committed as this in order to get the build back to green promptly.

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D33770

llvm-svn: 304409
2017-06-01 12:03:16 +00:00
Amaury Sechet 251ea8a4f8 Do not legalize large setcc with setcce, introduce setcccarry and do it with usubo/setcccarry.
Summary:
This is a continuation of the work started in D29872 . Passing the carry down as a value rather than as a glue allows for further optimizations. Introducing setcccarry makes the use of addc/subc unecessary and we can start the removal process.

This patch only introduce the optimization strictly required to get the same level of optimization as was available before nothing more.

Reviewers: jyknight, nemanjai, mkuper, spatel, RKSimon, zvi, bkramer

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D33374

llvm-svn: 304404
2017-06-01 11:14:17 +00:00
Amaury Sechet 9c5d1e966b [DAGCombine] Refactor common addcarry pattern.
Summary: This pattern is no very useful per se, but it exposes optimization for toehr patterns that wouldn't kick in otherwize. It's very common and worth optimizing for.

Reviewers: jyknight, nemanjai, mkuper, spatel, RKSimon, zvi, bkramer

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D32756

llvm-svn: 304402
2017-06-01 10:48:04 +00:00
Amaury Sechet 2e43cb6d03 [DAGCombine] (add/uaddo X, Carry) -> (addcarry X, 0, Carry)
Summary:
This enables further transforms.

Depends on D32916

Reviewers: jyknight, nemanjai, mkuper, spatel, RKSimon, zvi, bkramer

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D32925

llvm-svn: 304401
2017-06-01 10:42:39 +00:00
Nirav Dave 7c70fddba6 [DAG] Avoid use of stale store.
Correct references to alignment of store which may be deleted in a
previous iteration of merge. Instead use first store that would be
merged.

Corrects pr33172's use-after-poison caught by ASan.

Reviewers: spatel, hfinkel, RKSimon

Reviewed By: RKSimon

Subscribers: thegameg, javed.absar, llvm-commits

Differential Revision: https://reviews.llvm.org/D33686

llvm-svn: 304299
2017-05-31 13:36:17 +00:00
Sanjay Patel 51152a3727 [DAGCombiner] fix load narrowing transform to exclude loads with extension
The extending load possibility was missed in:
https://reviews.llvm.org/rL304072

We might want to handle this cases as a follow-up, but bailing out for now
to avoid miscompiling.

llvm-svn: 304153
2017-05-29 13:24:58 +00:00
Sanjay Patel 33f4a97287 [DAGCombiner] use narrow load to avoid vector extract
If we have (extract_subvector(load wide vector)) with no other users, 
that can just be (load narrow vector). This is intentionally conservative.
Follow-ups may loosen the one-use constraint to account for the extract cost
or just remove the one-use check.

The memop chain updating is based on code that already exists multiple times
in x86 lowering, so that should be pulled into a helper function as a follow-up.

Background: this is a potential improvement noticed via regressions caused by
making x86's peekThroughBitcasts() not loop on consecutive bitcasts (see 
comments in D33137).

Differential Revision: https://reviews.llvm.org/D33578

llvm-svn: 304072
2017-05-27 14:07:03 +00:00
Benjamin Kramer debb3c35e0 Make helper functions static. NFC.
llvm-svn: 304029
2017-05-26 20:09:00 +00:00
Sanjay Patel ec13ebf2c8 [DAGCombiner] use narrow vector ops to eliminate concat/extract (PR32790)
In the best case:
extract (binop (concat X1, X2), (concat Y1, Y2)), N --> binop XN, YN
...we kill all of the extract/concat and just have narrow binops remaining.

If only one of the binop operands is amenable, this transform is still
worthwhile because we kill some of the extract/concat.

Optional bitcasting makes the code more complicated, but there doesn't
seem to be a way to avoid that.

The TODO about extending to more than bitwise logic is there because we really
will regress several x86 tests including madd, psad, and even a plain
integer-multiply-by-2 or shift-left-by-1. I don't think there's anything
fundamentally wrong with this patch that would cause those regressions; those
folds are just missing or brittle.

If we extend to more binops, I found that this patch will fire on at least one
non-x86 regression test. There's an ARM NEON test in
test/CodeGen/ARM/coalesce-subregs.ll with a pattern like:

            t5: v2f32 = vector_shuffle<0,3> t2, t4
          t6: v1i64 = bitcast t5
          t8: v1i64 = BUILD_VECTOR Constant:i64<0>
        t9: v2i64 = concat_vectors t6, t8
      t10: v4f32 = bitcast t9
    t12: v4f32 = fmul t11, t10
  t13: v2i64 = bitcast t12
t16: v1i64 = extract_subvector t13, Constant:i32<0>

There was no functional change in the codegen from this transform from what I
could see though.

For the x86 test changes:

1. PR32790() is the closest call. We don't reduce the AVX1 instruction count in that case,
   but we improve throughput. Also, on a core like Jaguar that double-pumps 256-bit ops,
   there's an unseen win because two 128-bit ops have the same cost as the wider 256-bit op.
   SSE/AVX2/AXV512 are not affected which is expected because only AVX1 has the extract/concat
   ops to match the pattern.
2. do_not_use_256bit_op() is the best case. Everyone wins by avoiding the concat/extract.
   Related bug for IR filed as: https://bugs.llvm.org/show_bug.cgi?id=33026
3. The SSE diffs in vector-trunc-math.ll are just scheduling/RA, so nothing real AFAICT.
4. The AVX1 diffs in vector-tzcnt-256.ll are all the same pattern: we reduced the instruction
   count by one in each case by eliminating two insert/extract while adding one narrower logic op.

https://bugs.llvm.org/show_bug.cgi?id=32790

Differential Revision: https://reviews.llvm.org/D33137

llvm-svn: 303997
2017-05-26 15:33:18 +00:00
Nirav Dave 689709c928 [DAG] Move legal type checks in store merge to be checked only
on non-legal cases. NFC.

llvm-svn: 303994
2017-05-26 14:37:27 +00:00
Nirav Dave 7a8717d216 [DAG] Prevent crashes when merging constant stores with high-bit set. NFC.
llvm-svn: 303802
2017-05-24 19:56:39 +00:00
Nirav Dave 6c910c0dd8 [DAG] Add AddressSpace parameter to canMergeStoresTo. NFC.
llvm-svn: 303673
2017-05-23 18:53:02 +00:00
Nirav Dave 3b4f7cc0b3 [DAG] Add canMergeStoresTo predicate checks. NFCI.
Propagate canMergeStoresTo checks to missing cases in StoreMerge.

llvm-svn: 303668
2017-05-23 18:33:09 +00:00
Nirav Dave e00da22ef3 [DAG] Rework store merge to loop on load candidates. NFCI.
Continue to consider remaining candidate merges until all possible
merges have been considered.

llvm-svn: 303560
2017-05-22 15:33:47 +00:00
Amaury Sechet 77cfb4a85f [DAGCombine] (addcarry 0, 0, X) -> (ext/trunc X)
Summary:
While this makes some case better and some case worse - so it's unclear if it is a worthy combine just by itself - this is a useful canonicalisation.

As per discussion in D32756 .

Reviewers: jyknight, nemanjai, mkuper, spatel, RKSimon, zvi, bkramer

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D32916

llvm-svn: 303441
2017-05-19 18:20:44 +00:00
Nirav Dave da8f221273 Elide stores which are overwritten without being observed.
Summary:
In SelectionDAG, when a store is immediately chained to another store
to the same address, elide the first store as it has no observable
effects. This is causes small improvements dealing with intrinsics
lowered to stores.

Test notes:

* Many testcases overwrite store addresses multiple times and needed
  minor changes, mainly making stores volatile to prevent the
  optimization from optimizing the test away.

* Many X86 test cases optimized out instructions associated with
  associated with va_start.

* Note that test_splat in CodeGen/AArch64/misched-stp.ll no longer has
  dependencies to check and can probably be removed and potentially
  replaced with another test.

Reviewers: rnk, john.brawn

Subscribers: aemerson, rengolin, qcolombet, jyknight, nemanjai, nhaehnle, javed.absar, llvm-commits

Differential Revision: https://reviews.llvm.org/D33206

llvm-svn: 303198
2017-05-16 19:43:56 +00:00
Nirav Dave cfd357a61a [DAG] Prune deleted nodes in TokenFactor
Fix visitTokenFactor to correctly remove deleted nodes. NFC.

llvm-svn: 303181
2017-05-16 15:49:02 +00:00
Tim Shen 10c64e6aea [PPC] Move the combine "a << (b % (sizeof(a) * 8)) -> (PPCshl a, b)" to the backend. NFC.
Summary:
Eli pointed out that it's unsafe to combine the shifts to ISD::SHL etc.,
because those are not defined for b > sizeof(a) * 8, even after some of
the combiners run.

However, PPCISD::SHL defines that behavior (as the instructions themselves).
Move the combination to the backend.

The tests in shift_mask.ll still pass.

Reviewers: echristo, hfinkel, efriedma, iteratee

Subscribers: nemanjai, llvm-commits

Differential Revision: https://reviews.llvm.org/D33076

llvm-svn: 302937
2017-05-12 19:25:37 +00:00
Simon Pilgrim eabf6fc4b5 [DAGCombine] Use SelectionDAG::getAnyExtOrTrunc helper. NFCI.
llvm-svn: 302907
2017-05-12 15:26:50 +00:00
Simon Pilgrim f01c301f72 [DAGCombine] Use SelectionDAG::getZExtOrTrunc helper. NFCI.
llvm-svn: 302897
2017-05-12 13:22:12 +00:00