Commit Graph

16916 Commits

Author SHA1 Message Date
Wolfgang Pieb ce13e716c5 [DWARF] Null out the debug locs of load instructions that have been moved by GVN
performing partial redundancy elimination (PRE). Not doing so can cause jumpy line
tables and confusing (though correct) source attributions.

Differential Revision: https://reviews.llvm.org/D27857

llvm-svn: 291037
2017-01-04 23:58:26 +00:00
Mehdi Amini 19ef4fad91 Use lazy-loading of Metadata in MetadataLoader when importing is enabled (NFC)
Summary:
This is a relatively simple scheme: we use the index emitted in the
bitcode to avoid loading all the global metadata. Instead we load
the index with their position in the bitcode so that we can load each
of them individually. Materializing the global metadata block in this
condition only triggers loading the named metadata, and the ones
referenced from there (transitively). When materializing a function,
metadata from the global block are loaded lazily as they are
referenced.

Two main current limitations are:

1) Global values other than functions are not materialized on demand,
so we need to eagerly load METADATA_GLOBAL_DECL_ATTACHMENT records
(and their transitive dependencies).
2) When we load a single metadata, we don't recurse on the operands,
instead we use a placeholder or a temporary metadata. Unfortunately
tepmorary nodes are very expensive. This is why we don't have it
always enabled and only for importing.

These two limitations can be lifted in a subsequent improvement if
needed.

With this change, the total link time of opt with ThinLTO and Debug
Info enabled is going down from 282s to 224s (~20%).

Reviewers: pcc, tejohnson, dexonsmith

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D28113

llvm-svn: 291027
2017-01-04 22:54:33 +00:00
Matt Arsenault 3bdd75d01e InstCombine: Fold cos(-x) -> cos(x)
Also cos(fabs(x)) -> cos(x)

llvm-svn: 291022
2017-01-04 22:49:03 +00:00
Daniel Berlin 6cc5e44068 NewGVN: Track the maximum number of iterations GVN takes on any function, so we can pinpoint performance issues.
llvm-svn: 291002
2017-01-04 21:01:02 +00:00
Robert Lougher 5bf0416f45 Reapply "[SimplifyCFG] In sinkLastInstruction correctly set debugloc of common inst"
This reapplies r289828 (reverted in r289833 as it broke the address sanitizer).  The
debugloc is now only set when the instruction is not a call, as this causes the
verifier to assert (the inliner requires an inlinable callsite to have a debug loc
if the caller and callee have debug info).

Original commit message:

Simplify CFG will try to sink the last instruction in a series of basic blocks,
creating a "common" instruction in the successor block (sinkLastInstruction).
When it does this, the debug location of the single instruction should be the
merged debug locations of the commoned instructions.

Original review: https://reviews.llvm.org/D27590

llvm-svn: 290973
2017-01-04 17:40:32 +00:00
David Majnemer cb892e9066 [InstCombine] Move casts around shift operations
It is possible to perform a left shift before zero extending if the
shift would only shift out zeros.

llvm-svn: 290928
2017-01-04 02:21:34 +00:00
David Majnemer 022d2a563b [InstCombine] Combine adds across a zext
We can perform the following:
(add (zext (add nuw X, C1)), C2) -> (zext (add nuw X, C1+C2))

This is only possible if C2 is negative and C2 is greater than or equal to negative C1.

llvm-svn: 290927
2017-01-04 02:21:31 +00:00
Matt Arsenault 56ff4839ae InstCombine: Fold fabs on select of constants
llvm-svn: 290913
2017-01-03 22:40:34 +00:00
Sanjay Patel f0d1e77373 [InstCombine] use 'match' to reduce code bloat; NFCI
I wrote this patch before seeing the comment in:
https://reviews.llvm.org/D27114
...that suggests we should actually be canonicalizing the other way.

So just in case we decide this is the right way, we might as well
have a cleaner implementation.

llvm-svn: 290912
2017-01-03 22:25:31 +00:00
Matt Arsenault b264c94963 InstCombine: Add fma with constant transforms
DAGCombine already does these.

llvm-svn: 290860
2017-01-03 04:32:35 +00:00
Matt Arsenault 1cc294c85d InstCombine: Add fma + fabs/fneg transforms
fma (fneg x), (fneg y), z -> fma x, y, z
fma (fabs x), (fabs x), z -> fma x, x, z

llvm-svn: 290859
2017-01-03 04:32:31 +00:00
Sanjay Patel 1c9867d009 [EarlyCSE] less else, more auto; NFC
llvm-svn: 290848
2017-01-03 00:16:24 +00:00
Sanjay Patel b38ad88e9f [InstCombine] use combineMetadataForCSE instead of copying it; NFCI
llvm-svn: 290844
2017-01-02 23:25:28 +00:00
Xin Tong 2940231ff0 Make sure total loop body weight is preserved in loop peeling
Summary:
Regardless how the loop body weight is distributed, we should preserve
total loop body weight. i.e. we should have same weight reaching the body of the loop
or its duplicates in peeled and unpeeled case.

Reviewers: mkuper, davidxl, anemet

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D28179

llvm-svn: 290833
2017-01-02 20:27:23 +00:00
Daniel Berlin de43ef9601 NewGVN: Clean up after removing possibility of null expressions.
llvm-svn: 290828
2017-01-02 19:49:17 +00:00
Sanjay Patel 65d533ca42 fix typo; NFC
llvm-svn: 290827
2017-01-02 19:05:11 +00:00
Davide Italiano 67ada75d84 [NewGVN] Fold single-use variable inside the assertion.
It placates some bots which complain because they compile the
assertion out and think the variable is unused.

llvm-svn: 290825
2017-01-02 19:03:16 +00:00
Davide Italiano 841261624d [NewGVN] Restore old code to placate buildbots.
Apparently my suggestion of using ternary doesn't really work
as clang complains about incompatible types on LHS and RHS. Some
GCC versions happen to accept the code but clang behaviour is
correct here.

llvm-svn: 290822
2017-01-02 18:41:34 +00:00
Daniel Berlin 25f05b0ab7 NewGVN: Fix some formatting and comment issues
llvm-svn: 290820
2017-01-02 18:22:38 +00:00
Daniel Berlin 02c6b176e7 NewGVN: Add UnknownExpression and create them for things we can't symbolize. Kill fragile machinery for handling null expressions.
Summary:
This avoids the very fragile code for null expressions. We could also use a denseset that tracks which things have null expressions instead, but that seems pretty fragile and premature optimization.

This resolves a number of infinite loop cases, test reductions coming.

Reviewers: davide

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D28193

llvm-svn: 290816
2017-01-02 18:00:53 +00:00
Daniel Berlin 589cecc6e9 NewGVN: Fix PR31480, PR31483, PR31499, by rewriting how memory congruence handling works.
Summary: Previously, we tried to fix up the equivalences during symbolic evaluation.  This does not work. Now, we change the equivalences during congruence finding, where it belongs.  We also initialize the equivalence table to give a maximal answer.

Reviewers: davide

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D28192

llvm-svn: 290815
2017-01-02 18:00:46 +00:00
Davide Italiano b672537cbf [PMBuilder] Remove RunFloat2Int cl::opt.
The pass has been on by default for a long time without problems.

llvm-svn: 290814
2017-01-02 17:49:18 +00:00
Sanjay Patel aea60846c4 [Inliner] remove unnecessary null checks from AddAlignmentAssumptions(); NFCI
We bail out on the 1st line if the assumption cache is not set, so there's
no need to check it after that.

llvm-svn: 290787
2016-12-31 17:54:05 +00:00
Craig Topper d00db69227 [InstCombine][AVX-512] Teach InstCombine that llvm.x86.avx512.vcomi.sd and llvm.x86.avx512.vcomi.ss don't use the upper elements of their input.
This was already done for the SSE/SSE2 version of the intrinsics.

llvm-svn: 290776
2016-12-31 00:45:06 +00:00
Craig Topper 991636312b [InstCombine][AVX-512] When turning intrinsics with masking into native IR, don't emit a select if the mask is known to be all ones.
This saves InstCombine the burden of having to optimize the select later.

llvm-svn: 290774
2016-12-30 23:06:28 +00:00
Philip Reames fac031a178 Add a comment for a todo in LoopUnroll post cleanup
llvm-svn: 290769
2016-12-30 22:10:19 +00:00
Philip Reames a570a2303c [CVP] Adjust iteration order to reduce the amount of work required
CVP doesn't care about the order of blocks visited, but by using a pre-order traversal over the graph we can a) not visit unreachable blocks and b) optimize as we go so that analysis of later blocks produce slightly more precise results.

I noticed this via inspection and don't have a concrete example which points to the issue.  

llvm-svn: 290760
2016-12-30 18:00:55 +00:00
Davide Italiano 75e39f9790 [NewGVN] Remove unneeded newline from assertion message.
llvm-svn: 290755
2016-12-30 15:01:17 +00:00
David Majnemer 5ec5f278c9 [InstCombine] Address post-commit feedback
llvm-svn: 290741
2016-12-30 03:36:17 +00:00
Michael Kuperstein 76e06c8858 [LICM] When promoting scalars, allow inserting stores to thread-local allocas.
This is similar to the allocfn case - if an alloca is not captured, then it's
necessarily thread-local.

Differential Revision: https://reviews.llvm.org/D28170

llvm-svn: 290738
2016-12-30 01:03:17 +00:00
Dehao Chen cc76344ef5 Use continuous boosting factor for complete unroll.
Summary:
The current loop complete unroll algorithm checks if unrolling complete will reduce the runtime by a certain percentage. If yes, it will apply a fixed boosting factor to the threshold (by discounting cost). The problem for this approach is that the threshold abruptly. This patch makes the boosting factor a function of runtime reduction percentage, capped by a fixed threshold. In this way, the threshold changes continuously.

The patch also simplified the code by reducing one parameter in UP.

The patch only affects code-gen of two speccpu2006 benchmark:

445.gobmk binary size decreases 0.08%, no performance change.
464.h264ref binary size increases 0.24%, no performance change.

Reviewers: mzolotukhin, chandlerc

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D26989

llvm-svn: 290737
2016-12-30 00:50:28 +00:00
Michael Kuperstein 4a86a1921a [LICM] Remove unneeded tracking of whether changes were made. NFC.
"Changed" doesn't actually change within the loop, so there's
no reason to keep track of it - we always return false during
analysis and true after the transformation is made.

llvm-svn: 290735
2016-12-30 00:43:22 +00:00
Michael Kuperstein 62b98c3977 [LICM] Make logic in promoteLoopAccessesToScalars easier to follow. NFC.
llvm-svn: 290734
2016-12-30 00:39:00 +00:00
David Majnemer a1cfd7c5f8 [InstCombine] More thoroughly canonicalize the position of zexts
We correctly canonicalized (add (sext x), (sext y)) to (sext (add x, y))
where possible.  However, we didn't perform the same canonicalization
for zexts or for muls.

llvm-svn: 290733
2016-12-30 00:28:58 +00:00
Michael Kuperstein ff36baefe7 [LICM] Compute exit blocks for promotion eagerly. NFC.
This moves the exit block and insertion point computation to be eager,
instead of after seeing the first scalar we can promote.

The cost is relatively small (the computation happens anyway, see discussion
on D28147), and the code is easier to follow, and can bail out earlier
if there's a catchswitch present.

llvm-svn: 290729
2016-12-29 23:11:19 +00:00
Michael Kuperstein 5566092963 [LICM] Don't try to promote in loops where we have no chance to promote. NFC.
We would check whether we have a prehader *or* dedicated exit blocks,
and go into the promotion loop. Then, for each alias set we'd check
if we have a preheader *and* dedicated exit blocks, and bail if not.

Instead, bail immediately if we don't have both.

llvm-svn: 290728
2016-12-29 22:51:22 +00:00
Michael Kuperstein b6da9cf3b7 [LICM] Only recompute LCSSA when we actually promoted something.
We want to recompute LCSSA only when we actually promoted a value.
This means we only need to look at changes made by promotion when
deciding whether to recompute it or not, not at regular sinking/hoisting.

(This was what the code was documented as doing, just not what it did)

Hopefully NFC.

llvm-svn: 290726
2016-12-29 22:37:13 +00:00
Daniel Berlin e0bd37e78f NewGVN: Fix PR 31491 by ensuring that we touch the right instructions. Change to one based numbering so we can assert we don't cause the same bug again.
llvm-svn: 290724
2016-12-29 22:15:12 +00:00
Craig Topper 17b5568bc7 [InstCombine] Use getVectorNumElements instead of explicitly casting to VectorType and calling getNumElements. NFC
llvm-svn: 290707
2016-12-29 07:03:18 +00:00
Craig Topper 62f06e241b [InstCombine] Fix typo in comment. NFC
llvm-svn: 290706
2016-12-29 05:38:31 +00:00
Craig Topper 2e18bcfc60 [InstCombine] Use a 32-bits instead of 64-bits for storing the number of elements in VectorType for a ShuffleVector. While there getVectorNumElements to avoid an explicit cast. NFC
llvm-svn: 290705
2016-12-29 04:24:32 +00:00
Craig Topper 1a8a3377cc [InstCombine][X86] If the lowest element of a scalar intrinsic isn't used make sure we add it to the worklist so we can DCE it sooner.
We bypassed the intrinsic and returned the passthru operand, but we should also add the intrinsic to the worklist since its now dead. This can allow DCE to find it sooner and remove it. Similar was done for InsertElement when the inserted element isn't demanded.

llvm-svn: 290704
2016-12-29 03:30:17 +00:00
Daniel Berlin 6658cc9ead NewGVN: Sort Dominator Tree in RPO order, and use that for generating order.
Summary:
The optimal iteration order for this problem is RPO order. We want to
process as many preds of a backedge as we can before we process the
backedge.

At the same time, as we add predicate handling, we want to be able to
touch instructions that are dominated by a given block by
ranges (because a change in value numbering a predicate possibly
affects all users we dominate that are using that predicate).
If we don't do it this way, we can't do value inference over
backedges (the paper covers this in depth).

The newgvn branch currently overshoots the last part, and guarantees
that it will touch *at least* the right set of instructions, but it
does touch more.  This is because the bitvector instruction ranges are
currently generated in RPO order (so we take the max and the min of
the ranges of dominated blocks, which means there are some in the
middle we didn't have to touch that we did).

We can do better by sorting the dominator tree, and then just using
dominator tree order.

As a preliminary, the dominator tree has some RPO guarantees, but not
enough. It guarantees that for a given node, your idom must come
before you in the RPO ordering. It guarantees no relative RPO ordering
for siblings.  We add siblings in whatever order they appear in the module.

So that is what we fix.

We sort the children array of the domtree into RPO order, and then use
the dominator tree for ordering, instead of RPO, since the dominator
tree is now a valid RPO ordering.

Note: This would help any other pass that iterates a forward problem
in dominator tree order.  Most of them are single pass.  It will still
maximize whatever result they compute.  We could also build the
dominator tree in this order, but our incremental updates would still
put it out of sort order, and recomputing the sort order is almost as
hard as general incremental updates of the domtree.

Also note that the sorting does not affect any tests, etc. Nothing
depends on domtree order, including the verifier, the equals
functions for domtree nodes, etc.

How much could this matter, you ask?
Here are the current numbers.
This is generated by running NewGVN over all files in LLVM.

Note that once we propagate equalities, the differences go up by an
order of magnitude or two (IE instead of 29, the max ends up in the
thousands, since the worst case we add a factor of N, where N is the
number of branch predicates).  So while it doesn't look that stark for
the default ordering, it gets *much much* worse.  There are also
programs in the wild where the difference is already pretty stark
(2 iterations vs hundreds).

RPO ordering:
759040 Number of iterations is 1
112908 Number of iterations is 2

Default dominator tree ordering:
755081 Number of iterations is 1
116234 Number of iterations is 2
   603 Number of iterations is 3
    27 Number of iterations is 4
     2 Number of iterations is 5
     1 Number of iterations is 7

Dominator tree sorted:
759040 Number of iterations is 1
112908 Number of iterations is 2
<yay!>

Really bad ordering (sort domtree siblings in postorder. not quite the
worst possible, but yeah):
754008 Number of iterations is 1
    21 Number of iterations is 10
     8 Number of iterations is 11
     6 Number of iterations is 12
     5 Number of iterations is 13
     2 Number of iterations is 14
     2 Number of iterations is 15
     3 Number of iterations is 16
     1 Number of iterations is 17
     2 Number of iterations is 18
 96642 Number of iterations is 2
     1 Number of iterations is 20
     2 Number of iterations is 21
     1 Number of iterations is 22
     1 Number of iterations is 29
 17266 Number of iterations is 3
  2598 Number of iterations is 4
   798 Number of iterations is 5
   273 Number of iterations is 6
   186 Number of iterations is 7
    80 Number of iterations is 8
    42 Number of iterations is 9

Reviewers: chandlerc, davide

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D28129

llvm-svn: 290699
2016-12-29 01:12:36 +00:00
Daniel Berlin 7ad1ea0984 Update equalsStoreHelper for the fact that only one branch can be true
llvm-svn: 290697
2016-12-29 00:49:32 +00:00
Piotr Padlewski 6c37d298d9 Revert "[NewGVN] replace emplace_back with push_back"
llvm-svn: 290692
2016-12-28 23:24:02 +00:00
Piotr Padlewski 629a7f2cc0 [NewGVN] replace emplace_back with push_back
emplace_back is not faster if it is equivalent to push_back. In this cases emplaced value had the
same type that the one stored in container. It is ugly and it might be even slower (see
Scott Meyers presentation about emplacement).

llvm-svn: 290685
2016-12-28 20:36:08 +00:00
Piotr Padlewski 26dada79ff [NewGVN] Simplyfy loop NFC
llvm-svn: 290683
2016-12-28 19:42:49 +00:00
Piotr Padlewski e4047b89ad [NewGVN] replace typedefs with usings
llvm-svn: 290680
2016-12-28 19:29:26 +00:00
Piotr Padlewski fc5727b2a2 [NewGVN] NFC fixes
llvm-svn: 290679
2016-12-28 19:17:17 +00:00
Davide Italiano 0e71480523 [NewGVN] Global sweep replacing NULL with nullptr. NFCI.
llvm-svn: 290670
2016-12-28 14:00:11 +00:00
Davide Italiano 0fb3c7cde5 [NewGVN] Remove redundant code. NFCI.
llvm-svn: 290669
2016-12-28 13:54:16 +00:00
Davide Italiano b111409015 [NewGVN] equals() for loads/stores is the same. Unify.
Differential Revision:  https://reviews.llvm.org/D28116

llvm-svn: 290667
2016-12-28 13:37:17 +00:00
Chandler Carruth 9900d18bab [PM] Teach the inliner's call graph update to handle inserting new edges
when they are call edges at the leaf but may (transitively) be reached
via ref edges.

It turns out there is a simple rule: insert everything as a ref edge
which is a safe conservative default. Then we let the existing update
logic handle promoting some of those to call edges.

Note that it would be fairly cheap to make these call edges right away
if that is desirable by testing whether there is some existing call path
from the source to the target. It just seemed like slightly more
complexity in this code path that isn't strictly necessary. If anyone
feels strongly about handling this differently I'm happy to change it.

llvm-svn: 290649
2016-12-28 03:13:12 +00:00
Craig Topper 28ec3460e4 [InstCombine] Remove a piece of a comment that said that InstCombiner contains pass infrastructure. That hasn't been true since r226618. NFC
llvm-svn: 290648
2016-12-28 03:12:42 +00:00
Michael Kuperstein cd7ad7130f [InstCombine] Canonicalize insert splat sequences into an insert + shuffle
This adds a combine that canonicalizes a chain of inserts which broadcasts
a value into a single insert + a splat shufflevector.

This fixes PR31286.

Differential Revision: https://reviews.llvm.org/D27992

llvm-svn: 290641
2016-12-28 00:18:08 +00:00
Kostya Serebryany f24e52c0c2 [sanitizer-coverage] sort the switch cases
llvm-svn: 290628
2016-12-27 21:20:06 +00:00
Davide Italiano b222549dc5 [NewGVN] Simplify a bit removing else after return. NFCI.
llvm-svn: 290615
2016-12-27 18:15:39 +00:00
Bryant Wong 7cb744621b [MemCpyOpt] Don't sink LoadInst below possible clobber.
Differential Revision: https://reviews.llvm.org/D26811

llvm-svn: 290611
2016-12-27 17:58:12 +00:00
Daniel Berlin 1f31fe529e Change a std::vector to SmallVector in NewGVN
llvm-svn: 290596
2016-12-27 09:20:36 +00:00
Chandler Carruth 141bf5d14d [PM] Add one of the features left out of the initial inliner patch:
skipping indirectly recursive inline chains.

To do this, we implicitly build an inline stack for each callsite and
check prior to inlining that doing so would not form a cycle. This uses
the exact same technique and even shares some code with the legacy PM
inliner.

This solution remains deeply unsatisfying to me because it means we
cannot actually iterate the inliner externally. Doing so would not be
able to easily detect and avoid such cycles. Some day I would very much
like to have a solution that works without this internal state to detect
cycles, but this is not that day.

llvm-svn: 290590
2016-12-27 06:46:20 +00:00
Craig Topper 72f2d4e8d6 [InstCombine][X86] Add DemandedElts support for 512-bit PMULDQ/PMULUDQ instructions
PMULDQ/PMULUDQ vXi64 instructions only use the even numbered v2Xi32 input elements which SimplifyDemandedVectorElts should try and use.

This builds on r290554 which added supported for 128 and 256-bit.

llvm-svn: 290582
2016-12-27 05:30:09 +00:00
Chandler Carruth 03130d981c [PM] Teach the inliner in the new PM to merge attributes after inlining.
Also enable the new PM in the attributes test case which caught this
issue.

llvm-svn: 290572
2016-12-27 03:39:54 +00:00
Craig Topper 7f8540b5e7 [AVX-512][InstCombine] Teach InstCombine to turn masked scalar add/sub/mul/div with rounding intrinsics into normal IR operations if the rounding mode is CUR_DIRECTION.
An earlier commit added support for unmasked scalar operations. At that time isel wouldn't generate an optimal sequence for masked operations, but that has now been fixed.

llvm-svn: 290566
2016-12-27 01:56:30 +00:00
Chandler Carruth 0ee8bb11c3 [PM] Move the collection of call sites to a more appropriate place
inside of `InlineFunction`. Prior to this, call instructions are
specifically being rewritten and replaced within the inlined region,
invalidating some of the call sites.

Several of these regions are using the same technique to walk the
inlined region so this seems clearly safe up to this point.

I've also added a short circuit to the scan for call sites based on what
other code is doing.

With this, the most common crash I've found in the new inliner code is
fixed. I've turned it on for another test case that covers this
scenario.

I'll make my way through most of the other inliner test cases
just to get some easy coverage next.

llvm-svn: 290562
2016-12-27 01:24:50 +00:00
Craig Topper 020b228155 [AVX-512][InstCombine] Teach InstCombine to turn packed add/sub/mul/div with rounding intrinsics into normal IR operations if the rounding mode is CUR_DIRECTION.
llvm-svn: 290559
2016-12-27 00:23:16 +00:00
Chandler Carruth 6e9bb7e064 [PM] Teach the always inliner in the new pass manager to support
removing fully-dead comdats without removing dead entries in comdats
with live members.

This factors the core logic out of the current inliner's internals to
a reusable utility and leverages that in both places. The factored out
code should also be (minorly) more efficient in cases where we have very
few dead functions or dead comdats to consider.

I've added a test case to cover this behavior of the always inliner.
This is the last significant bug in the new PM's always inliner I've
found (so far).

llvm-svn: 290557
2016-12-26 23:43:27 +00:00
Simon Pilgrim c9cf7fc7a4 [InstCombine][X86] Add DemandedElts support for PMULDQ/PMULUDQ instructions
PMULDQ/PMULUDQ vXi64 instructions only use the even numbered v2Xi32 input elements which SimplifyDemandedVectorElts should try and use.

Differential Revision: https://reviews.llvm.org/D28119

llvm-svn: 290554
2016-12-26 23:28:17 +00:00
Daniel Berlin 85f91b0ec3 clang-format NewGVN files
llvm-svn: 290551
2016-12-26 20:06:58 +00:00
Daniel Berlin 85cbc8c097 Misc cleanups and simplifications for NewGVN.
Mostly use a bit more idiomatic C++ where we can,
so we can combine some things later.

Reviewers: davide

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D28111

llvm-svn: 290550
2016-12-26 19:57:25 +00:00
Daniel Berlin d59e8010c5 Don't use our own incorrect version of isTriviallyDeadInstruction in NewGVN. Fixes PR/31472
llvm-svn: 290549
2016-12-26 18:44:36 +00:00
Davide Italiano fe7a3ee51e [NewGVN] Add a flag to enable the pass via `-mllvm`.
NewGVN can be tested passing `-mllvm -enable-newgvn` to clang.

Differential Revision:  https://reviews.llvm.org/D28059

llvm-svn: 290548
2016-12-26 18:26:19 +00:00
Davide Italiano a312ca845c [NewGVN] Fold lookupOperandLeader() when there's only one use. NFCI.
llvm-svn: 290543
2016-12-26 16:19:34 +00:00
Craig Topper 7b788ada2d [AVX-512][InstCombine] Teach InstCombine to turn scalar add/sub/mul/div with rounding intrinsics into normal IR operations if the rounding mode is CUR_DIRECTION.
Summary:
I only do this for unmasked cases for now because isel is failing to fold the mask. I'll try to fix that soon.

I'll do the same thing for packed add/sub/mul/div in a future patch.

Reviewers: delena, RKSimon, zvi, craig.topper

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D27879

llvm-svn: 290535
2016-12-26 06:33:19 +00:00
Craig Topper e328045711 [AVX-512][InstCombine] Teach InstCombine to converted masked vpermv intrinsics into shufflevector instructions
Summary:
This patch adds support for converting the masked vpermv intrinsics into shufflevector instructions if the indices are constants.

We also need to wrap a select instruction around the shuffle to take care of the masking part. InstCombine will take care of optimizing the select if the mask is constant so I didn't bother checking for that.

Reviewers: zvi, delena, spatel, RKSimon

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D27825

llvm-svn: 290530
2016-12-25 23:58:57 +00:00
Bryant Wong 4213d94142 [MemorySSA] Define a restricted upward AccessList splice.
Differential Revision: https://reviews.llvm.org/D26661

llvm-svn: 290527
2016-12-25 23:34:07 +00:00
Daniel Berlin d7c12ee54c Value number stores and memory states so we can detect when memory states are equivalent (IE store of same value to memory).
Reviewers: davide

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D28084

llvm-svn: 290525
2016-12-25 22:23:49 +00:00
Daniel Berlin 65f5f0d728 Rename GVNExpression *ops_ members to *op_* to match conventions in the rest of LLVM
llvm-svn: 290524
2016-12-25 22:10:37 +00:00
Davide Italiano 463c32eaf6 [NewGVN] Prefer `auto` to explicit type when the latter is obvious.
llvm-svn: 290499
2016-12-24 17:17:21 +00:00
Daniel Berlin 8a6a86146c Mark isOnlyReachableViaThisEdge as const
llvm-svn: 290468
2016-12-24 00:04:07 +00:00
Chandler Carruth 4eaff12ba2 [PM] Teach the always inlining test case to be much more strict about
whether functions are removed, and fix the new PM's always inliner to
actually pass this test.

Without this, the new PM's always inliner leaves all the functions
kicking around which won't work out very well given the semantics of
always inline.

Doing this really highlights how frustrating the current alwaysinline
semantic contract is though -- why can we put it on *external*
functions, etc?

Also I've added a number of tricky and interesting test cases for
removing functions with the always inliner. There is one remaining case
not handled -- fully removing comdats -- and I've left a FIXME about
this.

llvm-svn: 290457
2016-12-23 23:33:35 +00:00
Mehdi Amini 94f86ad4e0 Function-import: Disable IRVerifier on lazy-loaded modules: the ODR TypeUniquing generates invalid debug info.
llvm-svn: 290442
2016-12-23 19:19:44 +00:00
Mehdi Amini fc06b83ee7 Fix build after r290437 (missing include)
llvm-svn: 290438
2016-12-23 18:04:51 +00:00
Mehdi Amini 9a9077fdad FunctionImport: fix typo '#ifndef NDEBUG' instead of '#ifndef DEBUG'
llvm-svn: 290437
2016-12-23 17:59:24 +00:00
Davide Italiano b9ff23a402 [LICM] Plug a leak freeing the ASTs before clearing the map.
llvm-svn: 290433
2016-12-23 15:02:35 +00:00
Davide Italiano 34f94384a5 [LICM] Work around LICM needs to maintain state across loops.
The pass creates some state which expects to be cleaned up by
a later instance of the same pass. opt-bisect happens to expose
this not ideal design because calling skipLoop() will result in
this state not being cleaned up at times and an assertion firing
in `doFinalization()`. Chandler tells me the new pass manager will
give us options to avoid these design traps, but until it's not ready,
we need a workaround for the current pass infrastructure. Fix provided
by Andy Kaylor, see the review for a complete discussion.

Differential Revision:  https://reviews.llvm.org/D25848

llvm-svn: 290427
2016-12-23 13:12:50 +00:00
Davide Italiano 0ff941620c [NewGVN] Remove (for now) unused code. NFCI.
llvm-svn: 290420
2016-12-23 10:28:30 +00:00
Mehdi Amini 96cdc49305 [ThinLTO] Verify lazy-loaded source module for function importing when assertions are enabled (NFC)
llvm-svn: 290416
2016-12-23 05:16:19 +00:00
Chandler Carruth ee08676102 Enable '-Wstring-conversion' and fix some bad asserts that it helped
find.

Notable is the assert in NewGVN which had no effect because of the bug.

llvm-svn: 290400
2016-12-23 01:38:06 +00:00
Evgeniy Stepanov 27d4c9b71b [cfi] Emit jump tables as a function-level inline asm.
Use a dummy private function with inline asm calls instead of module
level asm blocks for CFI jumptables.

The main advantage is that now jumptable codegen can be affected by
the function attributes (like target_cpu on ARM). Module level asm
gets the default subtarget based on the target triple, which is often
not good enough.

This change also uses asm constraints/arguments to reference
jumptable targets and aliases directly. We no longer do asm name
mangling in an IR pass.

Differential Revision: https://reviews.llvm.org/D28012

llvm-svn: 290384
2016-12-22 22:22:35 +00:00
Davide Italiano 7e274e02ae [GVN] Initial check-in of a new global value numbering algorithm.
The code have been developed by Daniel Berlin over the years, and
the new implementation goal is that of addressing shortcomings of
the current GVN infrastructure, i.e. long compile time for large
testcases, lack of phi predication, no load/store value numbering
etc...

The current code just implements the "core" GVN algorithm, although
other pieces (load coercion, phi handling, predicate system) are
already implemented in a branch out of tree. Once the core is stable,
we'll start adding pieces on top of the base framework.
The test currently living in test/Transform/NewGVN are a copy
of the ones in GVN, with proper `XFAIL` (missing features in NewGVN).
A flag will be added in a future commit to enable NewGVN, so that
interested parties can exercise this code easily.

Differential Revision:  https://reviews.llvm.org/D26224

llvm-svn: 290346
2016-12-22 16:03:48 +00:00
Chandler Carruth e3f5064b72 [PM] Introduce a reasonable port of the main per-module pass pipeline
from the old pass manager in the new one.

I'm not trying to support (initially) the numerous options that are
currently available to customize the pass pipeline. If we end up really
wanting them, we can add them later, but I suspect many are no longer
interesting. The simplicity of omitting them will help a lot as we sort
out what the pipeline should look like in the new PM.

I've also documented to the best of my ability *why* each pass or group
of passes is used so that reading the pipeline is more helpful. In many
cases I think we have some questionable choices of ordering and I've
left FIXME comments in place so we know what to come back and revisit
going forward. But for now, I've left it as similar to the current
pipeline as I could.

Lastly, I've had to comment out several places where passes are not
ported to the new pass manager or where the loop pass infrastructure is
not yet ready. I did at least fix a few bugs in the loop pass
infrastructure uncovered by running the full pipeline, but I didn't want
to go too far in this patch -- I'll come back and re-enable these as the
infrastructure comes online. But I'd like to keep the comments in place
because I don't want to lose track of which passes need to be enabled
and where they go.

One thing that seemed like a significant API improvement was to require
that we don't build pipelines for O0. It seems to have no real benefit.

I've also switched back to returning pass managers by value as at this
API layer it feels much more natural to me for composition. But if
others disagree, I'm happy to go back to an output parameter.

I'm not 100% happy with the testing strategy currently, but it seems at
least OK. I may come back and try to refactor or otherwise improve this
in subsequent patches but I wanted to at least get a good starting point
in place.

Differential Revision: https://reviews.llvm.org/D28042

llvm-svn: 290325
2016-12-22 06:59:15 +00:00
Adrian Prantl 49797ca6be Refactor the DIExpression fragment query interface (NFC)
... so it becomes available to DIExpressionCursor.

llvm-svn: 290322
2016-12-22 05:27:12 +00:00
Easwaran Raman 180bd9f6b3 Pass GetAssumptionCache to InlineFunctionInfo constructor
Differential revision: https://reviews.llvm.org/D28038

llvm-svn: 290295
2016-12-22 01:07:01 +00:00
David Majnemer b0761a0c1b Revert "[InstCombine] New opportunities for FoldAndOfICmp and FoldXorOfICmp"
This reverts commit r289813, it caused PR31449.

llvm-svn: 290266
2016-12-21 19:21:59 +00:00
Adam Nemet 32e6a34c02 [LDist] Match behavior between invoking via optimization pipeline or opt -loop-distribute
In r267672, where the loop distribution pragma was introduced, I tried
it hard to keep the old behavior for opt: when opt is invoked
with -loop-distribute, it should distribute the loop (it's off by
default when ran via the optimization pipeline).

As MichaelZ has discovered this has the unintended consequence of
breaking a very common developer work-flow to reproduce compilations
using opt: First you print the pass pipeline of clang
with -debug-pass=Arguments and then invoking opt with the returned
arguments.

clang -debug-pass will include -loop-distribute but the pass is invoked
with default=off so nothing happens unless the loop carries the pragma.
While through opt (default=on) we will try to distribute all loops.

This changes opt's default to off as well to match clang.  The tests are
modified to explicitly enable the transformation.

llvm-svn: 290235
2016-12-21 04:07:40 +00:00
Peter Collingbourne 598bd2a262 IPO: Remove the ModuleSummary argument to the FunctionImport pass. NFCI.
No existing client is passing a non-null value here. This will come back
in a slightly different form as part of the type identifier summary work.

Differential Revision: https://reviews.llvm.org/D28006

llvm-svn: 290222
2016-12-21 00:50:12 +00:00
George Burgess IV 3f08914e7e [Analysis] Centralize objectsize lowering logic.
We're currently doing nearly the same thing for @llvm.objectsize in
three different places: two of them are missing checks for overflow,
and one of them could subtly break if InstCombine gets much smarter
about removing alloc sites. Seems like a good idea to not do that.

llvm-svn: 290214
2016-12-20 23:46:36 +00:00
Haicheng Wu b29dd0107c [LoopUnroll] Modify a comment to clarify the usage of TripCount. NFC.
Make it clear that TripCount is the upper bound of the iteration on which
control exits LatchBlock.

Differential Revision: https://reviews.llvm.org/D26675

llvm-svn: 290199
2016-12-20 20:23:48 +00:00
Chandler Carruth 1d96311447 [PM] Provide an initial, minimal port of the inliner to the new pass manager.
This doesn't implement *every* feature of the existing inliner, but
tries to implement the most important ones for building a functional
optimization pipeline and beginning to sort out bugs, regressions, and
other problems.

Notable, but intentional omissions:
- No alloca merging support. Why? Because it isn't clear we want to do
  this at all. Active discussion and investigation is going on to remove
  it, so for simplicity I omitted it.
- No support for trying to iterate on "internally" devirtualized calls.
  Why? Because it adds what I suspect is inappropriate coupling for
  little or no benefit. We will have an outer iteration system that
  tracks devirtualization including that from function passes and
  iterates already. We should improve that rather than approximate it
  here.
- Optimization remarks. Why? Purely to make the patch smaller, no other
  reason at all.

The last one I'll probably work on almost immediately. But I wanted to
skip it in the initial patch to try to focus the change as much as
possible as there is already a lot of code moving around and both of
these *could* be skipped without really disrupting the core logic.

A summary of the different things happening here:

1) Adding the usual new PM class and rigging.

2) Fixing minor underlying assumptions in the inline cost analysis or
   inline logic that don't generally hold in the new PM world.

3) Adding the core pass logic which is in essence a loop over the calls
   in the nodes in the call graph. This is a bit duplicated from the old
   inliner, but only a handful of lines could realistically be shared.
   (I tried at first, and it really didn't help anything.) All told,
   this is only about 100 lines of code, and most of that is the
   mechanics of wiring up analyses from the new PM world.

4) Updating the LazyCallGraph (in the new PM) based on the *newly
   inlined* calls and references. This is very minimal because we cannot
   form cycles.

5) When inlining removes the last use of a function, eagerly nuking the
   body of the function so that any "one use remaining" inline cost
   heuristics are immediately refined, and queuing these functions to be
   completely deleted once inlining is complete and the call graph
   updated to reflect that they have become dead.

6) After all the inlining for a particular function, updating the
   LazyCallGraph and the CGSCC pass manager to reflect the
   function-local simplifications that are done immediately and
   internally by the inline utilties. These are the exact same
   fundamental set of CG updates done by arbitrary function passes.

7) Adding a bunch of test cases to specifically target CGSCC and other
   subtle aspects in the new PM world.

Many thanks to the careful review from Easwaran and Sanjoy and others!

Differential Revision: https://reviews.llvm.org/D24226

llvm-svn: 290161
2016-12-20 03:15:32 +00:00
Adrian Prantl bceaaa9643 [IR] Remove the DIExpression field from DIGlobalVariable.
This patch implements PR31013 by introducing a
DIGlobalVariableExpression that holds a pair of DIGlobalVariable and
DIExpression.

Currently, DIGlobalVariables holds a DIExpression. This is not the
best way to model this:

(1) The DIGlobalVariable should describe the source level variable,
    not how to get to its location.

(2) It makes it unsafe/hard to update the expressions when we call
    replaceExpression on the DIGLobalVariable.

(3) It makes it impossible to represent a global variable that is in
    more than one location (e.g., a variable with multiple
    DW_OP_LLVM_fragment-s).  We also moved away from attaching the
    DIExpression to DILocalVariable for the same reasons.

This reapplies r289902 with additional testcase upgrades and a change
to the Bitcode record for DIGlobalVariable, that makes upgrading the
old format unambiguous also for variables without DIExpressions.

<rdar://problem/29250149>
https://llvm.org/bugs/show_bug.cgi?id=31013
Differential Revision: https://reviews.llvm.org/D26769

llvm-svn: 290153
2016-12-20 02:09:43 +00:00
Michael Kuperstein fb7dd86fd6 [LV] Sink tripcount query to where it's actually used. NFC.
llvm-svn: 290142
2016-12-19 22:47:52 +00:00
Sanjay Patel 5a443ac000 [InstCombine] use commutative matcher for pattern with commutative operators
This is a case that was missed in:
https://reviews.llvm.org/rL290067
...and it would regress if we fix operand complexity (PR28296).

llvm-svn: 290127
2016-12-19 18:35:37 +00:00
Sanjay Patel dd46b52942 [InstCombine] add folds for icmp (umin|umax X, Y), X
This is a follow-up to:
https://reviews.llvm.org/rL289855 (https://reviews.llvm.org/D27531)
https://reviews.llvm.org/rL290111

llvm-svn: 290118
2016-12-19 17:32:37 +00:00
Florian Hahn 2e03213f90 [LoopVersioning] Require loop-simplify form for loop versioning.
Summary:
Requiring loop-simplify form for loop versioning ensures that the
runtime check block always dominates the exit block.
    
This patch closes #30958 (https://llvm.org/bugs/show_bug.cgi?id=30958).

Reviewers: silviu.baranga, hfinkel, anemet, ashutosh.nema

Subscribers: ashutosh.nema, mzolotukhin, efriedma, hfinkel, llvm-commits

Differential Revision: https://reviews.llvm.org/D27469

llvm-svn: 290116
2016-12-19 17:13:37 +00:00
Sanjay Patel 8296c6c96f [InstCombine] add folds for icmp (smax X, Y), X
This is a follow-up to:
https://reviews.llvm.org/rL289855 (D27531)

llvm-svn: 290111
2016-12-19 16:28:53 +00:00
Daniel Jasper aec2fa352f Revert @llvm.assume with operator bundles (r289755-r289757)
This creates non-linear behavior in the inliner (see more details in
r289755's commit thread).

llvm-svn: 290086
2016-12-19 08:22:17 +00:00
Sanjay Patel 2b9d4b4daf [InstCombine] use commutative matchers for patterns with commutative operators
Background/motivation - I was circling back around to:
https://llvm.org/bugs/show_bug.cgi?id=28296

I made a simple patch for that and noticed some regressions, so added test cases for
those with rL281055, and this is hopefully the minimal fix for just those cases.

But as you can see from the surrounding untouched folds, we are missing commuted patterns
all over the place, and of course there are no regression tests to cover any of those cases.

We could sprinkle "m_c_" dust all over this file and catch most of the missing folds, but 
then we still wouldn't have test coverage, and we'd still miss some fraction of commuted 
patterns because they require adjustments to the match order.

I'm aware of the concern about the potential compile-time performance impact of adding 
matches like this (currently being discussed on llvm-dev), but I don't think there's any
evidence yet to suggest that handling commutative pattern matching more thoroughly is not
a worthwhile goal of InstCombine.

Differential Revision: https://reviews.llvm.org/D24419

llvm-svn: 290067
2016-12-18 18:49:48 +00:00
Craig Topper e32b5fd7f9 [InstCombine] Simplify code slightly. NFC
llvm-svn: 290046
2016-12-17 18:10:04 +00:00
Evgeniy Stepanov 95294127d0 Revert "[GVNHoist] Move GVNHoist to function simplification part of pipeline."
This reverts r289696, which caused TSan perf regression.

See PR31382.

llvm-svn: 290030
2016-12-17 01:53:15 +00:00
Michael Kuperstein 3ca147ea3d Preserve loop metadata when folding branches to a common destination.
Differential Revision: https://reviews.llvm.org/D27830

llvm-svn: 289992
2016-12-16 21:23:59 +00:00
Adrian Prantl 73ec065604 Revert "[IR] Remove the DIExpression field from DIGlobalVariable."
This reverts commit 289920 (again).
I forgot to implement a Bitcode upgrade for the case where a DIGlobalVariable
has not DIExpression. Unfortunately it is not possible to safely upgrade
these variables without adding a flag to the bitcode record indicating which
version they are.
My plan of record is to roll the planned follow-up patch that adds a
unit: field to DIGlobalVariable into this patch before recomitting.
This way we only need one Bitcode upgrade for both changes (with a
version flag in the bitcode record to safely distinguish the record
formats).

Sorry for the churn!

llvm-svn: 289982
2016-12-16 19:39:01 +00:00
Matthew Simpson a4964f291a Reapply "[LV] Enable vectorization of loops with conditional stores by default"
This patch reapplies r289863. The original patch was reverted because it
exposed a bug causing the loop vectorizer to crash in the Python runtime on
PPC. The underlying issue was fixed with r289958.

llvm-svn: 289975
2016-12-16 19:12:02 +00:00
Matthew Simpson 099af810de [LV] Don't attempt to type-shrink scalarized instructions
After r288909, instructions feeding predicated instructions may be scalarized
if profitable. Since these instructions will remain scalar, we shouldn't
attempt to type-shrink them. We should only truncate vector types to their
minimal bit widths. This bug was exposed by enabling the vectorization of loops
containing conditional stores by default.

llvm-svn: 289958
2016-12-16 16:52:35 +00:00
Chandler Carruth 48b4e614d8 Revert r289863: [LV] Enable vectorization of loops with conditional
stores by default

This uncovers a crasher in the loop vectorizer on PPC when building the
Python runtime. I'll send the testcase to the review thread for the
original commit.

llvm-svn: 289934
2016-12-16 11:31:39 +00:00
Adrian Prantl 74a835cda0 [IR] Remove the DIExpression field from DIGlobalVariable.
This patch implements PR31013 by introducing a
DIGlobalVariableExpression that holds a pair of DIGlobalVariable and
DIExpression.

Currently, DIGlobalVariables holds a DIExpression. This is not the
best way to model this:

(1) The DIGlobalVariable should describe the source level variable,
    not how to get to its location.

(2) It makes it unsafe/hard to update the expressions when we call
    replaceExpression on the DIGLobalVariable.

(3) It makes it impossible to represent a global variable that is in
    more than one location (e.g., a variable with multiple
    DW_OP_LLVM_fragment-s).  We also moved away from attaching the
    DIExpression to DILocalVariable for the same reasons.

This reapplies r289902 with additional testcase upgrades.

<rdar://problem/29250149>
https://llvm.org/bugs/show_bug.cgi?id=31013
Differential Revision: https://reviews.llvm.org/D26769

llvm-svn: 289920
2016-12-16 04:25:54 +00:00
Teresa Johnson edddca22c9 [ThinLTO] Thin link efficiency: More efficient export list computation
Summary:
Instead of checking whether a global referenced by a function being
imported is defined in the same module, speculatively always add the
referenced globals to the module's export list. After all imports are
computed, for each module prune any not in its defined set from its
export list.

For a huge C++ app with aggressive importing thresholds, even with
D27687 we spent a lot of time invoking modulePath() from
exportGlobalInModule (modulePath() was still the 2nd hottest routine in
profile). The reason is that with comdat/linkonce the summary lists for
each GUID can be long. For the app in question, for example, we were
invoking exportGlobalInModule almost 2 million times, and we traversed
an average of 63 entries in the summary list each time.

This patch reduced the thin link time for the app by about 10% (on top
of D27687) when using aggressive importing thresholds, and about 3.5% on
average with default importing thresholds.

Reviewers: mehdi_amini

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D27755

llvm-svn: 289918
2016-12-16 04:11:51 +00:00
Davide Italiano f024a56cb8 [SimplifyLibCalls] Use a lambda. NFCI.
llvm-svn: 289911
2016-12-16 02:28:38 +00:00
Adrian Prantl 03c6d31a3b Revert "[IR] Remove the DIExpression field from DIGlobalVariable."
This reverts commit 289902 while investigating bot berakage.

llvm-svn: 289906
2016-12-16 01:00:30 +00:00
Peter Collingbourne 7a4be21d46 Add missing library dep.
llvm-svn: 289903
2016-12-16 00:43:00 +00:00
Adrian Prantl ce13935776 [IR] Remove the DIExpression field from DIGlobalVariable.
This patch implements PR31013 by introducing a
DIGlobalVariableExpression that holds a pair of DIGlobalVariable and
DIExpression.

Currently, DIGlobalVariables holds a DIExpression. This is not the
best way to model this:

(1) The DIGlobalVariable should describe the source level variable,
    not how to get to its location.

(2) It makes it unsafe/hard to update the expressions when we call
    replaceExpression on the DIGLobalVariable.

(3) It makes it impossible to represent a global variable that is in
    more than one location (e.g., a variable with multiple
    DW_OP_LLVM_fragment-s).  We also moved away from attaching the
    DIExpression to DILocalVariable for the same reasons.

<rdar://problem/29250149>
https://llvm.org/bugs/show_bug.cgi?id=31013
Differential Revision: https://reviews.llvm.org/D26769

llvm-svn: 289902
2016-12-16 00:36:43 +00:00
Peter Collingbourne 1398a32e28 IPO: Introduce ThinLTOBitcodeWriter pass.
This pass prepares a module containing type metadata for ThinLTO by splitting
it into regular and thin LTO parts if possible, and writing both parts to
a multi-module bitcode file. Modules that do not contain type metadata are
written unmodified as a single module.

All globals with type metadata are added to the regular LTO module, and
the rest are added to the thin LTO module.

Differential Revision: https://reviews.llvm.org/D27324

llvm-svn: 289899
2016-12-16 00:26:30 +00:00
Teresa Johnson 19f2aa7891 [ThinLTO] Thin link efficiency improvement: don't re-export globals (NFC)
Summary:
We were reinvoking exportGlobalInModule numerous times redundantly.
No need to re-export globals referenced by a global that was already
imported from its module. This resulted in a large speedup in the thin
link for a big application, particularly when importing aggressiveness
was cranked up.

Reviewers: mehdi_amini

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D27687

llvm-svn: 289896
2016-12-15 23:50:06 +00:00
Davide Italiano 85ad36b0e0 [SimplifyLibCalls] Lower fls() to llvm.ctlz().
Differential Revision:  https://reviews.llvm.org/D14590

llvm-svn: 289894
2016-12-15 23:45:11 +00:00
Davide Italiano 890e850348 [SimplifyLibCalls] Remove redundant folding logic for ffs().
Lowering to llvm.cttz() will result in constant folding anyway
if the argument to ffs is a constant. Pointed out by Eli for
fls() in D14590.

llvm-svn: 289888
2016-12-15 23:11:00 +00:00
Teresa Johnson eb0ac24172 [ThinLTO] Revert part of r289843 that belonged to another patch.
The code change for D27687 accidentally got committed along with the
main change in r289843. Revert it temporarily, so that I can recommit it
along with its test as intended.

llvm-svn: 289875
2016-12-15 21:39:42 +00:00
Teresa Johnson 0c3f57b133 [ThinLTO] Remove stale comment (NFC)
This should have been removed with r288446.

llvm-svn: 289871
2016-12-15 20:53:31 +00:00
Teresa Johnson 475b51a700 [ThinLTO] Thin link efficiency: skip candidate added later with higher threshold (NFC)
Summary:
Thin link efficiency improvement. After adding an importing candidate to
the worklist we might have later added it again with a higher threshold.
Skip it when popped from the worklist if we recorded a higher threshold
than the current worklist entry, it will get processed again at the
higher threshold when that entry is popped.

This required adding the summary's GUID to the worklist, so that it can
be used to query the recorded highest threshold for it when we pop from the
worklist.

Reviewers: mehdi_amini

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D27696

llvm-svn: 289867
2016-12-15 20:48:19 +00:00
Matthew Simpson 6a98bcfe33 [LV] Enable vectorization of loops with conditional stores by default
This patch sets the default value of the "-enable-cond-stores-vec" command line
option to "true".

Differential Revision: https://reviews.llvm.org/D27814

llvm-svn: 289863
2016-12-15 20:11:05 +00:00
Andrea Di Biagio f20c57eca9 [SimplifyCFG] Merge debug locations when hoisting an instruction from a then/else branch. NFC.
Now that a new API to merge debug locations has been committed at r289661 (see
review D26256 for more details), we can use it to "improve" the code added by
revision r280995.

Instead of nulling the debugloc of a commoned instruction, we use the 'merged'
debug location. At the moment, this is just a no functional change since
function `DILocation::getMergedLocation()` is just a stub and would always
return a null location.

Differential Revision: https://reviews.llvm.org/D27804

llvm-svn: 289862
2016-12-15 20:01:26 +00:00
Sanjay Patel d640641a61 [InstCombine] add folds for icmp (smin X, Y), X
Min/max canonicalization (r287585) exposes the fact that we're missing combines for min/max patterns. 
This patch won't solve the example that was attached to that thread, so something else still needs fixing.

The line between InstCombine and InstSimplify gets blurry here because sometimes the icmp instruction that
we want to fold to already exists, but sometimes it's the swapped form of what we want.

Corresponding changes for smax/umin/umax to follow.

Differential Revision: https://reviews.llvm.org/D27531

llvm-svn: 289855
2016-12-15 19:13:37 +00:00
Teresa Johnson 1b859a2306 [ThinLTO] Ensure callees get hot threshold when first seen on cold path
This is split out from D27696, since it turned out to be a bug fix and
not part of the NFC efficiency change.

Keep the same adjusted (possibly decayed) threshold in both the worklist
and the ImportList. Otherwise if we encountered it first along a cold
path, the callee would be added to the worklist with a lower decayed
threshold than when it is later encountered along a hot path. But the
logic uses the threshold recorded in the ImportList entry to check if
we should re-add it, and without this patch the threshold recorded there
is the same along both paths so we don't re-add it. Using the
same possibly decayed threshold in the ImportList ensures we re-add it
later with the higher non-decayed hot path threshold.

llvm-svn: 289843
2016-12-15 18:21:01 +00:00
Robert Lougher 6ea759a83e Revert "[SimplifyCFG] In sinkLastInstruction correctly set debugloc of common inst"
Reverting as it is causing buildbot failures (address sanitizer).

llvm-svn: 289833
2016-12-15 16:59:13 +00:00
Robert Lougher cf17674211 [SimplifyCFG] In sinkLastInstruction correctly set debugloc of "common" inst
Simplify CFG will try to sink the last instruction in a series of basic blocks,
creating a "common" instruction in the successor block (sinkLastInstruction).
When it does this, the debug location of the single instruction should be the
merged debug locations of the commoned instructions.

Differential Revision: https://reviews.llvm.org/D27590

llvm-svn: 289828
2016-12-15 16:17:53 +00:00
Ehsan Amiri 795b0671c5 [InstCombine] New opportunities for FoldAndOfICmp and FoldXorOfICmp
A number of new patterns for simplifying and/xor of icmp:

(icmp ne %x, 0) ^ (icmp ne %y, 0) => icmp ne %x, %y if the following is true:
1- (%x = and %a, %mask) and (%y = and %b, %mask)
2- %mask is a power of 2.

(icmp eq %x, 0) & (icmp ne %y, 0) => icmp ult %x, %y if the following is true:
1- (%x = and %a, %mask1) and (%y = and %b, %mask2)
2- Let %t be the smallest power of 2 where %mask1 & %t != 0. Then for any
   %s that is a power of 2 and %s & %mask2 != 0, we must have %s <= %t.
For example if %mask1 = 24 and %mask2 = 16, setting %s = 16 and %t = 8
violates condition (2) above. So this optimization cannot be applied.

llvm-svn: 289813
2016-12-15 12:25:13 +00:00
Craig Topper ab5f355d8c [AVX-512][InstCombine] Add masked scalar FMA intrinsics to SimplifyDemandedVectorElts.
llvm-svn: 289759
2016-12-15 03:49:45 +00:00
Hal Finkel 3ca4a6bcf1 Remove the AssumptionCache
After r289755, the AssumptionCache is no longer needed. Variables affected by
assumptions are now found by using the new operand-bundle-based scheme. This
new scheme is more computationally efficient, and also we need much less
code...

llvm-svn: 289756
2016-12-15 03:02:15 +00:00
Hal Finkel cb9f78e1c3 Make processing @llvm.assume more efficient by using operand bundles
There was an efficiency problem with how we processed @llvm.assume in
ValueTracking (and other places). The AssumptionCache tracked all of the
assumptions in a given function. In order to find assumptions relevant to
computing known bits, etc. we searched every assumption in the function. For
ValueTracking, that means that we did O(#assumes * #values) work in InstCombine
and other passes (with a constant factor that can be quite large because we'd
repeat this search at every level of recursion of the analysis).

Several of us discussed this situation at the last developers' meeting, and
this implements the discussed solution: Make the values that an assume might
affect operands of the assume itself. To avoid exposing this detail to
frontends and passes that need not worry about it, I've used the new
operand-bundle feature to add these extra call "operands" in a way that does
not affect the intrinsic's signature. I think this solution is relatively
clean. InstCombine adds these extra operands based on what ValueTracking, LVI,
etc. will need and then those passes need only search the users of the values
under consideration. This should fix the computational-complexity problem.

At this point, no passes depend on the AssumptionCache, and so I'll remove
that as a follow-up change.

Differential Revision: https://reviews.llvm.org/D27259

llvm-svn: 289755
2016-12-15 02:53:42 +00:00
Dehao Chen 40dd8c5109 Only sets profile summary when it was not preset.
Summary: SampleProfileLoader pass may be invoked twice by LTO. The 2nd pass should not append more summary info as it is already preset by the 1st pass.

Reviewers: eraman, davidxl

Subscribers: mehdi_amini, llvm-commits

Differential Revision: https://reviews.llvm.org/D27733

llvm-svn: 289725
2016-12-14 22:06:49 +00:00
Dehao Chen fb699619a0 Fix the bug in r289714 (NFC).
llvm-svn: 289724
2016-12-14 22:03:08 +00:00
Filipe Cabecinhas dd9688703c [asan] Don't skip instrumentation of masked load/store unless we've seen a full load/store on that pointer.
Reviewers: kcc, RKSimon

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D27625

llvm-svn: 289718
2016-12-14 21:57:04 +00:00
Filipe Cabecinhas 1e69017a6d [asan] Hook ClInstrumentWrites and ClInstrumentReads to masked operation instrumentation.
Reviewers: kcc

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D27548

llvm-svn: 289717
2016-12-14 21:56:59 +00:00
Dehao Chen a99e082e15 Create SampleProfileLoader pass in llvm instead of clang
Summary: We used to create SampleProfileLoader pass in clang. This makes LTO/ThinLTO unable to add this pass in the linker plugin. This patch moves the SampleProfileLoader pass creation from clang to llvm pass manager builder.

Reviewers: tejohnson, davidxl, dnovillo

Subscribers: llvm-commits, mehdi_amini

Differential Revision: https://reviews.llvm.org/D27743

llvm-svn: 289714
2016-12-14 21:40:47 +00:00
Robert Lougher cfd7198698 [InstCombine] Folding of a compare with RHS const should merge debug locations
If all the operands to a phi node are compares that have a RHS constant,
instcombine will try to pull them through the phi node, combining them into
a single operation. When it does this, the debug location of the new op
should be the merged debug locations of the phi node arguments.

Patch 8 of 8 for D26256.  Folding of a compare that has a RHS constant.

Differential Revision: https://reviews.llvm.org/D26256

llvm-svn: 289704
2016-12-14 20:27:22 +00:00
Robert Lougher c9f7354776 [InstCombine] Folding of a binop with RHS const should merge the debug locations
If all the operands to a phi node are a binop with a RHS constant, instcombine
will try to pull them through the phi node, combining them into a single
operation. When it does this, the debug location of the new op should be the
merged debug locations of the phi node arguments.

Patch 7 of 8 for D26256.  Folding of a binop with RHS constant.

Differential Revision: https://reviews.llvm.org/D26256

llvm-svn: 289699
2016-12-14 20:07:49 +00:00
Geoff Berry ca11a1e147 [GVNHoist] Move GVNHoist to function simplification part of pipeline.
Summary:
Move GVNHoist to later in the optimization pipeline, specifically, to
the function simplification part of the pipeline.  The new pipeline
location allows GVNHoist to run on a function after its callees have
been inlined but before the function has been considered for inlining
into its callers, exposing more opportunities for hoisting.

Performance results on AArch64 kryo:
Improvements:
  Benchmarks/CoyoteBench/fftbench  -24.952%
  spec2006/bzip2                    -4.071%
  internal bmark                    -3.177%
  Benchmarks/PAQ8p/paq8p            -1.754%
  spec2000/perlbmk                  -1.328%
  spec2006/h264ref                  -1.140%

Regressions:
  internal bmark                    +1.818%
  Benchmarks/mafft/pairlocalalign   +1.084%

Reviewers: sebpop, dberlin, hiraditya

Subscribers: aemerson, mehdi_amini, mcrosier, llvm-commits

Differential Revision: https://reviews.llvm.org/D27722

llvm-svn: 289696
2016-12-14 19:38:22 +00:00
Robert Lougher f02d9b8325 [InstCombine] When folding casts through a phi node merge the debug locations
If all the operands to a phi node are a cast, instcombine will try to pull
them through the phi node, combining them into a single cast. When it does
this, the debug location of the new cast should be the merged debug locations
of the phi node arguments.

Patch 6 of 8 for D26256.  Folding of a cast operation.

Differential Revision: https://reviews.llvm.org/D26256

llvm-svn: 289693
2016-12-14 19:24:01 +00:00
Robert Lougher 373e36a410 [InstCombine] Folding loads through a phi node should merge the debug locations
If all the operands to a phi node are a load, instcombine will try to pull
them through the phi node, combining them into a single load. When it does
this, the debug location of the new load should be the merged debug locations
of the phi node arguments.

Patch 5 of 8 for D26256.  Folding of a load operation.

Differential Revision: https://reviews.llvm.org/D26256

llvm-svn: 289688
2016-12-14 19:02:14 +00:00
Robert Lougher 8fc1e89bbb [InstCombine] When folding GEP through a phi node merge the debug locations
If all the operands to a phi node are getelementptr, instcombine
will try to pull them through the phi node, combining them into a single
operation.  When it does this, the debug location of the new getelementptr
should be the merged debug locations of the phi node arguments.

Patch 4 of 8 for D26256.  Folding of a getelementptr operation.

Differential Revision: https://reviews.llvm.org/D26256

llvm-svn: 289684
2016-12-14 18:37:50 +00:00
Robert Lougher 4b0790d488 [InstCombine] Merge debug locations when folding through a phi node
If all the operands to a phi node are of the same operation, instcombine
will try to pull them through the phi node, combining them into a single
operation.  When it does this, the debug location of the operation should
be the merged debug locations of the phi node arguments.

Patch 3 of 8 for D26256.  Folding of a compare operation.

Differential Revision: https://reviews.llvm.org/D26256

llvm-svn: 289681
2016-12-14 18:14:57 +00:00
Robert Lougher 2428a4050f [InstCombine] Merge debug locations when folding through a phi node
If all the operands to a phi node are of the same operation, instcombine
will try to pull them through the phi node, combining them into a single
operation.  When it does this, the debug location of the operation should
be the merged debug locations of the phi node arguments.

Patch 2 of 8 for D26256.  Folding of a binary operation.

Differential Revision: https://reviews.llvm.org/D26256

llvm-svn: 289679
2016-12-14 17:49:19 +00:00