Commit Graph

16280 Commits

Author SHA1 Message Date
Joseph Tremoulet e92e0a9042 Fix inliner funclet unwind memoization
Summary:
The inliner may need to determine where a given funclet unwinds to,
and this determination may depend on other funclets throughout the
funclet tree.  The code that performs this walk in getUnwindDestToken
memoizes results to avoid redundant computations.  In the case that
a funclet's unwind destination is derived from its ancestor, there's
code to walk back down the tree from the ancestor updating the memo
map of its descendants to record the unwind destination.  This change
fixes that code to account for the case that some descendant has a
different unwind destination, which can happen if that unwind dest
is a descendant of the EHPad being queried and thus didn't determine
its unwind destination.

Also update test inline-funclets.ll, which is supposed to cover such
scenarios, to include a case that fails an assertion without this fix
but passes with it.

Fixes PR29151.


Reviewers: majnemer

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D24117

llvm-svn: 280610
2016-09-04 01:23:20 +00:00
Xinliang David Li 7a28a7fbd8 Cleanup : Use metadata preserving API for branch creation
Use the wrapper API in IRBuilder that does meta data copy
to create new branch in LoopUnswitch.

llvm-svn: 280602
2016-09-03 22:26:11 +00:00
Matt Arsenault 46a0382ab2 AMDGPU: Do basic folding of class intrinsic
This allows more of the OCML builtin library to be
constant folded.

llvm-svn: 280586
2016-09-03 07:06:58 +00:00
Duncan P. N. Exon Smith 4e229a7c0a ADT: Do not inherit from std::iterator in ilist_iterator
Inheriting from std::iterator uses more boiler-plate than manual
typedefs.  Avoid that in both ilist_iterator and
MachineInstrBundleIterator.

This has the side effect of removing ilist_iterator from certain ADL
lookups in namespace std; calls to std::next need to be qualified by
"std::" that didn't have to before.  The one case of this in-tree was
operating on a temporary, so I used the more compact operator++.

llvm-svn: 280570
2016-09-03 02:27:35 +00:00
Xinliang David Li 40afd5c9e4 [Profile] handle select instruction in 'expect' lowering
Builtin expect lowering currently ignores select. This patch
fixes the issue

Differential Revision: http://reviews.llvm.org/D24166

llvm-svn: 280547
2016-09-02 22:03:40 +00:00
Chad Rosier c50dfe38ac [SLP] Don't pass a global CL option as an argument. NFC.
Differential Revision: https://reviews.llvm.org/D24199

llvm-svn: 280527
2016-09-02 19:09:50 +00:00
Sanjay Patel 521f19f249 [InsttCombine] fold insertelement of constant into shuffle with constant operand (PR29126)
The motivating case occurs with SSE/AVX scalar intrinsics, so this is a first step towards
shrinking that to a single shufflevector.

Note that the transform is intentionally limited to shuffles that are equivalent to vector
selects to avoid creating arbitrary shuffle masks that may not lower well.

This should solve PR29126:
https://llvm.org/bugs/show_bug.cgi?id=29126

Differential Revision: https://reviews.llvm.org/D23886

llvm-svn: 280504
2016-09-02 17:05:43 +00:00
Matthew Simpson b65c230eab [LV] Ensure reverse interleaved group GEPs remain uniform
For uniform instructions, we're only required to generate a scalar value for
the first vector lane of each unroll iteration. Thus, if we have a reverse
interleaved group, computing the member index off the scalar GEP corresponding
to the last vector lane of its pointer operand technically makes the GEP
non-uniform. We should compute the member index off the first scalar GEP
instead.

I've added the updated member index computation to the existing reverse
interleaved group test.

llvm-svn: 280497
2016-09-02 16:19:22 +00:00
James Molloy f3cf2a494b [SimplifyCFG] Add a workaround to fix PR30188
We're sinking stores, which is a good thing, but in the process creating selects for the store address operand, which SROA/Mem2Reg can't look through, which caused serious regressions.

The real fix is in SROA, which I'll be looking into.

llvm-svn: 280470
2016-09-02 07:29:00 +00:00
Dehao Chen 4b5e7f750c revert r280429 and r280425:
r280425 | dehao | 2016-09-01 16:15:50 -0700 (Thu, 01 Sep 2016) | 9 lines

Refactor LICM pass in preparation for LoopSink pass.

Summary: LoopSink pass uses some common function in LICM. This patch refactor the LICM code to make it usable by LoopSink pass (https://reviews.llvm.org/D22778).

r280429 | dehao | 2016-09-01 16:31:25 -0700 (Thu, 01 Sep 2016) | 9 lines

Refactor LICM to expose canSinkOrHoistInst to LoopSink pass.

Summary: LoopSink pass shares the same canSinkOrHoistInst functionality with LICM pass. This patch exposes this function in preparation of https://reviews.llvm.org/D22778
llvm-svn: 280453
2016-09-02 01:59:27 +00:00
Dehao Chen 820372c0ed revert r280432:
r280432 | dehao | 2016-09-01 16:51:37 -0700 (Thu, 01 Sep 2016) | 9 lines

Explicitly require DominatorTreeAnalysis pass for instsimplify pass.

Summary: DominatorTreeAnalysis is always required by instsimplify.
llvm-svn: 280452
2016-09-02 01:47:13 +00:00
Dehao Chen e573b77772 Explicitly require DominatorTreeAnalysis pass for instsimplify pass.
Summary: DominatorTreeAnalysis is always required by instsimplify.

Reviewers: davidxl, danielcdh

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D24173

llvm-svn: 280432
2016-09-01 23:51:37 +00:00
Dehao Chen e81d50b3b9 Refactor LICM to expose canSinkOrHoistInst to LoopSink pass.
Summary: LoopSink pass shares the same canSinkOrHoistInst functionality with LICM pass. This patch exposes this function in preparation of https://reviews.llvm.org/D22778

Reviewers: chandlerc, davidxl, danielcdh

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D24171

llvm-svn: 280429
2016-09-01 23:31:25 +00:00
Dehao Chen ddd0c125e3 Refactor replaceDominatedUsesWith to have a flag to control whether to replace uses in BB itself.
Summary: This is in preparation for LoopSink pass which calls replaceDominatedUsesWith to update after sinking.

Reviewers: chandlerc, davidxl, danielcdh

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D24170

llvm-svn: 280427
2016-09-01 23:26:48 +00:00
Dehao Chen bc4e5bba0e Refactor LICM pass in preparation for LoopSink pass.
Summary: LoopSink pass uses some common function in LICM. This patch refactor the LICM code to make it usable by LoopSink pass (https://reviews.llvm.org/D22778).

Reviewers: chandlerc, davidxl, danielcdh

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D24168

llvm-svn: 280425
2016-09-01 23:15:50 +00:00
Matthew Simpson 9e7851ed31 [LV] Use ScalarParts for ad-hoc pointer IV scalarization (NFCI)
We can now maintain scalar values in VectorLoopValueMap. Thus, we no longer
have to create temporary vectors with insertelement instructions when handling
pointer induction variables. This case was mistakenly missed from r279649 when
refactoring the other scalarization code.

llvm-svn: 280405
2016-09-01 19:40:19 +00:00
Matthew Simpson 922af076c7 [LV] Move VectorParts allocation and mapping into PHI widening (NFC)
This patch moves the allocation of VectorParts for PHI nodes into the actual
PHI widening code. Previously, we allocated these VectorParts in
vectorizeBlockInLoop, and passed them by reference to widenPHIInstruction. Upon
returning, we would then map the VectorParts in VectorLoopValueMap. This
behavior is problematic for the cases where we only want to generate a scalar
version of a PHI node. For example, if in the future we only generate a scalar
version of an induction variable, we would end up inserting an empty vector
entry into the map once we return to vectorizeBlockInLoop. We now no longer
need to pass VectorParts to the various PHI widening functions, and we can keep
VectorParts allocation as close as possible to the point at which they are
actually mapped in VectorLoopValueMap.

llvm-svn: 280390
2016-09-01 18:14:27 +00:00
Geoff Berry fcb186ca9d [EarlyCSE] Change C API pass interface for EarlyCSE w/ MemorySSA
Previous change broke the C API for creating an EarlyCSE pass w/
MemorySSA by adding a bool parameter to control whether MemorySSA was
used or not.  This broke the OCaml bindings.  Instead, change the old C
API entry point back and add a new one to request an EarlyCSE pass with
MemorySSA.

llvm-svn: 280379
2016-09-01 15:07:46 +00:00
Sanjay Patel dd861964d1 [InstCombine] remove fold of an icmp pattern that should never happen
While removing a scalar shackle from an icmp fold, I noticed that I couldn't find any tests to trigger
this code path.

The 'and' shrinking transform should be handled by InstCombiner::foldCastedBitwiseLogic()
or eliminated with InstSimplify. The icmp narrowing is part of InstCombiner::foldICmpWithCastAndCast().

Differential Revision: https://reviews.llvm.org/D24031 

llvm-svn: 280370
2016-09-01 14:20:43 +00:00
James Molloy 88cad7e5cf [SimplifyCFG] Handle tail-sinking of more than 2 incoming branches
This was a real restriction in the original version of SinkIfThenCodeToEnd. Now it's been rewritten, the restriction can be lifted.

As part of this, we handle a very common and useful case where one of the incoming branches is actually conditional. Consider:

   if (a)
     x(1);
   else if (b)
     x(2);

This produces the following CFG:

         [if]
        /    \
      [x(1)] [if]
        |     | \
        |     |  \
        |  [x(2)] |
         \    |  /
          [ end ]

[end] has two unconditional predecessor arcs and one conditional. The conditional refers to the implicit empty 'else' arc. This same pattern can also be caused by an empty default block in a switch.

We can't sink the call to x() down to end because no call to x() happens on the third incoming arc (assume that x() has sideeffects for the sake of argument; if something is safe to speculate we could indeed sink nevertheless but this cannot happen in the general case and causes many extra selects).

We are now able to detect this case and split off the unconditional arcs to a common successor:

         [if]
        /    \
      [x(1)] [if]
        |     | \
        |     |  \
        |  [x(2)] |
         \   /    |
     [sink.split] |
           \     /
           [ end ]

Now we can sink the call to x() into %sink.split. This can cause significant code simplification in many testcases.

llvm-svn: 280364
2016-09-01 12:58:13 +00:00
James Molloy eec6df3193 [SimplifyCFG] Change the algorithm in SinkThenElseCodeToEnd
r279460 rewrote this function to be able to handle more than two incoming edges and took pains to ensure this didn't regress anything.

This time we change the logic for determining if an instruction should be sunk. Previously we used a single pass greedy algorithm - sink instructions until one requires more than one PHI node or we run out of instructions to sink.

This had the problem that sinking instructions that had non-identical but trivially the same operands needed extra logic so we sunk them aggressively. For example:

    %a = load i32* %b          %d = load i32* %b
    %c = gep i32* %a, i32 0    %e = gep i32* %d, i32 1

Sinking %c and %e would naively require two PHI merges as %a != %d. But the loads are obviously equivalent (and maybe can't be hoisted because there is no common predecessor).

This is why we implemented the fairly complex function areValuesTriviallySame(), to look through trivial differences like this. However it's just not clever enough.

Instead, throw areValuesTriviallySame away, use pointer equality to check equivalence of operands and switch to a two-stage algorithm.

In the "scan" stage, we look at every sinkable instruction in isolation from end of block to front. If it's sinkable, we keep track of all operands that required PHI merging.

In the "sink" stage, we iteratively sink the last non-terminator in the source blocks. But when calculating how many PHIs are actually required to be inserted (to work out if we should stop or not) we remove any values that have already been sunk from the set of PHI-merges required, which allows us to be more aggressive.

This turns an algorithm with potentially recursive lookahead (looking through GEPs, casts, loads and any other instruction potentially not CSE'd) to two linear scans.

llvm-svn: 280351
2016-09-01 10:44:35 +00:00
James Molloy 21744689b9 [SimplifyCFG] Fix nondeterministic iteration order
We iterate over the result from SafeToMergeTerminators, so make it a SmallSetVector instead of a SmallPtrSet.

Should fix stage3 convergence builds.

llvm-svn: 280342
2016-09-01 09:01:34 +00:00
James Molloy e656642295 [SimplifyCFG] Improve FoldValueComparisonIntoPredecessors to handle more cases
A very important case is not handled here: multiple arcs to a single block with a PHI. Consider:

    a:
      %1 = icmp %b, 1
      br %1, label %c, label %e
    c:
      %2 = icmp %b, 2
      br %2, label %d, label %e
    d:
      br %e
    e:
      phi [0, %a], [1, %c], [2, %d]

FoldValueComparisonIntoPredecessors will refuse to fold this, as it doesn't know how to deal with two arcs to a common destination with different PHI values. The answer is obvious - just split all conflicting arcs.

llvm-svn: 280338
2016-09-01 07:45:25 +00:00
Nick Lewycky 8dd4dad08b Add cast to appease windows builder. Fixes build break introduced in r280306.
llvm-svn: 280311
2016-08-31 23:24:43 +00:00
Nick Lewycky 97e49ac59e Add -fprofile-dir= to clang.
-fprofile-dir=path allows the user to specify where .gcda files should be
emitted when the program is run. In particular, this is the first flag that
causes the .gcno and .o files to have different paths, LLVM is extended to
support this. -fprofile-dir= does not change the file name in the .gcno (and
thus where lcov looks for the source) but it does change the name in the .gcda
(and thus where the runtime library writes the .gcda file). It's different from
a GCOV_PREFIX because a user can observe that the GCOV_PREFIX_STRIP will strip
paths off of -fprofile-dir= but not off of a supplied GCOV_PREFIX.

To implement this we split -coverage-file into -coverage-data-file and
-coverage-notes-file to specify the two different names. The !llvm.gcov
metadata node grows from a 2-element form {string coverage-file, node dbg.cu}
to 3-elements, {string coverage-notes-file, string coverage-data-file, node
dbg.cu}. In the 3-element form, the file name is already "mangled" with
.gcno/.gcda suffixes, while the 2-element form left that to the middle end
pass.

llvm-svn: 280306
2016-08-31 23:04:32 +00:00
Sanjay Patel 0d70831d73 [InstCombine] allow icmp (shr exact X, C2), C fold for splat constant vectors
The enhancement to foldICmpDivConstant ( http://llvm.org/viewvc/llvm-project?view=revision&revision=280299 )
allows us to remove the ConstantInt check; no other changes needed.

llvm-svn: 280300
2016-08-31 22:18:43 +00:00
Sanjay Patel 541aef4661 [InstCombine] allow icmp (div X, Y), C folds for splat constant vectors
Converting all of the overflow ops to APInt looked risky, so I've left that as a TODO.

llvm-svn: 280299
2016-08-31 21:57:21 +00:00
Sanjay Patel 85d79744df [InstCombine] change insertRangeTest() to use APInt instead of Constant; NFCI
This is prep work before changing the callers to also use APInt which will
allow folds for splat vectors. Currently, the callers have ConstantInt
guards in place, so no functional change intended with this commit.

llvm-svn: 280282
2016-08-31 19:49:56 +00:00
Michael Zolotukhin e0b2d97b52 [LoopInfo] Add verification by recomputation.
Summary:
Current implementation of LI verifier isn't ideal and fails to detect
some cases when LI is incorrect. For instance, it checks that all
recorded loops are in a correct form, but it has no way to check if
there are no more other (unrecorded in LI) loops in the function. This
patch adds a way to detect such bugs.

Reviewers: chandlerc, sanjoy, hfinkel

Subscribers: llvm-commits, silvas, mzolotukhin

Differential Revision: https://reviews.llvm.org/D23437

llvm-svn: 280280
2016-08-31 19:26:19 +00:00
Geoff Berry 8d84605f25 [EarlyCSE] Optionally use MemorySSA. NFC.
Summary:
Use MemorySSA, if requested, to do less conservative memory dependency
checking.

This change doesn't enable the MemorySSA enhanced EarlyCSE in the
default pipelines, so should be NFC.

Reviewers: dberlin, sanjoy, reames, majnemer

Subscribers: mcrosier, llvm-commits

Differential Revision: http://reviews.llvm.org/D19821

llvm-svn: 280279
2016-08-31 19:24:10 +00:00
Geoff Berry 64f5ed172a [EarlyCSE] Allow forwarding a non-invariant load into an invariant load.
Reviewers: sanjoy

Subscribers: mcrosier, llvm-commits

Differential Revision: https://reviews.llvm.org/D23935

llvm-svn: 280265
2016-08-31 17:45:31 +00:00
Chad Rosier 0de580aaab [SLP] Update the debug based on Michael's suggestion.
Passing the types/opcode check still doesn't guarantee we'll actually vectorize.
Therefore, just make it clear we're attempting to vectorize.

llvm-svn: 280263
2016-08-31 17:41:12 +00:00
Chad Rosier 54807a9b9d [SLP] Sink debug after checking for matching types/opcode.
Differential Revision: https://reviews.llvm.org/D24090

llvm-svn: 280260
2016-08-31 17:31:09 +00:00
Tim Shen 48f814e8a3 s/static inline/static/ for headers I have changed in r279475. NFC.
llvm-svn: 280257
2016-08-31 16:48:13 +00:00
Philip Reames 2b1084ac93 [statepoints][experimental] Add support for live-in semantics of values in deopt bundles
This is a first step towards supporting deopt value lowering and reporting entirely with the register allocator. I hope to build on this in the near future to support live-on-return semantics, but I have a use case which allows me to test and investigate code quality with just the live-in semantics so I've chosen to start there. For those curious, my use cases is our implementation of the "__llvm_deoptimize" function we bind to @llvm.deoptimize. I'm choosing not to hard code that fact in the patch and instead make it configurable via function attributes.

The basic approach here is modelled on what is done for the "Live In" values on stackmaps and patchpoints. (A secondary goal here is to remove one of the last barriers to merging the pseudo instructions.) We start by adding the operands directly to the STATEPOINT SDNode. Once we've lowered to MI, we extend the remat logic used by the register allocator to fold virtual register uses into StackMap::Indirect entries as needed. This does rely on the fact that the register allocator rematerializes. If it didn't along some code path, we could end up with more vregs than physical registers and fail to allocate.

Today, we *only* fold in the register allocator. This can create some weird effects when combined with arguments passed on the stack because we don't fold them appropriately. I have an idea how to fix that, but it needs this patch in place to work on that effectively. (There's some weird interaction with the scheduler as well, more investigation needed.)

My near term plan is to land this patch off-by-default, experiment in my local tree to identify any correctness issues and then start fixing codegen problems one by one as I find them. Once I have the live-in lowering fully working (both correctness and code quality), I'm hoping to move on to the live-on-return semantics. Note: I don't have any *known* miscompiles with this patch enabled, but I'm pretty sure I'll find at least a couple. Thus, the "experimental" tag and the fact it's off by default.

Differential Revision: https://reviews.llvm.org/D24000

llvm-svn: 280250
2016-08-31 15:12:17 +00:00
Chad Rosier 669130ffdf [SLP] Arguments should be camel case, and start with an upper case letter. NFC.
llvm-svn: 280248
2016-08-31 15:06:58 +00:00
James Molloy 3c1137c639 Revert "[SimplifyCFG] Improve FoldValueComparisonIntoPredecessors to handle more cases"
This reverts commit r280218. This *also* causes buildbot errors. Sigh. Not a successful day all around!

llvm-svn: 280239
2016-08-31 13:32:28 +00:00
James Molloy cacfc16109 Revert "[SimplifyCFG] Change the algorithm in SinkThenElseCodeToEnd"
This reverts commit r280216 - it caused buildbot failures.

llvm-svn: 280234
2016-08-31 13:16:52 +00:00
James Molloy 76c9d423a7 Revert "[SimplifyCFG] Handle tail-sinking of more than 2 incoming branches"
This reverts commit r280217. r280216 caused buildbot failures - backing out the entire chain.

llvm-svn: 280233
2016-08-31 13:16:45 +00:00
James Molloy 06a45483a1 Revert "[SimplifyCFG] Add a workaround to fix PR30188"
This reverts commit r280219. r280216 caused buildbot failures - backing out the entire chain.

llvm-svn: 280232
2016-08-31 13:16:36 +00:00
James Molloy 8a66a39cbf Revert "[SimplifyCFG] Fix bootstrap failure after r280220"
This reverts commit r280228. r280216 caused buildbot failures - backing out the entire sequence.

llvm-svn: 280231
2016-08-31 13:16:30 +00:00
James Molloy b7efa6c227 [SimplifyCFG] Fix bootstrap failure after r280220
We check that a sinking candidate is used by only one PHI node during our legality checks. However for instructions that are used by other sinking candidates our heuristic is less conservative. This can result in a candidate actually being illegal when we come to sink it because of how we sunk a predecessor. Do the used-by-only-one-PHI checks again during sinking to ensure we don't crash.

llvm-svn: 280228
2016-08-31 12:33:48 +00:00
James Molloy 171fdac7ce [SimplifyCFG] Add a workaround to fix PR30188
We're sinking stores, which is a good thing, but in the process creating selects for the store address operand, which SROA/Mem2Reg can't look through, which caused serious regressions.

The real fix is in SROA, which I'll be looking into.

llvm-svn: 280219
2016-08-31 10:46:45 +00:00
James Molloy 8e69b032e5 [SimplifyCFG] Improve FoldValueComparisonIntoPredecessors to handle more cases
A very important case is not handled here: multiple arcs to a single block with a PHI. Consider:

    a:
      %1 = icmp %b, 1
      br %1, label %c, label %e
    c:
      %2 = icmp %b, 2
      br %2, label %d, label %e
    d:
      br %e
    e:
      phi [0, %a], [1, %c], [2, %d]

FoldValueComparisonIntoPredecessors will refuse to fold this, as it doesn't know how to deal with two arcs to a common destination with different PHI values. The answer is obvious - just split all conflicting arcs.

llvm-svn: 280218
2016-08-31 10:46:39 +00:00
James Molloy c53b40b509 [SimplifyCFG] Handle tail-sinking of more than 2 incoming branches
This was a real restriction in the original version of SinkIfThenCodeToEnd. Now it's been rewritten, the restriction can be lifted.

As part of this, we handle a very common and useful case where one of the incoming branches is actually conditional. Consider:

   if (a)
     x(1);
   else if (b)
     x(2);

This produces the following CFG:

         [if]
        /    \
      [x(1)] [if]
        |     | \
        |     |  \
        |  [x(2)] |
         \    |  /
          [ end ]

[end] has two unconditional predecessor arcs and one conditional. The conditional refers to the implicit empty 'else' arc. This same pattern can also be caused by an empty default block in a switch.

We can't sink the call to x() down to end because no call to x() happens on the third incoming arc (assume that x() has sideeffects for the sake of argument; if something is safe to speculate we could indeed sink nevertheless but this cannot happen in the general case and causes many extra selects).

We are now able to detect this case and split off the unconditional arcs to a common successor:

         [if]
        /    \
      [x(1)] [if]
        |     | \
        |     |  \
        |  [x(2)] |
         \   /    |
     [sink.split] |
           \     /
           [ end ]

Now we can sink the call to x() into %sink.split. This can cause significant code simplification in many testcases.

llvm-svn: 280217
2016-08-31 10:46:33 +00:00
James Molloy 55bd04cd20 [SimplifyCFG] Change the algorithm in SinkThenElseCodeToEnd
r279460 rewrote this function to be able to handle more than two incoming edges and took pains to ensure this didn't regress anything.

This time we change the logic for determining if an instruction should be sunk. Previously we used a single pass greedy algorithm - sink instructions until one requires more than one PHI node or we run out of instructions to sink.

This had the problem that sinking instructions that had non-identical but trivially the same operands needed extra logic so we sunk them aggressively. For example:

    %a = load i32* %b          %d = load i32* %b
    %c = gep i32* %a, i32 0    %e = gep i32* %d, i32 1

Sinking %c and %e would naively require two PHI merges as %a != %d. But the loads are obviously equivalent (and maybe can't be hoisted because there is no common predecessor).

This is why we implemented the fairly complex function areValuesTriviallySame(), to look through trivial differences like this. However it's just not clever enough.

Instead, throw areValuesTriviallySame away, use pointer equality to check equivalence of operands and switch to a two-stage algorithm.

In the "scan" stage, we look at every sinkable instruction in isolation from end of block to front. If it's sinkable, we keep track of all operands that required PHI merging.

In the "sink" stage, we iteratively sink the last non-terminator in the source blocks. But when calculating how many PHIs are actually required to be inserted (to work out if we should stop or not) we remove any values that have already been sunk from the set of PHI-merges required, which allows us to be more aggressive.

This turns an algorithm with potentially recursive lookahead (looking through GEPs, casts, loads and any other instruction potentially not CSE'd) to two linear scans.

llvm-svn: 280216
2016-08-31 10:46:23 +00:00
James Molloy 923e98c232 [SimplifyCFG] Tail-merge calls with sideeffects
This was deliberately disabled during my rewrite of SinkIfThenToEnd to keep behaviour
at least vaguely consistent with the previous version and keep it as close to NFC as
I could.

There's no real reason not to merge sideeffect calls though, so let's do it! Small fixup
along the way to ensure we don't create indirect calls.

Should fix PR28964.

llvm-svn: 280215
2016-08-31 10:46:16 +00:00
Gor Nishanov 50d7fb974f [Coroutines] Part 10: Add coroutine promise support.
Summary:
1) CoroEarly now lowers llvm.coro.promise intrinsic that allows to obtain
a coroutine promise pointer from a coroutine frame and vice versa.

2) CoroFrame now interprets Promise argument of llvm.coro.begin to
place CoroutinPromise alloca at a deterministic offset from the coroutine frame.

Now, the coroutine promise example from docs\Coroutines.rst compiles and produces expected result (see test/Transform/Coroutines/ex4.ll).

Reviewers: majnemer

Subscribers: llvm-commits, mehdi_amini

Differential Revision: https://reviews.llvm.org/D23993

llvm-svn: 280184
2016-08-31 00:35:41 +00:00
Sanjay Patel 7d9ebaf337 [InstCombine] clean up InsertRangeTest; NFCI
It's much less code and easier to read if we don't duplicate
everything between the 'Inside' and not 'Inside' cases.

As noted with the FIXME, the goal is to make this vector-friendly
in a follow-up patch.

llvm-svn: 280183
2016-08-31 00:19:35 +00:00
Alina Sbirlea 3f8f7840bf [LoadStoreVectorizer] Change VectorSet to Vector to match head and tail positions. Resolves PR29148.
Summary:
LSV was using two vector sets (heads and tails) to track pairs of adjiacent position to vectorize.
A recent optimization is trying to obtain the longest chain to vectorize and assumes the positions
in heads(H) and tails(T) match, which is not the case is there are multiple tails for the same head.

e.g.:
i1: store a[0]
i2: store a[1]
i3: store a[1]
Leads to:
H: i1
T: i2 i3
Instead of:
H: i1 i1
T: i2 i3
So the positions for instructions that follow i3 will have different indexes in H/T.
This patch resolves PR29148.

This issue also surfaced the fact that if the chain is too long, and TLI
returns a "not-fast" answer, the whole chain will be abandoned for
vectorization, even though a smaller one would be beneficial.
Added a testcase and FIXME for this.

Reviewers: tstellarAMD, arsenm, jlebar

Subscribers: mzolotukhin, wdng, llvm-commits

Differential Revision: https://reviews.llvm.org/D24057

llvm-svn: 280179
2016-08-30 23:53:59 +00:00
Michael Kuperstein 2954d1db77 [LoopVectorizer] Predicate instructions in blocks with several incoming edges
We don't need to limit predication to blocks that have a single incoming
edge, we just need to use the right mask.
This fixes PR30172.

Differential Revision: https://reviews.llvm.org/D24009

llvm-svn: 280148
2016-08-30 20:22:21 +00:00
Sanjay Patel b37145712e [InstCombine] replace divide-by-constant checks with asserts; NFC
These folds already have tests for scalar and vector types, except 
for the vector div-by-0 case, so I'm adding tests for that.

llvm-svn: 280115
2016-08-30 17:31:34 +00:00
Sanjay Patel a7cb477277 [InstCombine] clean up foldICmpDivConstant; NFCI
1. Fix comments to match variable names
2. Remove redundant CmpRHS variable
3. Add FIXME to replace some checks with asserts

llvm-svn: 280112
2016-08-30 17:10:49 +00:00
Chad Rosier 27ac0d8670 [Reassociate] Add additional debug output. NFC.
llvm-svn: 280090
2016-08-30 13:58:35 +00:00
James Molloy d13b1239e4 [SimplifyCFG] Properly CSE metadata in SinkThenElseCodeToEnd
This was missing, meaning the metadata in sunk instructions was potentially bogus and could cause miscompiles.

llvm-svn: 280072
2016-08-30 10:56:08 +00:00
Anna Thomas 1aea6da564 [RewriteStatepointsForGC] Update comment for same PHI node check. NFC
llvm-svn: 280052
2016-08-30 02:36:48 +00:00
Kostya Serebryany 5ac427b8e4 [sanitizer-coverage] add two more modes of instrumentation: trace-div and trace-gep, mostly usaful for value-profile-based fuzzing; llvm part
llvm-svn: 280043
2016-08-30 01:12:10 +00:00
Duncan P. N. Exon Smith 5c001c367f ADT: Give ilist<T>::reverse_iterator a handle to the current node
Reverse iterators to doubly-linked lists can be simpler (and cheaper)
than std::reverse_iterator.  Make it so.

In particular, change ilist<T>::reverse_iterator so that it is *never*
invalidated unless the node it references is deleted.  This matches the
guarantees of ilist<T>::iterator.

(Note: MachineBasicBlock::iterator is *not* an ilist iterator, but a
MachineInstrBundleIterator<MachineInstr>.  This commit does not change
MachineBasicBlock::reverse_iterator, but it does update
MachineBasicBlock::reverse_instr_iterator.  See note at end of commit
message for details on bundle iterators.)

Given the list (with the Sentinel showing twice for simplicity):

     [Sentinel] <-> A <-> B <-> [Sentinel]

the following is now true:
 1. begin() represents A.
 2. begin() holds the pointer for A.
 3. end() represents [Sentinel].
 4. end() holds the poitner for [Sentinel].
 5. rbegin() represents B.
 6. rbegin() holds the pointer for B.
 7. rend() represents [Sentinel].
 8. rend() holds the pointer for [Sentinel].

The changes are #6 and #8.  Here are some properties from the old
scheme (which used std::reverse_iterator):
- rbegin() held the pointer for [Sentinel] and rend() held the pointer
  for A;
- operator*() cost two dereferences instead of one;
- converting from a valid iterator to its valid reverse_iterator
  involved a confusing increment; and
- "RI++->erase()" left RI invalid.  The unintuitive replacement was
  "RI->erase(), RE = end()".

With vector-like data structures these properties are hard to avoid
(since past-the-beginning is not a valid pointer), and don't impose a
real cost (since there's still only one dereference, and all iterators
are invalidated on erase).  But with lists, this was a poor design.

Specifically, the following code (which obviously works with normal
iterators) now works with ilist::reverse_iterator as well:

    for (auto RI = L.rbegin(), RE = L.rend(); RI != RE;)
      fooThatMightRemoveArgFromList(*RI++);

Converting between iterator and reverse_iterator for the same node uses
the getReverse() function.

    reverse_iterator iterator::getReverse();
    iterator reverse_iterator::getReverse();

Why doesn't iterator <=> reverse_iterator conversion use constructors?

In order to catch and update old code, reverse_iterator does not even
have an explicit conversion from iterator.  It wouldn't be safe because
there would be no reasonable way to catch all the bugs from the changed
semantic (see the changes at call sites that are part of this patch).

Old code used this API:

    std::reverse_iterator::reverse_iterator(iterator);
    iterator std::reverse_iterator::base();

Here's how to update from old code to new (that incorporates the
semantic change), assuming I is an ilist<>::iterator and RI is an
ilist<>::reverse_iterator:

            [Old]         ==>          [New]
    reverse_iterator(I)       (--I).getReverse()
    reverse_iterator(I)         ++I.getReverse()
  --reverse_iterator(I)           I.getReverse()
    reverse_iterator(++I)         I.getReverse()
          RI.base()          (--RI).getReverse()
          RI.base()            ++RI.getReverse()
        --RI.base()              RI.getReverse()
      (++RI).base()              RI.getReverse()
  delete &*RI, RE = end()         delete &*RI++
  RI->erase(), RE = end()         RI++->erase()

=======================================
Note: bundle iterators are out of scope
=======================================

MachineBasicBlock::iterator, also known as
MachineInstrBundleIterator<MachineInstr>, is a wrapper to represent
MachineInstr bundles.  The idea is that each operator++ takes you to the
beginning of the next bundle.  Implementing a sane reverse iterator for
this is harder than ilist.  Here are the options:
- Use std::reverse_iterator<MBB::i>.  Store a handle to the beginning of
  the next bundle.  A call to operator*() runs a loop (usually
  operator--() will be called 1 time, for unbundled instructions).
  Increment/decrement just works.  This is the status quo.
- Store a handle to the final node in the bundle.  A call to operator*()
  still runs a loop, but it iterates one time fewer (usually
  operator--() will be called 0 times, for unbundled instructions).
  Increment/decrement just works.
- Make the ilist_sentinel<MachineInstr> *always* store that it's the
  sentinel (instead of just in asserts mode).  Then the bundle iterator
  can sniff the sentinel bit in operator++().

I initially tried implementing the end() option as part of this commit,
but updating iterator/reverse_iterator conversion call sites was
error-prone.  I have a WIP series of patches that implements the final
option.

llvm-svn: 280032
2016-08-30 00:13:12 +00:00
Teresa Johnson 8c1bc986b5 [ThinLTO] Indirect call promotion fixes for promoted local functions
Summary:
Fix a couple issues limiting the application of indirect call promotion
in ThinLTO mode:
- Invoke indirect call promotion before globalopt, since it may
  eliminate imported functions which appear unreferenced.
- Invoke indirect call promotion with InLTO=true so that the PGOFuncName
  metadata is used to get the name for locals which would have been
  renamed during promotion.

Reviewers: davidxl, mehdi_amini

Subscribers: Prazek, llvm-commits, mehdi_amini

Differential Revision: https://reviews.llvm.org/D24004

llvm-svn: 280024
2016-08-29 22:46:56 +00:00
Chad Rosier 6e1eaac62b [SLP] Return a boolean value for these static helpers. NFC.
Differential Revision: https://reviews.llvm.org/D24008

llvm-svn: 280020
2016-08-29 22:09:51 +00:00
Matthew Simpson df19502b16 [LV] Move insertelement sequence after scalar definitions
After r279649 when getting a vector value from VectorLoopValueMap, we create an
insertelement sequence on-demand if the value has been scalarized instead of
vectorized. We previously inserted this insertelement sequence before the
value's first vector user. However, this insert location is problematic if that
user is the phi node of a first-order recurrence. With this patch, we move the
insertelement sequence after the last scalar instruction we created when
scalarizing the value. Thus, the value's vector definition in the new loop will
immediately follow its scalar definitions. This should fix PR30183.

Reference: https://llvm.org/bugs/show_bug.cgi?id=30183
llvm-svn: 280001
2016-08-29 20:14:04 +00:00
Vitaly Buka 3c4f6bf654 [asan] Enable new stack poisoning with store instruction by default
Reviewers: eugenis

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D23968

llvm-svn: 279993
2016-08-29 19:28:34 +00:00
Tim Northover c10c33444e ASan: remove variable only used in assertions build
llvm-svn: 279990
2016-08-29 19:12:20 +00:00
Vitaly Buka 793913c7eb Use store operation to poison allocas for lifetime analysis.
Summary:
Calling __asan_poison_stack_memory and __asan_unpoison_stack_memory for small
variables is too expensive.

Code is disabled by default and can be enabled by -asan-experimental-poisoning.

PR27453

Reviewers: eugenis

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D23947

llvm-svn: 279984
2016-08-29 18:17:21 +00:00
Vitaly Buka db331d8be7 [asan] Separate calculation of ShadowBytes from calculating ASanStackFrameLayout
Summary: No functional changes, just refactoring to make D23947 simpler.

Reviewers: eugenis

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D23954

llvm-svn: 279982
2016-08-29 17:41:29 +00:00
David Majnemer e8fd5f9ffd [SimplifyCFG] Hoisting invalidates metadata
We forgot to remove optimization metadata when performing hosting during
FoldTwoEntryPHINode.

This fixes PR29163.

llvm-svn: 279980
2016-08-29 17:14:08 +00:00
Anna Thomas 2bc129c5fd [StatepointsForGC] Rematerialize in the presence of PHIs
Summary:
While walking the use chain for identifying rematerializable values in RS4GC,
add the case where the current value and base value are the same PHI nodes.

This will aid rematerialization of geps and casts instead of relocating.

Reviewers: sanjoy, reames, igor

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D23920

llvm-svn: 279975
2016-08-29 15:41:59 +00:00
Gor Nishanov dce9b02677 [Coroutines] Part 9: Add cleanup subfunction.
Summary:
[Coroutines] Part 9: Add cleanup subfunction.

This patch completes coroutine heap allocation elision. Now, the heap elision example from docs\Coroutines.rst compiles and produces expected result (see test/Transform/Coroutines/ex3.ll)

Intrinsic Changes:
* coro.free gets a token parameter tying it to coro.id to allow reliably discovering all coro.frees associated with a particular coroutine.
* coro.id gets an extra parameter that points back to a coroutine function. This allows to check whether a coro.id describes the enclosing function or it belongs to a different function that was later inlined.

CoroSplit now creates three subfunctions:
# f$resume - resume logic
# f$destroy - cleanup logic, followed by a deallocation code
# f$cleanup - just the cleanup code

CoroElide pass during devirtualization replaces coro.destroy with either f$destroy or f$cleanup depending whether heap elision is performed or not.

Other fixes, improvements:
* Fixed buglet in Shape::buildFrame that was not creating coro.save properly if coroutine has more than one suspend point.

* Switched to using variable width suspend index field (no longer limited to 32 bit index field can be as little as i1 or as large as i<whatever-size_t-is>)

Reviewers: majnemer

Subscribers: llvm-commits, mehdi_amini

Differential Revision: https://reviews.llvm.org/D23844

llvm-svn: 279971
2016-08-29 14:34:12 +00:00
Sanjay Patel 5c5311f4e5 [InstCombine] use m_APInt to allow icmp (and X, Y), C folds for splat constant vectors
llvm-svn: 279937
2016-08-28 18:18:00 +00:00
Sebastian Pop 4660199a33 GVN-hoist: invalidate MD cache (PR29144)
Without invalidating the entries in the MD cache we would try to access instructions
that were removed in previous iterations of hoisting.

Differential Revision: https://reviews.llvm.org/D23927

llvm-svn: 279907
2016-08-27 02:48:41 +00:00
Adam Nemet cef3314156 [Inliner] Report when inlining fails because callee's def is unavailable
Summary:
This is obviously an interesting case because it may motivate code
restructuring or LTO.

Reporting this requires instantiation of ORE in the loop where the call
sites are first gathered.  I've checked compile-time
overhead *with* -Rpass-with-hotness and the worst slow-down was 6% in
mcf and quickly tailing off.  As before without -Rpass-with-hotness
there is no overhead.

Because this could be a pretty noisy diagnostics, it is currently
qualified as 'verbose'.  As of this patch, 'verbose' diagnostics are
only emitted with -Rpass-with-hotness, i.e. when the output is expected
to be filtered.

Reviewers: eraman, chandlerc, davidxl, hfinkel

Subscribers: tejohnson, Prazek, davide, llvm-commits

Differential Revision: https://reviews.llvm.org/D23415

llvm-svn: 279860
2016-08-26 20:21:05 +00:00
Sanjay Patel 14e0e18d76 [InstCombine] add helper function for icmp (and (sh X, Y), C2), C1 ; NFC
Like other recent changes near here, the goal is to allow vector types for
all of these folds. Splitting things up makes it easier to incrementally 
enhance the code and easier to read.

llvm-svn: 279851
2016-08-26 18:28:46 +00:00
Sanjay Patel da9c56299b [InstCombine] clean up foldICmpAndConstConst(); NFC
1. Early exit to reduce indent
2. Fix comments and variable names to match
3. Reformat comments / clang-format code

llvm-svn: 279837
2016-08-26 17:15:22 +00:00
Sanjay Patel d3c7bb28be [InstCombine] add helper function for folding of icmp (and X, C2), C; NFC
llvm-svn: 279834
2016-08-26 16:42:33 +00:00
Bob Haarman 3db176410a limit the number of instructions per block examined by dead store elimination
Summary: Dead store elimination gets very expensive when large numbers of instructions need to be analyzed. This patch limits the number of instructions analyzed per store to the value of the memdep-block-scan-limit parameter (which defaults to 100). This resulted in no observed difference in performance of the generated code, and no change in the statistics for the dead store elimination pass, but improved compilation time on some files by more than an order of magnitude.

Reviewers: dexonsmith, bruno, george.burgess.iv, dberlin, reames, davidxl

Subscribers: davide, chandlerc, dberlin, davidxl, eraman, tejohnson, mbodart, llvm-commits

Differential Revision: https://reviews.llvm.org/D15537

llvm-svn: 279833
2016-08-26 16:34:27 +00:00
Sanjay Patel 311e0fabb1 [InstCombine] rename variables in foldICmpAndConstant(); NFC
llvm-svn: 279831
2016-08-26 16:14:06 +00:00
Bob Haarman 244ed8b574 test commit
llvm-svn: 279830
2016-08-26 16:00:04 +00:00
Adam Nemet 4f155b6e91 [LoopUnroll] Use OptimizationRemarkEmitter directly not via the analysis pass
We can't mark ORE (a function pass) preserved as required by the loop
passes because that is how we ensure that the required passes like
LazyBFI are all available any time ORE is used.  See the new comments in
the patch.

Instead we use it directly just like the inliner does in D22694.

As expected there is some additional overhead after removing the caching
provided by analysis passes.  The worst case, I measured was
LNT/CINT2006_ref/401.bzip2 which regresses by 12%.  As before, this only
affects -Rpass-with-hotness and not default compilation.

llvm-svn: 279829
2016-08-26 15:58:34 +00:00
Sanjay Patel f7ba0891ce [InstCombine] rename variables in foldICmpDivConstant(); NFC
Removing the redundant 'CmpRHSV' local variable exposes a bug in the caller
foldICmpShrConstant() - it was sending in the div constant instead of the
cmp constant. But I have not been able to expose this in a regression test
yet - the affected folds all appear to be handled before we ever reach this
code. I'll keep trying to find a case as I make changes to allow vector folds
in both functions.

llvm-svn: 279828
2016-08-26 15:53:01 +00:00
Tim Shen 3ad8b43cc2 [MemCpy] Add comments for r279769
Differential Revision: https://reviews.llvm.org/D23846

llvm-svn: 279778
2016-08-25 21:03:46 +00:00
Tim Shen a3dbead2d6 [MemCpy] Check for alias in performMemCpyToMemSetOptzn, instead of the identity of two operands
Summary:
This fixes pr29105. The reason is that lifetime marks creates new
aliasing pointers the original ones, but before this patch aliases
were not checked in performMemCpyToMemSetOptzn.

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D23846

llvm-svn: 279769
2016-08-25 19:27:26 +00:00
Wei Mi 59ca96636d [UNROLL] Postpone ScalarEvolution::forgetLoop after TripCountSC is expanded
when unroll runtime iteration loop.

In llvm::UnrollRuntimeLoopRemainder, if the loop to be unrolled is the inner
loop inside a loop nest, the scalar evolution needs to be dropped for its
parent loop which is done by ScalarEvolution::forgetLoop. However, we can
postpone forgetLoop to the end of UnrollRuntimeLoopRemainder so TripCountSC
expansion can still reuse existing value.

Differential Revision: https://reviews.llvm.org/D23572

llvm-svn: 279748
2016-08-25 16:17:18 +00:00
Sebastian Pop 5f0d0e60d1 GVN-hoist: fix hoistingFromAllPaths for loops (PR29034)
It is invalid to hoist stores or loads if they are not executed on all paths
from the hoisting point to the exit of the function. In the testcase, there are
paths in the loop that do not execute the stores or the loads, and so hoisting
them within the loop is unsafe.

The problem is that the current implementation of hoistingFromAllPaths is
incomplete: it walks all blocks dominated by the hoisting point, and does not
return false when the loop contains a path on which the hoisted ld/st is
not executed.

Differential Revision: https://reviews.llvm.org/D23843

llvm-svn: 279732
2016-08-25 11:55:47 +00:00
Xinliang David Li cad3a995a4 [Profile] Propagate branch metadata properly in instcombine
Differential Revision: http://reviews.llvm.org/D23590

llvm-svn: 279693
2016-08-25 00:26:32 +00:00
Sanjay Patel 1655414903 [InstCombine] move foldICmpDivConstConst() contents to foldICmpDivConstant(); NFCI
There was no logic in foldICmpDivConstant, so no need for a separate function.
The code is directly copy/pasted, so further cleanups to follow.

llvm-svn: 279685
2016-08-24 23:03:36 +00:00
Sanjay Patel d398d4a39e [InstCombine] use m_APInt to allow icmp eq/ne (shr X, C2), C folds for splat constant vectors
llvm-svn: 279677
2016-08-24 22:22:06 +00:00
Matthew Simpson abd2be1e2e [LV] Unify vector and scalar maps
This patch unifies the data structures we use for mapping instructions from the
original loop to their corresponding instructions in the new loop. Previously,
we maintained two distinct maps for this purpose: WidenMap and ScalarIVMap.
WidenMap maintained the vector values each instruction from the old loop was
represented with, and ScalarIVMap maintained the scalar values each scalarized
induction variable was represented with. With this patch, all values created
for the new loop are maintained in VectorLoopValueMap.

The change allows for several simplifications. Previously, when an instruction
was scalarized, we had to insert the scalar values into vectors in order to
maintain the mapping in WidenMap. Then, if a user of the scalarized value was
also scalar, we had to extract the scalar values from the temporary vector we
created. We now aovid these unnecessary scalar-to-vector-to-scalar conversions.
If a scalarized value is used by a scalar instruction, the scalar value is used
directly. However, if the scalarized value is needed by a vector instruction,
we generate the needed insertelement instructions on-demand.

A common idiom in several locations in the code (including the scalarization
code), is to first get the vector values an instruction from the original loop
maps to, and then extract a particular scalar value. This patch adds
getScalarValue for this purpose along side getVectorValue as an interface into
VectorLoopValueMap. These functions work together to return the requested
values if they're available or to produce them if they're not.

The mapping has also be made less permissive. Entries can be added to
VectorLoopValue map with the new initVector and initScalar functions.
getVectorValue has been modified to return a constant reference to the mapped
entries.

There's no real functional change with this patch; however, in some cases we
will generate slightly different code. For example, instead of an insertelement
sequence following the definition of an instruction, it will now precede the
first use of that instruction. This can be seen in the test case changes.

Differential Revision: https://reviews.llvm.org/D23169

llvm-svn: 279649
2016-08-24 18:23:17 +00:00
Sanjoy Das ff855b6020 [SCCP] Don't delete side-effecting instructions
I'm not sure if the `!isa<CallInst>(Inst) &&
!isa<TerminatorInst>(Inst))` bit is correct either, but this fixes the
case we know is broken.

llvm-svn: 279647
2016-08-24 18:10:21 +00:00
Sanjay Patel 8e297749c1 [InstCombine] add assert and explanatory comment for fold removed in r279568; NFC
I deleted a fold from InstCombine at:
https://reviews.llvm.org/rL279568

because it (like any InstCombine to a constant?) should always happen in InstSimplify,
however, it's not obvious what the assumptions are in the remaining code.

Add a comment and assert to make it clearer.

Differential Revision: https://reviews.llvm.org/D23819

llvm-svn: 279626
2016-08-24 13:55:55 +00:00
Gil Rapaport 550148b2f6 [Loop Vectorizer] Support predication of div/rem
div/rem instructions in basic blocks that require predication currently prevent
vectorization. This patch extends the existing mechanism for predicating stores
to handle other instructions and leverages it to predicate divs and rems.

Differential Revision: https://reviews.llvm.org/D22918

llvm-svn: 279620
2016-08-24 11:37:57 +00:00
Chandler Carruth 8882346842 [PM] Introduce basic update capabilities to the new PM's CGSCC pass
manager, including both plumbing and logic to handle function pass
updates.

There are three fundamentally tied changes here:
1) Plumbing *some* mechanism for updating the CGSCC pass manager as the
   CG changes while passes are running.
2) Changing the CGSCC pass manager infrastructure to have support for
   the underlying graph to mutate mid-pass run.
3) Actually updating the CG after function passes run.

I can separate them if necessary, but I think its really useful to have
them together as the needs of #3 drove #2, and that in turn drove #1.

The plumbing technique is to extend the "run" method signature with
extra arguments. We provide the call graph that intrinsically is
available as it is the basis of the pass manager's IR units, and an
output parameter that records the results of updating the call graph
during an SCC passes's run. Note that "...UpdateResult" isn't a *great*
name here... suggestions very welcome.

I tried a pretty frustrating number of different data structures and such
for the innards of the update result. Every other one failed for one
reason or another. Sometimes I just couldn't keep the layers of
complexity right in my head. The thing that really worked was to just
directly provide access to the underlying structures used to walk the
call graph so that their updates could be informed by the *particular*
nature of the change to the graph.

The technique for how to make the pass management infrastructure cope
with mutating graphs was also something that took a really, really large
number of iterations to get to a place where I was happy. Here are some
of the considerations that drove the design:

- We operate at three levels within the infrastructure: RefSCC, SCC, and
  Node. In each case, we are working bottom up and so we want to
  continue to iterate on the "lowest" node as the graph changes. Look at
  how we iterate over nodes in an SCC running function passes as those
  function passes mutate the CG. We continue to iterate on the "lowest"
  SCC, which is the one that continues to contain the function just
  processed.

- The call graph structure re-uses SCCs (and RefSCCs) during mutation
  events for the *highest* entry in the resulting new subgraph, not the
  lowest. This means that it is necessary to continually update the
  current SCC or RefSCC as it shifts. This is really surprising and
  subtle, and took a long time for me to work out. I actually tried
  changing the call graph to provide the opposite behavior, and it
  breaks *EVERYTHING*. The graph update algorithms are really deeply
  tied to this particualr pattern.

- When SCCs or RefSCCs are split apart and refined and we continually
  re-pin our processing to the bottom one in the subgraph, we need to
  enqueue the newly formed SCCs and RefSCCs for subsequent processing.
  Queuing them presents a few challenges:
  1) SCCs and RefSCCs use wildly different iteration strategies at
     a high level. We end up needing to converge them on worklist
     approaches that can be extended in order to be able to handle the
     mutations.
  2) The order of the enqueuing need to remain bottom-up post-order so
     that we don't get surprising order of visitation for things like
     the inliner.
  3) We need the worklists to have set semantics so we don't duplicate
     things endlessly. We don't need a *persistent* set though because
     we always keep processing the bottom node!!!! This is super, super
     surprising to me and took a long time to convince myself this is
     correct, but I'm pretty sure it is... Once we sink down to the
     bottom node, we can't re-split out the same node in any way, and
     the postorder of the current queue is fixed and unchanging.
  4) We need to make sure that the "current" SCC or RefSCC actually gets
     enqueued here such that we re-visit it because we continue
     processing a *new*, *bottom* SCC/RefSCC.

- We also need the ability to *skip* SCCs and RefSCCs that get merged
  into a larger component. We even need the ability to skip *nodes* from
  an SCC that are no longer part of that SCC.

This led to the design you see in the patch which uses SetVector-based
worklists. The RefSCC worklist is always empty until an update occurs
and is just used to handle those RefSCCs created by updates as the
others don't even exist yet and are formed on-demand during the
bottom-up walk. The SCC worklist is pre-populated from the RefSCC, and
we push new SCCs onto it and blacklist existing SCCs on it to get the
desired processing.

We then *directly* update these when updating the call graph as I was
never able to find a satisfactory abstraction around the update
strategy.

Finally, we need to compute the updates for function passes. This is
mostly used as an initial customer of all the update mechanisms to drive
their design to at least cover some real set of use cases. There are
a bunch of interesting things that came out of doing this:

- It is really nice to do this a function at a time because that
  function is likely hot in the cache. This means we want even the
  function pass adaptor to support online updates to the call graph!

- To update the call graph after arbitrary function pass mutations is
  quite hard. We have to build a fairly comprehensive set of
  data structures and then process them. Fortunately, some of this code
  is related to the code for building the cal graph in the first place.
  Unfortunately, very little of it makes any sense to share because the
  nature of what we're doing is so very different. I've factored out the
  one part that made sense at least.

- We need to transfer these updates into the various structures for the
  CGSCC pass manager. Once those were more sanely worked out, this
  became relatively easier. But some of those needs necessitated changes
  to the LazyCallGraph interface to make it significantly easier to
  extract the changed SCCs from an update operation.

- We also need to update the CGSCC analysis manager as the shape of the
  graph changes. When an SCC is merged away we need to clear analyses
  associated with it from the analysis manager which we didn't have
  support for in the analysis manager infrsatructure. New SCCs are easy!
  But then we have the case that the original SCC has its shape changed
  but remains in the call graph. There we need to *invalidate* the
  analyses associated with it.

- We also need to invalidate analyses after we *finish* processing an
  SCC. But the analyses we need to invalidate here are *only those for
  the newly updated SCC*!!! Because we only continue processing the
  bottom SCC, if we split SCCs apart the original one gets invalidated
  once when its shape changes and is not processed farther so its
  analyses will be correct. It is the bottom SCC which continues being
  processed and needs to have the "normal" invalidation done based on
  the preserved analyses set.

All of this is mostly background and context for the changes here.

Many thanks to all the reviewers who helped here. Especially Sanjoy who
caught several interesting bugs in the graph algorithms, David, Sean,
and others who all helped with feedback.

Differential Revision: http://reviews.llvm.org/D21464

llvm-svn: 279618
2016-08-24 09:37:14 +00:00
Gor Nishanov 4570e26e68 [Coroutines] Fix unused var warning in release build
llvm-svn: 279610
2016-08-24 05:20:30 +00:00
Gor Nishanov 241b041fba [Coroutines] Part 8: Coroutine Frame Building algorithm
Summary:
This patch adds coroutine frame building algorithm. Now, simple coroutines such as ex0.ll and ex1.ll (first examples from docs\Coroutines.rst can be compiled).

Documentation and overview is here: http://llvm.org/docs/Coroutines.html.

Upstreaming sequence (rough plan)
1.Add documentation. (https://reviews.llvm.org/D22603)
2.Add coroutine intrinsics. (https://reviews.llvm.org/D22659)
...

7. Split coroutine into subfunctions. (https://reviews.llvm.org/D23461)
8. Coroutine Frame Building algorithm  <= we are here
9. Add f.cleanup subfunction.
10+. The rest of the logic

Reviewers: majnemer

Subscribers: mehdi_amini, llvm-commits

Differential Revision: https://reviews.llvm.org/D23586

llvm-svn: 279609
2016-08-24 04:44:35 +00:00
David Callahan 012d1c0766 [ADCE] Add control dependence computation
Summary:
This is part of a serious of patches to evolve ADCE.cpp to support
removing of unnecessary control flow.

This patch adds the ability to compute control dependences using
the iterated dominance frontier. We extend the liveness propagation
to alternate between data and control dependences until convergences.

Modify the pass manager intergation to compute the post-dominator tree
needed for iterator dominance frontier.

We still force all terminators live for now until we add code to
handlinge removing control flow in a later patch.

No changes to effective behavior with this patch

Previous patches:

D23225 [ADCE] Modify data structures to support removing control flow
D23065 [ADCE] Refactor anticipating new functionality (NFC)
D23102 [ADCE] Refactoring for new functionality (NFC)

Reviewers: nadav, majnemer, mehdi_amini

Subscribers: twoh, freik, llvm-commits

Differential Revision: https://reviews.llvm.org/D23559

llvm-svn: 279594
2016-08-24 00:10:06 +00:00
Michael Zolotukhin bd63d436c1 [LoopUnroll] By default disable unrolling when optimizing for size.
Summary:
In clang commit r268509 we started to invoke loop-unroll pass from the
driver even under -Os. However, we happen to not initialize optsize
thresholds properly, which si fixed with this change.

r268509 led to some big compile time regressions, because we started to
unroll some loops that we didn't unroll before. With this change I hope
to recover most of the regressions. We still are slightly slower than
before, because we do some checks here and there in loop-unrolling
before we bail out, but at least the slowdown is not that huge now.

Reviewers: hfinkel, chandlerc

Subscribers: mzolotukhin, llvm-commits

Differential Revision: https://reviews.llvm.org/D23388

llvm-svn: 279585
2016-08-23 23:13:15 +00:00
Sanjay Patel d64e988701 [InstCombine] use local variables for repeated values; NFCI
llvm-svn: 279578
2016-08-23 22:05:55 +00:00
Sanjay Patel dcac0dfca9 [InstCombine] move foldICmpShrConstConst() contents to foldICmpShrConst(); NFCI
There will only be 3 lines of code in foldICmpShrConst() when the cleanup is done,
so it doesn't make much sense to have a separate function for a single fold.

llvm-svn: 279575
2016-08-23 21:25:13 +00:00
Sanjay Patel 6ef22da9ec [InstCombine] remove icmp shr folds that are already handled by InstSimplify
AFAICT, these already worked in all cases for scalar types, and I enhanced
the code to work for vector types in:
https://reviews.llvm.org/rL279543

llvm-svn: 279568
2016-08-23 21:01:35 +00:00
Matthew Simpson df2ab917ad [SLP] Avoid signed integer overflow
The test case included with r279125 exposed an existing signed integer
overflow. Since getTreeCost can return INT_MAX, we can't sum this cost together
with other costs, such as getReductionCost.

This patch removes the possibility of assigning a cost of INT_MAX. Since we
were previously using INT_MAX as an indicator for "should not vectorize", we
now explicitly check this condition with "isTreeTinyAndNotFullyVectorizable"
before computing a cost.

This patch adds a run-line to the test case used for r279125 that ensures we
don't vectorize. Previously, this line would vectorize the test case by chance
due to undefined behavior in the cost calculation.

Differential Revision: https://reviews.llvm.org/D23723

llvm-svn: 279562
2016-08-23 20:48:50 +00:00
Xinliang David Li 812288a5b4 Possible fix of test failures on win bots
llvm-svn: 279542
2016-08-23 18:00:41 +00:00
Xinliang David Li dc49140b44 [Profile] refactor meta data copying/swapping code
Differential Revision: http://reviews.llvm.org/D23619

llvm-svn: 279523
2016-08-23 15:39:03 +00:00
Daniel Berlin ea02eee18f GVNHoist: Use the pass version of MemorySSA and preserve it.
Summary: GVNHoist: Use the pass version of MemorySSA and preserve it.

Reviewers: sebpop, george.burgess.iv

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D23782

llvm-svn: 279504
2016-08-23 05:42:41 +00:00
George Burgess IV 7f414b90ab [MemorySSA] Remove unused field. NFC.
Given that we're not currently using blocker info, and whether or not we
will end up using it it is unclear, don't waste 8 (or 4) bytes of memory
per path node.

llvm-svn: 279493
2016-08-22 23:40:01 +00:00
Sanjay Patel c9196c4488 [InstCombine] change param type from Instruction to BinaryOperator for icmp helpers; NFCI
This saves some casting in the helper functions and eases some further refactoring.

llvm-svn: 279478
2016-08-22 21:24:29 +00:00
Tim Shen f2187ed321 [GraphTraits] Replace all NodeType usage with NodeRef
This should finish the GraphTraits migration.

Differential Revision: http://reviews.llvm.org/D23730

llvm-svn: 279475
2016-08-22 21:09:30 +00:00
Sanjay Patel a392049419 [InstCombine] use m_APInt to allow icmp (shr exact X, Y), 0 folds for splat constant vectors
llvm-svn: 279472
2016-08-22 20:45:06 +00:00
Daniel Berlin 3d512a2dc2 MSSA: Factor out phi node placement
llvm-svn: 279462
2016-08-22 19:14:30 +00:00
Daniel Berlin 868381bff6 MSSA: Only rename accesses whose defining access is nullptr
llvm-svn: 279461
2016-08-22 19:14:16 +00:00
James Molloy 5bf2114265 [SimplifyCFG] Rewrite SinkThenElseCodeToEnd
[Recommitting now an unrelated assertion in SROA is sorted out]

The new version has several advantages:
  1) IMSHO it's more readable and neater
  2) It handles loads and stores properly
  3) It can handle any number of incoming blocks rather than just two. I'll be taking advantage of this in a followup patch.

With this change we can now finally sink load-modify-store idioms such as:

    if (a)
      return *b += 3;
    else
      return *b += 4;

    =>

    %z = load i32, i32* %y
    %.sink = select i1 %a, i32 5, i32 7
    %b = add i32 %z, %.sink
    store i32 %b, i32* %y
    ret i32 %b

When this works for switches it'll be even more powerful.

Round 4. This time we should handle all instructions correctly, and not replace any operands that need to be constant with variables.

This was really hard to determine safely, so the helper function should be put into the Instruction API. I'll do that as a followup.

llvm-svn: 279460
2016-08-22 19:07:15 +00:00
James Molloy 0fee97f8ba [SROA] Remove incorrect assertion
Confirmed with aprantl, this assertion is incorrect - code can get here (for example 80-bit FP types) and if it does it's benign. This is exposed by a completely unrelated patch of mine, so stop the compiler falling over.

Original differential: http://reviews.llvm.org/D16187
aprantl's advice to remove assertion: http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20160815/382129.html

llvm-svn: 279454
2016-08-22 18:49:42 +00:00
Jun Bum Lim ec8b8cc595 [InstCombine] Allow sinking from unique predecessor with multiple edges
Summary: We can allow sinking if the single user block has only one unique predecessor, regardless of the number of edges. Note that a switch statement with multiple cases can have the same destination.

Reviewers: mcrosier, majnemer, spatel, reames

Subscribers: reames, mcrosier, llvm-commits

Differential Revision: https://reviews.llvm.org/D23722

llvm-svn: 279448
2016-08-22 18:21:56 +00:00
James Molloy 475f4a763f Revert "[SimplifyCFG] Rewrite SinkThenElseCodeToEnd"
This reverts commit r279443. It caused buildbot failures.

llvm-svn: 279447
2016-08-22 18:13:12 +00:00
James Molloy 353052698a [SimplifyCFG] Rewrite SinkThenElseCodeToEnd
The new version has several advantages:
  1) IMSHO it's more readable and neater
  2) It handles loads and stores properly
  3) It can handle any number of incoming blocks rather than just two. I'll be taking advantage of this in a followup patch.

With this change we can now finally sink load-modify-store idioms such as:

    if (a)
      return *b += 3;
    else
      return *b += 4;

    =>

    %z = load i32, i32* %y
    %.sink = select i1 %a, i32 5, i32 7
    %b = add i32 %z, %.sink
    store i32 %b, i32* %y
    ret i32 %b

When this works for switches it'll be even more powerful.

Round 4. This time we should handle all instructions correctly, and not replace any operands that need to be constant with variables.

This was really hard to determine safely, so the helper function should be put into the Instruction API. I'll do that as a followup.

llvm-svn: 279443
2016-08-22 17:40:23 +00:00
Artur Pilipenko b78ad9d41f Revert -r278269 [IndVarSimplify] Eliminate zext of a signed IV when the IV is known to be non-negative
This change needs to be reverted in order to revert -r278267 which cause performance regression on MultiSource/Benchmarks/TSVC/Symbolics-flt/Symbolics-flt from LNT and some other bechmarks.

See comments on https://reviews.llvm.org/D18777 for details.

llvm-svn: 279432
2016-08-22 13:12:07 +00:00
Vitaly Buka 0672a27bb5 [asan] Use 1 byte aligned stores to poison shadow memory
Summary: r279379 introduced crash on arm 32bit bot. I suspect this is alignment issue.

Reviewers: eugenis

Subscribers: llvm-commits, aemerson

Differential Revision: https://reviews.llvm.org/D23762

llvm-svn: 279413
2016-08-22 04:16:14 +00:00
Sanjay Patel 643d21a62c [InstCombine] use m_APInt to allow icmp (shl X, Y), C folds for splat constant vectors, part 4
This concludes the fixes for icmp+shl in this series:
https://reviews.llvm.org/rL279339
https://reviews.llvm.org/rL279398
https://reviews.llvm.org/rL279399

llvm-svn: 279401
2016-08-21 17:10:07 +00:00
Sanjay Patel 7ffcde7422 [InstCombine] use m_APInt to allow icmp (shl X, Y), C folds for splat constant vectors, part 3
This is a partial enablement (move the ConstantInt guard down).

llvm-svn: 279399
2016-08-21 16:35:34 +00:00
Sanjay Patel 7e09f13fed [InstCombine] use m_APInt to allow icmp (shl X, Y), C folds for splat constant vectors, part 2
This is a partial enablement (move the ConstantInt guard down).

llvm-svn: 279398
2016-08-21 16:28:22 +00:00
Sanjay Patel 792636603f [InstCombine] use APInt instead of ConstantInt in isSignBitCheck(); NFCI
The callers still have ConstantInt guards, so there is no functional change
intended from this change. But relaxing the callers will allow more folds
for vector types.

llvm-svn: 279396
2016-08-21 15:07:45 +00:00
Vitaly Buka 1f9e135023 [asan] Minimize code size by using __asan_set_shadow_* for large blocks
Summary:
We can insert function call instead of multiple store operation.
Current default is blocks larger than 64 bytes.
Changes are hidden behind -asan-experimental-poisoning flag.

PR27453

Differential Revision: https://reviews.llvm.org/D23711

llvm-svn: 279383
2016-08-20 20:23:50 +00:00
Vitaly Buka 3455b9b8bc [asan] Initialize __asan_set_shadow_* callbacks
Summary:
Callbacks are not being used yet.

PR27453

Reviewers: kcc, eugenis

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D23634

llvm-svn: 279380
2016-08-20 18:34:39 +00:00
Vitaly Buka 186280daa5 [asan] Optimize store size in FunctionStackPoisoner::poisonRedZones
Summary: Reduce store size to avoid leading and trailing zeros.

Reviewers: kcc, eugenis

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D23648

llvm-svn: 279379
2016-08-20 18:34:36 +00:00
Vitaly Buka 5b4f12176c [asan] Cleanup instrumentation of dynamic allocas
Summary:
Extract instrumenting dynamic allocas into separate method.
Rename asan-instrument-allocas -> asan-instrument-dynamic-allocas

Differential Revision: https://reviews.llvm.org/D23707

llvm-svn: 279376
2016-08-20 17:22:27 +00:00
Vitaly Buka f9fd63ad39 [asan] Add support of lifetime poisoning into ComputeASanStackFrameLayout
Summary:
We are going to combine poisoning of red zones and scope poisoning.

PR27453

Reviewers: kcc, eugenis

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D23623

llvm-svn: 279373
2016-08-20 16:48:24 +00:00
Matthew Simpson 235e479984 Reapply "[SLP] Initialize VectorizedValue when gathering"
The test case included in r279125 exposed existing undefined behavior in the
SLP vectorizer that it did not introduce. This patch reapplies the original
patch, but modifies the test case to avoid hitting the undefined behavior. This
allows us to close PR28330 while keeping the UBSan bot happy. The undefined
behavior the original test uncovered will be addressed in a follow-on patch.

Reference: https://llvm.org/bugs/show_bug.cgi?id=28330
llvm-svn: 279370
2016-08-20 14:49:02 +00:00
Matthew Simpson 2429656aa9 [SLP] Add command line option for minimum tree size (NFC)
llvm-svn: 279369
2016-08-20 14:10:06 +00:00
Vitaly Buka cc7db13bf0 Revert "[SLP] Initialize VectorizedValue when gathering" to fix ubsan bot.
This reverts commit r279125.

https://reviews.llvm.org/D23410

llvm-svn: 279363
2016-08-20 07:09:39 +00:00
Sanjay Patel fa7de606c4 [InstCombine] use m_APInt to allow icmp (shl X, Y), C folds for splat constant vectors, part 1
This is a partial enablement (move the ConstantInt guard down) because there are many
different folds here and one of the later ones will require reworking 'isSignBitCheck'.

llvm-svn: 279339
2016-08-19 22:33:26 +00:00
Daniel Berlin 11da66fc10 Partially revert 279331, as we modify this instruction in the loop
llvm-svn: 279335
2016-08-19 22:18:38 +00:00
Vitaly Buka e149b392a8 Revert "[asan] Add support of lifetime poisoning into ComputeASanStackFrameLayout"
This reverts commit r279020.

Speculative revert in hope to fix asan test on arm.

llvm-svn: 279332
2016-08-19 22:12:58 +00:00
Daniel Berlin a36f46363f Convert some depth first traversals to depth_first
llvm-svn: 279331
2016-08-19 22:06:23 +00:00
Tim Shen b5e0f5ac95 [GraphTraits] Make nodes_iterator dereference to NodeType*/NodeRef
Currently nodes_iterator may dereference to a NodeType* or a NodeType&. Make them all dereference to NodeType*, which is NodeRef later.

Differential Revision: https://reviews.llvm.org/D23704
Differential Revision: https://reviews.llvm.org/D23705

llvm-svn: 279326
2016-08-19 21:20:13 +00:00
Reid Kleckner 98a48afa5d Revert "[SimplifyCFG] Rewrite SinkThenElseCodeToEnd"
This reverts commit r279229. It breaks intrinsic function calls in
diamonds.

llvm-svn: 279313
2016-08-19 20:22:39 +00:00
Sanjay Patel 7a104615c5 [InstCombine] remove an icmp fold that is already handled by InstSimplify
Specifically, this is done near the end of "SimplifyICmpInst" using 
computeKnownBits() as the broader solution. There are even vector
tests (yay!) for this in test/Transforms/InstSimplify/compare.ll.

I considered putting an assert here instead of just deleting, but
then we could assert every possible fold in InstSimplify in 
InstCombine, so...less is more?

llvm-svn: 279300
2016-08-19 19:03:07 +00:00
Sanjay Patel e38e79c3e6 [InstCombine] use local variables to reduce code in foldICmpShlConstant; NFC
llvm-svn: 279282
2016-08-19 17:34:05 +00:00
Sanjay Patel 38b7506f75 [InstCombine] rename variables in foldICmpShlConstant(); NFC
llvm-svn: 279279
2016-08-19 17:20:37 +00:00
Vitaly Buka 170dede75d Revert "[asan] Optimize store size in FunctionStackPoisoner::poisonRedZones"
This reverts commit r279178.

Speculative revert in hope to fix asan crash on arm.

llvm-svn: 279277
2016-08-19 17:15:38 +00:00
Vitaly Buka c8f4d69c82 Revert "[asan] Fix size of shadow incorrectly calculated in r279178"
This reverts commit r279222.

Speculative revert in hope to fix asan crash on arm.

llvm-svn: 279276
2016-08-19 17:15:33 +00:00
Reid Kleckner a871d3872a Fix regression in InstCombine introduced by r278944
The intended transform is:
  // Simplify icmp eq (or (ptrtoint P), (ptrtoint Q)), 0
  // -> and (icmp eq P, null), (icmp eq Q, null).

P and Q are both pointer types, but may have different types. We need
two calls to getNullValue() to make the icmps.

llvm-svn: 279271
2016-08-19 16:53:18 +00:00
David Majnemer 5554edabef [CloneFunction] Don't remove unrelated nodes from the CGSSC
CGSCC use a WeakVH to track call sites.  RAUW a call within a function
can result in that WeakVH getting confused about whether or not the call
site is still around.

llvm-svn: 279268
2016-08-19 16:37:40 +00:00
Sanjay Patel a867afe094 [InstCombine] use m_APInt to allow icmp (shl 1, Y), C folds for splat constant vectors
llvm-svn: 279266
2016-08-19 16:12:16 +00:00
Sanjay Patel 57b12d3876 [InstCombine] use m_APInt to allow icmp X, C folds for splat constant vectors
Of course, we really need to refactor and fix all of the cmp predicates, 
but this one is interesting because without it, we later perform an 
information-losing transform of icmp (shl 1, Y), C, and we can't recover
the better fold.

llvm-svn: 279263
2016-08-19 15:40:44 +00:00
Benjamin Kramer 96fcf5df03 [LoopVectorize] Don't copy std::vector in for-range loop.
llvm-svn: 279233
2016-08-19 12:44:24 +00:00
James Molloy 11a1936b70 [SimplifyCFG] Rewrite SinkThenElseCodeToEnd
The new version has several advantages:
  1) IMSHO it's more readable and neater
  2) It handles loads and stores properly
  3) It can handle any number of incoming blocks rather than just two. I'll be taking advantage of this in a followup patch.

With this change we can now finally sink load-modify-store idioms such as:

    if (a)
      return *b += 3;
    else
      return *b += 4;

    =>

    %z = load i32, i32* %y
    %.sink = select i1 %a, i32 5, i32 7
    %b = add i32 %z, %.sink
    store i32 %b, i32* %y
    ret i32 %b

When this works for switches it'll be even more powerful.

llvm-svn: 279229
2016-08-19 10:10:27 +00:00
Vitaly Buka b81960a6c8 [asan] Fix size of shadow incorrectly calculated in r279178
Summary: r279178 generates 8 times more stores than necessary.

Reviewers: eugenis

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D23708

llvm-svn: 279222
2016-08-19 08:33:53 +00:00
Xinliang David Li 63248ab888 [Profile] Fix edge count read bug
Use uint64_t to avoid value truncation before scaling.

llvm-svn: 279213
2016-08-19 06:31:45 +00:00
Xinliang David Li 2c9336823c [Profile] Simple code refactoring for reuse /NFC
llvm-svn: 279209
2016-08-19 05:31:33 +00:00
Vitaly Buka aa654292bd [asan] Optimize store size in FunctionStackPoisoner::poisonRedZones
Summary: Reduce store size to avoid leading and trailing zeros.

Reviewers: kcc, eugenis

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D23648

llvm-svn: 279178
2016-08-18 23:51:15 +00:00
Sanjay Patel 98cd99dfc6 [InstCombine] add helper function for folds of icmp (shl 1, Y), C; NFCI
Clean up the existing code by:
1. Renaming variables
2. Adding local variables
3. Making it vector-safe

This is still guarded by a ConstantInt check, so no functional change is intended.
But this should be ready to go: if we move the ConstantInt check down, all of
these folds should do the right thing for vector types.

llvm-svn: 279150
2016-08-18 21:28:30 +00:00
Amaury Sechet 763c59dc9a Make cltz and cttz zero undef when the operand cannot be zero in InstCombine
Summary: Also add popcount(n) == bitsize(n)  -> n == -1 transformation.

Reviewers: majnemer, spatel

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D23134

llvm-svn: 279141
2016-08-18 20:43:50 +00:00
Sanjay Patel 40e8ca46ad [InstCombine] use m_APInt to allow icmp (trunc X, Y), C folds for splat constant vectors
This is a sibling of:
https://reviews.llvm.org/rL278859
https://reviews.llvm.org/rL278935
https://reviews.llvm.org/rL278945
https://reviews.llvm.org/rL279066
https://reviews.llvm.org/rL279077
https://reviews.llvm.org/rL279101

llvm-svn: 279133
2016-08-18 20:28:54 +00:00
Sanjay Patel 5f4ce4e23d [InstCombine] clean up foldICmpTruncConstant(); NFCI
1. Fix variable names
2. Add local variables to reduce code

llvm-svn: 279132
2016-08-18 20:25:16 +00:00
Matthew Simpson 11db6b6b8c [SLP] Initialize VectorizedValue when gathering
We abort building vectorizable trees in some cases (e.g., if the maximum
recursion depth is reached, if the region size is too large, etc.). If this
happens for a reduction, we can be left with a root entry that needs to be
gathered. For these cases, we need make sure we actually set VectorizedValue to
the resulting vector.

This patch ensures we properly set VectorizedValue, and it also ensures the
insertelement sequence generated for the gathers is inserted at the correct
location.

Reference: https://llvm.org/bugs/show_bug.cgi?id=28330
Differential Revison: https://reviews.llvm.org/D23410

llvm-svn: 279125
2016-08-18 19:50:32 +00:00
Sanjay Patel fa5ca2bf46 [InstCombine] use m_APInt to allow icmp (udiv X, Y), C folds for splat constant vectors
This is a sibling of:
https://reviews.llvm.org/rL278859
https://reviews.llvm.org/rL278935
https://reviews.llvm.org/rL278945
https://reviews.llvm.org/rL279066
https://reviews.llvm.org/rL279077

llvm-svn: 279101
2016-08-18 17:55:59 +00:00
Sanjay Patel 12a4105647 [InstCombine] clean up foldICmpUDivConstant; NFC
1. Better variable names
2. Remove unnecessary check of ConstantInt

llvm-svn: 279094
2016-08-18 17:37:26 +00:00
Artur Pilipenko 615b820af6 CVP. Turn marking adds as no wrap (introduced by r278107) off by default
It causes a regression on our internal benchmark. Introduce cvp-dont-process flag and set it off by default while investigating the regression.

llvm-svn: 279082
2016-08-18 16:08:35 +00:00
Davide Italiano d1279df752 [IRCE] Switch over to LLVM_DUMP_METHOD. NFCI.
llvm-svn: 279079
2016-08-18 15:55:49 +00:00
Sanjay Patel 6347807f87 [InstCombine] use m_APInt to allow icmp (mul X, Y), C folds for splat constant vectors
This is a sibling of:
https://reviews.llvm.org/rL278859
https://reviews.llvm.org/rL278935
https://reviews.llvm.org/rL278945
https://reviews.llvm.org/rL279066

llvm-svn: 279077
2016-08-18 15:44:44 +00:00
Sanjay Patel 5b112845da [InstCombine] use APInt in isSignTest instead of ConstantInt; NFC
This will enable vector splat folding, but NFC until the callers
have their ConstantInt restrictions removed.

llvm-svn: 279072
2016-08-18 14:59:14 +00:00
Sanjay Patel 4c5e60d95c [InstCombine] use m_APInt to allow icmp (xor X, Y), C folds for splat constant vectors
This is a sibling of:
https://reviews.llvm.org/rL278859
https://reviews.llvm.org/rL278935
https://reviews.llvm.org/rL278945

llvm-svn: 279066
2016-08-18 14:10:48 +00:00
Kostya Serebryany 524c3f32e7 [sanitizer-coverage/libFuzzer] instrument comparisons with __sanitizer_cov_trace_cmp[1248] instead of __sanitizer_cov_trace_cmp, don't pass the comparison type to save a bit performance. Use these new callbacks in libFuzzer
llvm-svn: 279027
2016-08-18 01:25:28 +00:00
Vitaly Buka d5ec14989d [asan] Add support of lifetime poisoning into ComputeASanStackFrameLayout
Summary:
We are going to combine poisoning of red zones and scope poisoning.

PR27453

Reviewers: kcc, eugenis

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D23623

llvm-svn: 279020
2016-08-18 00:56:58 +00:00
Haicheng Wu e787763275 [LoopUnroll] Move a simple check earlier. NFC.
Move the check of CallInst earlier to skip expensive recursive operations.

Differential Revision: https://reviews.llvm.org/D23611

llvm-svn: 278998
2016-08-17 22:42:58 +00:00
Tim Shen 5c0c063ad5 [LV] Move LoopBodyTraits to a better place, and add comment for simplifying LoopBlocksTraversal. NFC.
Summary: I later (after r278573) found that LoopIterator.h has some overlapping with LoopBodyTraits. It's good to use LoopBodyTraits because a *Traits struct is algorithm independent.

Reviewers: anemet, nadav, mkuper

Subscribers: mzolotukhin, llvm-commits

Differential Revision: https://reviews.llvm.org/D23529

llvm-svn: 278996
2016-08-17 22:20:07 +00:00
Justin Bogner cd1d5aaf2e Replace a few more "fall through" comments with LLVM_FALLTHROUGH
Follow up to r278902. I had missed "fall through", with a space.

llvm-svn: 278970
2016-08-17 20:30:52 +00:00
Sanjay Patel daffec91ef [InstCombine] more clean up of foldICmpXorConstant(); NFCI
Use m_APInt for the xor constant, but this is all still guarded by the initial
ConstantInt check, so no vector types should make it in here.

llvm-svn: 278957
2016-08-17 19:45:18 +00:00
Sanjay Patel 6d5f448746 [InstCombine] clean up foldICmpXorConstant(); NFCI
1. Change variable names
2. Use local variables to reduce code
3. Early exit to reduce indent

llvm-svn: 278955
2016-08-17 19:23:42 +00:00
Sanjay Patel 63e14a07e8 [InstCombine] use m_APInt to allow icmp (or X, Y), C folds for splat constant vectors
This is a sibling of:
https://reviews.llvm.org/rL278859
https://reviews.llvm.org/rL278935

llvm-svn: 278945
2016-08-17 16:38:57 +00:00
Sanjay Patel 943e92efde [InstCombine] clean up foldICmpOrConstant(); NFCI
1. Change variable names
2. Use local variables to reduce code
3. Use ? instead of if/else
4. Use the APInt variable instead of 'RHS' so the removal of the FIXME code will be direct

llvm-svn: 278944
2016-08-17 16:30:43 +00:00
Chad Rosier ea7e4647db Revert "Reassociate: Reprocess RedoInsts after each inst".
This reverts commit r258830, which introduced a bug described in PR28367.

PR28367

llvm-svn: 278938
2016-08-17 15:54:39 +00:00
Sanjay Patel 4f7eb2aa95 [InstCombine] use m_APInt to allow icmp (add X, Y), C folds for splat constant vectors
This is a sibling of:
https://reviews.llvm.org/rL278859

llvm-svn: 278935
2016-08-17 15:24:30 +00:00
Chad Rosier a6822f64f3 Revert "[Reassociate] Avoid iterator invalidation when negating value."
This reverts commit r278928 due to lit test failures.

llvm-svn: 278929
2016-08-17 14:31:34 +00:00
Chad Rosier cf3e8121a6 [Reassociate] Avoid iterator invalidation when negating value.
Differential Revision: https://reviews.llvm.org/D23464
PR28367

llvm-svn: 278928
2016-08-17 14:16:45 +00:00
Jonas Paulsson 7a79422536 [LoopStrenghtReduce] Refactoring and addition of a new target cost function.
Refactored so that a LSRUse owns its fixups, as oppsed to letting the
LSRInstance own them. This makes it easier to rate formulas for
LSRUses, since the fixups are available directly. The Offsets vector
has been removed since it was no longer necessary.

New target hook isFoldableMemAccessOffset(), which is used during formula
rating.

For SystemZ, this is useful to express that loads and stores with
float or vector types with a big/negative offset should be avoided in
loops. Without this, LSR will generate a lot of negative offsets that
would require extra instructions for loading the address.

Updated tests:
test/CodeGen/SystemZ/loop-01.ll

Reviewed by: Quentin Colombet and Ulrich Weigand.
https://reviews.llvm.org/D19152

llvm-svn: 278927
2016-08-17 13:24:19 +00:00
Justin Bogner b03fd12cef Replace "fallthrough" comments with LLVM_FALLTHROUGH
This is a mechanical change of comments in switches like fallthrough,
fall-through, or fall-thru to use the LLVM_FALLTHROUGH macro instead.

llvm-svn: 278902
2016-08-17 05:10:15 +00:00
Chandler Carruth 67fc52f067 [PM] Port the always inliner to the new pass manager in a much more
minimal and boring form than the old pass manager's version.

This pass does the very minimal amount of work necessary to inline
functions declared as always-inline. It doesn't support a wide array of
things that the legacy pass manager did support, but is alse ... about
20 lines of code. So it has that going for it. Notably things this
doesn't support:

- Array alloca merging
  - To support the above, bottom-up inlining with careful history
    tracking and call graph updates
- DCE of the functions that become dead after this inlining.
- Inlining through call instructions with the always_inline attribute.
  Instead, it focuses on inlining functions with that attribute.

The first I've omitted because I'm hoping to just turn it off for the
primary pass manager. If that doesn't pan out, I can add it here but it
will be reasonably expensive to do so.

The second should really be handled by running global-dce after the
inliner. I don't want to re-implement the non-trivial logic necessary to
do comdat-correct DCE of functions. This means the -O0 pipeline will
have to be at least 'always-inline,global-dce', but that seems
reasonable to me. If others are seriously worried about this I'd like to
hear about it and understand why. Again, this is all solveable by
factoring that logic into a utility and calling it here, but I'd like to
wait to do that until there is a clear reason why the existing
pass-based factoring won't work.

The final point is a serious one. I can fairly easily add support for
this, but it seems both costly and a confusing construct for the use
case of the always inliner running at -O0. This attribute can of course
still impact the normal inliner easily (although I find that
a questionable re-use of the same attribute). I've started a discussion
to sort out what semantics we want here and based on that can figure out
if it makes sense ta have this complexity at O0 or not.

One other advantage of this design is that it should be quite a bit
faster due to checking for whether the function is a viable candidate
for inlining exactly once per function instead of doing it for each call
site.

Anyways, hopefully a reasonable starting point for this pass.

Differential Revision: https://reviews.llvm.org/D23299

llvm-svn: 278896
2016-08-17 02:56:20 +00:00
Chandler Carruth f702d8ecb6 [Inliner] Add a flag to disable manual alloca merging in the Inliner.
This is off for now while testing can take place to make sure that in
fact we do sufficient stack coloring to fully obviate the manual alloca
array merging.

Some context on why we should be using stack coloring rather than
merging allocas in this way:

LLVM relies very heavily on analyzing pointers as coming from different
allocas in order to make aliasing decisions. These are some of the most
powerful aliasing signals available in LLVM. So merging allocas is an
extremely destructive operation on the LLVM IR -- it takes away highly
valuable and hard to reconstruct information.

As a consequence, inlined functions which happen to have array allocas
that this pattern matches will fail to be properly interleaved unless
SROA manages to hoist everything to an SSA register. Instead, the
inliner will have added an unnecessary dependence that one inlined
function execute after the other because they will have been rewritten
to refer to the same memory.

All that said, folks will reasonably want some time to experiment here
and make sure there are no significant regressions. A flag should give
us an easy knob to test.

For more context, see the thread here:
http://lists.llvm.org/pipermail/llvm-dev/2016-July/103277.html
http://lists.llvm.org/pipermail/llvm-dev/2016-August/103285.html

Differential Revision: https://reviews.llvm.org/D23052

llvm-svn: 278892
2016-08-17 02:40:23 +00:00
Duncan P. N. Exon Smith 362d120488 Scalar: Avoid dereferencing end() in IndVarSimplify
IndVarSimplify::sinkUnusedInvariants calls
BasicBlock::getFirstInsertionPt on the ExitBlock and moves instructions
before it.  This can return end(), so it's not safe to dereference.  Add
an iterator-based overload to Instruction::moveBefore to avoid the UB.

llvm-svn: 278886
2016-08-17 01:54:41 +00:00
Duncan P. N. Exon Smith 9e3edad932 IPO: Swap || operands to avoid dereferencing end()
IsOperandBundleUse conveniently indicates  whether
std::next(F->arg_begin(),UseIndex) will get to (or past) end().  Check
it first to avoid dereferencing end().

llvm-svn: 278884
2016-08-17 01:23:58 +00:00
Duncan P. N. Exon Smith 3bcaa81204 Scalar: Avoid dereferencing end() in InductiveRangeCheckElimination
BasicBlock::Create isn't designed to take iterators (which might be
end()), but pointers (which might be nullptr).  Fix the UB that was
converting end() to a BasicBlock* by calling BasicBlock::getNextNode()
in the first place.

llvm-svn: 278883
2016-08-17 01:16:17 +00:00
Duncan P. N. Exon Smith 0a12729f99 SimplifyCFG: Avoid dereferencing end()
When comparing a User* to a BasicBlock::iterator in
passingValueIsAlwaysUndefined, don't dereference the iterator in case it
is end().

llvm-svn: 278872
2016-08-16 23:57:56 +00:00
Sanjay Patel 60ea1b43d6 [InstCombine] clean up foldICmpAddConstant(); NFCI
1. Fix variable names
2. Add local variables to reduce code
3. Fix code comments
4. Add early exit to reduce indentation
5. Remove 'else' after if -> return
6. Hoist common predicate

llvm-svn: 278864
2016-08-16 22:34:42 +00:00
David Majnemer 744a8753db Preserve the assumption cache more often
We were clearing it out in LoopUnswitch and InlineFunction instead of
attempting to preserve it.

llvm-svn: 278860
2016-08-16 22:07:32 +00:00
Sanjay Patel e47df1ac62 [InstCombine] use m_APInt to allow icmp (sub X, Y), C folds for splat constant vectors
llvm-svn: 278859
2016-08-16 21:53:19 +00:00
Sanjay Patel b9aa67bfcf [InstCombine] fix variable names to match formula comments; NFC
llvm-svn: 278855
2016-08-16 21:26:10 +00:00
David Majnemer 110522bc0f [LoopUnroll] Don't clear out the AssumptionCache on each loop
Clearing out the AssumptionCache can cause us to rescan the entire
function for assumes.  If there are many loops, then we are scanning
over the entire function many times.

Instead of clearing out the AssumptionCache, register all cloned
assumes.

llvm-svn: 278854
2016-08-16 21:09:46 +00:00
Gor Nishanov 74309fa014 [Coroutines] Part 7: Split coroutine into subfunctions
Summary:
This patch adds simple coroutine splitting logic to CoroSplit pass.

Documentation and overview is here: http://llvm.org/docs/Coroutines.html.

Upstreaming sequence (rough plan)
1.Add documentation. (https://reviews.llvm.org/D22603)
2.Add coroutine intrinsics. (https://reviews.llvm.org/D22659)
...
7. Split coroutine into subfunctions <= we are here
8. Coroutine Frame Building algorithm
9. Handle coroutine with unwinds
10+. The rest of the logic

Reviewers: majnemer

Subscribers: llvm-commits, mehdi_amini

Differential Revision: https://reviews.llvm.org/D23461

llvm-svn: 278830
2016-08-16 18:04:14 +00:00
Sanjay Patel a3f4f0828b [InstCombine] add helper functions for foldICmpWithConstant; NFCI
Besides breaking up a 700 line function to improve readability,
this sinks the 'FIXME: ConstantInt' check into each helper. So 
now we can independently break that restriction within any of the
helper functions.

As much as possible, the code was only {cut/paste/clang-format}'ed 
to minimize risk (no functional changes intended), so several more
readability improvements are still possible. 

llvm-svn: 278828
2016-08-16 17:54:36 +00:00
Vitaly Buka 1ce73ef11c [Asan] Unpoison red zones even if use-after-scope was disabled with runtime flag
Summary: PR27453

Reviewers: eugenis

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D23481

llvm-svn: 278818
2016-08-16 16:24:10 +00:00
Sanjay Patel 1e5b2d1611 [InstCombine] use m_APInt in foldICmpWithConstant; NFCI
There's some formatting and pointer deref ugliness here that I intend to fix in
subsequent patches. The overall goal is to refactor the obnoxiously long switch
and incrementally remove the restriction to scalar types (allow folds for vector
splats). This patch introduces the use of m_APInt which means the RHSV reference
is now a pointer (and may have matched a vector splat), but the check of 'RHS' 
remains, so vector folds are disallowed and no functional change is intended.

llvm-svn: 278816
2016-08-16 16:08:11 +00:00
David Callahan 947be0fa66 [ADCE] Modify data structures to support removing control flow
Summary:
This is part of a serious of patches to evolve ADCE.cpp to support
removing of unnecessary control flow.

This patch changes the data structures to hold liveness information to
support the additional information we will eventually need. In
particular we now have a notion of basic blocks being live because
they contain a live operations. This will eventually feed into control
dependence analysis of which branches are live. We cater to getting
from instructions to associated block information and from blocks to
information about their terminators.

This patch also changes the structure of the main loop of the
algorithm so that it alternates propagating liveness between
instructions and usign control dependence information to mark branches
live.

We force all terminators live for now until we add code to handlinge
removing control flow in a later patch.

No changes to effective behavior with this patch

Previous patches:

D23065 [ADCE] Refactor anticipating new functionality (NFC)
D23102 [ADCE] Refactoring for new functionality (NFC)

Reviewers: nadav, majnemer, mehdi_amini

Subscribers: freik, twoh, llvm-commits

Differential Revision: https://reviews.llvm.org/D23225

llvm-svn: 278807
2016-08-16 14:31:51 +00:00
Sagar Thakur e311740bde [MemorySanitizer] [MIPS] Changed memory mapping to support pie executable.
Reviewed by eugenis
Differential: D22994

llvm-svn: 278795
2016-08-16 12:55:38 +00:00
Mehdi Amini 88c491ddec FunctionImport: missed one occurence of ImportListForModule to rename (NFC)
llvm-svn: 278778
2016-08-16 05:49:12 +00:00
Mehdi Amini 9b490f10e1 FunctionImport: rename ImportsForModule to ImportList for consistency (NFC)
llvm-svn: 278777
2016-08-16 05:47:12 +00:00
Mehdi Amini cdbcbf7477 [LTO] Simplify APIs and constify (NFC)
Summary:
Multiple APIs were taking a StringMap for the ImportLists containing
the entries for for all the modules while operating on a single entry
for the current module. Instead we can pass the desired ModuleImport
directly. Also some of the APIs were not const, I believe just to be
able to use operator[] on the StringMap.

Reviewers: tejohnson

Subscribers: llvm-commits, mehdi_amini

Differential Revision: https://reviews.llvm.org/D23537

llvm-svn: 278776
2016-08-16 05:46:05 +00:00
Teresa Johnson 6107a4195d [ThinLTO] Remove functions resolved to available_externally from comdats
Summary:
thinLTOResolveWeakForLinkerModule needs to drop any preempted weak symbols
that were converted to available_externally from comdats, otherwise we
will get a verification failure (since available_externally is a
declaration for the linker, and no declarations can be in a comdat).

Reviewers: mehdi_amini

Subscribers: llvm-commits, mehdi_amini

Differential Revision: https://reviews.llvm.org/D23015

llvm-svn: 278739
2016-08-15 21:00:04 +00:00
Reid Kleckner 70a600b8bb Revert "[SimplifyCFG] Rewrite SinkThenElseCodeToEnd"
This reverts commit r278660.

It causes downstream assertion failure in InstCombine on shuffle
instructions. Comes up in __mm_swizzle_epi32.

llvm-svn: 278672
2016-08-15 15:42:31 +00:00
James Molloy 9a3c82f5cf [SimplifyCFG] Rewrite SinkThenElseCodeToEnd
The new version has several advantages:
  1) IMSHO it's more readable and neater
  2) It handles loads and stores properly
  3) It can handle any number of incoming blocks rather than just two. I'll be taking advantage of this in a followup patch.

With this change we can now finally sink load-modify-store idioms such as:

    if (a)
      return *b += 3;
    else
      return *b += 4;

    =>

    %z = load i32, i32* %y
    %.sink = select i1 %a, i32 5, i32 7
    %b = add i32 %z, %.sink
    store i32 %b, i32* %y
    ret i32 %b

When this works for switches it'll be even more powerful.

llvm-svn: 278660
2016-08-15 08:04:56 +00:00
James Molloy 196ad0823e [LSR] Don't try and create post-inc expressions on non-rotated loops
If a loop is not rotated (for example when optimizing for size), the latch is not the backedge. If we promote an expression to post-inc form, we not only increase register pressure and add a COPY for that IV expression but for all IVs!

Motivating testcase:

    void f(float *a, float *b, float *c, int n) {
      while (n-- > 0)
        *c++ = *a++ + *b++;
    }

It's imperative that the pointer increments be located in the latch block and not the header block; if not, we cannot use post-increment loads and stores and we have to keep both the post-inc and pre-inc values around until the end of the latch which bloats register usage.

llvm-svn: 278658
2016-08-15 07:53:03 +00:00
Sanjoy Das 35459f0e34 [IRCE] Change variable grouping; NFC
llvm-svn: 278619
2016-08-14 01:04:50 +00:00