Commit Graph

7662 Commits

Author SHA1 Message Date
Mehdi Amini 55b06538b5 Fix test after renaming -name-anon-functions pass to -name-anon-globals
llvm-svn: 281752
2016-09-16 17:18:16 +00:00
Mehdi Amini 2cac787919 Fix NameAnonFunctions pass: for ThinLTO we need to rename global variables as well
A follow-up patch will rename this pass and the source file accordingly,
but I figured the non-NFC change will be easier to spot in isolation.

Differential Revision: https://reviews.llvm.org/D24641

llvm-svn: 281744
2016-09-16 16:56:25 +00:00
David Majnemer 08566532ae Add a test for r280191
llvm-svn: 281694
2016-09-16 02:43:36 +00:00
Sanjay Patel af91d1f81e [InstCombine] allow icmp (shr/shl) folds for vectors
These 2 helper functions were already using APInt internally, so just
change the API and caller to allow folds for splats. The scalar
regression tests look quite thorough, so I just added a couple of
tests to prove that vectors are handled too.

These folds should be grouped with the other cmp+shift folds though.
That can be an NFC follow-up.

llvm-svn: 281663
2016-09-15 21:35:30 +00:00
Sanjay Patel d590db6eac regenerate checks
llvm-svn: 281655
2016-09-15 20:39:01 +00:00
Mehdi Amini d880309835 [GlobalOpt] Dead Eliminate declarations
GlobalOpt is already dead-code-eliminating global definitions. With
this change it also takes care of declarations.
Hopefully this should make it now a strict superset of GlobalDCE.
This is important for LTO/ThinLTO as we don't want the linker to see
"undefined reference" when it processes the input files: it could
prevent proper internalization (or even load an extra file from a
static archive, changing the behavior of the program!).

llvm-svn: 281653
2016-09-15 20:26:27 +00:00
David Majnemer 8b16da8744 [InstCombine] Do not RAUW a constant GEP
canRewriteGEPAsOffset expects to process instructions, not constants.

This fixes PR30342.

llvm-svn: 281650
2016-09-15 20:10:09 +00:00
Sanjay Patel 886a542e23 [InstCombine] allow icmp (sub nsw) folds for vectors
Also, clean up the code and comments for the existing folds in foldICmpSubConstant().

llvm-svn: 281631
2016-09-15 18:05:17 +00:00
Sanjay Patel 514068397e [InstCombine] add vector tests for icmp (sub nsw)
llvm-svn: 281630
2016-09-15 17:54:47 +00:00
Sanjay Patel 40c53ea933 [InstCombine] allow (icmp sgt smin(PosA, B), 0) fold for vectors
llvm-svn: 281624
2016-09-15 16:23:20 +00:00
Sanjay Patel 78a7c7d26a [InstCombine] add vector tests for icmp sgt smin
llvm-svn: 281623
2016-09-15 16:13:41 +00:00
Sanjay Patel e957f1fe2c [InstCombine] auto-generate checks
llvm-svn: 281621
2016-09-15 15:48:53 +00:00
Sanjay Patel 7577a3d799 [InstCombine] use m_APInt to allow icmp folds using known bits for splat constant vectors
llvm-svn: 281613
2016-09-15 14:15:47 +00:00
NAKAMURA Takumi 6a5bac48cf llvm/test/Transforms/CorrelatedValuePropagation/alloca.ll REQUIRES +Asserts.
llvm-svn: 281598
2016-09-15 09:45:31 +00:00
Wei Mi f160e345be Add some shortcuts in LazyValueInfo to reduce compile time of Correlated Value Propagation.
The patch is to partially fix PR10584. Correlated Value Propagation queries LVI
to check non-null for pointer params of each callsite. If we know the def of
param is an alloca instruction, we know it is non-null and can return early from
LVI. Similarly, CVP queries LVI to check whether pointer for each mem access is
constant. If the def of the pointer is an alloca instruction, we know it is not
a constant pointer. These shortcuts can reduce the cost of CVP significantly.

Differential Revision: https://reviews.llvm.org/D18066

llvm-svn: 281586
2016-09-15 06:28:34 +00:00
Sanjay Patel 9f036b5a97 [InstCombine] add vector tests for foldICmpUsingKnownBits()
llvm-svn: 281559
2016-09-14 23:15:11 +00:00
Matthew Simpson b25e87fca5 [LV] Process pointer IVs with PHINodes in collectLoopUniforms
This patch moves the processing of pointer induction variables in
collectLoopUniforms from the consecutive pointer phase of the analysis to the
phi node phase. Previously, if a pointer induction variable was used by both a
scalarized non-memory instruction as well as a vectorized memory instruction,
we would incorrectly identify the pointer as uniform. Pointer induction
variables should be treated the same as other phi nodes. That is, they are
uniform if all users of the induction variable and induction variable update
are uniform.

Differential Revision: https://reviews.llvm.org/D24511

llvm-svn: 281485
2016-09-14 14:47:40 +00:00
Andrea Di Biagio e8e1af3649 [InstCombine] Merged two test files and regenerated checks using update_test_checks.py. NFC.
llvm-svn: 281478
2016-09-14 14:18:21 +00:00
Akira Hatanaka dea090e6b2 [ObjCARC] Traverse chain downwards to replace uses of argument passed to
ObjC library call with call return.

ARC contraction tries to replace uses of an argument passed to an
objective-c library call with the call return value. For example, in the
following IR, it replaces uses of argument %9 and uses of the values
discovered traversing the chain upwards (%7 and %8) with the call return
%10, if they are dominated by the call to @objc_autoreleaseReturnValue.
This transformation enables code-gen to tail-call the call to
@objc_autoreleaseReturnValue, which is necessary to enable auto release
return value optimization.

%7 = tail call i8* @objc_loadWeakRetained(i8** %6)
%8 = bitcast i8* %7 to %0*
%9 = bitcast %0* %8 to i8*
%10 = tail call i8* @objc_autoreleaseReturnValue(i8* %9)
ret %0* %8

Since r276727, llvm started removing redundant bitcasts and as a result
started feeding the following IR to ARC contraction:

%7 = tail call i8* @objc_loadWeakRetained(i8** %6)
%8 = bitcast i8* %7 to %0*
%9 = tail call i8* @objc_autoreleaseReturnValue(i8* %7)
ret %0* %8

ARC contraction no longer does the optimization described above since it
only traverses the chain upwards and fails to recognize that the
function return can be replaced by the call return. This commit changes
ARC contraction to traverse the chain downwards too and replace uses of
bitcasts with the call return.

rdar://problem/28011339

Differential Revision: https://reviews.llvm.org/D24523

llvm-svn: 281419
2016-09-13 23:43:11 +00:00
Sanjay Patel ab40f9d0ef add tests for PR28672
I'm not sure if we actually want to transform all of these in InstCombine yet, 
so I'm not labeling these with FIXME.  

llvm-svn: 281386
2016-09-13 20:36:13 +00:00
Matt Arsenault e2e6cfee61 Reapply "InstCombine: Reduce trunc (shl x, K) width."
This reapplies r272987 with a fix for infinitely looping
when the truncated value is another shift of a constant.

llvm-svn: 281379
2016-09-13 19:43:57 +00:00
Andrea Di Biagio 7277afeec1 [ConstantFold] Improve the bitcast folding logic for constant vectors.
The constant folder didn't know how to always fold bitcasts of constant integer
vectors. In particular, it was unable to handle the case where a constant vector
had some undef elements, and the resulting (i.e. bitcasted) vector type had more
elements than the original vector type.

Example:
  %cast = bitcast <2 x i64><i64 undef, i64 2> to <4 x i32>

On a little endian target, %cast could have been folded to:
  <4 x i32><i32 undef, i32 undef, i32 2, i32 0>

This patch improves the folding logic by teaching how to correctly propagate
undef elements in the folded vector.

Differential Revision: https://reviews.llvm.org/D24301

llvm-svn: 281343
2016-09-13 14:50:47 +00:00
Andrea Di Biagio 3647a96a44 [InstSimplify] Add tests to show missed bitcast folding opportunities.
InstSimplify doesn't always know how to fold a bitcast of a constant vector.
In particular, the logic in InstSimplify doesn't know how to handle the case
where the constant vector in input contains some undef elements, and the
number of elements is smaller than the number of elements of the bitcast
vector type.

llvm-svn: 281332
2016-09-13 13:17:42 +00:00
Sam Parker 64781ed4bb Remove InstCombine test file
My previous commit should of removed a test file but I missed it.

llvm-svn: 281326
2016-09-13 12:33:06 +00:00
Sam Parker 214f7bf5cc Enable simplify libcalls for ARM PCS
Teach SimplifyLibcalls that in can treat functions annotated with
apcs, aapcs or aapcs_vfp like normal C functions if they only take
and return integer or pointer values, and the target is not iOS.

Differential Revision: https://reviews.llvm.org/D24453

llvm-svn: 281322
2016-09-13 12:10:14 +00:00
Peter Collingbourne d4135bbc30 DebugInfo: New metadata representation for global variables.
This patch reverses the edge from DIGlobalVariable to GlobalVariable.
This will allow us to more easily preserve debug info metadata when
manipulating global variables.

Fixes PR30362. A program for upgrading test cases is attached to that
bug.

Differential Revision: http://reviews.llvm.org/D20147

llvm-svn: 281284
2016-09-13 01:12:59 +00:00
Sanjay Patel ff00fae8e6 add more tests for PR30273
llvm-svn: 281270
2016-09-12 22:28:29 +00:00
Sanjay Patel dea26950a0 [InstCombine] add test for PR30327
llvm-svn: 281248
2016-09-12 19:50:08 +00:00
Sanjay Patel 2eea9a1d58 [InstCombine] regenerate checks
llvm-svn: 281247
2016-09-12 19:29:26 +00:00
Sanjay Patel f5887f1fbd [InstCombine] use m_APInt to allow icmp X, C folds for splat constant vectors
isSignBitCheck could be changed to take a pointer param to avoid the 'UnusedBit' ugliness.

llvm-svn: 281231
2016-09-12 16:25:41 +00:00
David Majnemer c83044d9bb [FunctionAttrs] Don't try to infer returned if it is already on an argument
Trying to infer the 'returned' attribute if an argument is already
'returned' can lead to verification failure: inference might determine
that a different argument is passed through which would result in two
different arguments marked as 'returned'.

This fixes PR30350.

llvm-svn: 281221
2016-09-12 16:04:59 +00:00
Sanjay Patel db400baa80 [InstCombine] add tests to show missing vector folds
llvm-svn: 281219
2016-09-12 15:51:42 +00:00
Sanjay Patel f9ca770225 [InstCombine] regenerate checks
llvm-svn: 281186
2016-09-12 00:12:56 +00:00
Sanjay Patel a2aabfcc17 [InstCombine] regenerate checks
llvm-svn: 281185
2016-09-12 00:08:33 +00:00
James Molloy 104370ab37 [SimplifyCFG] Be even more conservative in SinkThenElseCodeToEnd
This should *actually* fix PR30244. This cranks up the workaround for PR30188 so that we never sink loads or stores of allocas.

The idea is that these should be removed by SROA/Mem2Reg, and any movement of them may well confuse SROA or just cause unwanted code churn. It's not ideal that the midend should be crippled like this, but that unwanted churn can really cause significant regressions in important workloads (tsan).

llvm-svn: 281162
2016-09-11 09:00:03 +00:00
James Molloy 18d96e8fa5 [SimplifyCFG] Harden up the profitability heuristic for block splitting during sinking
Exposed by PR30244, we will split a block currently if we think we can sink at least one instruction. However this isn't right - the reason we split predecessors is so that we can sink instructions that otherwise couldn't be sunk because it isn't safe to do so - stores, for example.

So, change the heuristic to only split if it thinks it can sink at least one non-speculatable instruction.

Should fix PR30244.

llvm-svn: 281160
2016-09-11 08:07:30 +00:00
Justin Lebar 11a3204355 Add handling of !invariant.load to PropagateMetadata.
Summary:
This will let e.g. the load/store vectorizer propagate this metadata
appropriately.

Reviewers: arsenm

Subscribers: tra, jholewinski, hfinkel, mzolotukhin

Differential Revision: https://reviews.llvm.org/D23479

llvm-svn: 281153
2016-09-11 01:39:08 +00:00
Arnold Schwaighofer 5d335559b9 InstCombine: Don't combine loads/stores from swifterror to a new type
This generates invalid IR: the only users of swifterror can be call
arguments, loads, and stores.

rdar://28242257

llvm-svn: 281144
2016-09-10 18:14:57 +00:00
Arnold Schwaighofer c9277f40fd Inliner: Don't mark swifterror allocas with lifetime markers
This would create a bitcast use which fails the verifier: swifterror values may
only be used by loads, stores, and as function arguments.

rdar://28233244

llvm-svn: 281114
2016-09-09 22:40:27 +00:00
Matt Arsenault 950a82047b LSV: Fix incorrectly increasing alignment
If the unaligned access has a dynamic offset, it may be odd which
would make the adjusted alignment incorrect to use.

llvm-svn: 281110
2016-09-09 22:20:14 +00:00
Sanjay Patel 58109abe91 [InstCombine] use m_APInt to allow icmp ult X, C folds for splat constant vectors
llvm-svn: 281107
2016-09-09 21:59:37 +00:00
Dehao Chen 22ce5eb051 Do not widen load for different variable in GVN.
Summary:
Widening load in GVN is too early because it will block other optimizations like PRE, LICM.

https://llvm.org/bugs/show_bug.cgi?id=29110

The SPECCPU2006 benchmark impact of this patch:

Reference: o2_nopatch
(1): o2_patched

           Benchmark             Base:Reference   (1)  
-------------------------------------------------------
spec/2006/fp/C++/444.namd                  25.2  -0.08%
spec/2006/fp/C++/447.dealII               45.92  +1.05%
spec/2006/fp/C++/450.soplex                41.7  -0.26%
spec/2006/fp/C++/453.povray               35.65  +1.68%
spec/2006/fp/C/433.milc                   23.79  +0.42%
spec/2006/fp/C/470.lbm                    41.88  -1.12%
spec/2006/fp/C/482.sphinx3                47.94  +1.67%
spec/2006/int/C++/471.omnetpp             22.46  -0.36%
spec/2006/int/C++/473.astar               21.19  +0.24%
spec/2006/int/C++/483.xalancbmk           36.09  -0.11%
spec/2006/int/C/400.perlbench             33.28  +1.35%
spec/2006/int/C/401.bzip2                 22.76  -0.04%
spec/2006/int/C/403.gcc                   32.36  +0.12%
spec/2006/int/C/429.mcf                   41.04  -0.41%
spec/2006/int/C/445.gobmk                 26.94  +0.04%
spec/2006/int/C/456.hmmer                  24.5  -0.20%
spec/2006/int/C/458.sjeng                    28  -0.46%
spec/2006/int/C/462.libquantum            55.25  +0.27%
spec/2006/int/C/464.h264ref               45.87  +0.72%

geometric mean                                   +0.23%

For most benchmarks, it's a wash, but we do see stable improvements on some benchmarks, e.g. 447,453,482,400.

Reviewers: davidxl, hfinkel, dberlin, sanjoy, reames

Subscribers: gberry, junbuml

Differential Revision: https://reviews.llvm.org/D24096

llvm-svn: 281074
2016-09-09 18:42:35 +00:00
Sanjay Patel 6da0fb8c74 [InstCombine] add tests to show pattern matching failures due to commutation
I was looking to fix a bug in getComplexity(), and these cases showed up as 
obvious failures. I'm not sure how to find these in general though.

llvm-svn: 281055
2016-09-09 16:35:20 +00:00
Gor Nishanov faf36c2e0b [Coroutines] Part13: Handle single edge PHINodes across suspends
Summary:
If one of the uses of the value is a single edge PHINode, handle it.

Original:

    %val = something
    <suspend>
    %p = PHINode [%val]

After Spill + Part13:

    %val = something
    %slot = gep val.spill.slot
    store %val, %slot
    <suspend>
    %p = load %slot

Plus tiny fixes/changes:
   * use correct index for coro.free in CoroCleanup
   * fixup id parameter in coro.free to allow authoring coroutine in plain C with __builtins

Reviewers: majnemer

Subscribers: mehdi_amini, llvm-commits

Differential Revision: https://reviews.llvm.org/D24242

llvm-svn: 281020
2016-09-09 05:39:00 +00:00
Dehao Chen 87823f8e4d Remove debug info when hoisting instruction from then/else branch.
Summary: The hoisted instruction is executed speculatively. It could affect the debugging experience as user would see gdb go into code that may not be expected to execute. It will also affect sample profile accuracy by assigning incorrect frequency to source within then/else branch.

Reviewers: davidxl, dblaikie, chandlerc, kcc, echristo

Subscribers: mehdi_amini, probinson, eric_niebler, andreadb, llvm-commits

Differential Revision: https://reviews.llvm.org/D24164

llvm-svn: 280995
2016-09-08 21:53:33 +00:00
Sanjay Patel a4c6223319 [InstCombine] regenerate checks
llvm-svn: 280993
2016-09-08 21:40:21 +00:00
Matthew Simpson bfe5e1817b [LV] Ensure proper handling of multi-use case when collecting uniforms
The test case included in r280979 wasn't checking what it was supposed to be
checking for the predicated store case. Fixing the test revealed that the
multi-use case (when a pointer is used by both vectorized and scalarized memory
accesses) wasn't being handled properly. We can't skip over
non-consecutive-like pointers since they may have looked consecutive-like with
a different memory access.

llvm-svn: 280992
2016-09-08 21:38:26 +00:00
Sanjay Patel ed9fda01a3 [InstCombine] regenerate checks
llvm-svn: 280991
2016-09-08 21:32:21 +00:00
Matthew Simpson 408a3abcfe [LV] Don't mark pointers used by scalarized memory accesses uniform
Previously, all consecutive pointers were marked uniform after vectorization.
However, if a consecutive pointer is used by a memory access that is eventually
scalarized, the pointer won't remain uniform after all. An example is
predicated stores. Even though a predicated store may be consecutive, it will
still be scalarized, making it's pointer operand non-uniform.

This patch updates the logic in collectLoopUniforms to consider the cases where
a memory access may be scalarized. If a memory access may be scalarized, its
pointer operand is not marked uniform. The determination of whether a given
memory instruction will be scalarized or not has been moved into a common
function that is used by the vectorizer, cost model, and legality analysis.

Differential Revision: https://reviews.llvm.org/D24271

llvm-svn: 280979
2016-09-08 19:11:07 +00:00
Dehao Chen ebb715b119 Add unittest for r280760
llvm-svn: 280963
2016-09-08 16:53:40 +00:00
Simon Pilgrim 073472a2bf [InstCombine][X86] Regenerate masked memory op combine tests
llvm-svn: 280960
2016-09-08 16:32:37 +00:00
Simon Pilgrim cd7b2830b9 [InstCombine][X86] Regenerate vperm2f128/vperm2i128 combine tests
llvm-svn: 280959
2016-09-08 16:30:46 +00:00
Simon Pilgrim 3b1ecbe66c [InstCombine][X86] Regenerate insertps combine tests
llvm-svn: 280957
2016-09-08 16:15:21 +00:00
Michael Zolotukhin e72997a524 Revert "[LoopUnroll] Properly update loop-info when cloning prologues and epilogues."
This reverts commit r280901.

This caused a bunch of failures, reverting it until I investigate them.

llvm-svn: 280905
2016-09-08 03:51:30 +00:00
Michael Zolotukhin 5e0a20697e [LoopUnroll] Properly update loop-info when cloning prologues and epilogues.
Summary:
When cloning blocks for prologue/epilogue we need to replicate the loop
structure from the original loop. It wasn't a problem for the innermost
loops, but it led to an incorrect loop info when we unrolled a loop with
a child loop - in this case created prologue-loop had a child loop, but
loop info didn't reflect that.

This fixes PR28888.

Reviewers: chandlerc, sanjoy, hfinkel

Subscribers: llvm-commits, silvas

Differential Revision: https://reviews.llvm.org/D24203

llvm-svn: 280901
2016-09-08 01:52:26 +00:00
Sanjay Patel 9b40f98357 [InstCombine] use m_APInt to allow icmp (and (sh X, Y), C2), C1 folds for splat constant vectors
llvm-svn: 280873
2016-09-07 22:33:03 +00:00
Hal Finkel ac5803ba91 [SimplifyCFG] Don't try to create metadata-valued PHIs
We can't create metadata-valued PHIs; don't try to do so when sinking.

I created a test case for this using the @llvm.type.test intrinsic, because it
takes a metadata parameter and does not have severe side effects (thus
SimplifyCFG is willing to otherwise sink it).

Previously, running the test case would crash with:

  Invalid use of metadata!
    %.sink = select i1 %flag, metadata <...>, metadata <0x4e45dc0>
  LLVM ERROR: Broken function found, compilation aborted!

llvm-svn: 280866
2016-09-07 21:38:22 +00:00
Sanjay Patel def931e76a [InstCombine] allow icmp (and X, C2), C1 folds for splat constant vectors
This is a revert of r280676 which was a revert of r280637;
ie, this is r280637 again. It was speculatively reverted to
help debug buildbot failures.

llvm-svn: 280861
2016-09-07 20:50:44 +00:00
Justin Lebar 3a5f40c191 [LSV] Use the original loads' names for the extractelement instructions.
Summary:
LSV replaces multiple adjacent loads with one vectorized load and a
bunch of extractelement instructions.  This patch makes the
extractelement instructions' names match those of the original loads,
for (hopefully) improved readability.

Reviewers: asbirlea, tstellarAMD

Subscribers: arsenm, mzolotukhin

Differential Revision: https://reviews.llvm.org/D23748

llvm-svn: 280818
2016-09-07 15:49:48 +00:00
Andrea Di Biagio bdd576dbb0 Regenerate vector bitcast folding tests using update_test_checks.py.
Two tests have been merged together, regenerated and then moved to
a more appropriate directory. No functional change.

llvm-svn: 280814
2016-09-07 14:50:07 +00:00
Andrea Di Biagio f3fd316223 [InstCombine][SSE4a] Fix assertion failure in the insertq/insertqi combining logic.
This fixes a similar issue to the one already fixed by r280804
(revieved in D24256). Revision 280804 fixed the problem with unsafe dyn_casts
in the extrq/extrqi combining logic. However, it turns out that even the
insertq/insertqi logic was affected by the same problem.

llvm-svn: 280807
2016-09-07 12:47:53 +00:00
Andrea Di Biagio 8df5b9cf48 [InstCombine][SSE4a] Fix assertion failure caused by unsafe dyn_casts on the operands of extrq/extrqi intrinsic calls.
This patch fixes an assertion failure caused by unsafe dynamic casts on the
constant operands of sse4a intrinsic calls to extrq/extrqi

The combine logic that simplifies sse4a extrq/extrqi intrinsic calls currently
checks if the input operands are constants. Internally, that logic relies on
dyn_casts of values returned by calls to method Constant::getAggregateElement.
However, method getAggregateElemet may return nullptr if the constant element
cannot be retrieved. So, all the dyn_casts can potentially fail. This is what
happens for example if a constexpr value is passed in input to an extrq/extrqi
intrinsic call.

This patch fixes the problem by using a dyn_cast_or_null (instead of a simple
dyn_cast) on the result of each call to Constant::getAggregateElement.

Added reproducible test cases to x86-sse4a.ll.

Differential Revision: https://reviews.llvm.org/D24256

llvm-svn: 280804
2016-09-07 12:03:03 +00:00
James Molloy ec905a62ae [SimplifyCFG] Update workaround for PR30188 to also include loads
I should have realised this the first time around, but if we're avoiding sinking stores where the operands come from allocas so they don't create selects, we also have to do the same for loads because SROA will be just as defective looking at loads of selected addresses as stores.

Fixes PR30188 (again).

llvm-svn: 280792
2016-09-07 08:40:20 +00:00
James Molloy bf1837d9c9 [SimplifyCFG] Check PHI uses more accurately
PR30292 showed a case where our PHI checking wasn't correct. We were checking that all values were used by the same PHI before deciding to sink, but we weren't checking that the incoming values for that PHI were what we expected. As a result, we had to bail out after block splitting which caused us to never reach a steady state in SimplifyCFG.

Fixes PR30292.

llvm-svn: 280790
2016-09-07 08:15:54 +00:00
Adam Nemet c520822dbf [JumpThreading] Only write back branch-weight MDs for blocks that originally had PGO info
Currently the pass updates branch weights in the IR if the function has
any PGO info (entry frequency is set).  However we could still have
regions of the CFG that does not have branch weights collected (e.g. a
cold region).  In this case we'd use static estimates.  Since static
estimates for branches are determined independently, they are
inconsistent.  Updating them can "randomly" inflate block frequencies.

I've run into this in a completely cold loop of h264ref from
SPEC.  -Rpass-with-hotness showed the loop to be completely cold during
inlining (before JT) but completely hot during vectorization (after JT).

The new testcase demonstrate the problem.  We check array elements
against 1, 2 and 3 in a loop.  The check against 3 is the loop-exiting
check.  The block names should be self-explanatory.

In this example, jump threading incorrectly updates the weight of the
loop-exiting branch to 0, drastically inflating the frequency of the
loop (in the range of billions).

There is no run-time profile info for edges inside the loop, so branch
probabilities are estimated.  These are the resulting branch and block
frequencies for the loop body:

                check_1 (16)
            (8) /  |
            eq_1   | (8)
                \  |
                check_2 (16)
            (8) /  |
            eq_2   | (8)
                \  |
                check_3 (16)
            (1) /  |
       (loop exit) | (15)
                   |
              (back edge)

First we thread eq_1 -> check_2 to check_3.  Frequencies are updated to
remove the frequency of eq_1 from check_2 and then from the false edge
leaving check_2.  Changed frequencies are highlighted with * *:

                check_1 (16)
            (8) /  |
           eq_1~   | (8)
           /       |
          /     check_2 (*8*)
         /  (8) /  |
         \  eq_2   | (*0*)
          \     \  |
           ` --- check_3 (16)
            (1) /  |
       (loop exit) | (15)
                   |
              (back edge)

Next we thread eq_1 -> check_3 and eq_2 -> check_3 to check_1 as new
back edges.  Frequencies are updated to remove the frequency of eq_1 and
eq_3 from check_3 and then the false edge leaving check_3 (changed
frequencies are highlighted with * *):

                  check_1 (16)
              (8) /  |
             eq_1~   | (8)
             /       |
            /     check_2 (*8*)
           /  (8) /  |
          /-- eq_2~  | (*0*)
  (back edge)        |
                  check_3 (*0*)
            (*0*) /  |
         (loop exit) | (*0*)
                     |
                (back edge)

As a result, the loop exit edge ends up with 0 frequency which in turn makes
the loop header to have maximum frequency.

There are a few potential problems here:

1. The profile data seems odd.  There is a single profile sample of the
loop being entered.  On the other hand, there are no weights inside the
loop.

2. Based on static estimation we shouldn't set edges to "extreme"
values, i.e. extremely likely or unlikely.

3. We shouldn't create profile metadata that is calculated from static
estimation.  I am not sure what policy is but it seems to make sense to
treat profile metadata as something that is known to originate from
profiling.  Estimated probabilities should only be reflected in BPI/BFI.

Any one of these would probably fix the immediate problem.  I went for 3
because I think it's a good policy to have and added a FIXME about 2.

Differential Revision: https://reviews.llvm.org/D24118

llvm-svn: 280713
2016-09-06 16:08:33 +00:00
Sanjay Patel e341c919b0 fix FileCheck variables for test added with r280677
The script (utils/update_test_checks.py) seems to have problems 
with variable names that start with the same string. 

llvm-svn: 280679
2016-09-05 23:49:32 +00:00
Gor Nishanov ccabaca273 [Coroutines] Part12: Handle alloca address-taken
Summary:
Move early uses of spilled variables after CoroBegin.

For example, if a parameter had address taken, we may end up with the code
like:
        define @f(i32 %n) {
          %n.addr = alloca i32
          store %n, %n.addr
          ...
          call @coro.begin

This patch fixes the problem by moving uses of spilled variables after CoroBegin.

Reviewers: majnemer

Subscribers: mehdi_amini, llvm-commits

Differential Revision: https://reviews.llvm.org/D24234

llvm-svn: 280678
2016-09-05 23:45:45 +00:00
Sanjay Patel eea2ef7862 [InstCombine] don't assert that division-by-constant has been folded (PR30281)
This is effectively a revert of:
https://reviews.llvm.org/rL280115

And this should fix
https://llvm.org/bugs/show_bug.cgi?id=30281:

llvm-svn: 280677
2016-09-05 23:38:22 +00:00
Sanjay Patel 46f9df5b71 [InstCombine] revert r280637 because it causes test failures on an ARM bot
http://lab.llvm.org:8011/builders/clang-cmake-armv7-a15/builds/14952/steps/ninja%20check%201/logs/FAIL%3A%20LLVM%3A%3Aicmp.ll

llvm-svn: 280676
2016-09-05 22:36:32 +00:00
Oliver Stannard ef38d53a7e [SimplifyCFG] Add test for sinking inline asm in if/else
This test code previously caused a failure in the module verifier,
because SimplifyCFG created this invalid instruction, which tries to
take the address of inline asm:
  %.sink = select i1 %1, i64 ()* asm "mov $0, #1", "=r", i64 ()* asm %"mov $0, #2", "=r"

This has been fixed recently, presumably by James Molloy's patches that
re-wrote and changed parts of SimplifyCFG, so this patch just adds a
regression test for it.

Differential Revision: https://reviews.llvm.org/D24231

llvm-svn: 280660
2016-09-05 13:49:26 +00:00
Gor Nishanov 0e18f75a92 [Coroutines] Part11: Add final suspend handling.
Summary:
A frontend may designate a particular suspend to be final, by setting the second argument of the coro.suspend intrinsic to true. Such a suspend point has two properties:

* it is possible to check whether a suspended coroutine is at the final suspend point via coro.done intrinsic;
* a resumption of a coroutine stopped at the final suspend point leads to undefined behavior. The only possible action for a coroutine at a final suspend point is destroying it via coro.destroy intrinsic.

This patch adds final suspend handling logic to CoroEarly and CoroSplit passes.
Now, the final suspend point example from docs\Coroutines.rst compiles and produces expected result (see test/Transform/Coroutines/ex5.ll).

Reviewers: majnemer

Subscribers: mehdi_amini, llvm-commits

Differential Revision: https://reviews.llvm.org/D24068

llvm-svn: 280646
2016-09-05 04:44:30 +00:00
Sanjay Patel c641e9d6ff [InstCombine] allow icmp (and X, C2), C1 folds for splat constant vectors
The code to calculate 'UsesRemoved' could be simplified.
As-is, that code is a victim of PR30273:
https://llvm.org/bugs/show_bug.cgi?id=30273

llvm-svn: 280637
2016-09-04 20:58:27 +00:00
Dorit Nuzman abd15f69b2 [InstCombine] Preserve llvm.mem.parallel_loop_access metadata when replacing
memcpy with ld/st.

When InstCombine replaces a memcpy with loads+stores it does not copy over the
llvm.mem.parallel_loop_access from the memcpy instruction. This patch fixes
that.

Differential Revision: https://reviews.llvm.org/D23499

llvm-svn: 280617
2016-09-04 07:49:39 +00:00
Joseph Tremoulet e92e0a9042 Fix inliner funclet unwind memoization
Summary:
The inliner may need to determine where a given funclet unwinds to,
and this determination may depend on other funclets throughout the
funclet tree.  The code that performs this walk in getUnwindDestToken
memoizes results to avoid redundant computations.  In the case that
a funclet's unwind destination is derived from its ancestor, there's
code to walk back down the tree from the ancestor updating the memo
map of its descendants to record the unwind destination.  This change
fixes that code to account for the case that some descendant has a
different unwind destination, which can happen if that unwind dest
is a descendant of the EHPad being queried and thus didn't determine
its unwind destination.

Also update test inline-funclets.ll, which is supposed to cover such
scenarios, to include a case that fails an assertion without this fix
but passes with it.

Fixes PR29151.


Reviewers: majnemer

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D24117

llvm-svn: 280610
2016-09-04 01:23:20 +00:00
Matt Arsenault 46a0382ab2 AMDGPU: Do basic folding of class intrinsic
This allows more of the OCML builtin library to be
constant folded.

llvm-svn: 280586
2016-09-03 07:06:58 +00:00
Wei Mi c37307a5f4 Fix buildbot error.
Add -mtriple=x86_64-unknown-linux-gnu for the test and move it to CodeGen/X86.

llvm-svn: 280568
2016-09-03 01:43:28 +00:00
Xinliang David Li 40afd5c9e4 [Profile] handle select instruction in 'expect' lowering
Builtin expect lowering currently ignores select. This patch
fixes the issue

Differential Revision: http://reviews.llvm.org/D24166

llvm-svn: 280547
2016-09-02 22:03:40 +00:00
Sanjay Patel 70277411d3 [InstCombine] auto-generate assertions for tighter checking
llvm-svn: 280531
2016-09-02 19:38:37 +00:00
Wei Mi c54d1298f5 Split the store of a wide value merged from an int-fp pair into multiple stores.
For the store of a wide value merged from a pair of values, especially int-fp pair,
sometimes it is more efficent to split it into separate narrow stores, which can
remove the bitwise instructions or sink them to colder places.

Now the feature is only enabled on x86 target, and only store of int-fp pair is
splitted. It is possible that the application scope gets extended with perf evidence
support in the future.

Differential Revision: https://reviews.llvm.org/D22840

llvm-svn: 280505
2016-09-02 17:17:04 +00:00
Sanjay Patel 521f19f249 [InsttCombine] fold insertelement of constant into shuffle with constant operand (PR29126)
The motivating case occurs with SSE/AVX scalar intrinsics, so this is a first step towards
shrinking that to a single shufflevector.

Note that the transform is intentionally limited to shuffles that are equivalent to vector
selects to avoid creating arbitrary shuffle masks that may not lower well.

This should solve PR29126:
https://llvm.org/bugs/show_bug.cgi?id=29126

Differential Revision: https://reviews.llvm.org/D23886

llvm-svn: 280504
2016-09-02 17:05:43 +00:00
Matthew Simpson b65c230eab [LV] Ensure reverse interleaved group GEPs remain uniform
For uniform instructions, we're only required to generate a scalar value for
the first vector lane of each unroll iteration. Thus, if we have a reverse
interleaved group, computing the member index off the scalar GEP corresponding
to the last vector lane of its pointer operand technically makes the GEP
non-uniform. We should compute the member index off the first scalar GEP
instead.

I've added the updated member index computation to the existing reverse
interleaved group test.

llvm-svn: 280497
2016-09-02 16:19:22 +00:00
Andrea Di Biagio 805815f407 [instsimplify] Fix incorrect folding of an ordered fcmp with a vector of all NaN.
This patch fixes a crash caused by an incorrect folding of an ordered comparison
between a packed floating point vector and a splat vector of NaN.

An ordered comparison between a vector and a constant vector of NaN, should
always be folded into a constant vector where each element is i1 false.

Since revision 266175, SimplifyFCmpInst folds the ordered fcmp into a scalar
'false'. Later on, this would cause an assertion failure, since the value type
of the folded value doesn't match the expected value type of the uses of the
original instruction: "Assertion failed: New->getType() == getType() &&
"replaceAllUses of value with new value of different type!".

This patch fixes the issue and adds a test case to the already existing test
InstSimplify/floating-point-compares.ll.

Differential Revision: https://reviews.llvm.org/D24143

llvm-svn: 280488
2016-09-02 14:47:43 +00:00
Alexey Bataev ed27d69037 [InstCombine] Add test for insertelementinsts with constants.
Added a tests that shows that several insertelementinsts with constant
indexes/data are not folded into a single shuffleinst.

llvm-svn: 280474
2016-09-02 09:00:53 +00:00
James Molloy f3cf2a494b [SimplifyCFG] Add a workaround to fix PR30188
We're sinking stores, which is a good thing, but in the process creating selects for the store address operand, which SROA/Mem2Reg can't look through, which caused serious regressions.

The real fix is in SROA, which I'll be looking into.

llvm-svn: 280470
2016-09-02 07:29:00 +00:00
NAKAMURA Takumi 9aec5c3797 llvm/test/Transforms/GCOVProfiling/three-element-mdnode.ll: Use %/T instead of %T, not to emit backslashes.
llvm-svn: 280451
2016-09-02 01:33:00 +00:00
Sanjay Patel 199a8e6a92 [InstCombine] add tests to show potential shuffle+insert folds
llvm-svn: 280403
2016-09-01 19:14:19 +00:00
Sanjay Patel dd861964d1 [InstCombine] remove fold of an icmp pattern that should never happen
While removing a scalar shackle from an icmp fold, I noticed that I couldn't find any tests to trigger
this code path.

The 'and' shrinking transform should be handled by InstCombiner::foldCastedBitwiseLogic()
or eliminated with InstSimplify. The icmp narrowing is part of InstCombiner::foldICmpWithCastAndCast().

Differential Revision: https://reviews.llvm.org/D24031 

llvm-svn: 280370
2016-09-01 14:20:43 +00:00
James Molloy 88cad7e5cf [SimplifyCFG] Handle tail-sinking of more than 2 incoming branches
This was a real restriction in the original version of SinkIfThenCodeToEnd. Now it's been rewritten, the restriction can be lifted.

As part of this, we handle a very common and useful case where one of the incoming branches is actually conditional. Consider:

   if (a)
     x(1);
   else if (b)
     x(2);

This produces the following CFG:

         [if]
        /    \
      [x(1)] [if]
        |     | \
        |     |  \
        |  [x(2)] |
         \    |  /
          [ end ]

[end] has two unconditional predecessor arcs and one conditional. The conditional refers to the implicit empty 'else' arc. This same pattern can also be caused by an empty default block in a switch.

We can't sink the call to x() down to end because no call to x() happens on the third incoming arc (assume that x() has sideeffects for the sake of argument; if something is safe to speculate we could indeed sink nevertheless but this cannot happen in the general case and causes many extra selects).

We are now able to detect this case and split off the unconditional arcs to a common successor:

         [if]
        /    \
      [x(1)] [if]
        |     | \
        |     |  \
        |  [x(2)] |
         \   /    |
     [sink.split] |
           \     /
           [ end ]

Now we can sink the call to x() into %sink.split. This can cause significant code simplification in many testcases.

llvm-svn: 280364
2016-09-01 12:58:13 +00:00
James Molloy eec6df3193 [SimplifyCFG] Change the algorithm in SinkThenElseCodeToEnd
r279460 rewrote this function to be able to handle more than two incoming edges and took pains to ensure this didn't regress anything.

This time we change the logic for determining if an instruction should be sunk. Previously we used a single pass greedy algorithm - sink instructions until one requires more than one PHI node or we run out of instructions to sink.

This had the problem that sinking instructions that had non-identical but trivially the same operands needed extra logic so we sunk them aggressively. For example:

    %a = load i32* %b          %d = load i32* %b
    %c = gep i32* %a, i32 0    %e = gep i32* %d, i32 1

Sinking %c and %e would naively require two PHI merges as %a != %d. But the loads are obviously equivalent (and maybe can't be hoisted because there is no common predecessor).

This is why we implemented the fairly complex function areValuesTriviallySame(), to look through trivial differences like this. However it's just not clever enough.

Instead, throw areValuesTriviallySame away, use pointer equality to check equivalence of operands and switch to a two-stage algorithm.

In the "scan" stage, we look at every sinkable instruction in isolation from end of block to front. If it's sinkable, we keep track of all operands that required PHI merging.

In the "sink" stage, we iteratively sink the last non-terminator in the source blocks. But when calculating how many PHIs are actually required to be inserted (to work out if we should stop or not) we remove any values that have already been sunk from the set of PHI-merges required, which allows us to be more aggressive.

This turns an algorithm with potentially recursive lookahead (looking through GEPs, casts, loads and any other instruction potentially not CSE'd) to two linear scans.

llvm-svn: 280351
2016-09-01 10:44:35 +00:00
Hal Finkel 40d7f5c277 Add a counter-function insertion pass
As discussed in https://reviews.llvm.org/D22666, our current mechanism to
support -pg profiling, where we insert calls to mcount(), or some similar
function, is fundamentally broken. We insert these calls in the frontend, which
means they get duplicated when inlining, and so the accumulated execution
counts for the inlined-into functions are wrong.

Because we don't want the presence of these functions to affect optimizaton,
they should be inserted in the backend. Here's a pass which would do just that.
The knowledge of the name of the counting function lives in the frontend, so
we're passing it here as a function attribute. Clang will be updated to use
this mechanism.

Differential Revision: https://reviews.llvm.org/D22825

llvm-svn: 280347
2016-09-01 09:42:39 +00:00
Nick Lewycky 97e49ac59e Add -fprofile-dir= to clang.
-fprofile-dir=path allows the user to specify where .gcda files should be
emitted when the program is run. In particular, this is the first flag that
causes the .gcno and .o files to have different paths, LLVM is extended to
support this. -fprofile-dir= does not change the file name in the .gcno (and
thus where lcov looks for the source) but it does change the name in the .gcda
(and thus where the runtime library writes the .gcda file). It's different from
a GCOV_PREFIX because a user can observe that the GCOV_PREFIX_STRIP will strip
paths off of -fprofile-dir= but not off of a supplied GCOV_PREFIX.

To implement this we split -coverage-file into -coverage-data-file and
-coverage-notes-file to specify the two different names. The !llvm.gcov
metadata node grows from a 2-element form {string coverage-file, node dbg.cu}
to 3-elements, {string coverage-notes-file, string coverage-data-file, node
dbg.cu}. In the 3-element form, the file name is already "mangled" with
.gcno/.gcda suffixes, while the 2-element form left that to the middle end
pass.

llvm-svn: 280306
2016-08-31 23:04:32 +00:00
Sanjay Patel 0d70831d73 [InstCombine] allow icmp (shr exact X, C2), C fold for splat constant vectors
The enhancement to foldICmpDivConstant ( http://llvm.org/viewvc/llvm-project?view=revision&revision=280299 )
allows us to remove the ConstantInt check; no other changes needed.

llvm-svn: 280300
2016-08-31 22:18:43 +00:00
Sanjay Patel 541aef4661 [InstCombine] allow icmp (div X, Y), C folds for splat constant vectors
Converting all of the overflow ops to APInt looked risky, so I've left that as a TODO.

llvm-svn: 280299
2016-08-31 21:57:21 +00:00
Geoff Berry 8d84605f25 [EarlyCSE] Optionally use MemorySSA. NFC.
Summary:
Use MemorySSA, if requested, to do less conservative memory dependency
checking.

This change doesn't enable the MemorySSA enhanced EarlyCSE in the
default pipelines, so should be NFC.

Reviewers: dberlin, sanjoy, reames, majnemer

Subscribers: mcrosier, llvm-commits

Differential Revision: http://reviews.llvm.org/D19821

llvm-svn: 280279
2016-08-31 19:24:10 +00:00
Geoff Berry 64f5ed172a [EarlyCSE] Allow forwarding a non-invariant load into an invariant load.
Reviewers: sanjoy

Subscribers: mcrosier, llvm-commits

Differential Revision: https://reviews.llvm.org/D23935

llvm-svn: 280265
2016-08-31 17:45:31 +00:00
Philip Reames 2b1084ac93 [statepoints][experimental] Add support for live-in semantics of values in deopt bundles
This is a first step towards supporting deopt value lowering and reporting entirely with the register allocator. I hope to build on this in the near future to support live-on-return semantics, but I have a use case which allows me to test and investigate code quality with just the live-in semantics so I've chosen to start there. For those curious, my use cases is our implementation of the "__llvm_deoptimize" function we bind to @llvm.deoptimize. I'm choosing not to hard code that fact in the patch and instead make it configurable via function attributes.

The basic approach here is modelled on what is done for the "Live In" values on stackmaps and patchpoints. (A secondary goal here is to remove one of the last barriers to merging the pseudo instructions.) We start by adding the operands directly to the STATEPOINT SDNode. Once we've lowered to MI, we extend the remat logic used by the register allocator to fold virtual register uses into StackMap::Indirect entries as needed. This does rely on the fact that the register allocator rematerializes. If it didn't along some code path, we could end up with more vregs than physical registers and fail to allocate.

Today, we *only* fold in the register allocator. This can create some weird effects when combined with arguments passed on the stack because we don't fold them appropriately. I have an idea how to fix that, but it needs this patch in place to work on that effectively. (There's some weird interaction with the scheduler as well, more investigation needed.)

My near term plan is to land this patch off-by-default, experiment in my local tree to identify any correctness issues and then start fixing codegen problems one by one as I find them. Once I have the live-in lowering fully working (both correctness and code quality), I'm hoping to move on to the live-on-return semantics. Note: I don't have any *known* miscompiles with this patch enabled, but I'm pretty sure I'll find at least a couple. Thus, the "experimental" tag and the fact it's off by default.

Differential Revision: https://reviews.llvm.org/D24000

llvm-svn: 280250
2016-08-31 15:12:17 +00:00
James Molloy cacfc16109 Revert "[SimplifyCFG] Change the algorithm in SinkThenElseCodeToEnd"
This reverts commit r280216 - it caused buildbot failures.

llvm-svn: 280234
2016-08-31 13:16:52 +00:00
James Molloy 76c9d423a7 Revert "[SimplifyCFG] Handle tail-sinking of more than 2 incoming branches"
This reverts commit r280217. r280216 caused buildbot failures - backing out the entire chain.

llvm-svn: 280233
2016-08-31 13:16:45 +00:00
James Molloy 06a45483a1 Revert "[SimplifyCFG] Add a workaround to fix PR30188"
This reverts commit r280219. r280216 caused buildbot failures - backing out the entire chain.

llvm-svn: 280232
2016-08-31 13:16:36 +00:00
James Molloy 8a66a39cbf Revert "[SimplifyCFG] Fix bootstrap failure after r280220"
This reverts commit r280228. r280216 caused buildbot failures - backing out the entire sequence.

llvm-svn: 280231
2016-08-31 13:16:30 +00:00
James Molloy b7efa6c227 [SimplifyCFG] Fix bootstrap failure after r280220
We check that a sinking candidate is used by only one PHI node during our legality checks. However for instructions that are used by other sinking candidates our heuristic is less conservative. This can result in a candidate actually being illegal when we come to sink it because of how we sunk a predecessor. Do the used-by-only-one-PHI checks again during sinking to ensure we don't crash.

llvm-svn: 280228
2016-08-31 12:33:48 +00:00
James Molloy 171fdac7ce [SimplifyCFG] Add a workaround to fix PR30188
We're sinking stores, which is a good thing, but in the process creating selects for the store address operand, which SROA/Mem2Reg can't look through, which caused serious regressions.

The real fix is in SROA, which I'll be looking into.

llvm-svn: 280219
2016-08-31 10:46:45 +00:00
James Molloy c53b40b509 [SimplifyCFG] Handle tail-sinking of more than 2 incoming branches
This was a real restriction in the original version of SinkIfThenCodeToEnd. Now it's been rewritten, the restriction can be lifted.

As part of this, we handle a very common and useful case where one of the incoming branches is actually conditional. Consider:

   if (a)
     x(1);
   else if (b)
     x(2);

This produces the following CFG:

         [if]
        /    \
      [x(1)] [if]
        |     | \
        |     |  \
        |  [x(2)] |
         \    |  /
          [ end ]

[end] has two unconditional predecessor arcs and one conditional. The conditional refers to the implicit empty 'else' arc. This same pattern can also be caused by an empty default block in a switch.

We can't sink the call to x() down to end because no call to x() happens on the third incoming arc (assume that x() has sideeffects for the sake of argument; if something is safe to speculate we could indeed sink nevertheless but this cannot happen in the general case and causes many extra selects).

We are now able to detect this case and split off the unconditional arcs to a common successor:

         [if]
        /    \
      [x(1)] [if]
        |     | \
        |     |  \
        |  [x(2)] |
         \   /    |
     [sink.split] |
           \     /
           [ end ]

Now we can sink the call to x() into %sink.split. This can cause significant code simplification in many testcases.

llvm-svn: 280217
2016-08-31 10:46:33 +00:00
James Molloy 55bd04cd20 [SimplifyCFG] Change the algorithm in SinkThenElseCodeToEnd
r279460 rewrote this function to be able to handle more than two incoming edges and took pains to ensure this didn't regress anything.

This time we change the logic for determining if an instruction should be sunk. Previously we used a single pass greedy algorithm - sink instructions until one requires more than one PHI node or we run out of instructions to sink.

This had the problem that sinking instructions that had non-identical but trivially the same operands needed extra logic so we sunk them aggressively. For example:

    %a = load i32* %b          %d = load i32* %b
    %c = gep i32* %a, i32 0    %e = gep i32* %d, i32 1

Sinking %c and %e would naively require two PHI merges as %a != %d. But the loads are obviously equivalent (and maybe can't be hoisted because there is no common predecessor).

This is why we implemented the fairly complex function areValuesTriviallySame(), to look through trivial differences like this. However it's just not clever enough.

Instead, throw areValuesTriviallySame away, use pointer equality to check equivalence of operands and switch to a two-stage algorithm.

In the "scan" stage, we look at every sinkable instruction in isolation from end of block to front. If it's sinkable, we keep track of all operands that required PHI merging.

In the "sink" stage, we iteratively sink the last non-terminator in the source blocks. But when calculating how many PHIs are actually required to be inserted (to work out if we should stop or not) we remove any values that have already been sunk from the set of PHI-merges required, which allows us to be more aggressive.

This turns an algorithm with potentially recursive lookahead (looking through GEPs, casts, loads and any other instruction potentially not CSE'd) to two linear scans.

llvm-svn: 280216
2016-08-31 10:46:23 +00:00
James Molloy 923e98c232 [SimplifyCFG] Tail-merge calls with sideeffects
This was deliberately disabled during my rewrite of SinkIfThenToEnd to keep behaviour
at least vaguely consistent with the previous version and keep it as close to NFC as
I could.

There's no real reason not to merge sideeffect calls though, so let's do it! Small fixup
along the way to ensure we don't create indirect calls.

Should fix PR28964.

llvm-svn: 280215
2016-08-31 10:46:16 +00:00
Gor Nishanov 50d7fb974f [Coroutines] Part 10: Add coroutine promise support.
Summary:
1) CoroEarly now lowers llvm.coro.promise intrinsic that allows to obtain
a coroutine promise pointer from a coroutine frame and vice versa.

2) CoroFrame now interprets Promise argument of llvm.coro.begin to
place CoroutinPromise alloca at a deterministic offset from the coroutine frame.

Now, the coroutine promise example from docs\Coroutines.rst compiles and produces expected result (see test/Transform/Coroutines/ex4.ll).

Reviewers: majnemer

Subscribers: llvm-commits, mehdi_amini

Differential Revision: https://reviews.llvm.org/D23993

llvm-svn: 280184
2016-08-31 00:35:41 +00:00
Alina Sbirlea 3f8f7840bf [LoadStoreVectorizer] Change VectorSet to Vector to match head and tail positions. Resolves PR29148.
Summary:
LSV was using two vector sets (heads and tails) to track pairs of adjiacent position to vectorize.
A recent optimization is trying to obtain the longest chain to vectorize and assumes the positions
in heads(H) and tails(T) match, which is not the case is there are multiple tails for the same head.

e.g.:
i1: store a[0]
i2: store a[1]
i3: store a[1]
Leads to:
H: i1
T: i2 i3
Instead of:
H: i1 i1
T: i2 i3
So the positions for instructions that follow i3 will have different indexes in H/T.
This patch resolves PR29148.

This issue also surfaced the fact that if the chain is too long, and TLI
returns a "not-fast" answer, the whole chain will be abandoned for
vectorization, even though a smaller one would be beneficial.
Added a testcase and FIXME for this.

Reviewers: tstellarAMD, arsenm, jlebar

Subscribers: mzolotukhin, wdng, llvm-commits

Differential Revision: https://reviews.llvm.org/D24057

llvm-svn: 280179
2016-08-30 23:53:59 +00:00
Sanjay Patel ddb53dd080 [InstCombine] add tests to show type limitations of InsertRangeTest and callers
llvm-svn: 280175
2016-08-30 23:16:59 +00:00
Michael Kuperstein 2954d1db77 [LoopVectorizer] Predicate instructions in blocks with several incoming edges
We don't need to limit predication to blocks that have a single incoming
edge, we just need to use the right mask.
This fixes PR30172.

Differential Revision: https://reviews.llvm.org/D24009

llvm-svn: 280148
2016-08-30 20:22:21 +00:00
Daniel Berlin c943d72d94 IntrArgMemOnly is only defined (and current AA machinery only sanely supports) pointer arguments, and these intrinsics have vector of pointer arguments. Remove ArgMemOnly until we either have the machinery, define a new attribute, or something similar
llvm-svn: 280143
2016-08-30 19:58:48 +00:00
Sanjay Patel b37145712e [InstCombine] replace divide-by-constant checks with asserts; NFC
These folds already have tests for scalar and vector types, except 
for the vector div-by-0 case, so I'm adding tests for that.

llvm-svn: 280115
2016-08-30 17:31:34 +00:00
James Molloy d13b1239e4 [SimplifyCFG] Properly CSE metadata in SinkThenElseCodeToEnd
This was missing, meaning the metadata in sunk instructions was potentially bogus and could cause miscompiles.

llvm-svn: 280072
2016-08-30 10:56:08 +00:00
Teresa Johnson 8c1bc986b5 [ThinLTO] Indirect call promotion fixes for promoted local functions
Summary:
Fix a couple issues limiting the application of indirect call promotion
in ThinLTO mode:
- Invoke indirect call promotion before globalopt, since it may
  eliminate imported functions which appear unreferenced.
- Invoke indirect call promotion with InLTO=true so that the PGOFuncName
  metadata is used to get the name for locals which would have been
  renamed during promotion.

Reviewers: davidxl, mehdi_amini

Subscribers: Prazek, llvm-commits, mehdi_amini

Differential Revision: https://reviews.llvm.org/D24004

llvm-svn: 280024
2016-08-29 22:46:56 +00:00
Matthew Simpson df19502b16 [LV] Move insertelement sequence after scalar definitions
After r279649 when getting a vector value from VectorLoopValueMap, we create an
insertelement sequence on-demand if the value has been scalarized instead of
vectorized. We previously inserted this insertelement sequence before the
value's first vector user. However, this insert location is problematic if that
user is the phi node of a first-order recurrence. With this patch, we move the
insertelement sequence after the last scalar instruction we created when
scalarizing the value. Thus, the value's vector definition in the new loop will
immediately follow its scalar definitions. This should fix PR30183.

Reference: https://llvm.org/bugs/show_bug.cgi?id=30183
llvm-svn: 280001
2016-08-29 20:14:04 +00:00
David Majnemer e8fd5f9ffd [SimplifyCFG] Hoisting invalidates metadata
We forgot to remove optimization metadata when performing hosting during
FoldTwoEntryPHINode.

This fixes PR29163.

llvm-svn: 279980
2016-08-29 17:14:08 +00:00
Anna Thomas 2bc129c5fd [StatepointsForGC] Rematerialize in the presence of PHIs
Summary:
While walking the use chain for identifying rematerializable values in RS4GC,
add the case where the current value and base value are the same PHI nodes.

This will aid rematerialization of geps and casts instead of relocating.

Reviewers: sanjoy, reames, igor

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D23920

llvm-svn: 279975
2016-08-29 15:41:59 +00:00
Sanjay Patel 25475bcc0c [Constant] remove fdiv and frem from canTrap()
Assuming the default FP env, we should not treat fdiv and frem any differently in terms of
trapping behavior than any other FP op. Ie, FP ops do not trap with the default FP env.

This matches how we treat the fdiv/frem in IR with isSafeToSpeculativelyExecute() and in 
the backend after:
https://reviews.llvm.org/rL279970

llvm-svn: 279973
2016-08-29 15:27:17 +00:00
Sanjay Patel 20f02b271b [SimplifyCFG] rename test file, regenerate checks, and add test
The fdiv test shows a problem similar to:
https://reviews.llvm.org/rL279970

llvm-svn: 279972
2016-08-29 14:57:53 +00:00
Gor Nishanov dce9b02677 [Coroutines] Part 9: Add cleanup subfunction.
Summary:
[Coroutines] Part 9: Add cleanup subfunction.

This patch completes coroutine heap allocation elision. Now, the heap elision example from docs\Coroutines.rst compiles and produces expected result (see test/Transform/Coroutines/ex3.ll)

Intrinsic Changes:
* coro.free gets a token parameter tying it to coro.id to allow reliably discovering all coro.frees associated with a particular coroutine.
* coro.id gets an extra parameter that points back to a coroutine function. This allows to check whether a coro.id describes the enclosing function or it belongs to a different function that was later inlined.

CoroSplit now creates three subfunctions:
# f$resume - resume logic
# f$destroy - cleanup logic, followed by a deallocation code
# f$cleanup - just the cleanup code

CoroElide pass during devirtualization replaces coro.destroy with either f$destroy or f$cleanup depending whether heap elision is performed or not.

Other fixes, improvements:
* Fixed buglet in Shape::buildFrame that was not creating coro.save properly if coroutine has more than one suspend point.

* Switched to using variable width suspend index field (no longer limited to 32 bit index field can be as little as i1 or as large as i<whatever-size_t-is>)

Reviewers: majnemer

Subscribers: llvm-commits, mehdi_amini

Differential Revision: https://reviews.llvm.org/D23844

llvm-svn: 279971
2016-08-29 14:34:12 +00:00
Sanjay Patel 5c5311f4e5 [InstCombine] use m_APInt to allow icmp (and X, Y), C folds for splat constant vectors
llvm-svn: 279937
2016-08-28 18:18:00 +00:00
Elena Demikhovsky 3622fbfc68 [Loop Vectorizer] Fixed memory confilict checks.
Fixed a bug in run-time checks for possible memory conflicts inside loop.
The bug is in Low <-> High boundaries calculation. The High boundary should be calculated as "last memory access pointer + element size".

Differential revision: https://reviews.llvm.org/D23176

llvm-svn: 279930
2016-08-28 08:53:53 +00:00
Adam Nemet cef3314156 [Inliner] Report when inlining fails because callee's def is unavailable
Summary:
This is obviously an interesting case because it may motivate code
restructuring or LTO.

Reporting this requires instantiation of ORE in the loop where the call
sites are first gathered.  I've checked compile-time
overhead *with* -Rpass-with-hotness and the worst slow-down was 6% in
mcf and quickly tailing off.  As before without -Rpass-with-hotness
there is no overhead.

Because this could be a pretty noisy diagnostics, it is currently
qualified as 'verbose'.  As of this patch, 'verbose' diagnostics are
only emitted with -Rpass-with-hotness, i.e. when the output is expected
to be filtered.

Reviewers: eraman, chandlerc, davidxl, hfinkel

Subscribers: tejohnson, Prazek, davide, llvm-commits

Differential Revision: https://reviews.llvm.org/D23415

llvm-svn: 279860
2016-08-26 20:21:05 +00:00
Tim Shen 3ad8b43cc2 [MemCpy] Add comments for r279769
Differential Revision: https://reviews.llvm.org/D23846

llvm-svn: 279778
2016-08-25 21:03:46 +00:00
Tim Shen a3dbead2d6 [MemCpy] Check for alias in performMemCpyToMemSetOptzn, instead of the identity of two operands
Summary:
This fixes pr29105. The reason is that lifetime marks creates new
aliasing pointers the original ones, but before this patch aliases
were not checked in performMemCpyToMemSetOptzn.

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D23846

llvm-svn: 279769
2016-08-25 19:27:26 +00:00
Sebastian Pop 5f0d0e60d1 GVN-hoist: fix hoistingFromAllPaths for loops (PR29034)
It is invalid to hoist stores or loads if they are not executed on all paths
from the hoisting point to the exit of the function. In the testcase, there are
paths in the loop that do not execute the stores or the loads, and so hoisting
them within the loop is unsafe.

The problem is that the current implementation of hoistingFromAllPaths is
incomplete: it walks all blocks dominated by the hoisting point, and does not
return false when the loop contains a path on which the hoisted ld/st is
not executed.

Differential Revision: https://reviews.llvm.org/D23843

llvm-svn: 279732
2016-08-25 11:55:47 +00:00
Xinliang David Li cad3a995a4 [Profile] Propagate branch metadata properly in instcombine
Differential Revision: http://reviews.llvm.org/D23590

llvm-svn: 279693
2016-08-25 00:26:32 +00:00
Sanjay Patel d398d4a39e [InstCombine] use m_APInt to allow icmp eq/ne (shr X, C2), C folds for splat constant vectors
llvm-svn: 279677
2016-08-24 22:22:06 +00:00
Matthew Simpson abd2be1e2e [LV] Unify vector and scalar maps
This patch unifies the data structures we use for mapping instructions from the
original loop to their corresponding instructions in the new loop. Previously,
we maintained two distinct maps for this purpose: WidenMap and ScalarIVMap.
WidenMap maintained the vector values each instruction from the old loop was
represented with, and ScalarIVMap maintained the scalar values each scalarized
induction variable was represented with. With this patch, all values created
for the new loop are maintained in VectorLoopValueMap.

The change allows for several simplifications. Previously, when an instruction
was scalarized, we had to insert the scalar values into vectors in order to
maintain the mapping in WidenMap. Then, if a user of the scalarized value was
also scalar, we had to extract the scalar values from the temporary vector we
created. We now aovid these unnecessary scalar-to-vector-to-scalar conversions.
If a scalarized value is used by a scalar instruction, the scalar value is used
directly. However, if the scalarized value is needed by a vector instruction,
we generate the needed insertelement instructions on-demand.

A common idiom in several locations in the code (including the scalarization
code), is to first get the vector values an instruction from the original loop
maps to, and then extract a particular scalar value. This patch adds
getScalarValue for this purpose along side getVectorValue as an interface into
VectorLoopValueMap. These functions work together to return the requested
values if they're available or to produce them if they're not.

The mapping has also be made less permissive. Entries can be added to
VectorLoopValue map with the new initVector and initScalar functions.
getVectorValue has been modified to return a constant reference to the mapped
entries.

There's no real functional change with this patch; however, in some cases we
will generate slightly different code. For example, instead of an insertelement
sequence following the definition of an instruction, it will now precede the
first use of that instruction. This can be seen in the test case changes.

Differential Revision: https://reviews.llvm.org/D23169

llvm-svn: 279649
2016-08-24 18:23:17 +00:00
Sanjoy Das ff855b6020 [SCCP] Don't delete side-effecting instructions
I'm not sure if the `!isa<CallInst>(Inst) &&
!isa<TerminatorInst>(Inst))` bit is correct either, but this fixes the
case we know is broken.

llvm-svn: 279647
2016-08-24 18:10:21 +00:00
Gil Rapaport 550148b2f6 [Loop Vectorizer] Support predication of div/rem
div/rem instructions in basic blocks that require predication currently prevent
vectorization. This patch extends the existing mechanism for predicating stores
to handle other instructions and leverages it to predicate divs and rems.

Differential Revision: https://reviews.llvm.org/D22918

llvm-svn: 279620
2016-08-24 11:37:57 +00:00
Gor Nishanov 241b041fba [Coroutines] Part 8: Coroutine Frame Building algorithm
Summary:
This patch adds coroutine frame building algorithm. Now, simple coroutines such as ex0.ll and ex1.ll (first examples from docs\Coroutines.rst can be compiled).

Documentation and overview is here: http://llvm.org/docs/Coroutines.html.

Upstreaming sequence (rough plan)
1.Add documentation. (https://reviews.llvm.org/D22603)
2.Add coroutine intrinsics. (https://reviews.llvm.org/D22659)
...

7. Split coroutine into subfunctions. (https://reviews.llvm.org/D23461)
8. Coroutine Frame Building algorithm  <= we are here
9. Add f.cleanup subfunction.
10+. The rest of the logic

Reviewers: majnemer

Subscribers: mehdi_amini, llvm-commits

Differential Revision: https://reviews.llvm.org/D23586

llvm-svn: 279609
2016-08-24 04:44:35 +00:00
Matthew Simpson df2ab917ad [SLP] Avoid signed integer overflow
The test case included with r279125 exposed an existing signed integer
overflow. Since getTreeCost can return INT_MAX, we can't sum this cost together
with other costs, such as getReductionCost.

This patch removes the possibility of assigning a cost of INT_MAX. Since we
were previously using INT_MAX as an indicator for "should not vectorize", we
now explicitly check this condition with "isTreeTinyAndNotFullyVectorizable"
before computing a cost.

This patch adds a run-line to the test case used for r279125 that ensures we
don't vectorize. Previously, this line would vectorize the test case by chance
due to undefined behavior in the cost calculation.

Differential Revision: https://reviews.llvm.org/D23723

llvm-svn: 279562
2016-08-23 20:48:50 +00:00
Sanjay Patel 6946e2ade3 [InstSimplify] allow icmp with constant folds for splat vectors, part 2
Completes the m_APInt changes for simplifyICmpWithConstant().

Other commits in this series:
https://reviews.llvm.org/rL279492
https://reviews.llvm.org/rL279530
https://reviews.llvm.org/rL279534
https://reviews.llvm.org/rL279538

llvm-svn: 279543
2016-08-23 18:00:51 +00:00
Sanjay Patel 200e3cbfb0 [InstSimplify] allow icmp with constant folds for splat vectors, part 1
llvm-svn: 279538
2016-08-23 17:30:56 +00:00
Sanjay Patel ada2bb3d5d [InstSimplify] add tests to show missing vector icmp folds
llvm-svn: 279534
2016-08-23 17:13:38 +00:00
Sanjay Patel 5c269d0b7a [InstSimplify] move icmp with constant tests to another file; NFC
...because like the corresponding code, this is just too big to keep adding to.
And the next step is to add a vector version of each of these tests to show
missed folds.

Also, auto-generate CHECK lines and add comments for the tests that correspond to
the source code.

llvm-svn: 279530
2016-08-23 16:46:53 +00:00
Sanjay Patel a392049419 [InstCombine] use m_APInt to allow icmp (shr exact X, Y), 0 folds for splat constant vectors
llvm-svn: 279472
2016-08-22 20:45:06 +00:00
James Molloy 5bf2114265 [SimplifyCFG] Rewrite SinkThenElseCodeToEnd
[Recommitting now an unrelated assertion in SROA is sorted out]

The new version has several advantages:
  1) IMSHO it's more readable and neater
  2) It handles loads and stores properly
  3) It can handle any number of incoming blocks rather than just two. I'll be taking advantage of this in a followup patch.

With this change we can now finally sink load-modify-store idioms such as:

    if (a)
      return *b += 3;
    else
      return *b += 4;

    =>

    %z = load i32, i32* %y
    %.sink = select i1 %a, i32 5, i32 7
    %b = add i32 %z, %.sink
    store i32 %b, i32* %y
    ret i32 %b

When this works for switches it'll be even more powerful.

Round 4. This time we should handle all instructions correctly, and not replace any operands that need to be constant with variables.

This was really hard to determine safely, so the helper function should be put into the Instruction API. I'll do that as a followup.

llvm-svn: 279460
2016-08-22 19:07:15 +00:00
Jun Bum Lim ec8b8cc595 [InstCombine] Allow sinking from unique predecessor with multiple edges
Summary: We can allow sinking if the single user block has only one unique predecessor, regardless of the number of edges. Note that a switch statement with multiple cases can have the same destination.

Reviewers: mcrosier, majnemer, spatel, reames

Subscribers: reames, mcrosier, llvm-commits

Differential Revision: https://reviews.llvm.org/D23722

llvm-svn: 279448
2016-08-22 18:21:56 +00:00
James Molloy 475f4a763f Revert "[SimplifyCFG] Rewrite SinkThenElseCodeToEnd"
This reverts commit r279443. It caused buildbot failures.

llvm-svn: 279447
2016-08-22 18:13:12 +00:00
James Molloy 353052698a [SimplifyCFG] Rewrite SinkThenElseCodeToEnd
The new version has several advantages:
  1) IMSHO it's more readable and neater
  2) It handles loads and stores properly
  3) It can handle any number of incoming blocks rather than just two. I'll be taking advantage of this in a followup patch.

With this change we can now finally sink load-modify-store idioms such as:

    if (a)
      return *b += 3;
    else
      return *b += 4;

    =>

    %z = load i32, i32* %y
    %.sink = select i1 %a, i32 5, i32 7
    %b = add i32 %z, %.sink
    store i32 %b, i32* %y
    ret i32 %b

When this works for switches it'll be even more powerful.

Round 4. This time we should handle all instructions correctly, and not replace any operands that need to be constant with variables.

This was really hard to determine safely, so the helper function should be put into the Instruction API. I'll do that as a followup.

llvm-svn: 279443
2016-08-22 17:40:23 +00:00
Artur Pilipenko bc76ecada0 Revert -r278267 [ValueTracking] An improvement to IR ValueTracking on Non-negative Integers
This change cause performance regression on MultiSource/Benchmarks/TSVC/Symbolics-flt/Symbolics-flt from LNT and some other bechmarks.

See https://reviews.llvm.org/D18777 for details.

llvm-svn: 279433
2016-08-22 13:14:07 +00:00
Artur Pilipenko b78ad9d41f Revert -r278269 [IndVarSimplify] Eliminate zext of a signed IV when the IV is known to be non-negative
This change needs to be reverted in order to revert -r278267 which cause performance regression on MultiSource/Benchmarks/TSVC/Symbolics-flt/Symbolics-flt from LNT and some other bechmarks.

See comments on https://reviews.llvm.org/D18777 for details.

llvm-svn: 279432
2016-08-22 13:12:07 +00:00
Balaram Makam a927aa4ad0 [PM] Port LoopDataPrefetch AArch64 tests to new pass manager
Reviewers: mcrosier, tejohnson

Subscribers: aemerson, rengolin, mcrosier, llvm-commits

Differential Revision: https://reviews.llvm.org/D23724

llvm-svn: 279431
2016-08-22 12:59:58 +00:00
Sanjay Patel 643d21a62c [InstCombine] use m_APInt to allow icmp (shl X, Y), C folds for splat constant vectors, part 4
This concludes the fixes for icmp+shl in this series:
https://reviews.llvm.org/rL279339
https://reviews.llvm.org/rL279398
https://reviews.llvm.org/rL279399

llvm-svn: 279401
2016-08-21 17:10:07 +00:00
Sanjay Patel 163a5ab799 remove FIXME comment; fixed by previous commit
llvm-svn: 279400
2016-08-21 16:40:42 +00:00
Sanjay Patel 7ffcde7422 [InstCombine] use m_APInt to allow icmp (shl X, Y), C folds for splat constant vectors, part 3
This is a partial enablement (move the ConstantInt guard down).

llvm-svn: 279399
2016-08-21 16:35:34 +00:00
Sanjay Patel 7e09f13fed [InstCombine] use m_APInt to allow icmp (shl X, Y), C folds for splat constant vectors, part 2
This is a partial enablement (move the ConstantInt guard down).

llvm-svn: 279398
2016-08-21 16:28:22 +00:00
Matthew Simpson 235e479984 Reapply "[SLP] Initialize VectorizedValue when gathering"
The test case included in r279125 exposed existing undefined behavior in the
SLP vectorizer that it did not introduce. This patch reapplies the original
patch, but modifies the test case to avoid hitting the undefined behavior. This
allows us to close PR28330 while keeping the UBSan bot happy. The undefined
behavior the original test uncovered will be addressed in a follow-on patch.

Reference: https://llvm.org/bugs/show_bug.cgi?id=28330
llvm-svn: 279370
2016-08-20 14:49:02 +00:00
Vitaly Buka cc7db13bf0 Revert "[SLP] Initialize VectorizedValue when gathering" to fix ubsan bot.
This reverts commit r279125.

https://reviews.llvm.org/D23410

llvm-svn: 279363
2016-08-20 07:09:39 +00:00