Commit Graph

7781 Commits

Author SHA1 Message Date
Arnold Schwaighofer f62ba1031f DeadArgElim: Don't mark swifterror arguments as unused
Replacing swifterror arguments with undef creates invalid IR.

rdar://28300490

llvm-svn: 282075
2016-09-21 15:29:08 +00:00
Adam Nemet e3cef93727 [LV] When reporting about a specific instruction without debug location use loop's
This can occur for example if some optimization drops the debug location.

llvm-svn: 282048
2016-09-21 03:14:20 +00:00
Michael Kuperstein 79dcc274b4 [InferAttributes] Don't access parameters that don't exist.
Check for the correct number of parameters before querying their type.
This fixes PR30455.

llvm-svn: 282038
2016-09-20 23:10:31 +00:00
Dorit Nuzman 02efef0525 Reverting revision 281960 due to test failures.
llvm-svn: 281961
2016-09-20 08:27:48 +00:00
Dorit Nuzman d3686e5269 [SROA] Preserve llvm.mem.parallel_loop_access metadata.
SROA doesn't preserve the llvm.mem.parallel_loop_access metadata when it
transforms loads/stores. This patch fixes a couple occurences of this
issue.

(Partially addresses PR28981).

Differential Revision: https://reviews.llvm.org/D23549

llvm-svn: 281960
2016-09-20 07:50:49 +00:00
Dehao Chen 20866ed57e Handle early inline for hot callsites that reside in the same basic block.
Summary: Callsites in the same basic block should share the same hotness. This patch checks for the hottest callsite in the same basic block, and use the hotness for all callsites in that basic block for early inline decisions. It also fixes the test to add "-S" so theat the "CHECK-NOT" is actually checking the content.

Reviewers: dnovillo

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D24734

llvm-svn: 281927
2016-09-19 18:38:14 +00:00
Dehao Chen 82667d04c5 Only set branch weight during sample pgo annotation when max_weight of the branch is non-zero. Otherwise use default static profile to set branch probability.
Summary: It does not make sense to set equal weights for all unkown branches as we have static branch prediction available.

Reviewers: dnovillo

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D24732

llvm-svn: 281912
2016-09-19 16:33:41 +00:00
Dehao Chen 38e3731c47 Use call target count to derive the call instruction weight
Summary: The call target count profile is directly derived from LBR branch->target data. This is more reliable than instruction frequency profiles that could be moved across basic block boundaries. This patches uses call target count profile to annotate call instructions.

Reviewers: davidxl, dnovillo

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D24410

llvm-svn: 281911
2016-09-19 16:06:37 +00:00
Keith Walker c941252374 Add @llvm.dbg.value entries for the phi node created by -mem2reg
When phi nodes are created in the -mem2reg phase, the @llvm.dbg.declare
entries are converted to @llvm.dbg.value entries at the place where the
store instructions existed. However no entry is created to describe
the resulting value of the phi node.

The effect of this is especially noticeable in for loops which have a
constant for the intial value; the loop control variable's location
would be described as the intial constant value in the loop body once
the -mem2reg optimization phase was run.

This change adds the creation of the @llvm.dbg.value entries to describe
variables whose location is the result of a phi node created in -mem2reg.

Also when the phi node is finally lowered to a machine instruction it
is important that the lowered "load" instruction is placed before the
associated DEBUG_VALUE entry describing the value loaded.

Differential Revision: https://reviews.llvm.org/D23715

llvm-svn: 281895
2016-09-19 09:49:30 +00:00
James Molloy 0efb96a8ee [SimplifyCFG] Update (AND) IR flags when CSE'ing instructions
We were updating metadata but not IR flags. Because we pick an arbitrary instruction to be the CSE candidate, it comes down to luck (50% or less chance) if this results in broken codegen or not, which is why PR30373 which is actually not the fault of the commit it was bisected down to.

Fixes PR30373.

llvm-svn: 281889
2016-09-19 08:23:08 +00:00
Dehao Chen 41cde0b986 Handle Invoke during sample profiler annotation: make it inlinable.
Summary: Previously we reline on inst-combine to remove inlinable invoke instructions. This causes trouble because a few extra optimizations are schedule early that could introduce too much CFG change (e.g. simplifycfg removes too much control flow). This patch handles invoke instruction in-place during sample profile annotation, so that we do not rely on instcombine to remove those invoke instructions.

Reviewers: davidxl, dnovillo

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D24409

llvm-svn: 281870
2016-09-18 23:11:37 +00:00
Simon Pilgrim 34032cba14 Rename tests
llvm-svn: 281863
2016-09-18 20:25:41 +00:00
Xinliang David Li 4ca1733a06 [Profile] Implement select instruction instrumentation in IR PGO
Differential Revision: http://reviews.llvm.org/D23727

llvm-svn: 281858
2016-09-18 18:34:07 +00:00
Elena Demikhovsky 5f8cc0c346 [Loop Vectorizer] Consecutive memory access - fixed and simplified
Amended consecutive memory access detection in Loop Vectorizer.
Load/Store were not handled properly without preceding GEP instruction.

Differential Revision: https://reviews.llvm.org/D20789

llvm-svn: 281853
2016-09-18 13:56:08 +00:00
Teresa Johnson fbb431b292 [ThinLTO] Ensure anonymous globals renamed even at -O0
Summary:
This fixes an issue when files are compiled with -flto=thin
at default -O0. We need to rename anonymous globals before attempting
to write the module summary because all values need names for
the summary. This was happening at -O1 and above, but not before
the early exit when constructing the pipeline for -O0.

Also add an internal -prepare-for-thinlto option to enable this
to be tested via opt.

Fixes PR30419.

Reviewers: mehdi_amini

Subscribers: probinson, llvm-commits, mehdi_amini

Differential Revision: https://reviews.llvm.org/D24701

llvm-svn: 281840
2016-09-17 20:40:16 +00:00
Sanjay Patel f26710d97d [InstCombine] canonicalize vector select with constant vector condition to shuffle
As discussed on llvm-dev ( http://lists.llvm.org/pipermail/llvm-dev/2016-August/104210.html ): 
turn a vector select with constant condition operand into a shuffle as a canonicalization step.
Shuffles may be easier to reason about in conjunction with other shuffles and insert/extract.

Possible known (minor?) regressions from this change are filed as:
https://llvm.org/bugs/show_bug.cgi?id=28530 
https://llvm.org/bugs/show_bug.cgi?id=28531 
https://llvm.org/bugs/show_bug.cgi?id=30371

If something terrible happens to perf after this commit, feel free to revert until a backend
fix is in place.

Differential Revision: https://reviews.llvm.org/D24279

llvm-svn: 281787
2016-09-16 22:16:18 +00:00
Evgeniy Stepanov aa84f050fc [safestack] Fix assertion failure in stack coloring.
This is a fix for PR30318.

Clang may generate IR where an alloca is already live when entering a
BB with lifetime.start. In this case, conservatively extend the
alloca lifetime all the way back to the block entry.

llvm-svn: 281784
2016-09-16 22:04:10 +00:00
Sanjay Patel c96f6db246 [InstCombine] allow vector types for constant folding / computeKnownBits (PR24942)
computeKnownBits() already works for integer vectors, so allow vector types when calling that from InstCombine.

I don't think the change to use m_APInt in computeKnownBits is strictly necessary because we do check for 
ConstantVector later, but it's more efficient to handle the splat case without needing to loop on vector elements.

This should work with InstSimplify, but doesn't yet, so I made that a FIXME comment on the test for PR24942:
https://llvm.org/bugs/show_bug.cgi?id=24942

Differential Revision: https://reviews.llvm.org/D24677

llvm-svn: 281777
2016-09-16 21:20:36 +00:00
Sanjay Patel 0417c7c3ef auto-generate checks
llvm-svn: 281755
2016-09-16 17:48:16 +00:00
Mehdi Amini 55b06538b5 Fix test after renaming -name-anon-functions pass to -name-anon-globals
llvm-svn: 281752
2016-09-16 17:18:16 +00:00
Mehdi Amini 2cac787919 Fix NameAnonFunctions pass: for ThinLTO we need to rename global variables as well
A follow-up patch will rename this pass and the source file accordingly,
but I figured the non-NFC change will be easier to spot in isolation.

Differential Revision: https://reviews.llvm.org/D24641

llvm-svn: 281744
2016-09-16 16:56:25 +00:00
David Majnemer 08566532ae Add a test for r280191
llvm-svn: 281694
2016-09-16 02:43:36 +00:00
Sanjay Patel af91d1f81e [InstCombine] allow icmp (shr/shl) folds for vectors
These 2 helper functions were already using APInt internally, so just
change the API and caller to allow folds for splats. The scalar
regression tests look quite thorough, so I just added a couple of
tests to prove that vectors are handled too.

These folds should be grouped with the other cmp+shift folds though.
That can be an NFC follow-up.

llvm-svn: 281663
2016-09-15 21:35:30 +00:00
Sanjay Patel d590db6eac regenerate checks
llvm-svn: 281655
2016-09-15 20:39:01 +00:00
Mehdi Amini d880309835 [GlobalOpt] Dead Eliminate declarations
GlobalOpt is already dead-code-eliminating global definitions. With
this change it also takes care of declarations.
Hopefully this should make it now a strict superset of GlobalDCE.
This is important for LTO/ThinLTO as we don't want the linker to see
"undefined reference" when it processes the input files: it could
prevent proper internalization (or even load an extra file from a
static archive, changing the behavior of the program!).

llvm-svn: 281653
2016-09-15 20:26:27 +00:00
David Majnemer 8b16da8744 [InstCombine] Do not RAUW a constant GEP
canRewriteGEPAsOffset expects to process instructions, not constants.

This fixes PR30342.

llvm-svn: 281650
2016-09-15 20:10:09 +00:00
Sanjay Patel 886a542e23 [InstCombine] allow icmp (sub nsw) folds for vectors
Also, clean up the code and comments for the existing folds in foldICmpSubConstant().

llvm-svn: 281631
2016-09-15 18:05:17 +00:00
Sanjay Patel 514068397e [InstCombine] add vector tests for icmp (sub nsw)
llvm-svn: 281630
2016-09-15 17:54:47 +00:00
Sanjay Patel 40c53ea933 [InstCombine] allow (icmp sgt smin(PosA, B), 0) fold for vectors
llvm-svn: 281624
2016-09-15 16:23:20 +00:00
Sanjay Patel 78a7c7d26a [InstCombine] add vector tests for icmp sgt smin
llvm-svn: 281623
2016-09-15 16:13:41 +00:00
Sanjay Patel e957f1fe2c [InstCombine] auto-generate checks
llvm-svn: 281621
2016-09-15 15:48:53 +00:00
Sanjay Patel 7577a3d799 [InstCombine] use m_APInt to allow icmp folds using known bits for splat constant vectors
llvm-svn: 281613
2016-09-15 14:15:47 +00:00
NAKAMURA Takumi 6a5bac48cf llvm/test/Transforms/CorrelatedValuePropagation/alloca.ll REQUIRES +Asserts.
llvm-svn: 281598
2016-09-15 09:45:31 +00:00
Wei Mi f160e345be Add some shortcuts in LazyValueInfo to reduce compile time of Correlated Value Propagation.
The patch is to partially fix PR10584. Correlated Value Propagation queries LVI
to check non-null for pointer params of each callsite. If we know the def of
param is an alloca instruction, we know it is non-null and can return early from
LVI. Similarly, CVP queries LVI to check whether pointer for each mem access is
constant. If the def of the pointer is an alloca instruction, we know it is not
a constant pointer. These shortcuts can reduce the cost of CVP significantly.

Differential Revision: https://reviews.llvm.org/D18066

llvm-svn: 281586
2016-09-15 06:28:34 +00:00
Sanjay Patel 9f036b5a97 [InstCombine] add vector tests for foldICmpUsingKnownBits()
llvm-svn: 281559
2016-09-14 23:15:11 +00:00
Matthew Simpson b25e87fca5 [LV] Process pointer IVs with PHINodes in collectLoopUniforms
This patch moves the processing of pointer induction variables in
collectLoopUniforms from the consecutive pointer phase of the analysis to the
phi node phase. Previously, if a pointer induction variable was used by both a
scalarized non-memory instruction as well as a vectorized memory instruction,
we would incorrectly identify the pointer as uniform. Pointer induction
variables should be treated the same as other phi nodes. That is, they are
uniform if all users of the induction variable and induction variable update
are uniform.

Differential Revision: https://reviews.llvm.org/D24511

llvm-svn: 281485
2016-09-14 14:47:40 +00:00
Andrea Di Biagio e8e1af3649 [InstCombine] Merged two test files and regenerated checks using update_test_checks.py. NFC.
llvm-svn: 281478
2016-09-14 14:18:21 +00:00
Akira Hatanaka dea090e6b2 [ObjCARC] Traverse chain downwards to replace uses of argument passed to
ObjC library call with call return.

ARC contraction tries to replace uses of an argument passed to an
objective-c library call with the call return value. For example, in the
following IR, it replaces uses of argument %9 and uses of the values
discovered traversing the chain upwards (%7 and %8) with the call return
%10, if they are dominated by the call to @objc_autoreleaseReturnValue.
This transformation enables code-gen to tail-call the call to
@objc_autoreleaseReturnValue, which is necessary to enable auto release
return value optimization.

%7 = tail call i8* @objc_loadWeakRetained(i8** %6)
%8 = bitcast i8* %7 to %0*
%9 = bitcast %0* %8 to i8*
%10 = tail call i8* @objc_autoreleaseReturnValue(i8* %9)
ret %0* %8

Since r276727, llvm started removing redundant bitcasts and as a result
started feeding the following IR to ARC contraction:

%7 = tail call i8* @objc_loadWeakRetained(i8** %6)
%8 = bitcast i8* %7 to %0*
%9 = tail call i8* @objc_autoreleaseReturnValue(i8* %7)
ret %0* %8

ARC contraction no longer does the optimization described above since it
only traverses the chain upwards and fails to recognize that the
function return can be replaced by the call return. This commit changes
ARC contraction to traverse the chain downwards too and replace uses of
bitcasts with the call return.

rdar://problem/28011339

Differential Revision: https://reviews.llvm.org/D24523

llvm-svn: 281419
2016-09-13 23:43:11 +00:00
Sanjay Patel ab40f9d0ef add tests for PR28672
I'm not sure if we actually want to transform all of these in InstCombine yet, 
so I'm not labeling these with FIXME.  

llvm-svn: 281386
2016-09-13 20:36:13 +00:00
Matt Arsenault e2e6cfee61 Reapply "InstCombine: Reduce trunc (shl x, K) width."
This reapplies r272987 with a fix for infinitely looping
when the truncated value is another shift of a constant.

llvm-svn: 281379
2016-09-13 19:43:57 +00:00
Andrea Di Biagio 7277afeec1 [ConstantFold] Improve the bitcast folding logic for constant vectors.
The constant folder didn't know how to always fold bitcasts of constant integer
vectors. In particular, it was unable to handle the case where a constant vector
had some undef elements, and the resulting (i.e. bitcasted) vector type had more
elements than the original vector type.

Example:
  %cast = bitcast <2 x i64><i64 undef, i64 2> to <4 x i32>

On a little endian target, %cast could have been folded to:
  <4 x i32><i32 undef, i32 undef, i32 2, i32 0>

This patch improves the folding logic by teaching how to correctly propagate
undef elements in the folded vector.

Differential Revision: https://reviews.llvm.org/D24301

llvm-svn: 281343
2016-09-13 14:50:47 +00:00
Andrea Di Biagio 3647a96a44 [InstSimplify] Add tests to show missed bitcast folding opportunities.
InstSimplify doesn't always know how to fold a bitcast of a constant vector.
In particular, the logic in InstSimplify doesn't know how to handle the case
where the constant vector in input contains some undef elements, and the
number of elements is smaller than the number of elements of the bitcast
vector type.

llvm-svn: 281332
2016-09-13 13:17:42 +00:00
Sam Parker 64781ed4bb Remove InstCombine test file
My previous commit should of removed a test file but I missed it.

llvm-svn: 281326
2016-09-13 12:33:06 +00:00
Sam Parker 214f7bf5cc Enable simplify libcalls for ARM PCS
Teach SimplifyLibcalls that in can treat functions annotated with
apcs, aapcs or aapcs_vfp like normal C functions if they only take
and return integer or pointer values, and the target is not iOS.

Differential Revision: https://reviews.llvm.org/D24453

llvm-svn: 281322
2016-09-13 12:10:14 +00:00
Peter Collingbourne d4135bbc30 DebugInfo: New metadata representation for global variables.
This patch reverses the edge from DIGlobalVariable to GlobalVariable.
This will allow us to more easily preserve debug info metadata when
manipulating global variables.

Fixes PR30362. A program for upgrading test cases is attached to that
bug.

Differential Revision: http://reviews.llvm.org/D20147

llvm-svn: 281284
2016-09-13 01:12:59 +00:00
Sanjay Patel ff00fae8e6 add more tests for PR30273
llvm-svn: 281270
2016-09-12 22:28:29 +00:00
Sanjay Patel dea26950a0 [InstCombine] add test for PR30327
llvm-svn: 281248
2016-09-12 19:50:08 +00:00
Sanjay Patel 2eea9a1d58 [InstCombine] regenerate checks
llvm-svn: 281247
2016-09-12 19:29:26 +00:00
Sanjay Patel f5887f1fbd [InstCombine] use m_APInt to allow icmp X, C folds for splat constant vectors
isSignBitCheck could be changed to take a pointer param to avoid the 'UnusedBit' ugliness.

llvm-svn: 281231
2016-09-12 16:25:41 +00:00
David Majnemer c83044d9bb [FunctionAttrs] Don't try to infer returned if it is already on an argument
Trying to infer the 'returned' attribute if an argument is already
'returned' can lead to verification failure: inference might determine
that a different argument is passed through which would result in two
different arguments marked as 'returned'.

This fixes PR30350.

llvm-svn: 281221
2016-09-12 16:04:59 +00:00
Sanjay Patel db400baa80 [InstCombine] add tests to show missing vector folds
llvm-svn: 281219
2016-09-12 15:51:42 +00:00
Sanjay Patel f9ca770225 [InstCombine] regenerate checks
llvm-svn: 281186
2016-09-12 00:12:56 +00:00
Sanjay Patel a2aabfcc17 [InstCombine] regenerate checks
llvm-svn: 281185
2016-09-12 00:08:33 +00:00
James Molloy 104370ab37 [SimplifyCFG] Be even more conservative in SinkThenElseCodeToEnd
This should *actually* fix PR30244. This cranks up the workaround for PR30188 so that we never sink loads or stores of allocas.

The idea is that these should be removed by SROA/Mem2Reg, and any movement of them may well confuse SROA or just cause unwanted code churn. It's not ideal that the midend should be crippled like this, but that unwanted churn can really cause significant regressions in important workloads (tsan).

llvm-svn: 281162
2016-09-11 09:00:03 +00:00
James Molloy 18d96e8fa5 [SimplifyCFG] Harden up the profitability heuristic for block splitting during sinking
Exposed by PR30244, we will split a block currently if we think we can sink at least one instruction. However this isn't right - the reason we split predecessors is so that we can sink instructions that otherwise couldn't be sunk because it isn't safe to do so - stores, for example.

So, change the heuristic to only split if it thinks it can sink at least one non-speculatable instruction.

Should fix PR30244.

llvm-svn: 281160
2016-09-11 08:07:30 +00:00
Justin Lebar 11a3204355 Add handling of !invariant.load to PropagateMetadata.
Summary:
This will let e.g. the load/store vectorizer propagate this metadata
appropriately.

Reviewers: arsenm

Subscribers: tra, jholewinski, hfinkel, mzolotukhin

Differential Revision: https://reviews.llvm.org/D23479

llvm-svn: 281153
2016-09-11 01:39:08 +00:00
Arnold Schwaighofer 5d335559b9 InstCombine: Don't combine loads/stores from swifterror to a new type
This generates invalid IR: the only users of swifterror can be call
arguments, loads, and stores.

rdar://28242257

llvm-svn: 281144
2016-09-10 18:14:57 +00:00
Arnold Schwaighofer c9277f40fd Inliner: Don't mark swifterror allocas with lifetime markers
This would create a bitcast use which fails the verifier: swifterror values may
only be used by loads, stores, and as function arguments.

rdar://28233244

llvm-svn: 281114
2016-09-09 22:40:27 +00:00
Matt Arsenault 950a82047b LSV: Fix incorrectly increasing alignment
If the unaligned access has a dynamic offset, it may be odd which
would make the adjusted alignment incorrect to use.

llvm-svn: 281110
2016-09-09 22:20:14 +00:00
Sanjay Patel 58109abe91 [InstCombine] use m_APInt to allow icmp ult X, C folds for splat constant vectors
llvm-svn: 281107
2016-09-09 21:59:37 +00:00
Dehao Chen 22ce5eb051 Do not widen load for different variable in GVN.
Summary:
Widening load in GVN is too early because it will block other optimizations like PRE, LICM.

https://llvm.org/bugs/show_bug.cgi?id=29110

The SPECCPU2006 benchmark impact of this patch:

Reference: o2_nopatch
(1): o2_patched

           Benchmark             Base:Reference   (1)  
-------------------------------------------------------
spec/2006/fp/C++/444.namd                  25.2  -0.08%
spec/2006/fp/C++/447.dealII               45.92  +1.05%
spec/2006/fp/C++/450.soplex                41.7  -0.26%
spec/2006/fp/C++/453.povray               35.65  +1.68%
spec/2006/fp/C/433.milc                   23.79  +0.42%
spec/2006/fp/C/470.lbm                    41.88  -1.12%
spec/2006/fp/C/482.sphinx3                47.94  +1.67%
spec/2006/int/C++/471.omnetpp             22.46  -0.36%
spec/2006/int/C++/473.astar               21.19  +0.24%
spec/2006/int/C++/483.xalancbmk           36.09  -0.11%
spec/2006/int/C/400.perlbench             33.28  +1.35%
spec/2006/int/C/401.bzip2                 22.76  -0.04%
spec/2006/int/C/403.gcc                   32.36  +0.12%
spec/2006/int/C/429.mcf                   41.04  -0.41%
spec/2006/int/C/445.gobmk                 26.94  +0.04%
spec/2006/int/C/456.hmmer                  24.5  -0.20%
spec/2006/int/C/458.sjeng                    28  -0.46%
spec/2006/int/C/462.libquantum            55.25  +0.27%
spec/2006/int/C/464.h264ref               45.87  +0.72%

geometric mean                                   +0.23%

For most benchmarks, it's a wash, but we do see stable improvements on some benchmarks, e.g. 447,453,482,400.

Reviewers: davidxl, hfinkel, dberlin, sanjoy, reames

Subscribers: gberry, junbuml

Differential Revision: https://reviews.llvm.org/D24096

llvm-svn: 281074
2016-09-09 18:42:35 +00:00
Sanjay Patel 6da0fb8c74 [InstCombine] add tests to show pattern matching failures due to commutation
I was looking to fix a bug in getComplexity(), and these cases showed up as 
obvious failures. I'm not sure how to find these in general though.

llvm-svn: 281055
2016-09-09 16:35:20 +00:00
Gor Nishanov faf36c2e0b [Coroutines] Part13: Handle single edge PHINodes across suspends
Summary:
If one of the uses of the value is a single edge PHINode, handle it.

Original:

    %val = something
    <suspend>
    %p = PHINode [%val]

After Spill + Part13:

    %val = something
    %slot = gep val.spill.slot
    store %val, %slot
    <suspend>
    %p = load %slot

Plus tiny fixes/changes:
   * use correct index for coro.free in CoroCleanup
   * fixup id parameter in coro.free to allow authoring coroutine in plain C with __builtins

Reviewers: majnemer

Subscribers: mehdi_amini, llvm-commits

Differential Revision: https://reviews.llvm.org/D24242

llvm-svn: 281020
2016-09-09 05:39:00 +00:00
Dehao Chen 87823f8e4d Remove debug info when hoisting instruction from then/else branch.
Summary: The hoisted instruction is executed speculatively. It could affect the debugging experience as user would see gdb go into code that may not be expected to execute. It will also affect sample profile accuracy by assigning incorrect frequency to source within then/else branch.

Reviewers: davidxl, dblaikie, chandlerc, kcc, echristo

Subscribers: mehdi_amini, probinson, eric_niebler, andreadb, llvm-commits

Differential Revision: https://reviews.llvm.org/D24164

llvm-svn: 280995
2016-09-08 21:53:33 +00:00
Sanjay Patel a4c6223319 [InstCombine] regenerate checks
llvm-svn: 280993
2016-09-08 21:40:21 +00:00
Matthew Simpson bfe5e1817b [LV] Ensure proper handling of multi-use case when collecting uniforms
The test case included in r280979 wasn't checking what it was supposed to be
checking for the predicated store case. Fixing the test revealed that the
multi-use case (when a pointer is used by both vectorized and scalarized memory
accesses) wasn't being handled properly. We can't skip over
non-consecutive-like pointers since they may have looked consecutive-like with
a different memory access.

llvm-svn: 280992
2016-09-08 21:38:26 +00:00
Sanjay Patel ed9fda01a3 [InstCombine] regenerate checks
llvm-svn: 280991
2016-09-08 21:32:21 +00:00
Matthew Simpson 408a3abcfe [LV] Don't mark pointers used by scalarized memory accesses uniform
Previously, all consecutive pointers were marked uniform after vectorization.
However, if a consecutive pointer is used by a memory access that is eventually
scalarized, the pointer won't remain uniform after all. An example is
predicated stores. Even though a predicated store may be consecutive, it will
still be scalarized, making it's pointer operand non-uniform.

This patch updates the logic in collectLoopUniforms to consider the cases where
a memory access may be scalarized. If a memory access may be scalarized, its
pointer operand is not marked uniform. The determination of whether a given
memory instruction will be scalarized or not has been moved into a common
function that is used by the vectorizer, cost model, and legality analysis.

Differential Revision: https://reviews.llvm.org/D24271

llvm-svn: 280979
2016-09-08 19:11:07 +00:00
Dehao Chen ebb715b119 Add unittest for r280760
llvm-svn: 280963
2016-09-08 16:53:40 +00:00
Simon Pilgrim 073472a2bf [InstCombine][X86] Regenerate masked memory op combine tests
llvm-svn: 280960
2016-09-08 16:32:37 +00:00
Simon Pilgrim cd7b2830b9 [InstCombine][X86] Regenerate vperm2f128/vperm2i128 combine tests
llvm-svn: 280959
2016-09-08 16:30:46 +00:00
Simon Pilgrim 3b1ecbe66c [InstCombine][X86] Regenerate insertps combine tests
llvm-svn: 280957
2016-09-08 16:15:21 +00:00
Michael Zolotukhin e72997a524 Revert "[LoopUnroll] Properly update loop-info when cloning prologues and epilogues."
This reverts commit r280901.

This caused a bunch of failures, reverting it until I investigate them.

llvm-svn: 280905
2016-09-08 03:51:30 +00:00
Michael Zolotukhin 5e0a20697e [LoopUnroll] Properly update loop-info when cloning prologues and epilogues.
Summary:
When cloning blocks for prologue/epilogue we need to replicate the loop
structure from the original loop. It wasn't a problem for the innermost
loops, but it led to an incorrect loop info when we unrolled a loop with
a child loop - in this case created prologue-loop had a child loop, but
loop info didn't reflect that.

This fixes PR28888.

Reviewers: chandlerc, sanjoy, hfinkel

Subscribers: llvm-commits, silvas

Differential Revision: https://reviews.llvm.org/D24203

llvm-svn: 280901
2016-09-08 01:52:26 +00:00
Sanjay Patel 9b40f98357 [InstCombine] use m_APInt to allow icmp (and (sh X, Y), C2), C1 folds for splat constant vectors
llvm-svn: 280873
2016-09-07 22:33:03 +00:00
Hal Finkel ac5803ba91 [SimplifyCFG] Don't try to create metadata-valued PHIs
We can't create metadata-valued PHIs; don't try to do so when sinking.

I created a test case for this using the @llvm.type.test intrinsic, because it
takes a metadata parameter and does not have severe side effects (thus
SimplifyCFG is willing to otherwise sink it).

Previously, running the test case would crash with:

  Invalid use of metadata!
    %.sink = select i1 %flag, metadata <...>, metadata <0x4e45dc0>
  LLVM ERROR: Broken function found, compilation aborted!

llvm-svn: 280866
2016-09-07 21:38:22 +00:00
Sanjay Patel def931e76a [InstCombine] allow icmp (and X, C2), C1 folds for splat constant vectors
This is a revert of r280676 which was a revert of r280637;
ie, this is r280637 again. It was speculatively reverted to
help debug buildbot failures.

llvm-svn: 280861
2016-09-07 20:50:44 +00:00
Justin Lebar 3a5f40c191 [LSV] Use the original loads' names for the extractelement instructions.
Summary:
LSV replaces multiple adjacent loads with one vectorized load and a
bunch of extractelement instructions.  This patch makes the
extractelement instructions' names match those of the original loads,
for (hopefully) improved readability.

Reviewers: asbirlea, tstellarAMD

Subscribers: arsenm, mzolotukhin

Differential Revision: https://reviews.llvm.org/D23748

llvm-svn: 280818
2016-09-07 15:49:48 +00:00
Andrea Di Biagio bdd576dbb0 Regenerate vector bitcast folding tests using update_test_checks.py.
Two tests have been merged together, regenerated and then moved to
a more appropriate directory. No functional change.

llvm-svn: 280814
2016-09-07 14:50:07 +00:00
Andrea Di Biagio f3fd316223 [InstCombine][SSE4a] Fix assertion failure in the insertq/insertqi combining logic.
This fixes a similar issue to the one already fixed by r280804
(revieved in D24256). Revision 280804 fixed the problem with unsafe dyn_casts
in the extrq/extrqi combining logic. However, it turns out that even the
insertq/insertqi logic was affected by the same problem.

llvm-svn: 280807
2016-09-07 12:47:53 +00:00
Andrea Di Biagio 8df5b9cf48 [InstCombine][SSE4a] Fix assertion failure caused by unsafe dyn_casts on the operands of extrq/extrqi intrinsic calls.
This patch fixes an assertion failure caused by unsafe dynamic casts on the
constant operands of sse4a intrinsic calls to extrq/extrqi

The combine logic that simplifies sse4a extrq/extrqi intrinsic calls currently
checks if the input operands are constants. Internally, that logic relies on
dyn_casts of values returned by calls to method Constant::getAggregateElement.
However, method getAggregateElemet may return nullptr if the constant element
cannot be retrieved. So, all the dyn_casts can potentially fail. This is what
happens for example if a constexpr value is passed in input to an extrq/extrqi
intrinsic call.

This patch fixes the problem by using a dyn_cast_or_null (instead of a simple
dyn_cast) on the result of each call to Constant::getAggregateElement.

Added reproducible test cases to x86-sse4a.ll.

Differential Revision: https://reviews.llvm.org/D24256

llvm-svn: 280804
2016-09-07 12:03:03 +00:00
James Molloy ec905a62ae [SimplifyCFG] Update workaround for PR30188 to also include loads
I should have realised this the first time around, but if we're avoiding sinking stores where the operands come from allocas so they don't create selects, we also have to do the same for loads because SROA will be just as defective looking at loads of selected addresses as stores.

Fixes PR30188 (again).

llvm-svn: 280792
2016-09-07 08:40:20 +00:00
James Molloy bf1837d9c9 [SimplifyCFG] Check PHI uses more accurately
PR30292 showed a case where our PHI checking wasn't correct. We were checking that all values were used by the same PHI before deciding to sink, but we weren't checking that the incoming values for that PHI were what we expected. As a result, we had to bail out after block splitting which caused us to never reach a steady state in SimplifyCFG.

Fixes PR30292.

llvm-svn: 280790
2016-09-07 08:15:54 +00:00
Adam Nemet c520822dbf [JumpThreading] Only write back branch-weight MDs for blocks that originally had PGO info
Currently the pass updates branch weights in the IR if the function has
any PGO info (entry frequency is set).  However we could still have
regions of the CFG that does not have branch weights collected (e.g. a
cold region).  In this case we'd use static estimates.  Since static
estimates for branches are determined independently, they are
inconsistent.  Updating them can "randomly" inflate block frequencies.

I've run into this in a completely cold loop of h264ref from
SPEC.  -Rpass-with-hotness showed the loop to be completely cold during
inlining (before JT) but completely hot during vectorization (after JT).

The new testcase demonstrate the problem.  We check array elements
against 1, 2 and 3 in a loop.  The check against 3 is the loop-exiting
check.  The block names should be self-explanatory.

In this example, jump threading incorrectly updates the weight of the
loop-exiting branch to 0, drastically inflating the frequency of the
loop (in the range of billions).

There is no run-time profile info for edges inside the loop, so branch
probabilities are estimated.  These are the resulting branch and block
frequencies for the loop body:

                check_1 (16)
            (8) /  |
            eq_1   | (8)
                \  |
                check_2 (16)
            (8) /  |
            eq_2   | (8)
                \  |
                check_3 (16)
            (1) /  |
       (loop exit) | (15)
                   |
              (back edge)

First we thread eq_1 -> check_2 to check_3.  Frequencies are updated to
remove the frequency of eq_1 from check_2 and then from the false edge
leaving check_2.  Changed frequencies are highlighted with * *:

                check_1 (16)
            (8) /  |
           eq_1~   | (8)
           /       |
          /     check_2 (*8*)
         /  (8) /  |
         \  eq_2   | (*0*)
          \     \  |
           ` --- check_3 (16)
            (1) /  |
       (loop exit) | (15)
                   |
              (back edge)

Next we thread eq_1 -> check_3 and eq_2 -> check_3 to check_1 as new
back edges.  Frequencies are updated to remove the frequency of eq_1 and
eq_3 from check_3 and then the false edge leaving check_3 (changed
frequencies are highlighted with * *):

                  check_1 (16)
              (8) /  |
             eq_1~   | (8)
             /       |
            /     check_2 (*8*)
           /  (8) /  |
          /-- eq_2~  | (*0*)
  (back edge)        |
                  check_3 (*0*)
            (*0*) /  |
         (loop exit) | (*0*)
                     |
                (back edge)

As a result, the loop exit edge ends up with 0 frequency which in turn makes
the loop header to have maximum frequency.

There are a few potential problems here:

1. The profile data seems odd.  There is a single profile sample of the
loop being entered.  On the other hand, there are no weights inside the
loop.

2. Based on static estimation we shouldn't set edges to "extreme"
values, i.e. extremely likely or unlikely.

3. We shouldn't create profile metadata that is calculated from static
estimation.  I am not sure what policy is but it seems to make sense to
treat profile metadata as something that is known to originate from
profiling.  Estimated probabilities should only be reflected in BPI/BFI.

Any one of these would probably fix the immediate problem.  I went for 3
because I think it's a good policy to have and added a FIXME about 2.

Differential Revision: https://reviews.llvm.org/D24118

llvm-svn: 280713
2016-09-06 16:08:33 +00:00
Sanjay Patel e341c919b0 fix FileCheck variables for test added with r280677
The script (utils/update_test_checks.py) seems to have problems 
with variable names that start with the same string. 

llvm-svn: 280679
2016-09-05 23:49:32 +00:00
Gor Nishanov ccabaca273 [Coroutines] Part12: Handle alloca address-taken
Summary:
Move early uses of spilled variables after CoroBegin.

For example, if a parameter had address taken, we may end up with the code
like:
        define @f(i32 %n) {
          %n.addr = alloca i32
          store %n, %n.addr
          ...
          call @coro.begin

This patch fixes the problem by moving uses of spilled variables after CoroBegin.

Reviewers: majnemer

Subscribers: mehdi_amini, llvm-commits

Differential Revision: https://reviews.llvm.org/D24234

llvm-svn: 280678
2016-09-05 23:45:45 +00:00
Sanjay Patel eea2ef7862 [InstCombine] don't assert that division-by-constant has been folded (PR30281)
This is effectively a revert of:
https://reviews.llvm.org/rL280115

And this should fix
https://llvm.org/bugs/show_bug.cgi?id=30281:

llvm-svn: 280677
2016-09-05 23:38:22 +00:00
Sanjay Patel 46f9df5b71 [InstCombine] revert r280637 because it causes test failures on an ARM bot
http://lab.llvm.org:8011/builders/clang-cmake-armv7-a15/builds/14952/steps/ninja%20check%201/logs/FAIL%3A%20LLVM%3A%3Aicmp.ll

llvm-svn: 280676
2016-09-05 22:36:32 +00:00
Oliver Stannard ef38d53a7e [SimplifyCFG] Add test for sinking inline asm in if/else
This test code previously caused a failure in the module verifier,
because SimplifyCFG created this invalid instruction, which tries to
take the address of inline asm:
  %.sink = select i1 %1, i64 ()* asm "mov $0, #1", "=r", i64 ()* asm %"mov $0, #2", "=r"

This has been fixed recently, presumably by James Molloy's patches that
re-wrote and changed parts of SimplifyCFG, so this patch just adds a
regression test for it.

Differential Revision: https://reviews.llvm.org/D24231

llvm-svn: 280660
2016-09-05 13:49:26 +00:00
Gor Nishanov 0e18f75a92 [Coroutines] Part11: Add final suspend handling.
Summary:
A frontend may designate a particular suspend to be final, by setting the second argument of the coro.suspend intrinsic to true. Such a suspend point has two properties:

* it is possible to check whether a suspended coroutine is at the final suspend point via coro.done intrinsic;
* a resumption of a coroutine stopped at the final suspend point leads to undefined behavior. The only possible action for a coroutine at a final suspend point is destroying it via coro.destroy intrinsic.

This patch adds final suspend handling logic to CoroEarly and CoroSplit passes.
Now, the final suspend point example from docs\Coroutines.rst compiles and produces expected result (see test/Transform/Coroutines/ex5.ll).

Reviewers: majnemer

Subscribers: mehdi_amini, llvm-commits

Differential Revision: https://reviews.llvm.org/D24068

llvm-svn: 280646
2016-09-05 04:44:30 +00:00
Sanjay Patel c641e9d6ff [InstCombine] allow icmp (and X, C2), C1 folds for splat constant vectors
The code to calculate 'UsesRemoved' could be simplified.
As-is, that code is a victim of PR30273:
https://llvm.org/bugs/show_bug.cgi?id=30273

llvm-svn: 280637
2016-09-04 20:58:27 +00:00
Dorit Nuzman abd15f69b2 [InstCombine] Preserve llvm.mem.parallel_loop_access metadata when replacing
memcpy with ld/st.

When InstCombine replaces a memcpy with loads+stores it does not copy over the
llvm.mem.parallel_loop_access from the memcpy instruction. This patch fixes
that.

Differential Revision: https://reviews.llvm.org/D23499

llvm-svn: 280617
2016-09-04 07:49:39 +00:00
Joseph Tremoulet e92e0a9042 Fix inliner funclet unwind memoization
Summary:
The inliner may need to determine where a given funclet unwinds to,
and this determination may depend on other funclets throughout the
funclet tree.  The code that performs this walk in getUnwindDestToken
memoizes results to avoid redundant computations.  In the case that
a funclet's unwind destination is derived from its ancestor, there's
code to walk back down the tree from the ancestor updating the memo
map of its descendants to record the unwind destination.  This change
fixes that code to account for the case that some descendant has a
different unwind destination, which can happen if that unwind dest
is a descendant of the EHPad being queried and thus didn't determine
its unwind destination.

Also update test inline-funclets.ll, which is supposed to cover such
scenarios, to include a case that fails an assertion without this fix
but passes with it.

Fixes PR29151.


Reviewers: majnemer

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D24117

llvm-svn: 280610
2016-09-04 01:23:20 +00:00
Matt Arsenault 46a0382ab2 AMDGPU: Do basic folding of class intrinsic
This allows more of the OCML builtin library to be
constant folded.

llvm-svn: 280586
2016-09-03 07:06:58 +00:00
Wei Mi c37307a5f4 Fix buildbot error.
Add -mtriple=x86_64-unknown-linux-gnu for the test and move it to CodeGen/X86.

llvm-svn: 280568
2016-09-03 01:43:28 +00:00
Xinliang David Li 40afd5c9e4 [Profile] handle select instruction in 'expect' lowering
Builtin expect lowering currently ignores select. This patch
fixes the issue

Differential Revision: http://reviews.llvm.org/D24166

llvm-svn: 280547
2016-09-02 22:03:40 +00:00
Sanjay Patel 70277411d3 [InstCombine] auto-generate assertions for tighter checking
llvm-svn: 280531
2016-09-02 19:38:37 +00:00
Wei Mi c54d1298f5 Split the store of a wide value merged from an int-fp pair into multiple stores.
For the store of a wide value merged from a pair of values, especially int-fp pair,
sometimes it is more efficent to split it into separate narrow stores, which can
remove the bitwise instructions or sink them to colder places.

Now the feature is only enabled on x86 target, and only store of int-fp pair is
splitted. It is possible that the application scope gets extended with perf evidence
support in the future.

Differential Revision: https://reviews.llvm.org/D22840

llvm-svn: 280505
2016-09-02 17:17:04 +00:00
Sanjay Patel 521f19f249 [InsttCombine] fold insertelement of constant into shuffle with constant operand (PR29126)
The motivating case occurs with SSE/AVX scalar intrinsics, so this is a first step towards
shrinking that to a single shufflevector.

Note that the transform is intentionally limited to shuffles that are equivalent to vector
selects to avoid creating arbitrary shuffle masks that may not lower well.

This should solve PR29126:
https://llvm.org/bugs/show_bug.cgi?id=29126

Differential Revision: https://reviews.llvm.org/D23886

llvm-svn: 280504
2016-09-02 17:05:43 +00:00
Matthew Simpson b65c230eab [LV] Ensure reverse interleaved group GEPs remain uniform
For uniform instructions, we're only required to generate a scalar value for
the first vector lane of each unroll iteration. Thus, if we have a reverse
interleaved group, computing the member index off the scalar GEP corresponding
to the last vector lane of its pointer operand technically makes the GEP
non-uniform. We should compute the member index off the first scalar GEP
instead.

I've added the updated member index computation to the existing reverse
interleaved group test.

llvm-svn: 280497
2016-09-02 16:19:22 +00:00
Andrea Di Biagio 805815f407 [instsimplify] Fix incorrect folding of an ordered fcmp with a vector of all NaN.
This patch fixes a crash caused by an incorrect folding of an ordered comparison
between a packed floating point vector and a splat vector of NaN.

An ordered comparison between a vector and a constant vector of NaN, should
always be folded into a constant vector where each element is i1 false.

Since revision 266175, SimplifyFCmpInst folds the ordered fcmp into a scalar
'false'. Later on, this would cause an assertion failure, since the value type
of the folded value doesn't match the expected value type of the uses of the
original instruction: "Assertion failed: New->getType() == getType() &&
"replaceAllUses of value with new value of different type!".

This patch fixes the issue and adds a test case to the already existing test
InstSimplify/floating-point-compares.ll.

Differential Revision: https://reviews.llvm.org/D24143

llvm-svn: 280488
2016-09-02 14:47:43 +00:00
Alexey Bataev ed27d69037 [InstCombine] Add test for insertelementinsts with constants.
Added a tests that shows that several insertelementinsts with constant
indexes/data are not folded into a single shuffleinst.

llvm-svn: 280474
2016-09-02 09:00:53 +00:00
James Molloy f3cf2a494b [SimplifyCFG] Add a workaround to fix PR30188
We're sinking stores, which is a good thing, but in the process creating selects for the store address operand, which SROA/Mem2Reg can't look through, which caused serious regressions.

The real fix is in SROA, which I'll be looking into.

llvm-svn: 280470
2016-09-02 07:29:00 +00:00
NAKAMURA Takumi 9aec5c3797 llvm/test/Transforms/GCOVProfiling/three-element-mdnode.ll: Use %/T instead of %T, not to emit backslashes.
llvm-svn: 280451
2016-09-02 01:33:00 +00:00
Sanjay Patel 199a8e6a92 [InstCombine] add tests to show potential shuffle+insert folds
llvm-svn: 280403
2016-09-01 19:14:19 +00:00
Sanjay Patel dd861964d1 [InstCombine] remove fold of an icmp pattern that should never happen
While removing a scalar shackle from an icmp fold, I noticed that I couldn't find any tests to trigger
this code path.

The 'and' shrinking transform should be handled by InstCombiner::foldCastedBitwiseLogic()
or eliminated with InstSimplify. The icmp narrowing is part of InstCombiner::foldICmpWithCastAndCast().

Differential Revision: https://reviews.llvm.org/D24031 

llvm-svn: 280370
2016-09-01 14:20:43 +00:00
James Molloy 88cad7e5cf [SimplifyCFG] Handle tail-sinking of more than 2 incoming branches
This was a real restriction in the original version of SinkIfThenCodeToEnd. Now it's been rewritten, the restriction can be lifted.

As part of this, we handle a very common and useful case where one of the incoming branches is actually conditional. Consider:

   if (a)
     x(1);
   else if (b)
     x(2);

This produces the following CFG:

         [if]
        /    \
      [x(1)] [if]
        |     | \
        |     |  \
        |  [x(2)] |
         \    |  /
          [ end ]

[end] has two unconditional predecessor arcs and one conditional. The conditional refers to the implicit empty 'else' arc. This same pattern can also be caused by an empty default block in a switch.

We can't sink the call to x() down to end because no call to x() happens on the third incoming arc (assume that x() has sideeffects for the sake of argument; if something is safe to speculate we could indeed sink nevertheless but this cannot happen in the general case and causes many extra selects).

We are now able to detect this case and split off the unconditional arcs to a common successor:

         [if]
        /    \
      [x(1)] [if]
        |     | \
        |     |  \
        |  [x(2)] |
         \   /    |
     [sink.split] |
           \     /
           [ end ]

Now we can sink the call to x() into %sink.split. This can cause significant code simplification in many testcases.

llvm-svn: 280364
2016-09-01 12:58:13 +00:00
James Molloy eec6df3193 [SimplifyCFG] Change the algorithm in SinkThenElseCodeToEnd
r279460 rewrote this function to be able to handle more than two incoming edges and took pains to ensure this didn't regress anything.

This time we change the logic for determining if an instruction should be sunk. Previously we used a single pass greedy algorithm - sink instructions until one requires more than one PHI node or we run out of instructions to sink.

This had the problem that sinking instructions that had non-identical but trivially the same operands needed extra logic so we sunk them aggressively. For example:

    %a = load i32* %b          %d = load i32* %b
    %c = gep i32* %a, i32 0    %e = gep i32* %d, i32 1

Sinking %c and %e would naively require two PHI merges as %a != %d. But the loads are obviously equivalent (and maybe can't be hoisted because there is no common predecessor).

This is why we implemented the fairly complex function areValuesTriviallySame(), to look through trivial differences like this. However it's just not clever enough.

Instead, throw areValuesTriviallySame away, use pointer equality to check equivalence of operands and switch to a two-stage algorithm.

In the "scan" stage, we look at every sinkable instruction in isolation from end of block to front. If it's sinkable, we keep track of all operands that required PHI merging.

In the "sink" stage, we iteratively sink the last non-terminator in the source blocks. But when calculating how many PHIs are actually required to be inserted (to work out if we should stop or not) we remove any values that have already been sunk from the set of PHI-merges required, which allows us to be more aggressive.

This turns an algorithm with potentially recursive lookahead (looking through GEPs, casts, loads and any other instruction potentially not CSE'd) to two linear scans.

llvm-svn: 280351
2016-09-01 10:44:35 +00:00
Hal Finkel 40d7f5c277 Add a counter-function insertion pass
As discussed in https://reviews.llvm.org/D22666, our current mechanism to
support -pg profiling, where we insert calls to mcount(), or some similar
function, is fundamentally broken. We insert these calls in the frontend, which
means they get duplicated when inlining, and so the accumulated execution
counts for the inlined-into functions are wrong.

Because we don't want the presence of these functions to affect optimizaton,
they should be inserted in the backend. Here's a pass which would do just that.
The knowledge of the name of the counting function lives in the frontend, so
we're passing it here as a function attribute. Clang will be updated to use
this mechanism.

Differential Revision: https://reviews.llvm.org/D22825

llvm-svn: 280347
2016-09-01 09:42:39 +00:00
Nick Lewycky 97e49ac59e Add -fprofile-dir= to clang.
-fprofile-dir=path allows the user to specify where .gcda files should be
emitted when the program is run. In particular, this is the first flag that
causes the .gcno and .o files to have different paths, LLVM is extended to
support this. -fprofile-dir= does not change the file name in the .gcno (and
thus where lcov looks for the source) but it does change the name in the .gcda
(and thus where the runtime library writes the .gcda file). It's different from
a GCOV_PREFIX because a user can observe that the GCOV_PREFIX_STRIP will strip
paths off of -fprofile-dir= but not off of a supplied GCOV_PREFIX.

To implement this we split -coverage-file into -coverage-data-file and
-coverage-notes-file to specify the two different names. The !llvm.gcov
metadata node grows from a 2-element form {string coverage-file, node dbg.cu}
to 3-elements, {string coverage-notes-file, string coverage-data-file, node
dbg.cu}. In the 3-element form, the file name is already "mangled" with
.gcno/.gcda suffixes, while the 2-element form left that to the middle end
pass.

llvm-svn: 280306
2016-08-31 23:04:32 +00:00
Sanjay Patel 0d70831d73 [InstCombine] allow icmp (shr exact X, C2), C fold for splat constant vectors
The enhancement to foldICmpDivConstant ( http://llvm.org/viewvc/llvm-project?view=revision&revision=280299 )
allows us to remove the ConstantInt check; no other changes needed.

llvm-svn: 280300
2016-08-31 22:18:43 +00:00
Sanjay Patel 541aef4661 [InstCombine] allow icmp (div X, Y), C folds for splat constant vectors
Converting all of the overflow ops to APInt looked risky, so I've left that as a TODO.

llvm-svn: 280299
2016-08-31 21:57:21 +00:00
Geoff Berry 8d84605f25 [EarlyCSE] Optionally use MemorySSA. NFC.
Summary:
Use MemorySSA, if requested, to do less conservative memory dependency
checking.

This change doesn't enable the MemorySSA enhanced EarlyCSE in the
default pipelines, so should be NFC.

Reviewers: dberlin, sanjoy, reames, majnemer

Subscribers: mcrosier, llvm-commits

Differential Revision: http://reviews.llvm.org/D19821

llvm-svn: 280279
2016-08-31 19:24:10 +00:00
Geoff Berry 64f5ed172a [EarlyCSE] Allow forwarding a non-invariant load into an invariant load.
Reviewers: sanjoy

Subscribers: mcrosier, llvm-commits

Differential Revision: https://reviews.llvm.org/D23935

llvm-svn: 280265
2016-08-31 17:45:31 +00:00
Philip Reames 2b1084ac93 [statepoints][experimental] Add support for live-in semantics of values in deopt bundles
This is a first step towards supporting deopt value lowering and reporting entirely with the register allocator. I hope to build on this in the near future to support live-on-return semantics, but I have a use case which allows me to test and investigate code quality with just the live-in semantics so I've chosen to start there. For those curious, my use cases is our implementation of the "__llvm_deoptimize" function we bind to @llvm.deoptimize. I'm choosing not to hard code that fact in the patch and instead make it configurable via function attributes.

The basic approach here is modelled on what is done for the "Live In" values on stackmaps and patchpoints. (A secondary goal here is to remove one of the last barriers to merging the pseudo instructions.) We start by adding the operands directly to the STATEPOINT SDNode. Once we've lowered to MI, we extend the remat logic used by the register allocator to fold virtual register uses into StackMap::Indirect entries as needed. This does rely on the fact that the register allocator rematerializes. If it didn't along some code path, we could end up with more vregs than physical registers and fail to allocate.

Today, we *only* fold in the register allocator. This can create some weird effects when combined with arguments passed on the stack because we don't fold them appropriately. I have an idea how to fix that, but it needs this patch in place to work on that effectively. (There's some weird interaction with the scheduler as well, more investigation needed.)

My near term plan is to land this patch off-by-default, experiment in my local tree to identify any correctness issues and then start fixing codegen problems one by one as I find them. Once I have the live-in lowering fully working (both correctness and code quality), I'm hoping to move on to the live-on-return semantics. Note: I don't have any *known* miscompiles with this patch enabled, but I'm pretty sure I'll find at least a couple. Thus, the "experimental" tag and the fact it's off by default.

Differential Revision: https://reviews.llvm.org/D24000

llvm-svn: 280250
2016-08-31 15:12:17 +00:00
James Molloy cacfc16109 Revert "[SimplifyCFG] Change the algorithm in SinkThenElseCodeToEnd"
This reverts commit r280216 - it caused buildbot failures.

llvm-svn: 280234
2016-08-31 13:16:52 +00:00
James Molloy 76c9d423a7 Revert "[SimplifyCFG] Handle tail-sinking of more than 2 incoming branches"
This reverts commit r280217. r280216 caused buildbot failures - backing out the entire chain.

llvm-svn: 280233
2016-08-31 13:16:45 +00:00
James Molloy 06a45483a1 Revert "[SimplifyCFG] Add a workaround to fix PR30188"
This reverts commit r280219. r280216 caused buildbot failures - backing out the entire chain.

llvm-svn: 280232
2016-08-31 13:16:36 +00:00
James Molloy 8a66a39cbf Revert "[SimplifyCFG] Fix bootstrap failure after r280220"
This reverts commit r280228. r280216 caused buildbot failures - backing out the entire sequence.

llvm-svn: 280231
2016-08-31 13:16:30 +00:00
James Molloy b7efa6c227 [SimplifyCFG] Fix bootstrap failure after r280220
We check that a sinking candidate is used by only one PHI node during our legality checks. However for instructions that are used by other sinking candidates our heuristic is less conservative. This can result in a candidate actually being illegal when we come to sink it because of how we sunk a predecessor. Do the used-by-only-one-PHI checks again during sinking to ensure we don't crash.

llvm-svn: 280228
2016-08-31 12:33:48 +00:00
James Molloy 171fdac7ce [SimplifyCFG] Add a workaround to fix PR30188
We're sinking stores, which is a good thing, but in the process creating selects for the store address operand, which SROA/Mem2Reg can't look through, which caused serious regressions.

The real fix is in SROA, which I'll be looking into.

llvm-svn: 280219
2016-08-31 10:46:45 +00:00
James Molloy c53b40b509 [SimplifyCFG] Handle tail-sinking of more than 2 incoming branches
This was a real restriction in the original version of SinkIfThenCodeToEnd. Now it's been rewritten, the restriction can be lifted.

As part of this, we handle a very common and useful case where one of the incoming branches is actually conditional. Consider:

   if (a)
     x(1);
   else if (b)
     x(2);

This produces the following CFG:

         [if]
        /    \
      [x(1)] [if]
        |     | \
        |     |  \
        |  [x(2)] |
         \    |  /
          [ end ]

[end] has two unconditional predecessor arcs and one conditional. The conditional refers to the implicit empty 'else' arc. This same pattern can also be caused by an empty default block in a switch.

We can't sink the call to x() down to end because no call to x() happens on the third incoming arc (assume that x() has sideeffects for the sake of argument; if something is safe to speculate we could indeed sink nevertheless but this cannot happen in the general case and causes many extra selects).

We are now able to detect this case and split off the unconditional arcs to a common successor:

         [if]
        /    \
      [x(1)] [if]
        |     | \
        |     |  \
        |  [x(2)] |
         \   /    |
     [sink.split] |
           \     /
           [ end ]

Now we can sink the call to x() into %sink.split. This can cause significant code simplification in many testcases.

llvm-svn: 280217
2016-08-31 10:46:33 +00:00
James Molloy 55bd04cd20 [SimplifyCFG] Change the algorithm in SinkThenElseCodeToEnd
r279460 rewrote this function to be able to handle more than two incoming edges and took pains to ensure this didn't regress anything.

This time we change the logic for determining if an instruction should be sunk. Previously we used a single pass greedy algorithm - sink instructions until one requires more than one PHI node or we run out of instructions to sink.

This had the problem that sinking instructions that had non-identical but trivially the same operands needed extra logic so we sunk them aggressively. For example:

    %a = load i32* %b          %d = load i32* %b
    %c = gep i32* %a, i32 0    %e = gep i32* %d, i32 1

Sinking %c and %e would naively require two PHI merges as %a != %d. But the loads are obviously equivalent (and maybe can't be hoisted because there is no common predecessor).

This is why we implemented the fairly complex function areValuesTriviallySame(), to look through trivial differences like this. However it's just not clever enough.

Instead, throw areValuesTriviallySame away, use pointer equality to check equivalence of operands and switch to a two-stage algorithm.

In the "scan" stage, we look at every sinkable instruction in isolation from end of block to front. If it's sinkable, we keep track of all operands that required PHI merging.

In the "sink" stage, we iteratively sink the last non-terminator in the source blocks. But when calculating how many PHIs are actually required to be inserted (to work out if we should stop or not) we remove any values that have already been sunk from the set of PHI-merges required, which allows us to be more aggressive.

This turns an algorithm with potentially recursive lookahead (looking through GEPs, casts, loads and any other instruction potentially not CSE'd) to two linear scans.

llvm-svn: 280216
2016-08-31 10:46:23 +00:00
James Molloy 923e98c232 [SimplifyCFG] Tail-merge calls with sideeffects
This was deliberately disabled during my rewrite of SinkIfThenToEnd to keep behaviour
at least vaguely consistent with the previous version and keep it as close to NFC as
I could.

There's no real reason not to merge sideeffect calls though, so let's do it! Small fixup
along the way to ensure we don't create indirect calls.

Should fix PR28964.

llvm-svn: 280215
2016-08-31 10:46:16 +00:00
Gor Nishanov 50d7fb974f [Coroutines] Part 10: Add coroutine promise support.
Summary:
1) CoroEarly now lowers llvm.coro.promise intrinsic that allows to obtain
a coroutine promise pointer from a coroutine frame and vice versa.

2) CoroFrame now interprets Promise argument of llvm.coro.begin to
place CoroutinPromise alloca at a deterministic offset from the coroutine frame.

Now, the coroutine promise example from docs\Coroutines.rst compiles and produces expected result (see test/Transform/Coroutines/ex4.ll).

Reviewers: majnemer

Subscribers: llvm-commits, mehdi_amini

Differential Revision: https://reviews.llvm.org/D23993

llvm-svn: 280184
2016-08-31 00:35:41 +00:00
Alina Sbirlea 3f8f7840bf [LoadStoreVectorizer] Change VectorSet to Vector to match head and tail positions. Resolves PR29148.
Summary:
LSV was using two vector sets (heads and tails) to track pairs of adjiacent position to vectorize.
A recent optimization is trying to obtain the longest chain to vectorize and assumes the positions
in heads(H) and tails(T) match, which is not the case is there are multiple tails for the same head.

e.g.:
i1: store a[0]
i2: store a[1]
i3: store a[1]
Leads to:
H: i1
T: i2 i3
Instead of:
H: i1 i1
T: i2 i3
So the positions for instructions that follow i3 will have different indexes in H/T.
This patch resolves PR29148.

This issue also surfaced the fact that if the chain is too long, and TLI
returns a "not-fast" answer, the whole chain will be abandoned for
vectorization, even though a smaller one would be beneficial.
Added a testcase and FIXME for this.

Reviewers: tstellarAMD, arsenm, jlebar

Subscribers: mzolotukhin, wdng, llvm-commits

Differential Revision: https://reviews.llvm.org/D24057

llvm-svn: 280179
2016-08-30 23:53:59 +00:00
Sanjay Patel ddb53dd080 [InstCombine] add tests to show type limitations of InsertRangeTest and callers
llvm-svn: 280175
2016-08-30 23:16:59 +00:00
Michael Kuperstein 2954d1db77 [LoopVectorizer] Predicate instructions in blocks with several incoming edges
We don't need to limit predication to blocks that have a single incoming
edge, we just need to use the right mask.
This fixes PR30172.

Differential Revision: https://reviews.llvm.org/D24009

llvm-svn: 280148
2016-08-30 20:22:21 +00:00
Daniel Berlin c943d72d94 IntrArgMemOnly is only defined (and current AA machinery only sanely supports) pointer arguments, and these intrinsics have vector of pointer arguments. Remove ArgMemOnly until we either have the machinery, define a new attribute, or something similar
llvm-svn: 280143
2016-08-30 19:58:48 +00:00
Sanjay Patel b37145712e [InstCombine] replace divide-by-constant checks with asserts; NFC
These folds already have tests for scalar and vector types, except 
for the vector div-by-0 case, so I'm adding tests for that.

llvm-svn: 280115
2016-08-30 17:31:34 +00:00
James Molloy d13b1239e4 [SimplifyCFG] Properly CSE metadata in SinkThenElseCodeToEnd
This was missing, meaning the metadata in sunk instructions was potentially bogus and could cause miscompiles.

llvm-svn: 280072
2016-08-30 10:56:08 +00:00
Teresa Johnson 8c1bc986b5 [ThinLTO] Indirect call promotion fixes for promoted local functions
Summary:
Fix a couple issues limiting the application of indirect call promotion
in ThinLTO mode:
- Invoke indirect call promotion before globalopt, since it may
  eliminate imported functions which appear unreferenced.
- Invoke indirect call promotion with InLTO=true so that the PGOFuncName
  metadata is used to get the name for locals which would have been
  renamed during promotion.

Reviewers: davidxl, mehdi_amini

Subscribers: Prazek, llvm-commits, mehdi_amini

Differential Revision: https://reviews.llvm.org/D24004

llvm-svn: 280024
2016-08-29 22:46:56 +00:00
Matthew Simpson df19502b16 [LV] Move insertelement sequence after scalar definitions
After r279649 when getting a vector value from VectorLoopValueMap, we create an
insertelement sequence on-demand if the value has been scalarized instead of
vectorized. We previously inserted this insertelement sequence before the
value's first vector user. However, this insert location is problematic if that
user is the phi node of a first-order recurrence. With this patch, we move the
insertelement sequence after the last scalar instruction we created when
scalarizing the value. Thus, the value's vector definition in the new loop will
immediately follow its scalar definitions. This should fix PR30183.

Reference: https://llvm.org/bugs/show_bug.cgi?id=30183
llvm-svn: 280001
2016-08-29 20:14:04 +00:00
David Majnemer e8fd5f9ffd [SimplifyCFG] Hoisting invalidates metadata
We forgot to remove optimization metadata when performing hosting during
FoldTwoEntryPHINode.

This fixes PR29163.

llvm-svn: 279980
2016-08-29 17:14:08 +00:00
Anna Thomas 2bc129c5fd [StatepointsForGC] Rematerialize in the presence of PHIs
Summary:
While walking the use chain for identifying rematerializable values in RS4GC,
add the case where the current value and base value are the same PHI nodes.

This will aid rematerialization of geps and casts instead of relocating.

Reviewers: sanjoy, reames, igor

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D23920

llvm-svn: 279975
2016-08-29 15:41:59 +00:00
Sanjay Patel 25475bcc0c [Constant] remove fdiv and frem from canTrap()
Assuming the default FP env, we should not treat fdiv and frem any differently in terms of
trapping behavior than any other FP op. Ie, FP ops do not trap with the default FP env.

This matches how we treat the fdiv/frem in IR with isSafeToSpeculativelyExecute() and in 
the backend after:
https://reviews.llvm.org/rL279970

llvm-svn: 279973
2016-08-29 15:27:17 +00:00
Sanjay Patel 20f02b271b [SimplifyCFG] rename test file, regenerate checks, and add test
The fdiv test shows a problem similar to:
https://reviews.llvm.org/rL279970

llvm-svn: 279972
2016-08-29 14:57:53 +00:00
Gor Nishanov dce9b02677 [Coroutines] Part 9: Add cleanup subfunction.
Summary:
[Coroutines] Part 9: Add cleanup subfunction.

This patch completes coroutine heap allocation elision. Now, the heap elision example from docs\Coroutines.rst compiles and produces expected result (see test/Transform/Coroutines/ex3.ll)

Intrinsic Changes:
* coro.free gets a token parameter tying it to coro.id to allow reliably discovering all coro.frees associated with a particular coroutine.
* coro.id gets an extra parameter that points back to a coroutine function. This allows to check whether a coro.id describes the enclosing function or it belongs to a different function that was later inlined.

CoroSplit now creates three subfunctions:
# f$resume - resume logic
# f$destroy - cleanup logic, followed by a deallocation code
# f$cleanup - just the cleanup code

CoroElide pass during devirtualization replaces coro.destroy with either f$destroy or f$cleanup depending whether heap elision is performed or not.

Other fixes, improvements:
* Fixed buglet in Shape::buildFrame that was not creating coro.save properly if coroutine has more than one suspend point.

* Switched to using variable width suspend index field (no longer limited to 32 bit index field can be as little as i1 or as large as i<whatever-size_t-is>)

Reviewers: majnemer

Subscribers: llvm-commits, mehdi_amini

Differential Revision: https://reviews.llvm.org/D23844

llvm-svn: 279971
2016-08-29 14:34:12 +00:00
Sanjay Patel 5c5311f4e5 [InstCombine] use m_APInt to allow icmp (and X, Y), C folds for splat constant vectors
llvm-svn: 279937
2016-08-28 18:18:00 +00:00
Elena Demikhovsky 3622fbfc68 [Loop Vectorizer] Fixed memory confilict checks.
Fixed a bug in run-time checks for possible memory conflicts inside loop.
The bug is in Low <-> High boundaries calculation. The High boundary should be calculated as "last memory access pointer + element size".

Differential revision: https://reviews.llvm.org/D23176

llvm-svn: 279930
2016-08-28 08:53:53 +00:00
Adam Nemet cef3314156 [Inliner] Report when inlining fails because callee's def is unavailable
Summary:
This is obviously an interesting case because it may motivate code
restructuring or LTO.

Reporting this requires instantiation of ORE in the loop where the call
sites are first gathered.  I've checked compile-time
overhead *with* -Rpass-with-hotness and the worst slow-down was 6% in
mcf and quickly tailing off.  As before without -Rpass-with-hotness
there is no overhead.

Because this could be a pretty noisy diagnostics, it is currently
qualified as 'verbose'.  As of this patch, 'verbose' diagnostics are
only emitted with -Rpass-with-hotness, i.e. when the output is expected
to be filtered.

Reviewers: eraman, chandlerc, davidxl, hfinkel

Subscribers: tejohnson, Prazek, davide, llvm-commits

Differential Revision: https://reviews.llvm.org/D23415

llvm-svn: 279860
2016-08-26 20:21:05 +00:00
Tim Shen 3ad8b43cc2 [MemCpy] Add comments for r279769
Differential Revision: https://reviews.llvm.org/D23846

llvm-svn: 279778
2016-08-25 21:03:46 +00:00
Tim Shen a3dbead2d6 [MemCpy] Check for alias in performMemCpyToMemSetOptzn, instead of the identity of two operands
Summary:
This fixes pr29105. The reason is that lifetime marks creates new
aliasing pointers the original ones, but before this patch aliases
were not checked in performMemCpyToMemSetOptzn.

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D23846

llvm-svn: 279769
2016-08-25 19:27:26 +00:00
Sebastian Pop 5f0d0e60d1 GVN-hoist: fix hoistingFromAllPaths for loops (PR29034)
It is invalid to hoist stores or loads if they are not executed on all paths
from the hoisting point to the exit of the function. In the testcase, there are
paths in the loop that do not execute the stores or the loads, and so hoisting
them within the loop is unsafe.

The problem is that the current implementation of hoistingFromAllPaths is
incomplete: it walks all blocks dominated by the hoisting point, and does not
return false when the loop contains a path on which the hoisted ld/st is
not executed.

Differential Revision: https://reviews.llvm.org/D23843

llvm-svn: 279732
2016-08-25 11:55:47 +00:00
Xinliang David Li cad3a995a4 [Profile] Propagate branch metadata properly in instcombine
Differential Revision: http://reviews.llvm.org/D23590

llvm-svn: 279693
2016-08-25 00:26:32 +00:00
Sanjay Patel d398d4a39e [InstCombine] use m_APInt to allow icmp eq/ne (shr X, C2), C folds for splat constant vectors
llvm-svn: 279677
2016-08-24 22:22:06 +00:00
Matthew Simpson abd2be1e2e [LV] Unify vector and scalar maps
This patch unifies the data structures we use for mapping instructions from the
original loop to their corresponding instructions in the new loop. Previously,
we maintained two distinct maps for this purpose: WidenMap and ScalarIVMap.
WidenMap maintained the vector values each instruction from the old loop was
represented with, and ScalarIVMap maintained the scalar values each scalarized
induction variable was represented with. With this patch, all values created
for the new loop are maintained in VectorLoopValueMap.

The change allows for several simplifications. Previously, when an instruction
was scalarized, we had to insert the scalar values into vectors in order to
maintain the mapping in WidenMap. Then, if a user of the scalarized value was
also scalar, we had to extract the scalar values from the temporary vector we
created. We now aovid these unnecessary scalar-to-vector-to-scalar conversions.
If a scalarized value is used by a scalar instruction, the scalar value is used
directly. However, if the scalarized value is needed by a vector instruction,
we generate the needed insertelement instructions on-demand.

A common idiom in several locations in the code (including the scalarization
code), is to first get the vector values an instruction from the original loop
maps to, and then extract a particular scalar value. This patch adds
getScalarValue for this purpose along side getVectorValue as an interface into
VectorLoopValueMap. These functions work together to return the requested
values if they're available or to produce them if they're not.

The mapping has also be made less permissive. Entries can be added to
VectorLoopValue map with the new initVector and initScalar functions.
getVectorValue has been modified to return a constant reference to the mapped
entries.

There's no real functional change with this patch; however, in some cases we
will generate slightly different code. For example, instead of an insertelement
sequence following the definition of an instruction, it will now precede the
first use of that instruction. This can be seen in the test case changes.

Differential Revision: https://reviews.llvm.org/D23169

llvm-svn: 279649
2016-08-24 18:23:17 +00:00
Sanjoy Das ff855b6020 [SCCP] Don't delete side-effecting instructions
I'm not sure if the `!isa<CallInst>(Inst) &&
!isa<TerminatorInst>(Inst))` bit is correct either, but this fixes the
case we know is broken.

llvm-svn: 279647
2016-08-24 18:10:21 +00:00
Gil Rapaport 550148b2f6 [Loop Vectorizer] Support predication of div/rem
div/rem instructions in basic blocks that require predication currently prevent
vectorization. This patch extends the existing mechanism for predicating stores
to handle other instructions and leverages it to predicate divs and rems.

Differential Revision: https://reviews.llvm.org/D22918

llvm-svn: 279620
2016-08-24 11:37:57 +00:00
Gor Nishanov 241b041fba [Coroutines] Part 8: Coroutine Frame Building algorithm
Summary:
This patch adds coroutine frame building algorithm. Now, simple coroutines such as ex0.ll and ex1.ll (first examples from docs\Coroutines.rst can be compiled).

Documentation and overview is here: http://llvm.org/docs/Coroutines.html.

Upstreaming sequence (rough plan)
1.Add documentation. (https://reviews.llvm.org/D22603)
2.Add coroutine intrinsics. (https://reviews.llvm.org/D22659)
...

7. Split coroutine into subfunctions. (https://reviews.llvm.org/D23461)
8. Coroutine Frame Building algorithm  <= we are here
9. Add f.cleanup subfunction.
10+. The rest of the logic

Reviewers: majnemer

Subscribers: mehdi_amini, llvm-commits

Differential Revision: https://reviews.llvm.org/D23586

llvm-svn: 279609
2016-08-24 04:44:35 +00:00
Matthew Simpson df2ab917ad [SLP] Avoid signed integer overflow
The test case included with r279125 exposed an existing signed integer
overflow. Since getTreeCost can return INT_MAX, we can't sum this cost together
with other costs, such as getReductionCost.

This patch removes the possibility of assigning a cost of INT_MAX. Since we
were previously using INT_MAX as an indicator for "should not vectorize", we
now explicitly check this condition with "isTreeTinyAndNotFullyVectorizable"
before computing a cost.

This patch adds a run-line to the test case used for r279125 that ensures we
don't vectorize. Previously, this line would vectorize the test case by chance
due to undefined behavior in the cost calculation.

Differential Revision: https://reviews.llvm.org/D23723

llvm-svn: 279562
2016-08-23 20:48:50 +00:00
Sanjay Patel 6946e2ade3 [InstSimplify] allow icmp with constant folds for splat vectors, part 2
Completes the m_APInt changes for simplifyICmpWithConstant().

Other commits in this series:
https://reviews.llvm.org/rL279492
https://reviews.llvm.org/rL279530
https://reviews.llvm.org/rL279534
https://reviews.llvm.org/rL279538

llvm-svn: 279543
2016-08-23 18:00:51 +00:00
Sanjay Patel 200e3cbfb0 [InstSimplify] allow icmp with constant folds for splat vectors, part 1
llvm-svn: 279538
2016-08-23 17:30:56 +00:00
Sanjay Patel ada2bb3d5d [InstSimplify] add tests to show missing vector icmp folds
llvm-svn: 279534
2016-08-23 17:13:38 +00:00
Sanjay Patel 5c269d0b7a [InstSimplify] move icmp with constant tests to another file; NFC
...because like the corresponding code, this is just too big to keep adding to.
And the next step is to add a vector version of each of these tests to show
missed folds.

Also, auto-generate CHECK lines and add comments for the tests that correspond to
the source code.

llvm-svn: 279530
2016-08-23 16:46:53 +00:00
Sanjay Patel a392049419 [InstCombine] use m_APInt to allow icmp (shr exact X, Y), 0 folds for splat constant vectors
llvm-svn: 279472
2016-08-22 20:45:06 +00:00
James Molloy 5bf2114265 [SimplifyCFG] Rewrite SinkThenElseCodeToEnd
[Recommitting now an unrelated assertion in SROA is sorted out]

The new version has several advantages:
  1) IMSHO it's more readable and neater
  2) It handles loads and stores properly
  3) It can handle any number of incoming blocks rather than just two. I'll be taking advantage of this in a followup patch.

With this change we can now finally sink load-modify-store idioms such as:

    if (a)
      return *b += 3;
    else
      return *b += 4;

    =>

    %z = load i32, i32* %y
    %.sink = select i1 %a, i32 5, i32 7
    %b = add i32 %z, %.sink
    store i32 %b, i32* %y
    ret i32 %b

When this works for switches it'll be even more powerful.

Round 4. This time we should handle all instructions correctly, and not replace any operands that need to be constant with variables.

This was really hard to determine safely, so the helper function should be put into the Instruction API. I'll do that as a followup.

llvm-svn: 279460
2016-08-22 19:07:15 +00:00
Jun Bum Lim ec8b8cc595 [InstCombine] Allow sinking from unique predecessor with multiple edges
Summary: We can allow sinking if the single user block has only one unique predecessor, regardless of the number of edges. Note that a switch statement with multiple cases can have the same destination.

Reviewers: mcrosier, majnemer, spatel, reames

Subscribers: reames, mcrosier, llvm-commits

Differential Revision: https://reviews.llvm.org/D23722

llvm-svn: 279448
2016-08-22 18:21:56 +00:00
James Molloy 475f4a763f Revert "[SimplifyCFG] Rewrite SinkThenElseCodeToEnd"
This reverts commit r279443. It caused buildbot failures.

llvm-svn: 279447
2016-08-22 18:13:12 +00:00
James Molloy 353052698a [SimplifyCFG] Rewrite SinkThenElseCodeToEnd
The new version has several advantages:
  1) IMSHO it's more readable and neater
  2) It handles loads and stores properly
  3) It can handle any number of incoming blocks rather than just two. I'll be taking advantage of this in a followup patch.

With this change we can now finally sink load-modify-store idioms such as:

    if (a)
      return *b += 3;
    else
      return *b += 4;

    =>

    %z = load i32, i32* %y
    %.sink = select i1 %a, i32 5, i32 7
    %b = add i32 %z, %.sink
    store i32 %b, i32* %y
    ret i32 %b

When this works for switches it'll be even more powerful.

Round 4. This time we should handle all instructions correctly, and not replace any operands that need to be constant with variables.

This was really hard to determine safely, so the helper function should be put into the Instruction API. I'll do that as a followup.

llvm-svn: 279443
2016-08-22 17:40:23 +00:00
Artur Pilipenko bc76ecada0 Revert -r278267 [ValueTracking] An improvement to IR ValueTracking on Non-negative Integers
This change cause performance regression on MultiSource/Benchmarks/TSVC/Symbolics-flt/Symbolics-flt from LNT and some other bechmarks.

See https://reviews.llvm.org/D18777 for details.

llvm-svn: 279433
2016-08-22 13:14:07 +00:00
Artur Pilipenko b78ad9d41f Revert -r278269 [IndVarSimplify] Eliminate zext of a signed IV when the IV is known to be non-negative
This change needs to be reverted in order to revert -r278267 which cause performance regression on MultiSource/Benchmarks/TSVC/Symbolics-flt/Symbolics-flt from LNT and some other bechmarks.

See comments on https://reviews.llvm.org/D18777 for details.

llvm-svn: 279432
2016-08-22 13:12:07 +00:00
Balaram Makam a927aa4ad0 [PM] Port LoopDataPrefetch AArch64 tests to new pass manager
Reviewers: mcrosier, tejohnson

Subscribers: aemerson, rengolin, mcrosier, llvm-commits

Differential Revision: https://reviews.llvm.org/D23724

llvm-svn: 279431
2016-08-22 12:59:58 +00:00
Sanjay Patel 643d21a62c [InstCombine] use m_APInt to allow icmp (shl X, Y), C folds for splat constant vectors, part 4
This concludes the fixes for icmp+shl in this series:
https://reviews.llvm.org/rL279339
https://reviews.llvm.org/rL279398
https://reviews.llvm.org/rL279399

llvm-svn: 279401
2016-08-21 17:10:07 +00:00
Sanjay Patel 163a5ab799 remove FIXME comment; fixed by previous commit
llvm-svn: 279400
2016-08-21 16:40:42 +00:00
Sanjay Patel 7ffcde7422 [InstCombine] use m_APInt to allow icmp (shl X, Y), C folds for splat constant vectors, part 3
This is a partial enablement (move the ConstantInt guard down).

llvm-svn: 279399
2016-08-21 16:35:34 +00:00
Sanjay Patel 7e09f13fed [InstCombine] use m_APInt to allow icmp (shl X, Y), C folds for splat constant vectors, part 2
This is a partial enablement (move the ConstantInt guard down).

llvm-svn: 279398
2016-08-21 16:28:22 +00:00
Matthew Simpson 235e479984 Reapply "[SLP] Initialize VectorizedValue when gathering"
The test case included in r279125 exposed existing undefined behavior in the
SLP vectorizer that it did not introduce. This patch reapplies the original
patch, but modifies the test case to avoid hitting the undefined behavior. This
allows us to close PR28330 while keeping the UBSan bot happy. The undefined
behavior the original test uncovered will be addressed in a follow-on patch.

Reference: https://llvm.org/bugs/show_bug.cgi?id=28330
llvm-svn: 279370
2016-08-20 14:49:02 +00:00
Vitaly Buka cc7db13bf0 Revert "[SLP] Initialize VectorizedValue when gathering" to fix ubsan bot.
This reverts commit r279125.

https://reviews.llvm.org/D23410

llvm-svn: 279363
2016-08-20 07:09:39 +00:00
Xinliang David Li 3bf2d58a21 [Profile] add test with large counts
llvm-svn: 279361
2016-08-20 05:28:42 +00:00
Sanjay Patel fa7de606c4 [InstCombine] use m_APInt to allow icmp (shl X, Y), C folds for splat constant vectors, part 1
This is a partial enablement (move the ConstantInt guard down) because there are many
different folds here and one of the later ones will require reworking 'isSignBitCheck'.

llvm-svn: 279339
2016-08-19 22:33:26 +00:00
Reid Kleckner 98a48afa5d Revert "[SimplifyCFG] Rewrite SinkThenElseCodeToEnd"
This reverts commit r279229. It breaks intrinsic function calls in
diamonds.

llvm-svn: 279313
2016-08-19 20:22:39 +00:00
Reid Kleckner a871d3872a Fix regression in InstCombine introduced by r278944
The intended transform is:
  // Simplify icmp eq (or (ptrtoint P), (ptrtoint Q)), 0
  // -> and (icmp eq P, null), (icmp eq Q, null).

P and Q are both pointer types, but may have different types. We need
two calls to getNullValue() to make the icmps.

llvm-svn: 279271
2016-08-19 16:53:18 +00:00
David Majnemer 5554edabef [CloneFunction] Don't remove unrelated nodes from the CGSSC
CGSCC use a WeakVH to track call sites.  RAUW a call within a function
can result in that WeakVH getting confused about whether or not the call
site is still around.

llvm-svn: 279268
2016-08-19 16:37:40 +00:00
Sanjay Patel a867afe094 [InstCombine] use m_APInt to allow icmp (shl 1, Y), C folds for splat constant vectors
llvm-svn: 279266
2016-08-19 16:12:16 +00:00
Sanjay Patel 57b12d3876 [InstCombine] use m_APInt to allow icmp X, C folds for splat constant vectors
Of course, we really need to refactor and fix all of the cmp predicates, 
but this one is interesting because without it, we later perform an 
information-losing transform of icmp (shl 1, Y), C, and we can't recover
the better fold.

llvm-svn: 279263
2016-08-19 15:40:44 +00:00
Sanjay Patel 78111a7617 [InstCombine] add tests for missing vector icmp folds
llvm-svn: 279259
2016-08-19 15:27:28 +00:00
Sanjay Patel 14cdf1968f [InstCombine] add missing tests for basic icmp folds
These are implicitly included as part of larger test cases, but they don't 
exist stand-alone (and don't happen for vectors...).

llvm-svn: 279257
2016-08-19 15:21:45 +00:00
James Molloy 11a1936b70 [SimplifyCFG] Rewrite SinkThenElseCodeToEnd
The new version has several advantages:
  1) IMSHO it's more readable and neater
  2) It handles loads and stores properly
  3) It can handle any number of incoming blocks rather than just two. I'll be taking advantage of this in a followup patch.

With this change we can now finally sink load-modify-store idioms such as:

    if (a)
      return *b += 3;
    else
      return *b += 4;

    =>

    %z = load i32, i32* %y
    %.sink = select i1 %a, i32 5, i32 7
    %b = add i32 %z, %.sink
    store i32 %b, i32* %y
    ret i32 %b

When this works for switches it'll be even more powerful.

llvm-svn: 279229
2016-08-19 10:10:27 +00:00
Amaury Sechet 763c59dc9a Make cltz and cttz zero undef when the operand cannot be zero in InstCombine
Summary: Also add popcount(n) == bitsize(n)  -> n == -1 transformation.

Reviewers: majnemer, spatel

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D23134

llvm-svn: 279141
2016-08-18 20:43:50 +00:00
Sanjay Patel 40e8ca46ad [InstCombine] use m_APInt to allow icmp (trunc X, Y), C folds for splat constant vectors
This is a sibling of:
https://reviews.llvm.org/rL278859
https://reviews.llvm.org/rL278935
https://reviews.llvm.org/rL278945
https://reviews.llvm.org/rL279066
https://reviews.llvm.org/rL279077
https://reviews.llvm.org/rL279101

llvm-svn: 279133
2016-08-18 20:28:54 +00:00
Matthew Simpson 11db6b6b8c [SLP] Initialize VectorizedValue when gathering
We abort building vectorizable trees in some cases (e.g., if the maximum
recursion depth is reached, if the region size is too large, etc.). If this
happens for a reduction, we can be left with a root entry that needs to be
gathered. For these cases, we need make sure we actually set VectorizedValue to
the resulting vector.

This patch ensures we properly set VectorizedValue, and it also ensures the
insertelement sequence generated for the gathers is inserted at the correct
location.

Reference: https://llvm.org/bugs/show_bug.cgi?id=28330
Differential Revison: https://reviews.llvm.org/D23410

llvm-svn: 279125
2016-08-18 19:50:32 +00:00
Sanjay Patel fa5ca2bf46 [InstCombine] use m_APInt to allow icmp (udiv X, Y), C folds for splat constant vectors
This is a sibling of:
https://reviews.llvm.org/rL278859
https://reviews.llvm.org/rL278935
https://reviews.llvm.org/rL278945
https://reviews.llvm.org/rL279066
https://reviews.llvm.org/rL279077

llvm-svn: 279101
2016-08-18 17:55:59 +00:00
Artur Pilipenko 615b820af6 CVP. Turn marking adds as no wrap (introduced by r278107) off by default
It causes a regression on our internal benchmark. Introduce cvp-dont-process flag and set it off by default while investigating the regression.

llvm-svn: 279082
2016-08-18 16:08:35 +00:00
Sanjay Patel 6347807f87 [InstCombine] use m_APInt to allow icmp (mul X, Y), C folds for splat constant vectors
This is a sibling of:
https://reviews.llvm.org/rL278859
https://reviews.llvm.org/rL278935
https://reviews.llvm.org/rL278945
https://reviews.llvm.org/rL279066

llvm-svn: 279077
2016-08-18 15:44:44 +00:00
Sanjay Patel 4c5e60d95c [InstCombine] use m_APInt to allow icmp (xor X, Y), C folds for splat constant vectors
This is a sibling of:
https://reviews.llvm.org/rL278859
https://reviews.llvm.org/rL278935
https://reviews.llvm.org/rL278945

llvm-svn: 279066
2016-08-18 14:10:48 +00:00
Chad Rosier 83f6bbc154 [Reassociate] Add test for PR28367.
llvm-svn: 279063
2016-08-18 13:22:37 +00:00
Sanjay Patel 3c92db7560 [InstCombine] add test for missing vector icmp fold
Also, add a scalar test to demonstrate one of the intermediate folds that
is necessary to accomplish the existing, multi-step test. And simplify
the vector tests to only check the final piece of that multi-step transform.

llvm-svn: 278995
2016-08-17 22:18:57 +00:00
Sanjay Patel 84ff18ba92 [InstCombine] minimize tests and autogenerate checks
llvm-svn: 278960
2016-08-17 19:56:10 +00:00
Sanjay Patel 63e14a07e8 [InstCombine] use m_APInt to allow icmp (or X, Y), C folds for splat constant vectors
This is a sibling of:
https://reviews.llvm.org/rL278859
https://reviews.llvm.org/rL278935

llvm-svn: 278945
2016-08-17 16:38:57 +00:00
Sanjay Patel f636d762ed [InstCombine] add tests for missing vector icmp folds
llvm-svn: 278943
2016-08-17 16:23:15 +00:00
Chad Rosier ea7e4647db Revert "Reassociate: Reprocess RedoInsts after each inst".
This reverts commit r258830, which introduced a bug described in PR28367.

PR28367

llvm-svn: 278938
2016-08-17 15:54:39 +00:00
Sanjay Patel 4f7eb2aa95 [InstCombine] use m_APInt to allow icmp (add X, Y), C folds for splat constant vectors
This is a sibling of:
https://reviews.llvm.org/rL278859

llvm-svn: 278935
2016-08-17 15:24:30 +00:00
Chad Rosier a6822f64f3 Revert "[Reassociate] Avoid iterator invalidation when negating value."
This reverts commit r278928 due to lit test failures.

llvm-svn: 278929
2016-08-17 14:31:34 +00:00
Chad Rosier cf3e8121a6 [Reassociate] Avoid iterator invalidation when negating value.
Differential Revision: https://reviews.llvm.org/D23464
PR28367

llvm-svn: 278928
2016-08-17 14:16:45 +00:00
Chandler Carruth 67fc52f067 [PM] Port the always inliner to the new pass manager in a much more
minimal and boring form than the old pass manager's version.

This pass does the very minimal amount of work necessary to inline
functions declared as always-inline. It doesn't support a wide array of
things that the legacy pass manager did support, but is alse ... about
20 lines of code. So it has that going for it. Notably things this
doesn't support:

- Array alloca merging
  - To support the above, bottom-up inlining with careful history
    tracking and call graph updates
- DCE of the functions that become dead after this inlining.
- Inlining through call instructions with the always_inline attribute.
  Instead, it focuses on inlining functions with that attribute.

The first I've omitted because I'm hoping to just turn it off for the
primary pass manager. If that doesn't pan out, I can add it here but it
will be reasonably expensive to do so.

The second should really be handled by running global-dce after the
inliner. I don't want to re-implement the non-trivial logic necessary to
do comdat-correct DCE of functions. This means the -O0 pipeline will
have to be at least 'always-inline,global-dce', but that seems
reasonable to me. If others are seriously worried about this I'd like to
hear about it and understand why. Again, this is all solveable by
factoring that logic into a utility and calling it here, but I'd like to
wait to do that until there is a clear reason why the existing
pass-based factoring won't work.

The final point is a serious one. I can fairly easily add support for
this, but it seems both costly and a confusing construct for the use
case of the always inliner running at -O0. This attribute can of course
still impact the normal inliner easily (although I find that
a questionable re-use of the same attribute). I've started a discussion
to sort out what semantics we want here and based on that can figure out
if it makes sense ta have this complexity at O0 or not.

One other advantage of this design is that it should be quite a bit
faster due to checking for whether the function is a viable candidate
for inlining exactly once per function instead of doing it for each call
site.

Anyways, hopefully a reasonable starting point for this pass.

Differential Revision: https://reviews.llvm.org/D23299

llvm-svn: 278896
2016-08-17 02:56:20 +00:00
Sanjay Patel 7ad324b396 [InstCombine] add tests for fold with no coverage and missing vector fold
llvm-svn: 278867
2016-08-16 23:18:42 +00:00
Sanjay Patel e47df1ac62 [InstCombine] use m_APInt to allow icmp (sub X, Y), C folds for splat constant vectors
llvm-svn: 278859
2016-08-16 21:53:19 +00:00
David Majnemer 00940fb854 Make MDNode::intersect faster than O(n * m)
It is pretty easy to get it down to O(nlogn + mlogm).  This
implementation has the added benefit of automatically deduplicating
entries between the two sets.

llvm-svn: 278837
2016-08-16 18:48:37 +00:00
David Majnemer fa0f1e660b Don't passively concatenate MDNodes
I have audited all the callers of concatenate and none require duplicate
entries to service concatenation.
These duplicates serve no purpose but to needlessly embiggen the IR.

N.B. Layering getMostGenericAliasScope on top of concatenate makes it
O(nlogn + mlogm) instead of O(n*m).

llvm-svn: 278836
2016-08-16 18:48:34 +00:00
Gor Nishanov 74309fa014 [Coroutines] Part 7: Split coroutine into subfunctions
Summary:
This patch adds simple coroutine splitting logic to CoroSplit pass.

Documentation and overview is here: http://llvm.org/docs/Coroutines.html.

Upstreaming sequence (rough plan)
1.Add documentation. (https://reviews.llvm.org/D22603)
2.Add coroutine intrinsics. (https://reviews.llvm.org/D22659)
...
7. Split coroutine into subfunctions <= we are here
8. Coroutine Frame Building algorithm
9. Handle coroutine with unwinds
10+. The rest of the logic

Reviewers: majnemer

Subscribers: llvm-commits, mehdi_amini

Differential Revision: https://reviews.llvm.org/D23461

llvm-svn: 278830
2016-08-16 18:04:14 +00:00
David Majnemer 5c5df6283a [InstSimplify] Fold gep (gep V, C), (xor V, -1) to C-1
llvm-svn: 278779
2016-08-16 06:13:46 +00:00
Sanjay Patel 46a68ba618 [InstCombine] add tests for missing vector icmp folds
llvm-svn: 278768
2016-08-16 00:48:38 +00:00
Sanjay Patel f1bf21c56b [InstCombine] add tests for missing vector icmp folds
llvm-svn: 278765
2016-08-16 00:27:12 +00:00
Sanjay Patel df77a4dbb0 [InstCombine] add tests for missing vector icmp folds
llvm-svn: 278757
2016-08-15 22:43:52 +00:00
Sanjay Patel 41520e1712 [InstCombine] add tests for missing vector icmp folds
llvm-svn: 278751
2016-08-15 21:47:50 +00:00
Sanjay Patel 638b613101 [InstCombine] add tests for missing vector icmp folds
llvm-svn: 278747
2016-08-15 21:37:24 +00:00
Sanjay Patel 55d87a88cc update tests to use FileCheck and exact checking
llvm-svn: 278741
2016-08-15 21:02:25 +00:00
Sanjay Patel 3e9acec2fa [InstCombine] add tests for missing vector icmp folds
llvm-svn: 278737
2016-08-15 20:56:11 +00:00
Sanjay Patel b37bd6d7b7 [InstCombine] add test for missing vector icmp fold
llvm-svn: 278727
2016-08-15 20:02:40 +00:00
Sanjay Patel 7d98be81cc [InstCombine] add tests for vector icmp folds
llvm-svn: 278726
2016-08-15 19:58:21 +00:00
Sanjay Patel b860859611 [InstCombine] add tests for missing vector icmp folds
llvm-svn: 278717
2016-08-15 19:16:33 +00:00
Sanjay Patel 6866b82a05 update test to use FileCheck and autogenerated checks
llvm-svn: 278714
2016-08-15 18:56:10 +00:00
Sanjay Patel 2044a8eba9 [InstCombine] add tests for missing vector icmp folds
llvm-svn: 278709
2016-08-15 18:45:10 +00:00
Sanjay Patel aaf34d1bfc [InstCombine] add test for missing vector icmp fold
llvm-svn: 278708
2016-08-15 18:39:54 +00:00
Sanjay Patel 195eb9340a minimize test
llvm-svn: 278707
2016-08-15 18:35:44 +00:00
Sanjay Patel 3f506daf8c remove unnecessary IR comments about uses
llvm-svn: 278705
2016-08-15 18:32:50 +00:00
Sanjay Patel d391b0d69e [InstCombine] add tests for missing vector icmp folds
llvm-svn: 278704
2016-08-15 18:26:56 +00:00
Sanjay Patel cbd62a082c [InstCombine] add tests for missing vector icmp folds
llvm-svn: 278689
2016-08-15 17:55:39 +00:00
Sanjay Patel 566b348987 [InstCombine] auto-generate exact checks
Note that several of these tests belong in InstSimplify rather than
InstCombine because they return existing operands or constants.

llvm-svn: 278684
2016-08-15 17:19:07 +00:00
Sanjay Patel a7b9bb3785 [InstCombine] add tests for missing vector icmp folds
llvm-svn: 278683
2016-08-15 17:10:35 +00:00
Reid Kleckner 70a600b8bb Revert "[SimplifyCFG] Rewrite SinkThenElseCodeToEnd"
This reverts commit r278660.

It causes downstream assertion failure in InstCombine on shuffle
instructions. Comes up in __mm_swizzle_epi32.

llvm-svn: 278672
2016-08-15 15:42:31 +00:00
James Molloy 9a3c82f5cf [SimplifyCFG] Rewrite SinkThenElseCodeToEnd
The new version has several advantages:
  1) IMSHO it's more readable and neater
  2) It handles loads and stores properly
  3) It can handle any number of incoming blocks rather than just two. I'll be taking advantage of this in a followup patch.

With this change we can now finally sink load-modify-store idioms such as:

    if (a)
      return *b += 3;
    else
      return *b += 4;

    =>

    %z = load i32, i32* %y
    %.sink = select i1 %a, i32 5, i32 7
    %b = add i32 %z, %.sink
    store i32 %b, i32* %y
    ret i32 %b

When this works for switches it'll be even more powerful.

llvm-svn: 278660
2016-08-15 08:04:56 +00:00
James Molloy 196ad0823e [LSR] Don't try and create post-inc expressions on non-rotated loops
If a loop is not rotated (for example when optimizing for size), the latch is not the backedge. If we promote an expression to post-inc form, we not only increase register pressure and add a COPY for that IV expression but for all IVs!

Motivating testcase:

    void f(float *a, float *b, float *c, int n) {
      while (n-- > 0)
        *c++ = *a++ + *b++;
    }

It's imperative that the pointer increments be located in the latch block and not the header block; if not, we cannot use post-increment loads and stores and we have to keep both the post-inc and pre-inc values around until the end of the latch which bloats register usage.

llvm-svn: 278658
2016-08-15 07:53:03 +00:00
Sanjay Patel 52fe9ae990 [InstCombine] add test for missing vector icmp fold
llvm-svn: 278639
2016-08-14 22:56:46 +00:00
Sanjay Patel 7e57b00274 [InstCombine] add tests for vector icmp folds
llvm-svn: 278637
2016-08-14 22:44:10 +00:00
Sanjay Patel 8554f70c07 [InstCombine] add test for potentially missing vector icmp fold
llvm-svn: 278636
2016-08-14 22:30:07 +00:00
Sanjay Patel beebe05af1 [InstCombine] add test for missing vector icmp fold
llvm-svn: 278635
2016-08-14 22:29:27 +00:00
Sanjay Patel ba1f9fbddc [InstCombine] add tests for missing vector icmp folds
llvm-svn: 278634
2016-08-14 22:28:50 +00:00
Sanjay Patel f6559404d5 [InstCombine] remove unnecessary function attributes from tests
llvm-svn: 278633
2016-08-14 21:48:21 +00:00
Sanjay Patel b44ca3bfa9 [InstCombine] add tests for missing vector icmp folds
llvm-svn: 278632
2016-08-14 21:36:22 +00:00
Sanjay Patel bbb3dffd0a [InstCombine] add test for missing vector icmp fold
llvm-svn: 278631
2016-08-14 21:05:08 +00:00
Sanjay Patel 66a3457a4c [InstCombine] add test for missing vector icmp fold
llvm-svn: 278630
2016-08-14 20:39:42 +00:00
Sanjoy Das 2143447c73 [IRCE] Create llvm::Loop instances for cloned out loops
llvm-svn: 278618
2016-08-14 01:04:46 +00:00
Sanjoy Das 7a18a238c6 [IRCE] Don't iterate on loops that were cloned out
IRCE has the ability to further version pre-loops and post-loops that it
created, but this isn't useful at all.  This change teaches IRCE to
leave behind some metadata in the loops it creates (by cloning the main
loop) so that these new loops are not re-processed by IRCE.

Today this bug is hidden by another bug -- IRCE does not update LoopInfo
properly so the loop pass manager does not re-invoke IRCE on the loops
it split out.  However, once the latter is fixed the bug addressed in
this change causes IRCE to infinite-loop in some cases (e.g. it splits
out a pre-loop, a pre-pre-loop from that, a pre-pre-pre-loop from that
and so on).

llvm-svn: 278617
2016-08-14 01:04:36 +00:00
Sanjoy Das 1b1272f515 [IRCE] Fix test case; NFC
The (negative) test case is supposed to check that IRCE does not muck
with range checks it cannot handle, not that it does the right thing in
the absence of profiling information.

llvm-svn: 278612
2016-08-13 23:36:40 +00:00
Sanjoy Das 2a2f14d7ab [IRCE] Be resilient in the face of non-simplified loops
Loops containing `indirectbr` may not be in simplified form, even after
running LoopSimplify.  Reject then gracefully, instead of tripping an
assert.

llvm-svn: 278611
2016-08-13 23:36:35 +00:00
Mehdi Amini 8c629ecf3a Revert "Revert "Invariant start/end intrinsics overloaded for address space""
This reverts commit 32fc6488e48eafc0ca1bac1bd9cbf0008224d530.

llvm-svn: 278609
2016-08-13 23:31:24 +00:00
Mehdi Amini 164ac651da Revert "Invariant start/end intrinsics overloaded for address space"
This reverts commit r276447.

llvm-svn: 278608
2016-08-13 23:27:32 +00:00
Teresa Johnson 1eca6bc6a7 [PM] Port LoopDataPrefetch to new pass manager
Summary:
Refactor the existing support into a LoopDataPrefetch implementation
class and a LoopDataPrefetchLegacyPass class that invokes it.
Add a new LoopDataPrefetchPass for the new pass manager that utilizes
the LoopDataPrefetch implementation class.

Reviewers: mehdi_amini

Subscribers: sanjoy, mzolotukhin, nemanjai, llvm-commits

Differential Revision: https://reviews.llvm.org/D23483

llvm-svn: 278591
2016-08-13 04:11:27 +00:00
Sanjoy Das 3502511548 [IndVars] Ignore (s|z)exts that don't extend the induction variable
`IVVisitor::visitCast` used to have the invariant that if the
instruction it was passed was a sext or zext instruction, the result of
the instruction would be wider than the induction variable.  This is no
longer true after rL275037, so this change teaches `IndVarSimplify` s
implementation of `IVVisitor::visitCast` to work with the relaxed
invariant.

A corresponding change to SimplifyIndVar to preserve the said invariant
after rL275037 would also work, but given how `IVVisitor::visitCast` is
spelled (no indication of said invariant), I figured the current fix is
cleaner.

Fixes PR28935.

llvm-svn: 278584
2016-08-13 00:58:31 +00:00
Tim Shen c9c0d2dcb5 [LoopVectorize] Detect loops in the innermost loop before creating InnerLoopVectorizer
InnerLoopVectorizer shouldn't handle a loop with cycles inside the loop
body, even if that cycle isn't a natural loop.

Fixes PR28541.

Differential Revision: https://reviews.llvm.org/D22952

llvm-svn: 278573
2016-08-12 22:47:13 +00:00
Reid Kleckner 6ee00a2602 [Inliner] Don't treat inalloca allocas as static
They aren't static, and moving them to the entry block across something
else will only result in tears.

Root cause of http://crbug.com/636558.

llvm-svn: 278571
2016-08-12 22:23:04 +00:00
Michael Kuperstein 31b8399beb [PM] Port LowerInvoke to the new pass manager
llvm-svn: 278531
2016-08-12 17:28:27 +00:00
Dehao Chen c0a1e432c7 Fine tuning of sample profile propagation algorithm.
Summary: The refined propagation algorithm is more accurate and robust.

Reviewers: davidxl, dnovillo

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D23224

llvm-svn: 278522
2016-08-12 16:22:12 +00:00
Artur Pilipenko 2e8f82d962 [LVI] Take guards into account
Teach LVI to gather control dependant constraints from guards.

Reviewed By: sanjoy

Differential Revision: https://reviews.llvm.org/D23358

llvm-svn: 278518
2016-08-12 15:52:23 +00:00
Artur Pilipenko b623088abe [LVI] Fix potential memory corruption in getValueFromCondition
Rewrite Visited[Cond] = getValueFromConditionImpl(..., Visited) statement which can lead to a memory corruption since getValueFromConditionImpl changes Visited map and invalidates the iterators.

llvm-svn: 278514
2016-08-12 15:08:15 +00:00
Artur Pilipenko 6669f253d5 [LVI] Take range metadata into account while calculating icmp condition constraints
Take range metadata into account for conditions like this:

%length = load i32, i32* %length_ptr, !range !{i32 0, i32 2147483647}
%cmp = icmp ult i32 %a, %length

This is a common pattern for range checks where the length of the array is dynamically loaded.

Reviewed By: sanjoy

Differential Revision: https://reviews.llvm.org/D23267

llvm-svn: 278496
2016-08-12 10:14:11 +00:00
Artur Pilipenko 635625855f [LVI] Handle any predicate in comparisons like icmp <pred> (add Val, Offset), ...
Currently LVI can only gather value constraints from comparisons like:

* icmp <pred> Val, ...
* icmp ult (add Val, Offset), ...

In fact we can handle any predicate in latter comparisons.

Reviewed By: sanjoy

Differential Revision: https://reviews.llvm.org/D23357

llvm-svn: 278493
2016-08-12 10:05:11 +00:00
Gor Nishanov 0f303accde [Coroutines]: Part6b: Add coro.id intrinsic.
Summary:
1. Make coroutine representation more robust against optimization that may duplicate instruction by introducing coro.id intrinsics that returns a token that will get fed into coro.alloc and coro.begin. Due to coro.id returning a token, it won't get duplicated and can be used as reliable indicator of coroutine identify when a particular coroutine call gets inlined.
2. Move last three arguments of coro.begin into coro.id as they will be shared if coro.begin will get duplicated.
3. doc + test + code updated to support the new intrinsic.

Reviewers: mehdi_amini, majnemer

Subscribers: mehdi_amini, llvm-commits

Differential Revision: https://reviews.llvm.org/D23412

llvm-svn: 278481
2016-08-12 05:45:49 +00:00