Commit Graph

10693 Commits

Author SHA1 Message Date
Peter Collingbourne 2974856ad4 Use branch funnels for virtual calls when retpoline mitigation is enabled.
The retpoline mitigation for variant 2 of CVE-2017-5715 inhibits the
branch predictor, and as a result it can lead to a measurable loss of
performance. We can reduce the performance impact of retpolined virtual
calls by replacing them with a special construct known as a branch
funnel, which is an instruction sequence that implements virtual calls
to a set of known targets using a binary tree of direct branches. This
allows the processor to speculately execute valid implementations of the
virtual function without allowing for speculative execution of of calls
to arbitrary addresses.

This patch extends the whole-program devirtualization pass to replace
certain virtual calls with calls to branch funnels, which are
represented using a new llvm.icall.jumptable intrinsic. It also extends
the LowerTypeTests pass to recognize the new intrinsic, generate code
for the branch funnels (x86_64 only for now) and lay out virtual tables
as required for each branch funnel.

The implementation supports full LTO as well as ThinLTO, and extends the
ThinLTO summary format used for whole-program devirtualization to
support branch funnels.

For more details see RFC:
http://lists.llvm.org/pipermail/llvm-dev/2018-January/120672.html

Differential Revision: https://reviews.llvm.org/D42453

llvm-svn: 327163
2018-03-09 19:11:44 +00:00
Renato Golin bc94b98c44 [LV] Adding test for r327109
llvm-svn: 327155
2018-03-09 18:02:36 +00:00
Farhana Aleen a7cb31123c [AMDGPU] Supported ds_read_b128 generation; Widened vector length for local address-space.
Summary: Starting from GCN 2nd generation, ISA supports ds_read_b128 on top of ds_read_b64.
         This patch supports ds_read_b128 instruction pattern and generation of this instruction.
         In the vectorizer, this patch also widen the vector length so that vectorizer generates
         128 bit loads for local address-space which gets translated to ds_read_b128.
         Since the performance benefit is not clear; compiler generates ds_read_b128 under -amdgpu-ds128.

Author: FarhanaAleen

Reviewed By: rampitec, arsenm

Subscribers: llvm-commits, AMDGPU

Differential Revision: https://reviews.llvm.org/D44210

llvm-svn: 327153
2018-03-09 17:41:39 +00:00
Chad Rosier 95d9ccb2a0 [JumpThreading] Don't restrict cast-traversal to i1
In r263618, JumpThreading learned to look trough simple cast instructions, but
only if the source of those cast instructions was a phi/cmp i1 (in an effort to
limit compile time effects). I think this condition is too restrictive. For
switches with limited value range, InstCombine will readily introduce an extra
trunc instruction to a smaller integer type (e.g. from i8 to i2), leaving us in
the somewhat perverse situation that jump-threading would work before running
instcombine, but not after. Since instcombine produces this pattern, I think we
need to consider it canonical and support it in JumpThreading.  In general,
for limiting recursion, I think the existing restriction to phi and cmp nodes
should be sufficient to avoid looking through unprofitable chains of
instructions.

Patch by Keno Fischer!
Differential Revision: https://reviews.llvm.org/D42262

llvm-svn: 327150
2018-03-09 16:43:46 +00:00
Sanjay Patel 3675b8cece [InstSimplify] fix FP infinite hex constant values in tests; NFC
Really should improve this...

llvm-svn: 327144
2018-03-09 16:14:02 +00:00
Stefan Pintilie ef7c4976bb Revert "[PowerPC] LSR tunings for PowerPC"
Revert the rest of the LST tune commit.
It seems that the LSR tune commit breaks internal tests.
Reverting the commit.

llvm-svn: 327143
2018-03-09 16:08:55 +00:00
Stefan Pintilie 7f879a8467 Revert "[PowerPC] Move test to correct location."
Revert part of the LSR tune commit.

llvm-svn: 327142
2018-03-09 16:08:48 +00:00
Eric Christopher 3caa0fd050 Revert "[ThinLTO] Keep available_externally symbols live"
This reverts commit r327041 and the followup attempts at fixing the testcase as they're still failing.

llvm-svn: 327094
2018-03-09 01:25:18 +00:00
Adrian Prantl 5b477be72a LowerDbgDeclare: ignore dbg.declares for allocas with volatile access
There is no point in lowering a dbg.declare describing an alloca that
has volatile loads or stores as users, since the alloca cannot be
elided. Lowering the dbg.declare will result in larger debug info that
may also have worse coverage than just describing the alloca.

rdar://problem/34496278

llvm-svn: 327092
2018-03-09 00:45:04 +00:00
Sanjay Patel ee770e9c4e [Reassociate] fix test to be independent of FP undef
llvm-svn: 327071
2018-03-08 22:05:27 +00:00
Sanjay Patel 2ee7b9349d [ConstantFold] fp_binop undef, undef --> undef
These are uncontroversial and independent of a proposed LangRef edits (D44216).

I tried to fix tests that would fold away:
rL327004
rL327028
rL327030
rL327034

I'm not sure if the Reassociate tests are meaningless yet, but they probably will be 
as we add more folds, so if anyone has suggestions or wants to fix those, please do.

Differential Revision: https://reviews.llvm.org/D44258

llvm-svn: 327058
2018-03-08 20:42:49 +00:00
Vlad Tsyrklevich c9a1a6e964 Specify that test from r327041 requires asserts
llvm-svn: 327051
2018-03-08 19:46:19 +00:00
Vlad Tsyrklevich f6337d3a73 Fix test failure introduced in r327041
The "Assertion: `...' failed" error message format is not identical
across platforms.

llvm-svn: 327047
2018-03-08 19:20:08 +00:00
Vlad Tsyrklevich 7b66ef1036 [ThinLTO] Keep available_externally symbols live
Summary:
This change fixes PR36483. The bug was originally introduced by a change
that marked non-prevailing symbols dead. This broke LowerTypeTests
handling of available_externally functions, which are non-prevailing.
LowerTypeTests uses liveness information to avoid emitting thunks for
unused functions.

Marking available_externally functions dead is incorrect, the functions
are used though the function definitions are not. This change keeps them
live, and lets the EliminateAvailableExternally/GlobalDCE passes remove
them later instead.

I've also enabled EliminateAvailableExternally for all optimization
levels, I believe it being disabled for O1 was an oversight.

Reviewers: pcc, tejohnson

Reviewed By: tejohnson

Subscribers: grimar, mehdi_amini, inglorion, eraman, llvm-commits

Differential Revision: https://reviews.llvm.org/D43690

llvm-svn: 327041
2018-03-08 18:48:03 +00:00
Sanjay Patel 31051f8314 [InstCombine] add min/max tests with not ops; NFC
These are based on:
https://bugs.llvm.org/show_bug.cgi?id=35875
It's not clear if/how instcombine can reduce these,
but we should have the tests here either way to 
document current behavior.

llvm-svn: 327039
2018-03-08 18:34:23 +00:00
Sanjay Patel 788a4336cd [StructurizeCFG] fix test to be independent of FP undef
llvm-svn: 327028
2018-03-08 17:13:57 +00:00
Sanjay Patel e755bf87a5 [StructurizeCFG] auto-generate full checks; NFC
Not sure what the intent of this test is, but this will change when we fix FP undef constant folding.

llvm-svn: 327022
2018-03-08 16:25:37 +00:00
Sanjay Patel faf9b0f322 [InstCombine] regenerate checks; NFC
We may not need any of these tests after rL327012, but leaving 
them here for now until that's confirmed.

llvm-svn: 327014
2018-03-08 15:46:38 +00:00
Sanjay Patel a87b74f72b [InstSimplify] add more tests for FP undef; NFC
llvm-svn: 327012
2018-03-08 15:39:39 +00:00
Sanjay Patel 4ad3c32144 [InstCombine, NewGVN] remove FP undef from tests
I'm trying to preserve the intent of these tests by using 
non-undef operands; if we fix FP undef folding these tests
will not pass.

llvm-svn: 327004
2018-03-08 14:57:08 +00:00
Stefan Pintilie f8c2dce236 [PowerPC] Move test to correct location.
Test was added in r326906 to an incorrect location.
Moving the test to PPC CodeGen directory as the test is PPC specific.

llvm-svn: 326923
2018-03-07 18:27:10 +00:00
Justin Lebar eccfbf1bcd Re-land: Teach CorrelatedValuePropagation to reduce the width of udiv/urem instructions.
Summary:
If the operands of a udiv/urem can be proved to fit within a smaller
power-of-two-sized type, reduce the width of the udiv/urem.

Backed out for failing an assert in clang bootstrap builds.  Re-landing
with a fix for handling non-power-of-two inputs (e.g. udiv i24).

Original Differential Revision: https://reviews.llvm.org/D44102

llvm-svn: 326908
2018-03-07 16:56:49 +00:00
Stefan Pintilie f8438e8e59 [PowerPC] LSR tunings for PowerPC
The purpose of this patch is to have LSR generate better code on Power.
This is done by overriding isLSRCostLess.

Differential Revision: https://reviews.llvm.org/D40855

llvm-svn: 326906
2018-03-07 16:53:09 +00:00
Justin Lebar eeeb0eb049 Revert rL326898: "Teach CorrelatedValuePropagation to reduce the width of udiv/urem instructions."
Breaks bootstrap builds: clang built with this patch asserts while
building MCDwarf.cpp: Assertion `castIsValid(op, S, Ty) && "Invalid
cast!"' failed.

llvm-svn: 326900
2018-03-07 16:05:43 +00:00
Justin Lebar cb9e89c39b Teach CorrelatedValuePropagation to reduce the width of udiv/urem instructions.
Summary:
If the operands of a udiv/urem can be proved to fit within a smaller
power-of-two-sized type, reduce the width of the udiv/urem.

Reviewers: spatel, sanjoy

Subscribers: llvm-commits, hiraditya

Differential Revision: https://reviews.llvm.org/D44102

llvm-svn: 326898
2018-03-07 15:11:13 +00:00
Sven van Haastregt 19f531d31e [LoadStoreVectorizer] Differentiate between <1 x T> and T
The LoadStoreVectorizer thought that <1 x T> and T were the same types
when merging stores, leading to a crash later.

Patch by Erik Hogeman.

Differential Revision: https://reviews.llvm.org/D44014

llvm-svn: 326884
2018-03-07 10:29:28 +00:00
Evgeny Stupachenko 204ade4102 Add early exit on reassociation of 0 expression.
Summary:

Before the patch a try to reassociate ((v * 16) * 0) * 1 fall into infinite loop

Reviewers: pankajchawla

Differential Revision: http://reviews.llvm.org/D41467

From: Evgeny Stupachenko <evstupac@gmail.com>
                         <evgeny.v.stupachenko@intel.com>
llvm-svn: 326861
2018-03-07 02:17:08 +00:00
Sebastian Pop bf6e1c26cf DA: remove uses of GEP, only ask SCEV
It's been quite some time the Dependence Analysis (DA) is broken,
as it uses the GEP representation to "identify" multi-dimensional arrays.
It even wrongly detects multi-dimensional arrays in single nested loops:

from test/Analysis/DependenceAnalysis/Coupled.ll, example @couple6
;; for (long int i = 0; i < 50; i++) {
;; A[i][3*i - 6] = i;
;; *B++ = A[i][i];

DA used to detect two subscripts, which makes no sense in the LLVM IR
or in C/C++ semantics, as there are no guarantees as in Fortran of
subscripts not overlapping into a next array dimension:

maximum nesting levels = 1
SrcPtrSCEV = %A
DstPtrSCEV = %A
using GEPs
subscript 0
    src = {0,+,1}<nuw><nsw><%for.body>
    dst = {0,+,1}<nuw><nsw><%for.body>
    class = 1
    loops = {1}
subscript 1
    src = {-6,+,3}<nsw><%for.body>
    dst = {0,+,1}<nuw><nsw><%for.body>
    class = 1
    loops = {1}
Separable = {}
Coupled = {1}

With the current patch, DA will correctly work on only one dimension:

maximum nesting levels = 1
SrcSCEV = {(-2424 + %A)<nsw>,+,1212}<%for.body>
DstSCEV = {%A,+,404}<%for.body>
subscript 0
    src = {(-2424 + %A)<nsw>,+,1212}<%for.body>
    dst = {%A,+,404}<%for.body>
    class = 1
    loops = {1}
Separable = {0}
Coupled = {}

This change removes all uses of GEP from DA, and we now only rely
on the SCEV representation.

The patch does not turn on -da-delinearize by default, and so the DA analysis
will be more conservative in the case of multi-dimensional memory accesses in
nested loops.

I disabled some interchange tests, as the DA is not able to disambiguate
the dependence anymore. To make DA stronger, we may need to
compute a bound on the number of iterations based on the access functions
and array dimensions.

The patch cleans up all the CHECKs in test/Transforms/LoopInterchange/*.ll to
avoid checking for snippets of LLVM IR: this form of checking is very hard to
maintain. Instead, we now check for output of the pass that are more meaningful
than dozens of lines of LLVM IR. Some tests now require -debug messages and thus
only enabled with asserts.

Patch written by Sebastian Pop and Aditya Kumar.

Differential Revision: https://reviews.llvm.org/D35430

llvm-svn: 326837
2018-03-06 21:55:59 +00:00
Sanjay Patel ed2211d50f [PatternMatch] define m_Not using m_Xor and cst_pred_ty
Using cst_pred_ty in the definition allows us to match vectors with undef elements.

This is a continuation of an effort to make all pattern matchers allow undef elements in vectors:
rL325437
rL325466
D43792

Differential Revision: https://reviews.llvm.org/D44076

llvm-svn: 326823
2018-03-06 18:19:42 +00:00
Florian Hahn 517dc51c48 [CallSiteSplitting] Do not crash when BB's terminator changes.
Change doCallSiteSplitting to iterate until we reach the terminator instruction.
tryToSplitCallSite can replace BB's terminator in case BB is a successor of
itself. Then IE will be invalidated and we also have to check the current
terminator.

Reviewers: junbuml, davidxl, davide, fhahn

Reviewed By: fhahn, junbuml

Differential Revision: https://reviews.llvm.org/D43824

llvm-svn: 326793
2018-03-06 14:00:58 +00:00
Daniel Neilson 82daad31fe [RewriteStatepoints] Fix stale parse points
Summary:
RewriteStatepointsForGC collects parse points for further processing.
During the collection if a callsite is found in an unreachable block
(DominatorTree::isReachableFromEntry()) then all unreachable blocks are
removed by removeUnreachableBlocks(). Some of the removed blocks could
have been reachable according to DominatorTree::isReachableFromEntry().
In this case the collected parse points became stale and resulted in a
crash when accessed.

The fix is to unconditionally canonicalize the IR to
removeUnreachableBlocks and then collect the parse points.

The added test crashes with the old version and passes with this patch.

Patch by Yevgeny Rouban!

Reviewed by: Anna

Differential Revision: https://reviews.llvm.org/D43929

llvm-svn: 326748
2018-03-05 22:27:30 +00:00
Alexey Bataev 625ce229b1 [SLP] Additional tests for stores vectorization, NFC.
llvm-svn: 326740
2018-03-05 20:20:12 +00:00
Daniel Neilson bdda115e19 [InstCombine] Don't blow up in foldICmpWithCastAndCast on vector icmp instructions.
Summary:
Presently, InstCombiner::foldICmpWithCastAndCast() implicitly assumes that it is
only invoked with icmp instructions of integer type. If that assumption is broken,
and it is called with an icmp of vector type, then it fails (asserts/crashes).

This patch addresses the deficiency. It allows it to simplify
icmp (ptrtoint x), (ptrtoint/c) of vector type into a compare of the inputs,
much as is done when the type is integer.

Reviewers: apilipenko, fedor.sergeev, mkazantsev, anna

Reviewed By: anna

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D44063

llvm-svn: 326730
2018-03-05 18:05:51 +00:00
Craig Topper 8452faceae [InstCombine] Add constant vector support to getMinimumFPType for visitFPTrunc.
This patch teaches getMinimumFPType to support shrinking a vector of ConstantFPs. This should improve our ability to combine vector fptrunc with fp binops.

Differential Revision: https://reviews.llvm.org/D43774

llvm-svn: 326729
2018-03-05 18:04:12 +00:00
Florian Hahn 0b7c6422fb [IPSCCP] Add getCompare which returns either true, false, undef or null.
getCompare returns true, false or undef constants if the comparison can
be evaluated, or nullptr if it cannot. This is in line with what
ConstantExpr::getCompare returns. It also allows us to use
ConstantExpr::getCompare for comparing constants.

Reviewers: davide, mssimpso, dberlin, anna

Reviewed By: davide

Differential Revision: https://reviews.llvm.org/D43761

llvm-svn: 326720
2018-03-05 17:33:50 +00:00
Xin Tong 8345c0e3a5 [MergeICmp] We can discard initial blocks that do other work
Summary:
 We can discard initial blocks that do other work
We do not need to limit ourselves to just the first block in the chain.

Reviewers: courbet, davide

Reviewed By: courbet

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D44029

llvm-svn: 326698
2018-03-05 13:54:47 +00:00
Fedor Indutny f9e09c1dd0 [CallSiteSplitting] properly split musttail calls
Summary:
`musttail` calls can't be naively splitted. The split blocks must
include not only the call instruction itself, but also (optional)
`bitcast` and `return` instructions that follow it.

Clone `bitcast` and `ret`, place them into the split blocks, and
remove the tail block when done.

Reviewers: junbuml, mcrosier, davidxl, davide, fhahn

Reviewed By: fhahn

Subscribers: JDevlieghere, llvm-commits

Differential Revision: https://reviews.llvm.org/D43729

llvm-svn: 326666
2018-03-03 21:40:14 +00:00
Sanjay Patel 9119b844a3 [InstCombine] add test for vectors with undef elts; NFC
llvm-svn: 326661
2018-03-03 18:00:15 +00:00
Sanjay Patel 1a8d5c3d1f [InstCombine] (~X) - (~Y) --> Y - X
llvm-svn: 326660
2018-03-03 17:53:25 +00:00
Sanjay Patel 73eb2d2555 [InstCombine] add tests for notnotsub; NFC
As shown in D44043, we may need this fold in the backend,
but it's also missing in the IR optimizer.

llvm-svn: 326659
2018-03-03 17:20:37 +00:00
Chandler Carruth a4619d9944 [ThinLTO] Revert r325320: Import global variables
This caused some links to fail with ThinLTO due to missing symbols as
well as causing some binaries to have failures at runtime. We're working
with the author to get a test case, but want to get the tree green
again.

Further, it appears to introduce a data race. While the test usage of
threads was disabled in r325361 & r325362, that isn't an acceptable fix.
I've reverted both of these as well. This code needs to be thread safe.
Test cases for this are already on the original commit thread.

llvm-svn: 326638
2018-03-02 23:40:08 +00:00
Sanjay Patel 2fd0acf05a [InstCombine] partly fix FMF for fmul+log2 fold
The code was checking that all of the instructions in the 
sequence are 'fast', but that's not necessary. The final 
multiply is all that we need to check (tests adjusted). 
The fmul doesn't need to be fully 'fast' either, but that 
can be another patch.

llvm-svn: 326608
2018-03-02 20:32:46 +00:00
Sanjay Patel bb7228703a [InstCombine] add tests for rL169025; NFC
This narrow fold was added with no motivation or test cases 
a bit over 5 years ago. Removing a constant operand is a 
good canonicalization? We should handle Y*2.0 too then?

llvm-svn: 326606
2018-03-02 19:26:13 +00:00
Craig Topper 18799f4c07 [InstCombine] Allow fptrunc (fpext X)) to be reduced to a single fpext/ftrunc
If we are only truncating bits from the extend we should be able to just use a smaller extend.

If we are truncating more than the extend we should be able to just use a fptrunc since the presense of the fpextend shouldn't affect rounding.

Differential Revision: https://reviews.llvm.org/D43970

llvm-svn: 326595
2018-03-02 18:16:51 +00:00
Yaxun Liu 3c42f1c3c9 LoopUnroll: respect pragma unroll when AllowRemainder is disabled
Currently when AllowRemainder is disabled, pragma unroll count is not
respected even though there is no remainder. This bug causes a loop
fully unrolled in many cases even though the user specifies a unroll
count. Especially it affects OpenCL/CUDA since in many cases a loop
contains convergent instructions and currently AllowRemainder is
disabled for such loops.

Differential Revision: https://reviews.llvm.org/D43826

llvm-svn: 326585
2018-03-02 16:22:32 +00:00
Clement Courbet c9119b3b6a [MergeICmps] Revert accidentally submitted failing test case.
Reverts r326574.

llvm-svn: 326582
2018-03-02 14:53:33 +00:00
Clement Courbet 1de4a183d4 [MergeIcmps] Add the test case from PR36557.
Summary: See PR36557.

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D44009

llvm-svn: 326574
2018-03-02 14:34:39 +00:00
Craig Topper 8cb5fc15ee [InstCombine] Add more test case to fpextend.ll.
This includes the test cases from D43970 and additional tests for combining (fptrunc (binop (fpext), (fpext))) where the pre-extended types don't match the trunc and therefore can't be completely removed.

llvm-svn: 326528
2018-03-02 01:34:42 +00:00
Fedor Indutny 1571b1271e [ArgumentPromotion] don't break musttail invariant PR36543
Summary:
Do not break musttail invariant by promoting arguments of musttail
callee or caller.

Reviewers: sanjoy, dberlin, hfinkel, george.burgess.iv, fhahn, rnk

Reviewed By: rnk

Subscribers: rnk, llvm-commits

Differential Revision: https://reviews.llvm.org/D43926

llvm-svn: 326521
2018-03-02 00:59:27 +00:00
Craig Topper 113446ca37 [InstCombine] Simplify test cases by removing loads/stores that aren't required for what is being tested.
The loads and stores were getting the data and storing the results. There's no reason we can't just use function arguments and return.

llvm-svn: 326515
2018-03-02 00:27:44 +00:00
Sanjay Patel d0cdb2f861 [InstCombine] allow fmul fold with less than 'fast'
This is a retry of r326502 with updates to the reassociate 
test file that I missed the first time.

@test15_reassoc in the supposed -reassociate test file 
(except that it tests 2 other passes too...) shows that
there's no clear responsiblity for reassociation transforms.

Instcombine now gets that case, but only because the
constant values are identical. Otherwise, it would still
miss that pattern. 

Reassociate doesn't get that case because it hasn't been 
updated to use less than 'fast' FMF.

llvm-svn: 326513
2018-03-02 00:14:51 +00:00
Sanjay Patel f2664d0663 [Reassociate] regenerate checks; NFC
llvm-svn: 326511
2018-03-01 23:41:03 +00:00
Sanjay Patel eb5d046890 revert r326502: [InstCombine] allow fmul fold with less than 'fast'
I forgot that I added tests for 'reassoc' to -reassociate, but
suprisingly that file calls -instcombine too, so it is affected.
I'll update that file and try again.

llvm-svn: 326510
2018-03-01 23:39:24 +00:00
Sanjay Patel 7373ae5c9a [InstCombine] allow fmul fold with less than 'fast'
llvm-svn: 326502
2018-03-01 22:53:47 +00:00
Craig Topper f1a7c6755d [InstCombine] Auto-generate complete checks. NFC
llvm-svn: 326474
2018-03-01 20:05:07 +00:00
Sanjay Patel d696e93cb6 [InstCombine] remove stale comments for tests; NFC
llvm-svn: 326448
2018-03-01 16:28:32 +00:00
Sanjay Patel 3fd43a843b [InstCombine] move/add tests for fmul reassociation; NFC
This transform may be out-of-scope for instcombine, 
but this is only documenting the current behavior.

llvm-svn: 326442
2018-03-01 15:30:44 +00:00
Sanjay Patel 237e52674f [InstCombine] auto-generate full checks; NFC
llvm-svn: 326440
2018-03-01 15:13:42 +00:00
Max Kazantsev f8d2969abb [SCEV] Smart range calculation for SCEVUnknown Phis
The range of SCEVUnknown Phi which merges values `X1, X2, ..., XN`
can be evaluated as `U(Range(X1), Range(X2), ..., Range(XN))`.

Reviewed By: sanjoy
Differential Revision: https://reviews.llvm.org/D43810

llvm-svn: 326418
2018-03-01 06:56:48 +00:00
Reid Kleckner 3762a089d7 [IPSCCP] do not break musttail invariant (PR36485)
Do not replace results of `musttail` calls with a constant if the
call itself can't be removed.

Do not zap returns of `musttail` callees, if the call site can't be
removed and replaced with a constant.

Do not zap returns of `musttail`-calling blocks, this breaks
invariant too.

Patch by Fedor Indutny

Differential Revision: https://reviews.llvm.org/D43695

llvm-svn: 326404
2018-03-01 01:19:18 +00:00
Reid Kleckner cb9611ca67 [DAE] don't remove args of musttail target/caller
`musttail` requires identical signatures of caller and callee. Removing
arguments breaks `musttail` semantics.

PR36441

Patch by Fedor Indutny

Differential Revision: https://reviews.llvm.org/D43708

llvm-svn: 326394
2018-03-01 00:09:35 +00:00
Sanjay Patel eaf5a120ed [InstCombine] simplify code for X * -1.0 --> -X; NFC
I've added random FMF to one of the tests to show those are propagated.

llvm-svn: 326377
2018-02-28 22:30:04 +00:00
Jonas Devlieghere 9ca064552a [GlobalOpt] don't change CC of musttail calle(e|r)
When the function has musttail call - its cc is fixed to be equal to the
cc of the musttail callee. In such case (and in the case of the musttail
callee), GlobalOpt should not change the cc to fastcc as it will break
the invariant.

This fixes PR36546

Patch by: Fedor Indutny (indutny)

Differential revision: https://reviews.llvm.org/D43859

llvm-svn: 326376
2018-02-28 22:28:44 +00:00
Sanjay Patel 356e77f550 [InstCombine] auto-generate complete checks; NFC
llvm-svn: 326331
2018-02-28 16:53:45 +00:00
Xin Tong 8ba674e43b [MergeICmp] Fix a bug in MergeICmp that can lead to a block being processed more than once.
Summary:
Fix a bug in MergeICmp that can lead to a BCECmp block being processed more than once and eventually lead to a broken LLVM module.
The problem is that if the non-constant value is not produced by the last block, the producer will be processed once when the its parent block
is processed and second time when the last block is processed.

We end up having 2 same BCECmpBlock in the merge queue. And eventually lead to a broken LLVM module.

Reviewers: courbet, davide

Reviewed By: courbet

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D43825

llvm-svn: 326318
2018-02-28 12:08:00 +00:00
Mohammad Shahid ddeee12f59 [SLP] Added new tests and updated existing for jumbled load, NFC.
llvm-svn: 326303
2018-02-28 04:19:34 +00:00
Sanjay Patel bf28a8fc01 [InstSimplify] add tests for FP with undef operand; NFC
Are any of these correct?

llvm-svn: 326241
2018-02-27 20:17:18 +00:00
Craig Topper 301991080e [ValueTracking] Teach cannotBeOrderedLessThanZeroImpl to look through ExtractElement.
This is similar to what's done in computeKnownBits and computeSignBits. Don't do anything fancy just collect information valid for any element.

Differential Revision: https://reviews.llvm.org/D43789

llvm-svn: 326237
2018-02-27 19:53:45 +00:00
Sanjay Patel 8529dd5ee1 [ARM] add loop vectorizer test based on 482.sphinx3 from SPEC2006; NFC
This is a slight reduction of one of the benchmarks
that suffered with D43079. Cost model changes should
not cause this test to remain scalarized.

llvm-svn: 326221
2018-02-27 18:33:24 +00:00
Sanjay Patel 04d1d79ee5 [AArch64] add SLP test based on TSVC; NFC
This is a slight reduction of one of the benchmarks
that suffered with D43079. Cost model changes should
not cause this test to remain scalarized.

llvm-svn: 326217
2018-02-27 18:06:15 +00:00
Florian Hahn 1807c516c7 [NewGVN] Update phi-of-ops def block when updating existing ValuePHI.
In case we update a ValuePHI node created earlier, we could update it
based on a different OpPHI which could be in a different block.
We need to update the TempToBlock mapping reflecting the new block,
otherwise we would end up placing the new phi node in a wrong block.

This problem is exposed by the test case in
https://bugs.llvm.org/show_bug.cgi?id=36504.

This patch fixes a slightly simpler problem than in the bug report. In
the bug's re-producer, the additional problem is that we are re-using a
ValuePHI node with to few incoming values for the new OpPHI. If this
patch makes sense, I will follow it up with a patch that creates a new
PHI node if the existing PHI node has a different number of incoming
values.

Reviewers: davide, dberlin

Reviewed By: dberlin

Differential Revision: https://reviews.llvm.org/D43770

llvm-svn: 326181
2018-02-27 09:34:51 +00:00
Adam Nemet b424cd5d61 Make test agnostic to cost model
This was causing bot failures on greendragon

llvm-svn: 326169
2018-02-27 05:41:16 +00:00
Evgeny Stupachenko f1c058d99b Fix r326154 buildbots test fail
Summary:

Add specific mtriples to tests added in r326154.

From: Evgeny Stupachenko <evstupac@gmail.com>
                         <evgeny.v.stupachenko@intel.com>
llvm-svn: 326158
2018-02-27 01:33:11 +00:00
Evgeny Stupachenko a732611ac8 Fix PR36032, PR35432
Summary:

The change fix an assert fail at ScalarEvolutionExpander.cpp:
  assert(ExitCount != SE.getCouldNotCompute() && "Invalid loop count");

Reviewers: sbaranga

Differential Revision: http://reviews.llvm.org/D42604

From: Evgeny Stupachenko <evstupac@gmail.com>
                         <evgeny.v.stupachenko@intel.com>
llvm-svn: 326154
2018-02-27 00:17:31 +00:00
Sanjay Patel 66911b16e6 [InstCombine, InstSimplify] add tests with undef elements in constant FP vectors; NFC
llvm-svn: 326148
2018-02-26 23:23:02 +00:00
Craig Topper 69c8972fd1 [ValueTracking] Teach cannotBeOrderedLessThanZeroImpl to handle vector constants.
Summary: This allows vector fabs to be removed in more cases.

Reviewers: spatel, arsenm, RKSimon

Reviewed By: spatel

Subscribers: wdng, llvm-commits

Differential Revision: https://reviews.llvm.org/D43739

llvm-svn: 326138
2018-02-26 22:33:17 +00:00
Simon Pilgrim 9929f90740 [X86][SSE] Reduce FADD/FSUB/FMUL costs on later targets (PR36280)
Agner's tables indicate that for SSE42+ targets (Core2 and later) we can reduce the FADD/FSUB/FMUL costs down to 1, which should fix the Himeno benchmark.

Note: the AVX512 FDIV costs look rather dodgy, but this isn't part of this patch.

Differential Revision: https://reviews.llvm.org/D43733

llvm-svn: 326133
2018-02-26 22:10:17 +00:00
Alexey Bataev b44e2b75e8 [SLP] Added new test + fixed some checks, NFC.
llvm-svn: 326117
2018-02-26 20:01:24 +00:00
Craig Topper 43fb1cdef7 [InstCombine] Add test cases with vector constants to fpextend.ll
llvm-svn: 326115
2018-02-26 19:36:37 +00:00
Craig Topper b284e8b9b4 [InstCombine] Switch to using FileCheck instead of grep. Auto-generate checks. NFC
llvm-svn: 326114
2018-02-26 19:36:36 +00:00
Sanjay Patel 31a90468e1 [InstCombine] allow fdiv folds with less than fully 'fast' ops
Note: gcc appears to allow this fold with -freciprocal-math alone, 
but clang/llvm require more than that with this patch. The wording
in the definitions seems fuzzy enough that it could go either way,
but we'll err on the conservative side of FMF interpretation.

This patch also changes the newly created fmul to have FMF propagated
by the last fdiv rather than intersecting the FMF of the fdivs. This
matches the behavior of other folds near here. The new fmul is only 
used to produce an intermediate op for the final fdiv result, so it
shouldn't be any stricter than that result. The previous behavior
could result in dropping FMF via other folds in instcombine or CSE.

Differential Revision: https://reviews.llvm.org/D43398

llvm-svn: 326098
2018-02-26 16:02:45 +00:00
Renato Golin 9d1b2acaaa [LV] Move isLegalMasked* functions from Legality to CostModel
All SIMD architectures can emulate masked load/store/gather/scatter
through element-wise condition check, scalar load/store, and
insert/extract. Therefore, bailing out of vectorization as legality
failure, when they return false, is incorrect. We should proceed to cost
model and determine profitability.

This patch is to address the vectorizer's architectural limitation
described above. As such, I tried to keep the cost model and
vectorize/don't-vectorize behavior nearly unchanged. Cost model tuning
should be done separately.

Please see
http://lists.llvm.org/pipermail/llvm-dev/2018-January/120164.html for
RFC and the discussions.

Closes D43208.

Patch by: Hideki Saito <hideki.saito@intel.com>

llvm-svn: 326079
2018-02-26 11:06:36 +00:00
Florian Hahn ed45836253 [LoopInterchange] Add test case for D43236.
llvm-svn: 326078
2018-02-26 10:46:25 +00:00
Craig Topper aee341ef28 [InstSimplify] Add test cases for removal of vector fabs on known positive.
llvm-svn: 326050
2018-02-25 06:51:52 +00:00
Craig Topper 2b8f051aaa [InstSimplify] Remove unused parameter from test cases.
llvm-svn: 326049
2018-02-25 06:51:51 +00:00
Adam Nemet e4e1de60aa Revert "StructurizeCFG: Test for branch divergence correctly"
This reverts commit r325881.

Breaks many bots

llvm-svn: 326037
2018-02-24 17:29:09 +00:00
Sanjay Patel db53d1847b [InstSimplify] sqrt(X) * sqrt(X) --> X
This was misplaced in InstCombine. We can loosen the FMF as a follow-up step.

llvm-svn: 325965
2018-02-23 22:20:13 +00:00
Sanjay Patel d32104e1b2 [InstCombine] allow fmul-sqrt folds with less than full -ffast-math
Also, add a Builder method for intrinsics to reduce code duplication for clients.

llvm-svn: 325960
2018-02-23 21:16:12 +00:00
Matt Davis 708271849a [Test] Fix the test to output to /dev/null instead of redirecting.
The redirection was confusing the windows build machine.

llvm-svn: 325937
2018-02-23 19:03:04 +00:00
Matt Davis 523c656e25 [Debug] Add dbg.value intrinsics for PHIs created during LCSSA.
Summary:
This patch is an enhancement to propagate dbg.value information when Phis are created on behalf of LCSSA.
I noticed a case where a value carried across a loop was reported as <optimized out>.

Specifically this case:
```
int bar(int x, int y) {
  return x + y;
}

int foo(int size) {
  int val = 0;
  for (int i = 0; i < size; ++i) {
    val = bar(val, i);  // Both val and i are correct
  }
  return val; // <optimized out>
}
```

In the above case, after all of the interesting computation completes our value
is reported as "optimized out." This change will add a dbg.value to correct this.

This patch also moves the dbg.value insertion routine from LoopRotation.cpp 
into Local.cpp, so that we can share it in both places (LoopRotation and LCSSA).

Reviewers: mzolotukhin, aprantl, vsk, davide

Reviewed By: aprantl, vsk

Subscribers: dberlin, llvm-commits

Differential Revision: https://reviews.llvm.org/D42551

llvm-svn: 325926
2018-02-23 17:38:27 +00:00
Nicolai Haehnle 43c1115cd4 StructurizeCFG: Test for branch divergence correctly
Summary:
This fixes cases like the new test @nonuniform. In that test, %cc itself
is a uniform value; however, when reading it after the end of the loop in
basic block %if, its value is effectively non-uniform.

This problem was encountered in
https://bugs.freedesktop.org/show_bug.cgi?id=103743; however, this change
in itself is not sufficient to fix that bug, as there is another issue
in the AMDGPU backend.

Change-Id: I32bbffece4a32f686fab54964dae1a5dd72949d4

Reviewers: arsenm, rampitec, jlebar

Subscribers: wdng, tpr, llvm-commits

Differential Revision: https://reviews.llvm.org/D40546

llvm-svn: 325881
2018-02-23 10:45:46 +00:00
Bjorn Steinbrink 983d6c3f18 Mark MergedLoadStoreMotion as not preserving MemDep results
Summary:
MemDep caches results that signify that a dependence is non-local, and
there is currently no way to invalidate such cache entries.
Unfortunately, when MLSM sinks a store that can result in a non-local
dependence becoming a local one, and then MemDep gives wrong answers.
The easiest way out here is to just say that MLSM does indeed not
preserve MemDep results.

Reviewers: davide, Gerolf

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D43177

llvm-svn: 325880
2018-02-23 10:41:57 +00:00
Daniel Neilson 20c9207be3 [AlignmentFromAssumptions] Set source and dest alignments of memory intrinsiscs separately
Summary:
This change is part of step five in the series of changes to remove alignment argument from
memcpy/memmove/memset in favour of alignment attributes. In particular, this changes the
AlignmentFromAssumptions pass to cease using the old getAlignment()/setAlignment API of
MemoryIntrinsic in favour of getting/setting source & dest specific alignments through
the new API. This allows us to simplify some of the code in this pass and also be more
aggressive about setting the source and destination alignments separately.

Steps:
Step 1) Remove alignment parameter and create alignment parameter attributes for
memcpy/memmove/memset. ( rL322965, rC322964, rL322963 )
Step 2) Expand the IRBuilder API to allow creation of memcpy/memmove with differing
source and dest alignments. ( rL323597 )
Step 3) Update Clang to use the new IRBuilder API. ( rC323617 )
Step 4) Update Polly to use the new IRBuilder API. ( rL323618 )
Step 5) Update LLVM passes that create memcpy/memmove calls to use the new IRBuilder API,
and those that use use MemIntrinsicInst::[get|set]Alignment() to use [get|set]DestAlignment()
and [get|set]SourceAlignment() instead. ( rL323886, rL323891, rL324148, rL324273, rL324278,
rL324384, rL324395, rL324402, rL324626, rL324642, rL324653, rL324654, rL324773, rL324774,
rL324781, rL324784, rL324955, rL324960 )
Step 6) Remove the single-alignment IRBuilder API for memcpy/memmove, and the
MemIntrinsicInst::[get|set]Alignment() methods.

Reference
   http://lists.llvm.org/pipermail/llvm-dev/2015-August/089384.html
   http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20151109/312083.html

Reviewers: hfinkel, bollu, reames

Reviewed By: reames

Subscribers: reames, llvm-commits

Differential Revision: https://reviews.llvm.org/D43081

llvm-svn: 325816
2018-02-22 18:55:59 +00:00
Luke Cheeseman 6c1e6bbe0c [FunctionAttrs][ArgumentPromotion][GlobalOpt] Disable some optimisations passes for naked functions
- Fix for bug 36078.
- Prevent the functionattrs, function-attrs, globalopt and argpromotion passes
  from changing naked functions.
- These passes can perform some alterations to the functions that should not be
  applied. An example is removing parameters that are seemingly not used because
  they are only referenced in the inline assembly. Another example is marking
  the function as fastcc.

llvm-svn: 325788
2018-02-22 14:42:08 +00:00
Sanjay Patel 92b7371113 [InstCombine] add fmul multi-use test; NFC
Also, rename tests to make their intent clearer.

llvm-svn: 325785
2018-02-22 14:27:16 +00:00
Simon Pilgrim 864949d5e9 [SLPVectorizer][X86] Add load extend tests (PR36091)
llvm-svn: 325772
2018-02-22 12:19:34 +00:00
Sanjay Patel 9befaeb582 [InstCombine] add some random FMF to tests so we know it's not dropped; NFC
llvm-svn: 325734
2018-02-21 22:48:28 +00:00
Sanjay Patel d53da082a0 [AArch64] fix IR names to not be 'tmp' because that gives the CHECK script problems
llvm-svn: 325718
2018-02-21 20:48:14 +00:00
Sanjay Patel ffe51e450f [AArch64] add SLP test for matmul (PR36280); NFC
This is a slight reduction of one of the benchmarks
that suffered with D43079. Cost model changes should
not cause this test to remain scalarized.

llvm-svn: 325717
2018-02-21 20:34:16 +00:00
Alexey Bataev 650f639d33 [LV] Fix test checks, NFC
llvm-svn: 325699
2018-02-21 16:48:23 +00:00
Alexey Bataev cdd0675ddc [SLP] Fix test checks, NFC.
llvm-svn: 325689
2018-02-21 15:32:58 +00:00
Silviu Baranga 10ad93c6bf [SCEV] Temporarily disable loop versioning for the purpose
of turning SCEVUnknowns of PHIs into AddRecExprs.

This feature is now hidden behind the -scev-version-unknown flag.

Fixes PR36032 and PR35432.

llvm-svn: 325687
2018-02-21 15:20:32 +00:00
Vedant Kumar 56492f9177 [BDCE] Salvage debug info from dying insts
This results in 15 additional unique source variables in a stage2 build
of FileCheck (at '-Os -g'), with a negligible increase in the size of
the .debug_loc section.

llvm-svn: 325660
2018-02-21 01:55:33 +00:00
Sanjay Patel e6143904b9 revert r325515: [TTI CostModel] change default cost of FP ops to 1 (PR36280)
There are too many perf regressions resulting from this, so we need to 
investigate (and add tests for) targets like ARM and AArch64 before 
trying to reinstate.

llvm-svn: 325658
2018-02-21 01:42:52 +00:00
Sanjay Patel 6f716a7c5e [InstCombine] C / -X --> -C / X
We already do this in DAGCombiner, but it should
also be good to eliminate the fsub use in IR.

This is similar to rL325648.

llvm-svn: 325649
2018-02-21 00:01:45 +00:00
Sanjay Patel d8dd0151fc [InstCombine] -X / C --> X / -C for FP
We already do this in DAGCombiner, but it should 
also be good to eliminate the fsub use in IR.

llvm-svn: 325648
2018-02-20 23:51:16 +00:00
Sanjay Patel 8357371861 [InstCombine] add tests for fdiv with negated op and constant op; NFC
llvm-svn: 325644
2018-02-20 23:34:43 +00:00
Sanjay Patel 3e569ac0cc [PatternMatch] allow vector matches with m_FNeg
llvm-svn: 325642
2018-02-20 23:29:05 +00:00
Sanjoy Das 737fa40ffa [DSE] Don't DSE stores that subsequent memmove calls read from
Summary:
We used to remove the first memmove in cases like this:

  memmove(p, p+2, 8);
  memmove(p, p+2, 8);

which is incorrect.  Fix this by changing isPossibleSelfRead to what was most
likely the intended behavior.

Historical note: the buggy code was added in https://reviews.llvm.org/rL120974
to address PR8728.

Reviewers: rsmith

Subscribers: mcrosier, llvm-commits, jlebar

Differential Revision: https://reviews.llvm.org/D43425

llvm-svn: 325641
2018-02-20 23:19:34 +00:00
Sanjay Patel 4f65e0d008 [InstCombine] auto-generate full checks; NFC
llvm-svn: 325639
2018-02-20 23:08:47 +00:00
Sanjay Patel 088f4690f5 [InstCombine] add test for vector -X/-Y; NFC
m_FNeg doesn't match vector types.

llvm-svn: 325637
2018-02-20 22:46:38 +00:00
Benjamin Kramer 1516dd70bb Fix broken test from r325630.
llvm-svn: 325634
2018-02-20 22:30:16 +00:00
Benjamin Kramer fd0630665b [MemoryBuiltins] Check nobuiltin status when identifying calls to free.
This is usually not a problem because this code's main purpose is
eliminating unused new/delete pairs. We got deletes of nullptr or
nobuiltin deletes of builtin new wrong though.

llvm-svn: 325630
2018-02-20 22:00:33 +00:00
Sanjay Patel e29caaa9c5 [PatternMatch] enhance m_SignMask() to ignore undef elements in vectors
llvm-svn: 325623
2018-02-20 21:02:40 +00:00
Sanjay Patel ff7b777bbe [InstSimplify] add tests for m_SignMask with undef vector elements; NFC
llvm-svn: 325622
2018-02-20 20:53:35 +00:00
Alexey Bataev 42bcec7d38 [LV] Fix test checks, NFC.
llvm-svn: 325617
2018-02-20 19:49:25 +00:00
Alexey Bataev 47dfd249f0 [SLP] Fix tests checks, NFC.
llvm-svn: 325605
2018-02-20 18:11:50 +00:00
Sanjay Patel 90f4c8ec29 [InstCombine] fold fdiv with non-splat divisor to fmul: X/C --> X * (1/C)
llvm-svn: 325590
2018-02-20 16:08:15 +00:00
Sanjay Patel 1d14779aed [InstCombine] allow fdiv with constant dividend folds with less than full -ffast-math
It's possible that we could allow this either 'arcp' or 'reassoc' alone, but this
should be conservatively better than what we have right now. GCC allows this with
only -freciprocal-math.

The last test is changed to show a case that is expected to fold, but we need D43398.

llvm-svn: 325533
2018-02-19 21:46:52 +00:00
Sanjay Patel e82cc6fcc5 [InstCombine] move fdiv tests; NFC
Also, use vector constants just to prove that already works.

llvm-svn: 325530
2018-02-19 21:13:39 +00:00
Sanjay Patel 3e8a76abfd [TTI CostModel] change default cost of FP ops to 1 (PR36280)
This change was mentioned at least as far back as:
https://bugs.llvm.org/show_bug.cgi?id=26837#c26
...and I found a real program that is harmed by this: 
Himeno running on AMD Jaguar gets 6% slower with SLP vectorization:
https://bugs.llvm.org/show_bug.cgi?id=36280
...but the change here appears to solve that bug only accidentally.

The div/rem costs for x86 look very wrong in some cases, but that's already true, 
so we can fix those in follow-up patches. There's also evidence that more cost model
changes are needed to solve SLP problems as shown in D42981, but that's an independent 
problem (though the solution may be adjusted after this change is made).

Differential Revision: https://reviews.llvm.org/D43079

llvm-svn: 325515
2018-02-19 16:11:44 +00:00
Ivan A. Kosarev f03f579d1d [Transforms] Propagate new-format TBAA tags on simplification of memory-transfer intrinsics
With this patch in place, when a new-format TBAA tag is available
for a memory-transfer intrinsic call, we prefer propagating that
new-format tag. Otherwise, we fallback to the old approach where
we try to construct a proper TBAA access tag from 'tbaa.struct'
metadata.

Differential Revision: https://reviews.llvm.org/D41543

llvm-svn: 325488
2018-02-19 12:10:20 +00:00
Sanjay Patel adf6e88c74 [PatternMatch, InstSimplify] enhance m_AllOnes() to ignore undef elements in vectors
Loosening the matcher definition reveals a subtle bug in InstSimplify (we should not
assume that because an operand constant matches that it's safe to return it as a result).

So I'm making that change here too (that diff could be independent, but I'm not sure how 
to reveal it before the matcher change).

This also seems like a good reason to *not* include matchers that capture the value.
We don't want to encourage the potential misstep of propagating undef values when it's
not allowed/intended.

I didn't include the capture variant option here or in the related rL325437 (m_One), 
but it already exists for other constant matchers.

llvm-svn: 325466
2018-02-18 18:05:08 +00:00
Sanjay Patel 7faceaed31 [InstSimplify] add tests with vector undef elts; NFC
llvm-svn: 325465
2018-02-18 17:39:09 +00:00
Sanjay Patel f569578373 [PatternMatch] enhance m_One() to ignore undef elements in vectors
llvm-svn: 325437
2018-02-17 16:00:42 +00:00
Sanjay Patel a6a1426cf1 [InstSimplify, InstCombine] add tests with vector undef elts; NFC
These would fold if the m_One pattern matcher accounted for undef elts.

llvm-svn: 325436
2018-02-17 15:55:40 +00:00
Sanjay Patel 841ca95219 [InstSimplify] add vector select tests with undef elts in condition; NFC
llvm-svn: 325419
2018-02-17 01:18:53 +00:00
Sanjay Patel 870fbda805 [InstCombine] add FMF to better show current fdiv fold behavior; NFC
llvm-svn: 325365
2018-02-16 17:46:50 +00:00
Eugene Leviant 8c83b9b8c5 [ThinLTO] Fix data race in test #2
Switched to the right option (-thinlto-threads)

llvm-svn: 325362
2018-02-16 17:25:03 +00:00
Eugene Leviant c9724d9149 [ThinLTO] Fix data race in test
llvm-svn: 325361
2018-02-16 16:56:33 +00:00
Brian M. Rzycki f1a7df5ef2 [JumpThreading] PR36133 enable/disable DominatorTree for LVI analysis
Summary:
The LazyValueInfo pass caches a copy of the DominatorTree when available.
Whenever there are pending DominatorTree updates within JumpThreading's
DeferredDominance object we cannot use the cached DT for LVI analysis.
This commit adds the new methods enableDT() and disableDT() to LVI.
JumpThreading also sets the appropriate usage model before calling LVI
analysis methods.

Fixes https://bugs.llvm.org/show_bug.cgi?id=36133

Reviewers: sebpop, dberlin, kuhar

Reviewed by: sebpop, kuhar

Subscribers: uabelho, llvm-commits, aprantl, hiraditya, a.elovikov

Differential Revision: https://reviews.llvm.org/D42717

llvm-svn: 325356
2018-02-16 16:35:17 +00:00
Ivan A. Kosarev 53270d0fa6 [Transforms] Propagate TBAA info in SROA
Now that we have the new TBAA metadata format that is capable of
representing accesses to aggregates, we can propagate TBAA access
tags from memory setting and transferring intrinsics to load and
store instructions and vice versa.

Since SROA produces lots of new loads and stores on optimized
builds, this change significantly decreases the share of
undecorated memory accesses on such builds.

Differential Revision: https://reviews.llvm.org/D41563

llvm-svn: 325329
2018-02-16 10:10:29 +00:00
Eugene Leviant 7331a0bf1c [ThinLTO] Import global variables
Differential revision: https://reviews.llvm.org/D43077

llvm-svn: 325320
2018-02-16 08:11:04 +00:00
Vedant Kumar 3dc6de619a Remove brittle check lines from a test, NFC
llvm-svn: 325310
2018-02-16 01:21:01 +00:00
Vedant Kumar 616fdb00df [GVN] Partially revert debug info salvage change (r325063)
In r325063, we salvaged debug values from dying instructions in
GVN::processBlock() and GVN::performScalarPRE().

The change in performScalarPRE(), while correct, is unhelpful. It
introduced a call to salvageDebugInfo() which was immediately followed
by a RAUW, meaning it prevented the RAUW from efficiently updating
dbg.value intrinsics.  This commit reverts the mistake and tightens up
the affected test case.

llvm-svn: 325308
2018-02-16 01:15:20 +00:00
Vedant Kumar 1df820ecd7 [DCE] Salvage debug info from dead insts
This results in small increases in the size of the .debug_loc section
and the number of unique source variables in a stage2 build of opt.

llvm-svn: 325301
2018-02-15 22:26:18 +00:00
Brian Gesiak a5e3675bd3 [Coroutines] Don't move stores for allocator args
Summary:
The behavior described in Coroutines TS `[dcl.fct.def.coroutine]/7`
allows coroutine parameters to be passed into allocator functions.
The instructions to store values into the alloca'd parameters must not
be moved past the frame allocation, otherwise uninitialized values are
passed to the allocator.

Test Plan: `check-llvm`

Reviewers: rsmith, GorNishanov, eric_niebler

Reviewed By: GorNishanov

Subscribers: compnerd, EricWF, llvm-commits

Differential Revision: https://reviews.llvm.org/D43000

llvm-svn: 325285
2018-02-15 19:31:45 +00:00
Vedant Kumar 24953dc876 [SCCP] Test that constant propagation updates debug info, NFC
This extends an existing test to check that SCCP updates the operands of
relevant dbg.value instructions as it does its work.

llvm-svn: 325281
2018-02-15 19:13:04 +00:00
Alexey Bataev 862c476fc2 [SLP] Fix the test for the reversed stores, NFC.
llvm-svn: 325268
2018-02-15 17:11:50 +00:00
Alexey Bataev ac619599d8 [SLP] Added test for reversed stores, NFC.
llvm-svn: 325265
2018-02-15 16:56:49 +00:00
Sanjay Patel 9174416e89 [InstCombine] test fdiv folds better; NFC
We had redundant tests, but no tests for extra uses or vectors.
'fast' is an overly conservative requirement for these folds.

llvm-svn: 325262
2018-02-15 16:28:15 +00:00
Sanjay Patel 339b4d338d [InstCombine] allow sin/cos transforms with 'reassoc'
The variable name 'AllowReassociate' is a lie at this point because
it's set to 'isFast()' which is more than the 'reassoc' FMF after
rL317488.

In D41286, we showed that this transform may be valid even with strict
math by brute force checking every 32-bit float result.

There's a potential problem here because we're replacing with a tan()
libcall rather than a hypothetical LLVM tan intrinsic. So we might
set errno when we should be guaranteed not to do that. But that's
independent of this change.

llvm-svn: 325247
2018-02-15 15:07:12 +00:00
Sanjay Patel 6a0f667077 [InstCombine] allow X / C -> X * (1.0/C) for vector splat FP constants
llvm-svn: 325237
2018-02-15 13:55:52 +00:00
Max Kazantsev 6e4ce23add [NFC] Fix metadata placement in test
llvm-svn: 325215
2018-02-15 07:13:18 +00:00
Max Kazantsev c5941d12f4 [SCEV] Favor isKnownViaSimpleReasoning over constant ranges check
There is a more powerful but still simple function `isKnownViaSimpleReasoning ` that
does constant range check and few more additional checks. We use it some places (e.g.
when proving implications) and in some other places we only check constant ranges.

Currently, indvar simplifier fails to remove the check in following loop:

  int inc = ...;
  for (int i = inc, j = inc - 1; i < 200; ++i, ++j)
    if (i > j) { ... }

This patch replaces all usages of `isKnownPredicateViaConstantRanges` with
`isKnownViaSimpleReasoning` to have smarter proofs. In particular, it fixes the
case above.

Reviewed-By: sanjoy
Differential Revision: https://reviews.llvm.org/D43175

llvm-svn: 325214
2018-02-15 07:09:00 +00:00
Sanjay Patel af5d499cb9 [InstCombine] add tests and comments for fdiv X, C; NFC
llvm-svn: 325161
2018-02-14 19:54:51 +00:00
Craig Topper 1c19cc1745 [InstCombine] Don't fold select(C, Z, binop(select(C, X, Y), W)) -> select(C, Z, binop(Y, W)) if the binop is rem or div.
The select may have been preventing a division by zero or INT_MIN/-1 so removing it might not be safe.

Fixes PR36362.

Differential Revision: https://reviews.llvm.org/D43276

llvm-svn: 325148
2018-02-14 18:08:33 +00:00
Sanjay Patel 48f671bb27 [InstCombine] regenerate checks; NFC
llvm-svn: 325144
2018-02-14 17:37:32 +00:00
Alexey Bataev 7f246e003a [SLP] Allow vectorization of reversed loads.
Summary:
Reversed loads are handled as gathering. But we can just reshuffle
these values. Patch adds support for vectorization of reversed loads.

Reviewers: RKSimon, spatel, mkuper, hfinkel

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D43022

llvm-svn: 325134
2018-02-14 15:29:15 +00:00
Florian Hahn b4e3bad89b Recommit r325001: [CallSiteSplitting] Support splitting of blocks with instrs before call.
For basic blocks with instructions between the beginning of the block
and a call we have to duplicate the instructions before the call in all
split blocks and add PHI nodes for uses of the duplicated instructions
after the call.

Currently, the threshold for the number of instructions before a call
is quite low, to keep the impact on binary size low.

Reviewers: junbuml, mcrosier, davidxl, davide

Reviewed By: junbuml

Differential Revision: https://reviews.llvm.org/D41860

llvm-svn: 325126
2018-02-14 13:59:12 +00:00
Florian Hahn c6296fea3f [LoopInterchange] Incrementally update the dominator tree.
We can use incremental dominator tree updates to avoid re-calculating
the dominator tree after interchanging 2 loops.

Reviewers: dmgreen, kuhar

Reviewed By: kuhar

Differential Revision: https://reviews.llvm.org/D43176

llvm-svn: 325122
2018-02-14 13:13:15 +00:00
Petar Jovanovic 1768957c82 [Utils] Salvage the debug info of DCE'ed 'and' instructions
Preserve debug info from a dead 'and' instruction with a constant.

Patch by Djordje Todorovic.

Differential Revision: https://reviews.llvm.org/D43163

llvm-svn: 325119
2018-02-14 13:10:35 +00:00
Elena Demikhovsky 945b7e5aa6 Adding a width of the GEP index to the Data Layout.
Making a width of GEP Index, which is used for address calculation, to be one of the pointer properties in the Data Layout.
p[address space]:size:memory_size:alignment:pref_alignment:index_size_in_bits.
The index size parameter is optional, if not specified, it is equal to the pointer size.

Till now, the InstCombiner normalized GEPs and extended the Index operand to the pointer width.
It works fine if you can convert pointer to integer for address calculation and all registered targets do this.
But some ISAs have very restricted instruction set for the pointer calculation. During discussions were desided to retrieve information for GEP index from the Data Layout.
http://lists.llvm.org/pipermail/llvm-dev/2018-January/120416.html

I added an interface to the Data Layout and I changed the InstCombiner and some other passes to take the Index width into account.
This change does not affect any in-tree target. I added tests to cover data layouts with explicitly specified index size.

Differential Revision: https://reviews.llvm.org/D42123

llvm-svn: 325102
2018-02-14 06:58:08 +00:00
Sanjay Patel 3ce76ad26f [InstCombine] put tests of mul with neg operand(s) together; NFC
llvm-svn: 325066
2018-02-13 23:02:12 +00:00
Vedant Kumar 1d5d31b706 [GVN] Salvage debug info from dead insts
This preserves an additional 581 unique source variables in a stage2
build of clang (according to `llvm-dwarfdump --statistics`). It
increases the size of the .debug_loc section by 0.1% (or 87139 bytes).

Differential Revision: https://reviews.llvm.org/D43255

llvm-svn: 325063
2018-02-13 22:27:17 +00:00
Sanjay Patel 7558d860af [InstCombine] (lshr X, 31) * Y --> (ashr X, 31) & Y
This replaces the bit-tracking based fold that did the same thing,
but it only worked for scalars and not directly. 

There is no evidence in existing regression tests that the greater 
power of bit-tracking was needed here, but we should be aware of 
this potential loss of optimization.

llvm-svn: 325062
2018-02-13 22:24:37 +00:00
Sanjay Patel fdb3b036cc [InstCombine] add vector tests, fix comments; NFC
The scalar folds are done indirectly and use potentially
expensive value tracking calls. That can be improved
along with the enhancement to support vector types.

llvm-svn: 325051
2018-02-13 21:19:42 +00:00
Sanjay Patel cb8ac00f73 [InstCombine] (bool X) * Y --> X ? Y : 0
This is both a functional improvement for vectors and an
efficiency improvement for scalars. The existing code below
the new folds does the same thing for scalars, but in an 
indirect and expensive way.

llvm-svn: 325048
2018-02-13 20:41:22 +00:00
Sanjay Patel 2e2958497f [InstCombine] fix test comment and add vector test; NFC
llvm-svn: 325039
2018-02-13 18:48:27 +00:00
Sanjay Patel b13fcd52ed [InstCombine, InstSimplify] (re)move tests, regenerate checks; NFC
The InstCombine integer mul test file had tests that belong in InstSimplify 
(including fmul tests). Move things to where they belong and auto-generate
complete checks for everything.

llvm-svn: 325037
2018-02-13 18:22:53 +00:00
Vedant Kumar 35fc103e1e [DeadStoreElimination] Salvage debug info from dead insts
According to `llvm-dwarfdump --statistics` this salvages 43 additional
unique source variables in a stage2 build of clang. It increases the
size of the .debug_loc section by 0.002% (or 2864 bytes).

Differential Revision: https://reviews.llvm.org/D43220

llvm-svn: 325035
2018-02-13 18:15:26 +00:00
Yaxun Liu 0124b5484c [AMDGPU] Change constant addr space to 4
Differential Revision: https://reviews.llvm.org/D43170

llvm-svn: 325030
2018-02-13 18:00:25 +00:00
Florian Hahn 35d744d388 Revert r325001: [CallSiteSplitting] Support splitting of blocks with instrs before call.
Due to memsan not being happy with the array of ValueToValue maps.

llvm-svn: 325009
2018-02-13 14:48:39 +00:00
Ivan A. Kosarev 4a381b444e [IR] Fix creating mutable versions of TBAA access tags
Due to a typo in D41565, mutable TBAA tags created with
createMutableTBAAAccessTag() lose their base types. This patch
fixes that typo and updates tests respectively.

Differential Revision: https://reviews.llvm.org/D42364

llvm-svn: 325008
2018-02-13 14:44:25 +00:00
Florian Hahn b0884b6443 [CallSiteSplitting] Support splitting of blocks with instrs before call.
For basic blocks with instructions between the beginning of the block
and a call we have to duplicate the instructions before the call in all
split blocks and add PHI nodes for uses of the duplicated instructions
after the call.

Currently, the threshold for the number of instructions before a call
is quite low, to keep the impact on binary size low.

Reviewers: junbuml, mcrosier, davidxl, davide

Reviewed By: junbuml

Differential Revision: https://reviews.llvm.org/D41860

llvm-svn: 325001
2018-02-13 12:00:48 +00:00
Florian Hahn 1f95ef1815 [LoopInterchange] Check number of latch successors before accessing them.
In cases where the OuterMostLoopLatchBI only has a single successor,
accessing the second successor will fail.

This fixes a failure when building the test-suite with loop-interchange
enabled.

Reviewers: mcrosier, karthikthecool, davide

Reviewed by: karthikthecool

Differential Revision: https://reviews.llvm.org/D42906

llvm-svn: 324994
2018-02-13 10:02:52 +00:00
Vedant Kumar 388fac5de6 [Utils] Salvage debug info from all no-op casts
We already try to salvage debug values from no-op bitcasts and inttoptr
instructions: we should handle ptrtoint instructions as well.

This saves an additional 24,444 debug values in a stage2 build of clang,
and (according to llvm-dwarfdump --statistics) provides an additional
289 unique source variables.

llvm-svn: 324982
2018-02-13 03:34:23 +00:00
Vedant Kumar 4011c26cc7 [Utils] Salvage debug info of DCE'ed mul/sdiv/srem instructions
Here are the number of additional debug values salvaged in a stage2
build of clang:

  63 SALVAGE: MUL
  1250 SALVAGE: SDIV

(No values were salvaged from `srem` instructions in this experiment,
but it's a simple case to handle so we might as well.)

llvm-svn: 324976
2018-02-13 01:09:52 +00:00
Vedant Kumar 31ec356a48 [Utils] Salvage debug info of DCE'ed shl/lhsr/ashr instructions
Here are the number of additional debug values salvaged in a stage2
build of clang:

  1912 SALVAGE: ASHR
   405 SALVAGE: LSHR
   249 SALVAGE: SHL

llvm-svn: 324975
2018-02-13 01:09:49 +00:00
Vedant Kumar 47b16c45d7 [Utils] Salvage the debug info of DCE'ed 'sub' instructions
This salvages 14 debug values in a stage2 build of clang.

llvm-svn: 324974
2018-02-13 01:09:47 +00:00
Vedant Kumar 96b7dc041b [Utils] Salvage the debug info of DCE'ed 'xor' instructions
This salvages 259 debug values in a stage2 build of clang.

Differential Revision: https://reviews.llvm.org/D43207

llvm-svn: 324973
2018-02-13 01:09:46 +00:00
Sanjay Patel 246d769232 [InstSimplify] allow exp/log simplifications with only 'reassoc' FMF
These intrinsic folds were added with D41381, but only allowed with isFast().
That's more than necessary because FMF has 'reassoc' to apply to these
kinds of folds after D39304, and that's all we need in these cases.

Differential Revision: https://reviews.llvm.org/D43160

llvm-svn: 324967
2018-02-12 23:51:23 +00:00
Sanjay Patel c5d5933bd5 [InstSimplify] change tests to 'fast' to reflect current folds
The diff to use 'reassoc' is part of D43160; it should not have
been made with rL324961. Reverting that part here, so we'll
see the intended diff with the code change.

llvm-svn: 324963
2018-02-12 23:39:10 +00:00
Sanjay Patel de3e889a88 [InstSimplify] consolidate tests for log-exp inverse folds
Some tests didn't add much value because we already show stronger
constraints for the folds in other tests, so the weaker versions
were deleted.

Moved the remaining tests into 1 file because the folds are 
very similar and handled from 1 place in the code.

llvm-svn: 324961
2018-02-12 23:18:11 +00:00
Daniel Neilson 2363da9236 [InstCombine] Simplify MemTransferInst's source and dest alignments separately
Summary:
This change is part of step five in the series of changes to remove alignment argument from
memcpy/memmove/memset in favour of alignment attributes. In particular, this changes the
InstCombine pass to cease using the deprecated MemoryIntrinsic::getAlignment() method, and
instead we use the separate getSourceAlignment and getDestAlignment APIs to simplify
the source and destination alignment attributes separately.

Steps:
Step 1) Remove alignment parameter and create alignment parameter attributes for
memcpy/memmove/memset. ( rL322965, rC322964, rL322963 )
Step 2) Expand the IRBuilder API to allow creation of memcpy/memmove with differing
source and dest alignments. ( rL323597 )
Step 3) Update Clang to use the new IRBuilder API. ( rC323617 )
Step 4) Update Polly to use the new IRBuilder API. ( rL323618 )
Step 5) Update LLVM passes that create memcpy/memmove calls to use the new IRBuilder API,
and those that use use MemIntrinsicInst::[get|set]Alignment() to use [get|set]DestAlignment()
and [get|set]SourceAlignment() instead. ( rL323886, rL323891, rL324148, rL324273, rL324278,
rL324384, rL324395, rL324402, rL324626, rL324642, rL324653, rL324654, rL324773, rL324774,
rL324781, rL324784, rL324955 )
Step 6) Remove the single-alignment IRBuilder API for memcpy/memmove, and the
MemIntrinsicInst::[get|set]Alignment() methods.

Reference
   http://lists.llvm.org/pipermail/llvm-dev/2015-August/089384.html
   http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20151109/312083.html

Reviewers: majnemer, bollu, efriedma

Reviewed By: efriedma

Subscribers: efriedma, llvm-commits

Differential Revision: https://reviews.llvm.org/D42871

llvm-svn: 324960
2018-02-12 23:06:55 +00:00
Daniel Neilson 095d72989d [SafeStack] Use updated CreateMemCpy API to set more accurate source and destination alignments.
Summary:
This change is part of step five in the series of changes to remove alignment argument from
memcpy/memmove/memset in favour of alignment attributes. In particular, this changes the
creation of memcpys in the SafeStack pass to set the alignment of the destination object to
its stack alignment while separately setting the source byval arguments alignment to its
alignment.

Steps:
Step 1) Remove alignment parameter and create alignment parameter attributes for
memcpy/memmove/memset. ( rL322965, rC322964, rL322963 )
Step 2) Expand the IRBuilder API to allow creation of memcpy/memmove with differing
source and dest alignments. ( rL323597 )
Step 3) Update Clang to use the new IRBuilder API. ( rC323617 )
Step 4) Update Polly to use the new IRBuilder API. ( rL323618 )
Step 5) Update LLVM passes that create memcpy/memmove calls to use the new IRBuilder API,
and those that use use MemIntrinsicInst::[get|set]Alignment() to use [get|set]DestAlignment()
and [get|set]SourceAlignment() instead. (rL323886, rL323891, rL324148, rL324273, rL324278,
rL324384, rL324395, rL324402, rL324626, rL324642, rL324653, rL324654, rL324773, rL324774,
rL324781, rL324784 )
Step 6) Remove the single-alignment IRBuilder API for memcpy/memmove, and the
MemIntrinsicInst::[get|set]Alignment() methods.

Reference
   http://lists.llvm.org/pipermail/llvm-dev/2015-August/089384.html
   http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20151109/312083.html

Reviewers: eugenis, bollu

Reviewed By: eugenis

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D42710

llvm-svn: 324955
2018-02-12 22:39:47 +00:00
Vedant Kumar a8b8d3225b Move the debuginfo-dce-or test into debuginfo-variables.ll, NFC
llvm-svn: 324933
2018-02-12 21:02:45 +00:00
Sanjay Patel 4a4f35f324 [InstCombine] X / (X * Y) --> 1.0 / Y
This is similar to the instsimplify fold added with D42385 
( rL323716 )
...but this can't be in instsimplify because we're creating/morphing
a different instruction.

llvm-svn: 324927
2018-02-12 19:39:21 +00:00
Sanjay Patel ee4257f676 [InstCombine] add tests for missing fdiv fold; NFC
llvm-svn: 324926
2018-02-12 19:23:39 +00:00
Sanjay Patel e4fcb00290 [InstCombine] regenerate checks; NFC
llvm-svn: 324924
2018-02-12 19:14:01 +00:00
Jun Bum Lim 144eb593dd [LICM] update BlockColors after splitting predecessors
Update BlockColors after splitting predecessors. Do not allow splitting
EHPad for sinking when the BlockColors is not empty, so we can
simply assign predecessor's color to the new block.

Fixes PR36184

llvm-svn: 324916
2018-02-12 17:56:55 +00:00
Alexey Bataev ca2396e673 [SLP] Take user instructions cost into consideration in insertelement vectorization.
Summary:
For better vectorization result we should take into consideration the
cost of the user insertelement instructions when we try to
vectorize sequences that build the whole vector. I.e. if we have the
following scalar code:
```
<Scalar code>
insertelement <ScalarCode>, ...
```
we should consider the cost of the last `insertelement ` instructions as
the cost of the scalar code.

Reviewers: RKSimon, spatel, hfinkel, mkuper

Subscribers: javed.absar, llvm-commits

Differential Revision: https://reviews.llvm.org/D42657

llvm-svn: 324893
2018-02-12 14:54:48 +00:00
Sanjay Patel 510d647a4d [InstCombine] X / (X * Y) -> 1 / Y if the multiplication does not overflow
The related cases for (X * Y) / X were handled in rL124487.

https://rise4fun.com/Alive/6k9

The division in these tests is subsequently eliminated by existing instcombines
for 1/X.

llvm-svn: 324843
2018-02-11 17:20:32 +00:00
Sanjay Patel aee107f30d [InstCombine] add tests for div-mul folds; NFC
The related cases for (X * Y) / X were handled in rL124487.

llvm-svn: 324840
2018-02-11 16:52:44 +00:00
Craig Topper 4dccffc84a [X86] Change signatures of avx512 packed fp compare intrinsics to return a vXi1 mask type to be closer to an fcmp.
Summary:
This patch changes the signature of the avx512 packed fp compare intrinsics to return a vXi1 vector and no longer take a mask as input. The casts to scalar type will now need to be explicit in the IR. The masking node will now be an explicit and in the IR.

This makes the intrinsic look much more similar to an fcmp instruction that we wish we could use for these but can't. We already use icmp instructions for integer compares.

Previously the lowering step of isel would turn the intrinsic into an X86 specific ISD node and a emit the masking nodes as well as some bitcasts. This means DAG combines can't see the vXi1 type until somewhat late, making it more difficult to combine out gpr<->mask transition sequences. By exposing the vXi1 type explicitly in the IR and initial SelectionDAG we give earlier DAG combines and even InstCombine the chance to see it and optimize it.

This should make any issues with gpr<->mask sequences the same between integer and fp. Meaning we only have to fix them once.

Reviewers: spatel, delena, RKSimon, zvi

Reviewed By: RKSimon

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D43137

llvm-svn: 324827
2018-02-10 23:33:55 +00:00
Simon Pilgrim 19495198af [InstCombine] Add constant vector support for ~(C >> Y) --> ~C >> Y
Includes adding m_NonNegative constant pattern matcher

llvm-svn: 324825
2018-02-10 21:46:09 +00:00
Mircea Trofin 73b96d6dcf [LV] Fix analyzeInterleaving when -pass-remarks enabled
Summary:
If -pass-remarks=loop-vectorize, atomic ops will be seen by
analyzeInterleaving(), even though canVectorizeMemory() == false. This
is because we are requesting extra analysis instead of bailing out.

In such a case, we end up with a Group in both Load- and StoreGroups,
and then we'll try to access freed memory when traversing LoadGroups after having had released the Group when  iterating over StoreGroups.

The fix is to include mayWriteToMemory() when validating that two
instructions are the same kind of memory operation.

Reviewers: mssimpso, davidxl

Reviewed By: davidxl

Subscribers: hsaito, fhahn, llvm-commits

Differential Revision: https://reviews.llvm.org/D43064

llvm-svn: 324786
2018-02-10 00:07:45 +00:00
Vedant Kumar 04386d8e3d [Utils] Salvage debug info from dead 'or' instructions
Extend salvageDebugInfo to preserve the debug info from a dead 'or'
with a constant.

Patch by Ismail Badawi!

Differential Revision: https://reviews.llvm.org/D43129

llvm-svn: 324764
2018-02-09 19:19:55 +00:00
Simon Pilgrim 0919a8c130 [InstCombine] Add vector xor tests
This doesn't cover everything in InstCombiner.visitXor yet, but increases coverage for a lot of tests

llvm-svn: 324753
2018-02-09 17:45:45 +00:00
Simon Pilgrim 9620f4b746 [InstCombine] Add constant vector support for X udiv C, where C >= signbit
llvm-svn: 324728
2018-02-09 10:43:59 +00:00
Dmitry Mikulin 87e1c4c8de Minor tweak to test case.
llvm-svn: 324670
2018-02-08 23:10:07 +00:00
Dmitry Mikulin 5cf73cea9c [ThinLTO] Skip BlockAddresses while replacing uses in function import.
Differential Revision: https://reviews.llvm.org/D43027

llvm-svn: 324658
2018-02-08 22:14:56 +00:00
Simon Pilgrim 3f462e90cc Regenerate test
llvm-svn: 324639
2018-02-08 19:28:05 +00:00
Simon Pilgrim f30656add3 [InstCombine] Add vector udiv tests
Tests for X udiv C, where C >= signbit

llvm-svn: 324635
2018-02-08 18:58:00 +00:00
Simon Pilgrim 1889f26b94 [InstCombine] Add m_Negative pattern matching
Allows us to add non-uniform constant vector support for "X urem C -> X < C ? X : X - C, where C >= signbit."

llvm-svn: 324631
2018-02-08 18:36:01 +00:00
Simon Pilgrim 11a02589c1 [InstCombine] Add vector urem tests.
Improve coverage of InstCombiner::visitURem for vector types

llvm-svn: 324629
2018-02-08 18:10:08 +00:00
Simon Pilgrim ab689cb638 [InstCombine] Regenerate vector mul tests.
llvm-svn: 324627
2018-02-08 17:54:24 +00:00
Daniel Neilson fb99a493be [LoopIdiom] Be more aggressive when setting alignment in memcpy
Summary:
This change is part of step five in the series of changes to remove alignment argument from
memcpy/memmove/memset in favour of alignment attributes. In particular, this changes the
LoopIdiom pass to cease using the old IRBuilder CreateMemCpy single-alignment APIs in
favour of the new API that allows setting source and destination alignments independently.
This allows us to be slightly more aggressive in setting the alignment of memcpy calls that
loop idiom creates.

Steps:
Step 1) Remove alignment parameter and create alignment parameter attributes for
memcpy/memmove/memset. ( rL322965, rC322964, rL322963 )
Step 2) Expand the IRBuilder API to allow creation of memcpy/memmove with differing
source and dest alignments. ( rL323597 )
Step 3) Update Clang to use the new IRBuilder API. ( rC323617 )
Step 4) Update Polly to use the new IRBuilder API. ( rL323618 )
Step 5) Update LLVM passes that create memcpy/memmove calls to use the new IRBuilder API,
and those that use use MemIntrinsicInst::[get|set]Alignment() to use [get|set]DestAlignment()
and [get|set]SourceAlignment() instead. ( rL323886, rL323891, rL324148, rL324273, rL324278,
rL324384, rL324395, rL324402 )
Step 6) Remove the single-alignment IRBuilder API for memcpy/memmove, and the
MemIntrinsicInst::[get|set]Alignment() methods.

Reference
   http://lists.llvm.org/pipermail/llvm-dev/2015-August/089384.html
   http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20151109/312083.html

llvm-svn: 324626
2018-02-08 17:33:08 +00:00
Sanjay Patel 574fb73c89 [SLPVectorizer] auto-generate complete checks; NFC
llvm-svn: 324616
2018-02-08 15:32:28 +00:00
Sanjay Patel 124392f038 [SLPVectorizer] auto-generate complete checks; NFC
llvm-svn: 324615
2018-02-08 15:30:39 +00:00
Sanjay Patel e2c5e9a970 [SLPVectorizer] move RUN line to top-of-file; NFC
I was confused what we were checking because the RUN line was
in the middle of the file.

llvm-svn: 324614
2018-02-08 15:28:49 +00:00
Simon Pilgrim 2a90acd17a [InstCombine] Fix issue with X udiv (POW2_C1 << N) for non-splat constant vectors
foldUDivShl was assuming that the input was a scalar or a splat constant

llvm-svn: 324613
2018-02-08 15:19:38 +00:00
Sanjay Patel cfa5c03039 [SLPVectorizer] auto-generate complete checks; NFC
llvm-svn: 324612
2018-02-08 15:16:26 +00:00
Sanjay Patel 42b8c23cc6 [LoopVectorize] auto-generate complete checks; NFC
llvm-svn: 324611
2018-02-08 15:13:47 +00:00
Sanjay Patel a60aec1ab7 [ValueTracking] don't crash when assumptions conflict (PR36270)
The last assume in the test says that %B12 is 0. 
The first assume says that %and1 is less than %B12. 
Therefore, %and1 is unsigned less than 0...does not compute.

That means this line:
Known.Zero.setHighBits(RHSKnown.countMinLeadingZeros() + 1);
...tries to set more bits than exist.

Differential Revision: https://reviews.llvm.org/D43052

llvm-svn: 324610
2018-02-08 14:52:40 +00:00
Simon Pilgrim 94cc89d5f2 [InstCombine] Fix issue with X udiv 2^C -> X >> C for non-splat constant vectors
foldUDivPow2Cst was assuming that the input was a scalar or a splat constant

llvm-svn: 324608
2018-02-08 14:46:10 +00:00
Simon Pilgrim 0b9f3912ce [InstCombine] Improve mul(x, pow2) -> shl combine for vector constants
Refactor getLogBase2Vector into getLogBase2 to accept all scalars/vectors. Generalize from ConstantDataVector to support all constant vectors.

llvm-svn: 324603
2018-02-08 14:10:01 +00:00
Serguei Katkov c8016e7a65 [Loop Predication] Teach LP about reverse loops with uge and sge latch conditions
Add support of uge and sge latch condition to Loop Prediction for
reverse loops.

Reviewers: apilipenko, mkazantsev, sanjoy, anna
Reviewed By: anna
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D42837

llvm-svn: 324589
2018-02-08 10:34:08 +00:00
Serguei Katkov 66182d6c38 [SimplifyCFG] Re-apply Relax restriction for folding unconditional branches
The commit rL308422 introduces a restriction for folding unconditional
branches. Specifically if empty block with unconditional branch leads to
header of the loop then elimination of this basic block is prohibited.
However it seems this condition is redundantly strict.
If elimination of this basic block does not introduce more back edges
then we can eliminate this block.

The patch implements this relax of restriction.

The test profile/Linux/counter_promo_nest.c in compiler-rt project
is updated to meet this change.

Reviewers: efriedma, mcrosier, pacxx, hsung, davidxl	
Reviewed By: pacxx
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D42691

llvm-svn: 324572
2018-02-08 07:16:29 +00:00
Mircea Trofin 06ac8cfbd1 Verify profile data confirms large loop trip counts.
Summary:
Loops with inequality comparers, such as:

   // unsigned bound
   for (unsigned i = 1; i < bound; ++i) {...}

have getSmallConstantMaxTripCount report a large maximum static
trip count - in this case, 0xffff fffe. However, profiling info
may show that the trip count is much smaller, and thus
counter-recommend vectorization.

This change:
- flips loop-vectorize-with-block-frequency on by default.
- validates profiled loop frequency data supports vectorization,
  when static info appears to not counter-recommend it. Absence
  of profile data means we rely on static data, just as we've
  done so far.

Reviewers: twoh, mkuper, davidxl, tejohnson, Ayal

Reviewed By: davidxl

Subscribers: bkramer, llvm-commits

Differential Revision: https://reviews.llvm.org/D42946

llvm-svn: 324543
2018-02-07 23:29:52 +00:00
Alexey Bataev cd8d6de381 [SLP] Add a tests for PR36280, NFC.
llvm-svn: 324510
2018-02-07 20:11:37 +00:00
Max Kazantsev b299ade2c5 Re-enable "[SCEV] Make isLoopEntryGuardedByCond a bit smarter"
The failures happened because of assert which was overconfident about
SCEV's proving capabilities and is generally not valid.

Differential Revision: https://reviews.llvm.org/D42835

llvm-svn: 324473
2018-02-07 11:16:29 +00:00
Serguei Katkov 69246ca787 Revert [SCEV] Make isLoopEntryGuardedByCond a bit smarter
Revert rL324453 commit which causes buildbot failures.

Differential Revision: https://reviews.llvm.org/D42835

llvm-svn: 324462
2018-02-07 09:10:08 +00:00
Max Kazantsev dd5ee6f5d9 [SCEV] Make isLoopEntryGuardedByCond a bit smarter
Sometimes `isLoopEntryGuardedByCond` cannot prove predicate `a > b` directly.
But it is a common situation when `a >= b` is known from ranges and `a != b` is
known from a dominating condition. Thia patch teaches SCEV to sum these facts
together and prove strict comparison via non-strict one.

Differential Revision: https://reviews.llvm.org/D42835

llvm-svn: 324453
2018-02-07 07:56:26 +00:00
Michael Zolotukhin cae66ba5f8 The xfailed test from r324448 passed on one of the bots: remove it entirely for now.
llvm-svn: 324451
2018-02-07 06:54:11 +00:00
Michael Zolotukhin 1713dd5b8d Xfail the test added in r324445 until the underlying issue in LoopSink is fixed.
llvm-svn: 324448
2018-02-07 06:11:50 +00:00
Michael Zolotukhin e82e83fcce Follow-up for r324429: "[LCSSAVerification] Run verification only when asserts are enabled."
Before r324429 we essentially didn't have a verification of LCSSA, so
no wonder that it has been broken: currently loop-sink breaks it (the
attached test illustrates the failure).

It was detected during a stage2 RA build, so to unbreak it I'm disabling
the check for now.

llvm-svn: 324445
2018-02-07 04:24:44 +00:00
Alexey Bataev 1e593fe73e [SLP] Update test checks, NFC.
llvm-svn: 324387
2018-02-06 20:00:05 +00:00
Simon Pilgrim 9f2ae7e2d1 [InstCombine][ValueTracking] Match non-uniform constant power-of-two vectors
Generalize existing constant matching to work with non-uniform constant vectors as well.

Differential Revision: https://reviews.llvm.org/D42818

llvm-svn: 324369
2018-02-06 18:39:23 +00:00
Simon Pilgrim e11c64162c Regenerate vector-urem test. NFCI.
llvm-svn: 324357
2018-02-06 16:10:12 +00:00
Clement Courbet a7a1746865 [MergeICmps] Handle chains with several complex BCE basic blocks.
- Fix condition for detecting that a complex basic block was the first in
   the chain.
 - Add tests.

This was caught by buildbots when submitting rL324319.

llvm-svn: 324341
2018-02-06 12:25:33 +00:00
Hiroshi Inoue ba3585eaf2 [ThinLTO] fix test failure without x86 backend
This patch moves ThinLTOBitcodeWriter/module-asm.ll test case into x86 directory to avoid a test failure when x86 backend is not enabled.

llvm-svn: 324316
2018-02-06 07:03:09 +00:00
Peter Collingbourne 29c6f4833c ThinLTOBitcodeWriter: Do not include module-level inline asm in the merged module.
If the inline asm provides the definition of a symbol, this can result
in duplicate symbol errors.

Differential Revision: https://reviews.llvm.org/D42944

llvm-svn: 324313
2018-02-06 03:29:18 +00:00
Teresa Johnson 791c98e4c8 [ThinLTO] Remove dead and dropped symbol declarations when possible
Summary:
Removing the dropped symbols will prevent indirect call promotion in the
ThinLTO Backend from adding a new reference to a symbol, which can
result in linker unsats. This can happen when we compile with a sample
profile collected from one binary by used for another, which may have
profiled targets that aren't used in the new binary.

Note that until dropDeadSymbols handles variables and aliases (in
progress), we may not be able to remove the declaration and can still
have an issue.

Reviewers: grimar, davidxl

Subscribers: mehdi_amini, inglorion, llvm-commits, eraman

Differential Revision: https://reviews.llvm.org/D42816

llvm-svn: 324299
2018-02-06 00:43:39 +00:00
Sanjay Patel d7c702b451 [LoopStrengthReduce, x86] don't add cost for a cmp that will be macro-fused (PR35681)
In the motivating case from PR35681 and represented by the macro-fuse-cmp test:
https://bugs.llvm.org/show_bug.cgi?id=35681
...there's a 37 -> 31 byte size win for the loop because we eliminate the big base 
address offsets.

SPEC2017 on Ryzen shows no significant perf difference.

Differential Revision: https://reviews.llvm.org/D42607

llvm-svn: 324289
2018-02-05 23:43:05 +00:00
Sanjay Patel 49aafec2e6 [InstCombine] don't try to evaluate instructions with >1 use (revert r324014)
This example causes a compile-time explosion:

define i16 @foo(i16 %in) {
  %x = zext i16 %in to i32
  %a1 = mul i32 %x, %x
  %a2 = mul i32 %a1, %a1
  %a3 = mul i32 %a2, %a2
  %a4 = mul i32 %a3, %a3
  %a5 = mul i32 %a4, %a4
  %a6 = mul i32 %a5, %a5
  %a7 = mul i32 %a6, %a6
  %a8 = mul i32 %a7, %a7
  %a9 = mul i32 %a8, %a8
  %a10 = mul i32 %a9, %a9
  %a11 = mul i32 %a10, %a10
  %a12 = mul i32 %a11, %a11
  %a13 = mul i32 %a12, %a12
  %a14 = mul i32 %a13, %a13
  %a15 = mul i32 %a14, %a14
  %a16 = mul i32 %a15, %a15
  %a17 = mul i32 %a16, %a16
  %a18 = mul i32 %a17, %a17
  %a19 = mul i32 %a18, %a18
  %a20 = mul i32 %a19, %a19
  %a21 = mul i32 %a20, %a20
  %a22 = mul i32 %a21, %a21
  %a23 = mul i32 %a22, %a22
  %a24 = mul i32 %a23, %a23
  %T = trunc i32 %a24 to i16
  ret i16 %T
}

 

llvm-svn: 324276
2018-02-05 21:50:32 +00:00
Sanjay Patel 1c84dd9a8f [InstCombine] add test corresponding to r324252 (PR36225); NFC
As PR36225 shows, we definitely don't want to enable the
canEvaluate* logic with phis. 

There's still a question of whether we should just revert 
r324014 completely because it exposes a compile-time sinkhole
(although that problem might exist independently).

llvm-svn: 324266
2018-02-05 19:59:52 +00:00
Sanjay Patel e9a153f414 [InstCombine] add unsigned saturation subtraction canonicalizations
This is the instcombine part of unsigned saturation canonicalization.
Backend patches already commited: 
https://reviews.llvm.org/D37510
https://reviews.llvm.org/D37534

It converts unsigned saturated subtraction patterns to forms recognized 
by the backend:
(a > b) ? a - b : 0 -> ((a > b) ? a : b) - b)
(b < a) ? a - b : 0 -> ((a > b) ? a : b) - b)
(b > a) ? 0 : a - b -> ((a > b) ? a : b) - b)
(a < b) ? 0 : a - b -> ((a > b) ? a : b) - b)
((a > b) ? b - a : 0) -> - ((a > b) ? a : b) - b)
((b < a) ? b - a : 0) -> - ((a > b) ? a : b) - b)
((b > a) ? 0 : b - a) -> - ((a > b) ? a : b) - b)
((a < b) ? 0 : b - a) -> - ((a > b) ? a : b) - b)

Patch by Yulia Koval!

Differential Revision: https://reviews.llvm.org/D41480

llvm-svn: 324255
2018-02-05 17:53:29 +00:00
Hans Wennborg 22db17cf43 Revert r323472 "[Debug] Add dbg.value intrinsics for PHIs created during LCSSA."
This broke the Chromium build; see PR36238.

> This patch is an enhancement to propagate dbg.value information when
> Phis are created on behalf of LCSSA.  I noticed a case where a value
> carried across a loop was reported as <optimized out>.
>
> Specifically this case:
>
>   int bar(int x, int y) {
>     return x + y;
>   }
>
>   int foo(int size) {
>     int val = 0;
>     for (int i = 0; i < size; ++i) {
>       val = bar(val, i);  // Both val and i are correct
>     }
>     return val; // <optimized out>
>   }
>
> In the above case, after all of the interesting computation completes
> our value is reported as "optimized out." This change will add a
> dbg.value to correct this.
>
> This patch also moves the dbg.value insertion routine from
> LoopRotation.cpp into Local.cpp, so that we can share it in both places
> (LoopRotation and LCSSA).
>
> Patch by Matt Davis!
>
> Differential Revision: https://reviews.llvm.org/D42551

llvm-svn: 324247
2018-02-05 16:10:42 +00:00
Serguei Katkov 276b32bb14 Revert [SimplifyCFG] Relax restriction for folding unconditional branches
The patch causes the failure of the test
compiler-rt/test/profile/Linux/counter_promo_nest.c

To unblock buildbot, revert the patch while investigation is in progress.

Differential Revision: https://reviews.llvm.org/D42691

llvm-svn: 324214
2018-02-05 09:05:43 +00:00
Max Kazantsev f7667483c1 [NFC] Add tests for PR35743
llvm-svn: 324209
2018-02-05 08:09:49 +00:00
Serguei Katkov 6e93980e82 [SimplifyCFG] Relax restriction for folding unconditional branches
The commit rL308422 introduces a restriction for folding unconditional
branches. Specifically if empty block with unconditional branch leads to
header of the loop then elimination of this basic block is prohibited.
However it seems this condition is redundantly strict.
If elimination of this basic block does not introduce more back edges
then we can eliminate this block.

The patch implements this relax of restriction.

Reviewers: efriedma, mcrosier, pacxx, hsung, davidxl	
Reviewed By: pacxx
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D42691

llvm-svn: 324208
2018-02-05 07:56:43 +00:00
Serguei Katkov ec7029c286 Re-apply [SCEV] Fix isLoopEntryGuardedByCond usage
ScalarEvolution::isKnownPredicate invokes isLoopEntryGuardedByCond without check
that SCEV is available at entry point of the loop. It is incorrect and fixed by patch.

To bugs additionally fixed:
assert is moved after the check whether loop is not a nullptr.
Usage of isLoopEntryGuardedByCond in ScalarEvolution::isImpliedCondOperandsViaNoOverflow
is guarded by isAvailableAtLoopEntry.

Reviewers: sanjoy, mkazantsev, anna, dorit, reames
Reviewed By: mkazantsev
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D42417

llvm-svn: 324204
2018-02-05 05:49:47 +00:00
Florian Hahn 642637aab4 [PartialInliner] Update test (NFC).
llvm-svn: 324199
2018-02-04 18:40:24 +00:00
Florian Hahn 8f804fc07d [InlineFunction] Set arg attrs even if there only are VarArg attrs.
When using the partial inliner, we might have attributes for forwarded
varargs, but the CodeExtractor does not create an empty argument
attribute set for regular arguments in that case, because it does not know
of the additional arguments. So in case we have attributes for VarArgs, we
also have to make sure we create (empty) attributes for all regular arguments.

This fixes PR36210.

llvm-svn: 324197
2018-02-04 18:27:47 +00:00
Chad Rosier a097bc69df [LV] Use Demanded Bits and ValueTracking for reduction type-shrinking
The type-shrinking logic in reduction detection, although narrow in scope, is
also rather ad-hoc, which has led to bugs (e.g., PR35734). This patch modifies
the approach to rely on the demanded bits and value tracking analyses, if
available. We currently perform type-shrinking separately for reductions and
other instructions in the loop. Long-term, we should probably think about
computing minimal bit widths in a more complete way for the loops we want to
vectorize.

PR35734
Differential Revision: https://reviews.llvm.org/D42309

llvm-svn: 324195
2018-02-04 15:42:24 +00:00
David Green 9688ed61fe Remove unneeded -debug argument from new test
llvm-svn: 324176
2018-02-03 17:33:50 +00:00
David Green 7174023f57 [InstCombine] Allow common type conversions to i8/i16/i32
This, in instcombine, allows conversions to i8/i16/i32 (very
common cases) even if the resulting type is not legal according
to the data layout. This can often open up extra combine
opportunities.

Differential Revision: https://reviews.llvm.org/D42424

llvm-svn: 324174
2018-02-03 16:51:03 +00:00
Sanjay Patel a767ee5af0 [InstCombine] make sure tests are providing coverage for the stated pattern; NFC
Without extra instructions and uses, swapMayExposeCSEOpportunities() would change
the icmp (as seen in the check lines), so we were not actually testing patterns 
that should be handled by D41480.

llvm-svn: 324143
2018-02-02 21:40:54 +00:00
Sanjay Patel 5b8cb26bcc [InstCombine] add baseline tests for unsigned saturated sub (D41480); NFC
llvm-svn: 324109
2018-02-02 17:43:16 +00:00
Yaxun Liu 2a22c5deff [AMDGPU] Switch to the new addr space mapping by default
This requires corresponding clang change.

Differential Revision: https://reviews.llvm.org/D40955

llvm-svn: 324101
2018-02-02 16:07:16 +00:00
Sanjay Patel 3343fcef86 [InstCombine] allow multi-use values in canEvaluate* if all uses are in 1 inst
This is the enhancement suggested in D42536 to fix a shortcoming in 
regular InstCombine's canEvaluate* functionality.
When we have multiple uses of a value, but they're all in one instruction, we can 
allow that expression to be narrowed or widened for the same cost as a single-use 
value.

AFAICT, this can only matter for multiply: sub/and/or/xor/select would be simplified 
away if the operands are the same value; add becomes shl; shifts with a variable shift 
amount aren't handled.

Differential Revision: https://reviews.llvm.org/D42739

llvm-svn: 324014
2018-02-01 21:55:53 +00:00
David Green 184df0c35d Revert commit rL323951
Looks like it's causing timeouts out on at least ppc64le
buildbots.

llvm-svn: 323959
2018-02-01 13:05:25 +00:00
David Green e11f0545db [InstCombine] Allow common type conversions to i8/i16/i32
This, in instcombine, allows conversions to i8/i16/i32 (very
common cases) even if the resulting type is not legal according
to the data layout. This can often open up extra combine
opportunities.

Differential Revision: https://reviews.llvm.org/D42424

llvm-svn: 323951
2018-02-01 11:06:18 +00:00
Mikael Holmen 6d06976e74 [LSR] Don't force bases of foldable formulae to the final type.
Summary:
Before emitting code for scaled registers, we prevent
SCEVExpander from hoisting any scaled addressing mode
by emitting all the bases first. However, these bases
are being forced to the final type, resulting in some
odd code.

For example, if the type of the base is an integer and
the final type is a pointer, we will emit an inttoptr
for the base, a ptrtoint for the scale, and then a
'reverse' GEP where the GEP pointer is actually the base
integer and the index is the pointer. It's more intuitive
to use the pointer as a pointer and the integer as index.

Patch by: Bevin Hansson

Reviewers: atrick, qcolombet, sanjoy

Reviewed By: qcolombet

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D42103

llvm-svn: 323946
2018-02-01 06:38:34 +00:00
Amjad Aboud b86b771c02 [AggressiveInstCombine] Fixed TruncCombine class to handle TruncInst leaf node correctly.
This covers the case where TruncInst leaf node is a constant expression.
See PR36121 for more details.

Differential Revision: https://reviews.llvm.org/D42622

llvm-svn: 323926
2018-01-31 22:39:05 +00:00
Puyan Lotfi 43e94b15ea Followup on Proposal to move MIR physical register namespace to '$' sigil.
Discussed here:

http://lists.llvm.org/pipermail/llvm-dev/2018-January/120320.html

In preparation for adding support for named vregs we are changing the sigil for
physical registers in MIR to '$' from '%'. This will prevent name clashes of
named physical register with named vregs.

llvm-svn: 323922
2018-01-31 22:04:26 +00:00
Marek Olsak 8f2df9d26c [SeparateConstOffsetFromGEP] Fix up addrspace in the AMDGPU test
llvm-svn: 323913
2018-01-31 20:49:19 +00:00
Marek Olsak 13e4741275 AMDGPU: Add intrinsics llvm.amdgcn.cvt.{pknorm.i16, pknorm.u16, pk.i16, pk.u16}
Reviewers: arsenm, nhaehnle

Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye

Differential Revision: https://reviews.llvm.org/D41663

llvm-svn: 323908
2018-01-31 20:18:04 +00:00
Marek Olsak 8e7d149a31 [SeparateConstOffsetFromGEP] Preserve metadata when splitting GEPs
Summary:
!amdgpu.uniform needs to be preserved for AMDGPU, otherwise bad things
happen.

Reviewers: arsenm, nhaehnle, jingyue, broune, majnemer, bjarke.roune, dblaikie

Subscribers: wdng, tpr, llvm-commits

Differential Revision: https://reviews.llvm.org/D42744

llvm-svn: 323907
2018-01-31 20:17:52 +00:00
Daniel Neilson be58a220e9 [CodeGenPrepare] Improve source and dest alignments of memory intrinsics independently
Summary:
  This change is part of step five in the series of changes to remove alignment argument from
memcpy/memmove/memset in favour of alignment attributes. In particular, this changes the
CodeGenPrepare pass to be more aggressive in improving the source and destination alignments
of memcpy/memmove/memset by exploiting our new ability to record independent alignments
for each argument.

Steps:
Step 1) Remove alignment parameter and create alignment parameter attributes for
memcpy/memmove/memset. ( rL322965, rC322964, rL322963 )
Step 2) Expand the IRBuilder API to allow creation of memcpy/memmove with differing
source and dest alignments. ( rL323597 )
Step 3) Update Clang to use the new IRBuilder API. ( rC323617 )
Step 4) Update Polly to use the new IRBuilder API. ( rL323618 )
Step 5) Update LLVM passes that create memcpy/memmove calls to use the new IRBuilder API,
and those that use use MemIntrinsicInst::[get|set]Alignment() to use [get|set]DestAlignment()
and [get|set]SourceAlignment() instead. ( rL323886 )
Step 6) Remove the single-alignment IRBuilder API for memcpy/memmove, and the
MemIntrinsicInst::[get|set]Alignment() methods.

Reference
   http://lists.llvm.org/pipermail/llvm-dev/2015-August/089384.html
   http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20151109/312083.html

llvm-svn: 323891
2018-01-31 17:24:53 +00:00
Sanjay Patel fd58ade81c [InstCombine] move related tests into the same file; NFC
llvm-svn: 323882
2018-01-31 15:47:59 +00:00
Sanjay Patel 8c74a9a155 [InstCombine] add tests to show limit of canEvaluate* ; NFC
llvm-svn: 323881
2018-01-31 15:28:39 +00:00
Amjad Aboud d895bff5f2 [AggressiveInstCombine] Make TruncCombine class ignore unreachable basic blocks.
Because dead code may contain non-standard IR that causes infinite looping or crashes in underlying analysis.
See PR36134 for more details.

Differential Revision: https://reviews.llvm.org/D42683

llvm-svn: 323862
2018-01-31 10:41:31 +00:00
Alexey Bataev 1c8f53f47d [SLP] Add extra test for extractelement shuffle, NFC.
llvm-svn: 323815
2018-01-30 21:06:06 +00:00
Sanjay Patel ffb37a29d1 [LoopStrengthReduce] add test to show potential macro-fusion-based diff (PR35681); NFC
This is the baseline output for the test proposed with D42607.

llvm-svn: 323806
2018-01-30 19:17:38 +00:00
Simon Pilgrim 073f089c6e [X86][XOP] Update isVectorShiftByScalarCheap with cases covered by XOP
Similar to D42437, XOP supports variable shift for v16i8/v8i16/v4i32/v2i64 types.

Differential Revision: https://reviews.llvm.org/D42526

llvm-svn: 323797
2018-01-30 18:10:21 +00:00
Petar Jovanovic 9208e8fbf6 [DeadArgumentElimination] Preserve llvm.dbg.values's first argument
When removing return value Dead Argument Elimination pass clobbers first
llvm.dbg.value’s argument for live arguments of that function by replacing
it with nullptr. In the next pass it will be deleted, so debug location
about those arguments are lost. This change fixes it.

Patch by Djordje Todorovic.

Differential Revision: https://reviews.llvm.org/D42541

llvm-svn: 323784
2018-01-30 16:42:04 +00:00
Zaara Syeda 1f59ae311b Re-commit : [PowerPC] Add handling for ColdCC calling convention and a pass to mark
candidates with coldcc attribute.

This recommits r322721 reverted due to sanitizer memory leak build bot failures.

Original commit message:
This patch adds support for the coldcc calling convention for Power.
This changes the set of non-volatile registers. It includes a pass to stress
test the implementation by marking all static directly called functions with
the coldcc attribute through the option -enable-coldcc-stress-test. It also
includes an option, -ppc-enable-coldcc, to add the coldcc attribute to
functions which are cold at all call sites based on BlockFrequencyInfo when
the containing function does not call any non cold functions.

Differential Revision: https://reviews.llvm.org/D38413

llvm-svn: 323778
2018-01-30 16:17:22 +00:00
Daniel Neilson 594f443b06 [RS4GC] Handle call/invoke instructions as base defining values of vectors
Summary:
 There's an asymmetry in the definitions of findBaseDefiningValueOfVector() and
findBaseDefiningValue() of RS4GC. The later handles call and invoke instructions,
and the former does not. This appears to be simple oversight. This patch remedies
the oversight by adding the call and invoke cases to findBaseDefiningValueOfVector().

Reviewers: DaniilSuchkov, anna

Reviewed By: anna

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D42653

llvm-svn: 323764
2018-01-30 14:43:41 +00:00
Sanjay Patel 1aef27f5cd [DSE] make sure memory is not modified before partial store merging (PR36129)
We missed a critical check in D30703. We must make sure that no intermediate 
store is sitting between the stores that we want to merge.

This should fix:
https://bugs.llvm.org/show_bug.cgi?id=36129

Differential Revision: https://reviews.llvm.org/D42663

llvm-svn: 323759
2018-01-30 13:53:59 +00:00
Sanjay Patel 83f056604c [InstSimplify] (X * Y) / Y --> X for relaxed floating-point ops
This is the FP counterpart that was mentioned in PR35709:
https://bugs.llvm.org/show_bug.cgi?id=35709

Differential Revision: https://reviews.llvm.org/D42385

llvm-svn: 323716
2018-01-30 00:18:37 +00:00
Sanjay Patel d023a9b777 [DSE] add test for PR36129; NFC
We can miscompile because we're not checking is the memory might
me modified between the seemingly redundant store ops.

llvm-svn: 323704
2018-01-29 22:50:08 +00:00
Alexey Bataev 9c5c103283 [SLP] Fix for PR32086: Count InsertElementInstr of the same elements as shuffle.
Summary:
If the same value is going to be vectorized several times in the same
tree entry, this entry is considered to be a gather entry and cost of
this gather is counter as cost of InsertElementInstrs for each gathered
value. But we can consider these elements as ShuffleInstr with
SK_PermuteSingle shuffle kind.

Reviewers: spatel, RKSimon, mkuper, hfinkel

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D38697

llvm-svn: 323662
2018-01-29 16:08:52 +00:00
Alexey Bataev 10f5c9e765 [SLP] Add a test with extract for PR32086, NFC.
llvm-svn: 323661
2018-01-29 15:56:52 +00:00
Davide Italiano 8b797a0fd2 [CVP] Don't Replace incoming values from unreachable blocks with undef.
This pretty much reverts r322006, except that we keep the test,
because we work around the issue exposed in a different way (a
recursion limit in value tracking). There's still probably some
sequence that exposes this problem, and the proper way to fix that
for somebody who has time is outlined in the code review.

llvm-svn: 323630
2018-01-29 05:59:55 +00:00
Hiroshi Inoue c8e9245816 [NFC] fix trivial typos in comments and documents
"to to" -> "to"

llvm-svn: 323628
2018-01-29 05:17:03 +00:00
Florian Hahn 1636651e35 [InlineCost] Mark functions accessing varargs as not viable.
This prevents functions accessing varargs from being inlined if they
have the alwaysinline attribute.

Reviewers: efriedma, rnk, davide

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D42556

llvm-svn: 323619
2018-01-28 19:11:49 +00:00
Alexey Bataev f86be12182 Revert "[SLP] Fix for PR32086: Count InsertElementInstr of the same elements as shuffle."
This reverts commit r323530 to fix possible problems in users code.

llvm-svn: 323581
2018-01-27 02:42:21 +00:00
Sanjay Patel 5bce08ddff [x86] auto-generate complete checks; NFC
llvm-svn: 323571
2018-01-26 22:06:07 +00:00
Vedant Kumar e48597a50e [InstCombine] Preserve debug values for eliminable casts
A cast from A to B is eliminable if its result is casted to C, and if
the pair of casts could just be expressed as a single cast. E.g here,
%c1 is eliminable:

  %c1 = zext i16 %A to i32
  %c2 = sext i32 %c1 to i64

InstCombine optimizes away eliminable casts. This patch teaches it to
insert a dbg.value intrinsic pointing to the final result, so that local
variables pointing to the eliminable result are preserved.

Differential Revision: https://reviews.llvm.org/D42566

llvm-svn: 323570
2018-01-26 22:02:52 +00:00
Alexey Bataev 7ad4e31c3b [SLP] Test for trunc vectorization, NFC.
llvm-svn: 323556
2018-01-26 20:07:55 +00:00
Alexey Bataev 167003df28 [SLP] Fix for PR32086: Count InsertElementInstr of the same elements as shuffle.
Summary:
If the same value is going to be vectorized several times in the same
tree entry, this entry is considered to be a gather entry and cost of
this gather is counter as cost of InsertElementInstrs for each gathered
value. But we can consider these elements as ShuffleInstr with
SK_PermuteSingle shuffle kind.

Reviewers: spatel, RKSimon, mkuper, hfinkel

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D38697

llvm-svn: 323530
2018-01-26 14:31:09 +00:00
Florian Hahn 212afb9fd9 [CallSiteSplitting] Fix infinite loop when recording conditions.
Fix infinite loop when recording conditions by correctly marking basic
blocks as visited.

Fixes https://bugs.llvm.org/show_bug.cgi?id=36105

llvm-svn: 323515
2018-01-26 10:36:50 +00:00
Vedant Kumar 6394df9fc4 [Debug] LCSSA: Insert dbg.value at the first available insertion point
Inserting a dbg.value instruction at the start of a basic block with a
landingpad instruction triggers a verifier failure. We should be OK if
we insert the instruction a bit later.

Speculative fix for the bot failure described here:
https://reviews.llvm.org/D42551

llvm-svn: 323482
2018-01-25 23:48:29 +00:00
Vedant Kumar 60f54084bf [Debug] Add dbg.value intrinsics for PHIs created during LCSSA.
This patch is an enhancement to propagate dbg.value information when
Phis are created on behalf of LCSSA.  I noticed a case where a value
carried across a loop was reported as <optimized out>.

Specifically this case:

  int bar(int x, int y) {
    return x + y;
  }

  int foo(int size) {
    int val = 0;
    for (int i = 0; i < size; ++i) {
      val = bar(val, i);  // Both val and i are correct
    }
    return val; // <optimized out>
  }

In the above case, after all of the interesting computation completes
our value is reported as "optimized out." This change will add a
dbg.value to correct this.

This patch also moves the dbg.value insertion routine from
LoopRotation.cpp into Local.cpp, so that we can share it in both places
(LoopRotation and LCSSA).

Patch by Matt Davis!

Differential Revision: https://reviews.llvm.org/D42551

llvm-svn: 323472
2018-01-25 21:37:07 +00:00
Alexey Bataev 102d4b59f9 Revert "[SLP] Fix for PR32086: Count InsertElementInstr of the same elements as shuffle."
This reverts commit r323441 to fix buildbots.

llvm-svn: 323447
2018-01-25 17:28:12 +00:00
Alexey Bataev c8cfa14b6d [SLP] Fix for PR32086: Count InsertElementInstr of the same elements as shuffle.
Summary:
If the same value is going to be vectorized several times in the same
tree entry, this entry is considered to be a gather entry and cost of
this gather is counter as cost of InsertElementInstrs for each gathered
value. But we can consider these elements as ShuffleInstr with
SK_PermuteSingle shuffle kind.

Reviewers: spatel, RKSimon, mkuper, hfinkel

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D38697

llvm-svn: 323441
2018-01-25 16:45:18 +00:00
Sanjay Patel 1d68112c4b [InstCombine] narrow masked zexted binops (PR35792)
This is guarded by shouldChangeType(), so the tests show that
we don't do the fold if the narrower type is not legal. Note
that there is a proposal (D42424) that would change the results
for the specific cases shown in these tests. That difference is
also discussed in PR35792:
https://bugs.llvm.org/show_bug.cgi?id=35792

Alive proofs for the cases handled here as well as the bitwise 
logic binops that we should already do better on:
https://rise4fun.com/Alive/c97
https://rise4fun.com/Alive/Lc5E
https://rise4fun.com/Alive/kdf

llvm-svn: 323437
2018-01-25 16:34:36 +00:00
Sanjay Patel 0f95dd234d [InstCombine] add tests for PR35792; NFC
llvm-svn: 323436
2018-01-25 16:03:44 +00:00
Alexey Bataev a0b2c78efc Revert "[SLP] Fix for PR32086: Count InsertElementInstr of the same elements as shuffle."
This reverts commit r323430 to fix buildbots.

llvm-svn: 323432
2018-01-25 15:20:29 +00:00
Alexey Bataev ad51fe3644 [SLP] Fix for PR32086: Count InsertElementInstr of the same elements as shuffle.
Summary:
If the same value is going to be vectorized several times in the same
tree entry, this entry is considered to be a gather entry and cost of
this gather is counter as cost of InsertElementInstrs for each gathered
value. But we can consider these elements as ShuffleInstr with
SK_PermuteSingle shuffle kind.

Reviewers: spatel, RKSimon, mkuper, hfinkel

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D38697

llvm-svn: 323430
2018-01-25 15:01:36 +00:00
Amjad Aboud f1f57a3137 Another try to commit 323321 (aggressive instruction combine).
llvm-svn: 323416
2018-01-25 12:06:32 +00:00
Sanjay Patel 60c13c7712 [InstCombine] fix datalayout in test file
The only part of the datalayout that should matter for these tests
is the part that specifies the legal int widths ('n*'). But there
was a bug - that part of the string was not correctly separated with
the expected '-' character, so we were testing as if there were no
legal int widths at all. Removed the leading cruft so we have some 
legal ints to test with.

I noticed this while testing a potential change to the way we 
transform shifts and sexts in D42424.

llvm-svn: 323377
2018-01-24 21:36:45 +00:00
Alexey Bataev 0affccc8d7 Revert "[SLP] Fix for PR32086: Count InsertElementInstr of the same elements as shuffle."
This reverts commit r323348 because of the broken buildbots.

llvm-svn: 323359
2018-01-24 18:36:51 +00:00
Nicolai Haehnle 4afb64e4c6 Revert r321751, "StructurizeCFG: Fix broken backedge detection"
It causes regressions in various OpenGL test suites.

Keep the test cases introduced by r321751 as XFAIL, and add a test case
for the regression.

Change-Id: I90b4cc354f68cebe5fcef1f2422dc8fe1c6d3514
Bugzilla: https://bugs.llvm.org/show_bug.cgi?id=36015
llvm-svn: 323355
2018-01-24 18:02:05 +00:00
Alexey Bataev 4bd8e5332f [SLP] Fix for PR32086: Count InsertElementInstr of the same elements as shuffle.
Summary:
If the same value is going to be vectorized several times in the same
tree entry, this entry is considered to be a gather entry and cost of
this gather is counter as cost of InsertElementInstrs for each gathered
value. But we can consider these elements as ShuffleInstr with
SK_PermuteSingle shuffle kind.

Reviewers: spatel, RKSimon, mkuper, hfinkel

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D38697

llvm-svn: 323348
2018-01-24 17:50:53 +00:00
Zvi Rackover 51f0d64b9c InstSimplify: If divisor element is undef simplify to undef
Summary:
If any vector divisor element is undef, we can arbitrarily choose it be
zero which would make the div/rem an undef value by definition.

Reviewers: spatel, reames

Reviewed By: spatel

Subscribers: magabari, llvm-commits

Differential Revision: https://reviews.llvm.org/D42485

llvm-svn: 323343
2018-01-24 17:22:00 +00:00
Simon Pilgrim f15886eb30 Regenerate shuffle sink test
llvm-svn: 323328
2018-01-24 14:59:02 +00:00
Amjad Aboud d53504e379 Reverted 323321.
llvm-svn: 323326
2018-01-24 14:48:49 +00:00
Amjad Aboud e4453233d7 [InstCombine] Introducing Aggressive Instruction Combine pass (-aggressive-instcombine).
Combine expression patterns to form expressions with fewer, simple instructions.
This pass does not modify the CFG.

For example, this pass reduce width of expressions post-dominated by TruncInst
into smaller width when applicable.

It differs from instcombine pass in that it contains pattern optimization that
requires higher complexity than the O(1), thus, it should run fewer times than
instcombine pass.

Differential Revision: https://reviews.llvm.org/D38313

llvm-svn: 323321
2018-01-24 12:42:42 +00:00
Max Kazantsev 0f720e1296 [NFC] Remove overconfident assert from IRCE
This patch removes assert that SCEV is able to prove that a value is
non-negative. In fact, SCEV can sometimes be unable to do this because
its cache does not update properly. This assert will be returned once this
problem is resolved.

llvm-svn: 323309
2018-01-24 07:51:41 +00:00
Hiroshi Inoue 501931b117 [NFC] fix trivial typos in comments
"the the" -> "the"

llvm-svn: 323302
2018-01-24 05:04:35 +00:00
Zvi Rackover b5447b1e7c X86: Update isVectorShiftByScalarCheap with cases covered by AVX512BW
Summary:
AVX512BW adds support for variable shift amount for 16-bit element
vectors.

Reviewers: craig.topper, RKSimon, spatel

Reviewed By: RKSimon

Subscribers: rengolin, tschuett, llvm-commits

Differential Revision: https://reviews.llvm.org/D42437

llvm-svn: 323292
2018-01-24 01:36:40 +00:00
Sanjay Patel c4ed9ed276 [SLPVectorizer] add test for PR13837; NFC
This was probably fixed long ago, but I don't see a test
that lines up with the example and target in the bug report:
https://bugs.llvm.org/show_bug.cgi?id=13837
...so adding it here.

llvm-svn: 323269
2018-01-23 22:04:17 +00:00
Simon Pilgrim 67b21313ae Add bdver shuffle sink tests.
llvm-svn: 323268
2018-01-23 22:03:57 +00:00
Volkan Keles dc40be75f8 [llvm-extract] Support extracting basic blocks
Summary:
Currently, there is no way to extract a basic block from a function easily. This patch
extends llvm-extract to extract the specified basic block(s).

Reviewers: loladiro, rafael, bogner

Reviewed By: bogner

Subscribers: hintonda, mgorny, qcolombet, llvm-commits

Differential Revision: https://reviews.llvm.org/D41638

llvm-svn: 323266
2018-01-23 21:51:34 +00:00
Simon Pilgrim bf3c39877e Regenerate select test. NFCI.
llvm-svn: 323265
2018-01-23 21:50:46 +00:00
Simon Pilgrim a075ce04d8 Regenerate shuffle sink test. NFCI.
llvm-svn: 323264
2018-01-23 21:50:11 +00:00
Alexey Bataev 4f74a31c0e Revert "[SLP] Fix for PR32086: Count InsertElementInstr of the same elements as shuffle."
This reverts commit r323246 because of the broken buildbots.

llvm-svn: 323252
2018-01-23 20:11:27 +00:00
Alexey Bataev 6719e2418c [SLP] Fix for PR32086: Count InsertElementInstr of the same elements as shuffle.
Summary:
If the same value is going to be vectorized several times in the same
tree entry, this entry is considered to be a gather entry and cost of
this gather is counter as cost of InsertElementInstrs for each gathered
value. But we can consider these elements as ShuffleInstr with
SK_PermuteSingle shuffle kind.

Reviewers: spatel, RKSimon, mkuper, hfinkel

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D38697

llvm-svn: 323246
2018-01-23 19:30:26 +00:00
Zvi Rackover 87937e8ed2 X86 Tests: Add AVX512BW config to CodeGenPrepare test. NFC
Case points out that we don't consider shifts supported by AVX512BW
in isVectorShiftByScalarCheap()

llvm-svn: 323242
2018-01-23 19:20:39 +00:00
Serguei Katkov 17e5794f11 [CGP] Fix the GV handling in complex addressing mode
If in complex addressing mode the difference is in GV then
base reg should not be installed because we plan to use
base reg as a merge point of different GVs.

This is a fix for PR35980.

Reviewers: reames, john.brawn, santosh
Reviewed By: john.brawn
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D42230

llvm-svn: 323192
2018-01-23 12:07:49 +00:00
Anton Bikineev 82f61151b3 [InstSimplify] (X << Y) % X -> 0
llvm-svn: 323182
2018-01-23 09:27:47 +00:00
Chandler Carruth c58f2166ab Introduce the "retpoline" x86 mitigation technique for variant #2 of the speculative execution vulnerabilities disclosed today, specifically identified by CVE-2017-5715, "Branch Target Injection", and is one of the two halves to Spectre..
Summary:
First, we need to explain the core of the vulnerability. Note that this
is a very incomplete description, please see the Project Zero blog post
for details:
https://googleprojectzero.blogspot.com/2018/01/reading-privileged-memory-with-side.html

The basis for branch target injection is to direct speculative execution
of the processor to some "gadget" of executable code by poisoning the
prediction of indirect branches with the address of that gadget. The
gadget in turn contains an operation that provides a side channel for
reading data. Most commonly, this will look like a load of secret data
followed by a branch on the loaded value and then a load of some
predictable cache line. The attacker then uses timing of the processors
cache to determine which direction the branch took *in the speculative
execution*, and in turn what one bit of the loaded value was. Due to the
nature of these timing side channels and the branch predictor on Intel
processors, this allows an attacker to leak data only accessible to
a privileged domain (like the kernel) back into an unprivileged domain.

The goal is simple: avoid generating code which contains an indirect
branch that could have its prediction poisoned by an attacker. In many
cases, the compiler can simply use directed conditional branches and
a small search tree. LLVM already has support for lowering switches in
this way and the first step of this patch is to disable jump-table
lowering of switches and introduce a pass to rewrite explicit indirectbr
sequences into a switch over integers.

However, there is no fully general alternative to indirect calls. We
introduce a new construct we call a "retpoline" to implement indirect
calls in a non-speculatable way. It can be thought of loosely as
a trampoline for indirect calls which uses the RET instruction on x86.
Further, we arrange for a specific call->ret sequence which ensures the
processor predicts the return to go to a controlled, known location. The
retpoline then "smashes" the return address pushed onto the stack by the
call with the desired target of the original indirect call. The result
is a predicted return to the next instruction after a call (which can be
used to trap speculative execution within an infinite loop) and an
actual indirect branch to an arbitrary address.

On 64-bit x86 ABIs, this is especially easily done in the compiler by
using a guaranteed scratch register to pass the target into this device.
For 32-bit ABIs there isn't a guaranteed scratch register and so several
different retpoline variants are introduced to use a scratch register if
one is available in the calling convention and to otherwise use direct
stack push/pop sequences to pass the target address.

This "retpoline" mitigation is fully described in the following blog
post: https://support.google.com/faqs/answer/7625886

We also support a target feature that disables emission of the retpoline
thunk by the compiler to allow for custom thunks if users want them.
These are particularly useful in environments like kernels that
routinely do hot-patching on boot and want to hot-patch their thunk to
different code sequences. They can write this custom thunk and use
`-mretpoline-external-thunk` *in addition* to `-mretpoline`. In this
case, on x86-64 thu thunk names must be:
```
  __llvm_external_retpoline_r11
```
or on 32-bit:
```
  __llvm_external_retpoline_eax
  __llvm_external_retpoline_ecx
  __llvm_external_retpoline_edx
  __llvm_external_retpoline_push
```
And the target of the retpoline is passed in the named register, or in
the case of the `push` suffix on the top of the stack via a `pushl`
instruction.

There is one other important source of indirect branches in x86 ELF
binaries: the PLT. These patches also include support for LLD to
generate PLT entries that perform a retpoline-style indirection.

The only other indirect branches remaining that we are aware of are from
precompiled runtimes (such as crt0.o and similar). The ones we have
found are not really attackable, and so we have not focused on them
here, but eventually these runtimes should also be replicated for
retpoline-ed configurations for completeness.

For kernels or other freestanding or fully static executables, the
compiler switch `-mretpoline` is sufficient to fully mitigate this
particular attack. For dynamic executables, you must compile *all*
libraries with `-mretpoline` and additionally link the dynamic
executable and all shared libraries with LLD and pass `-z retpolineplt`
(or use similar functionality from some other linker). We strongly
recommend also using `-z now` as non-lazy binding allows the
retpoline-mitigated PLT to be substantially smaller.

When manually apply similar transformations to `-mretpoline` to the
Linux kernel we observed very small performance hits to applications
running typical workloads, and relatively minor hits (approximately 2%)
even for extremely syscall-heavy applications. This is largely due to
the small number of indirect branches that occur in performance
sensitive paths of the kernel.

When using these patches on statically linked applications, especially
C++ applications, you should expect to see a much more dramatic
performance hit. For microbenchmarks that are switch, indirect-, or
virtual-call heavy we have seen overheads ranging from 10% to 50%.

However, real-world workloads exhibit substantially lower performance
impact. Notably, techniques such as PGO and ThinLTO dramatically reduce
the impact of hot indirect calls (by speculatively promoting them to
direct calls) and allow optimized search trees to be used to lower
switches. If you need to deploy these techniques in C++ applications, we
*strongly* recommend that you ensure all hot call targets are statically
linked (avoiding PLT indirection) and use both PGO and ThinLTO. Well
tuned servers using all of these techniques saw 5% - 10% overhead from
the use of retpoline.

We will add detailed documentation covering these components in
subsequent patches, but wanted to make the core functionality available
as soon as possible. Happy for more code review, but we'd really like to
get these patches landed and backported ASAP for obvious reasons. We're
planning to backport this to both 6.0 and 5.0 release streams and get
a 5.0 release with just this cherry picked ASAP for distros and vendors.

This patch is the work of a number of people over the past month: Eric, Reid,
Rui, and myself. I'm mailing it out as a single commit due to the time
sensitive nature of landing this and the need to backport it. Huge thanks to
everyone who helped out here, and everyone at Intel who helped out in
discussions about how to craft this. Also, credit goes to Paul Turner (at
Google, but not an LLVM contributor) for much of the underlying retpoline
design.

Reviewers: echristo, rnk, ruiu, craig.topper, DavidKreitzer

Subscribers: sanjoy, emaste, mcrosier, mgorny, mehdi_amini, hiraditya, llvm-commits

Differential Revision: https://reviews.llvm.org/D41723

llvm-svn: 323155
2018-01-22 22:05:25 +00:00
Serguei Katkov f38041dc3e Revert [SCEV] Fix isLoopEntryGuardedByCond usage
It causes buildbot failures. New added assert is fired.
It seems not all usages of isLoopEntryGuardedByCond are fixed.

llvm-svn: 323079
2018-01-22 07:47:02 +00:00
Serguei Katkov 50714a1cbc [SCEV] Fix isLoopEntryGuardedByCond usage
ScalarEvolution::isKnownPredicate invokes isLoopEntryGuardedByCond without check
that SCEV is available at entry point of the loop. It is incorrect and fixed by patch.

Reviewers: sanjoy, mkazantsev, anna, dorit
Reviewed By: mkazantsev
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D42165

llvm-svn: 323077
2018-01-22 07:31:41 +00:00
Sanjay Patel 9530f18864 [InstCombine] (X << Y) / X -> 1 << Y
...when the shift is known to not overflow with the matching
signed-ness of the division.

This closes an optimization gap caused by canonicalizing mul
by power-of-2 to shl as shown in PR35709:
https://bugs.llvm.org/show_bug.cgi?id=35709

Patch by Anton Bikineev!

Differential Revision: https://reviews.llvm.org/D42032

llvm-svn: 323068
2018-01-21 16:14:51 +00:00
Sanjay Patel 7a44e4d594 [InstSimplify] add baseline tests for (X << Y) % X -> 0; NFC
This is the 'rem' counterpart to D42032 and would be folded by
D42341.

Patch by Anton Bikineev.

Differential Revision: https://reviews.llvm.org/D42342

llvm-svn: 323067
2018-01-21 15:36:15 +00:00
Sanjay Patel 439132185d [InstCombine] add baseline tests for (X << Y) / X -> 1 << Y; NFC
This fold is proposed in D42032.

llvm-svn: 323043
2018-01-20 16:13:40 +00:00
Craig Topper 0d797a34d8 [X86] Add support for passing 'prefer-vector-width' function attribute into X86Subtarget and exposing via X86's getRegisterWidth TTI interface.
This will cause the vectorizers to do some limiting of the vector widths they create. This is not a strict limit. There are reasons I know of that the loop vectorizer will generate larger vectors for.

I've written this in such a way that the interface will only return a properly supported width(0/128/256/512) even if the attribute says something funny like 384 or 10.

This has been split from D41895 with the remainder in a follow up commit.

llvm-svn: 323015
2018-01-20 00:26:08 +00:00
Akira Hatanaka 73ceb50d85 [ObjCARC] Do not turn a call to @objc_autoreleaseReturnValue into a call
to @objc_autorelease if its operand is a PHI and the PHI has an
equivalent value that is used by a return instruction.

For example, ARC optimizer shouldn't replace the call in the following
example, as doing so breaks the AutoreleaseRV/RetainRV optimization:

  %v1 = bitcast i32* %v0 to i8*
  br label %bb3
bb2:
  %v3 = bitcast i32* %v2 to i8*
  br label %bb3
bb3:
  %p = phi i8* [ %v1, %bb1 ], [ %v3, %bb2 ]
  %retval = phi i32* [ %v0, %bb1 ], [ %v2, %bb2 ] ; equivalent to %p
  %v4 = tail call i8* @objc_autoreleaseReturnValue(i8* %p)
  ret i32* %retval

Also, make sure ObjCARCContract replaces @objc_autoreleaseReturnValue's
operand uses with its value so that the call gets tail-called.

rdar://problem/15894705

llvm-svn: 323009
2018-01-19 23:51:13 +00:00
Jakub Kuderski d2e371f046 [Dominators] Visit affected node candidates found at different root levels
Summary:
This patch attempts to fix the DomTree incremental insertion bug found here [[ https://bugs.llvm.org/show_bug.cgi?id=35969 | PR35969 ]] .

When performing an insertion into a piece of unreachable CFG, we may find the same not at different levels. When this happens, the node can turn out to be affected when we find it starting from a node with a lower level in the tree. The level at which we start visitation affects if we consider a node affected or not.

This patch tracks the lowest level at which each node was visited during insertion and allows it to be visited multiple times, if it can cause it to be considered affected.

Reviewers: brzycki, davide, dberlin, grosser

Reviewed By: brzycki

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D42231

llvm-svn: 322993
2018-01-19 21:27:24 +00:00
Daniel Neilson 1e68724d24 Remove alignment argument from memcpy/memmove/memset in favour of alignment attributes (Step 1)
Summary:
 This is a resurrection of work first proposed and discussed in Aug 2015:
   http://lists.llvm.org/pipermail/llvm-dev/2015-August/089384.html
and initially landed (but then backed out) in Nov 2015:
   http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20151109/312083.html

 The @llvm.memcpy/memmove/memset intrinsics currently have an explicit argument
which is required to be a constant integer. It represents the alignment of the
dest (and source), and so must be the minimum of the actual alignment of the
two.

 This change is the first in a series that allows source and dest to each
have their own alignments by using the alignment attribute on their arguments.

 In this change we:
1) Remove the alignment argument.
2) Add alignment attributes to the source & dest arguments. We, temporarily,
   require that the alignments for source & dest be equal.

 For example, code which used to read:
  call void @llvm.memcpy.p0i8.p0i8.i32(i8* %dest, i8* %src, i32 100, i32 4, i1 false)
will now read
  call void @llvm.memcpy.p0i8.p0i8.i32(i8* align 4 %dest, i8* align 4 %src, i32 100, i1 false)

 Downstream users may have to update their lit tests that check for
@llvm.memcpy/memmove/memset call/declaration patterns. The following extended sed script
may help with updating the majority of your tests, but it does not catch all possible
patterns so some manual checking and updating will be required.

s~declare void @llvm\.mem(set|cpy|move)\.p([^(]*)\((.*), i32, i1\)~declare void @llvm.mem\1.p\2(\3, i1)~g
s~call void @llvm\.memset\.p([^(]*)i8\(i8([^*]*)\* (.*), i8 (.*), i8 (.*), i32 [01], i1 ([^)]*)\)~call void @llvm.memset.p\1i8(i8\2* \3, i8 \4, i8 \5, i1 \6)~g
s~call void @llvm\.memset\.p([^(]*)i16\(i8([^*]*)\* (.*), i8 (.*), i16 (.*), i32 [01], i1 ([^)]*)\)~call void @llvm.memset.p\1i16(i8\2* \3, i8 \4, i16 \5, i1 \6)~g
s~call void @llvm\.memset\.p([^(]*)i32\(i8([^*]*)\* (.*), i8 (.*), i32 (.*), i32 [01], i1 ([^)]*)\)~call void @llvm.memset.p\1i32(i8\2* \3, i8 \4, i32 \5, i1 \6)~g
s~call void @llvm\.memset\.p([^(]*)i64\(i8([^*]*)\* (.*), i8 (.*), i64 (.*), i32 [01], i1 ([^)]*)\)~call void @llvm.memset.p\1i64(i8\2* \3, i8 \4, i64 \5, i1 \6)~g
s~call void @llvm\.memset\.p([^(]*)i128\(i8([^*]*)\* (.*), i8 (.*), i128 (.*), i32 [01], i1 ([^)]*)\)~call void @llvm.memset.p\1i128(i8\2* \3, i8 \4, i128 \5, i1 \6)~g
s~call void @llvm\.memset\.p([^(]*)i8\(i8([^*]*)\* (.*), i8 (.*), i8 (.*), i32 ([0-9]*), i1 ([^)]*)\)~call void @llvm.memset.p\1i8(i8\2* align \6 \3, i8 \4, i8 \5, i1 \7)~g
s~call void @llvm\.memset\.p([^(]*)i16\(i8([^*]*)\* (.*), i8 (.*), i16 (.*), i32 ([0-9]*), i1 ([^)]*)\)~call void @llvm.memset.p\1i16(i8\2* align \6 \3, i8 \4, i16 \5, i1 \7)~g
s~call void @llvm\.memset\.p([^(]*)i32\(i8([^*]*)\* (.*), i8 (.*), i32 (.*), i32 ([0-9]*), i1 ([^)]*)\)~call void @llvm.memset.p\1i32(i8\2* align \6 \3, i8 \4, i32 \5, i1 \7)~g
s~call void @llvm\.memset\.p([^(]*)i64\(i8([^*]*)\* (.*), i8 (.*), i64 (.*), i32 ([0-9]*), i1 ([^)]*)\)~call void @llvm.memset.p\1i64(i8\2* align \6 \3, i8 \4, i64 \5, i1 \7)~g
s~call void @llvm\.memset\.p([^(]*)i128\(i8([^*]*)\* (.*), i8 (.*), i128 (.*), i32 ([0-9]*), i1 ([^)]*)\)~call void @llvm.memset.p\1i128(i8\2* align \6 \3, i8 \4, i128 \5, i1 \7)~g
s~call void @llvm\.mem(cpy|move)\.p([^(]*)i8\(i8([^*]*)\* (.*), i8([^*]*)\* (.*), i8 (.*), i32 [01], i1 ([^)]*)\)~call void @llvm.mem\1.p\2i8(i8\3* \4, i8\5* \6, i8 \7, i1 \8)~g
s~call void @llvm\.mem(cpy|move)\.p([^(]*)i16\(i8([^*]*)\* (.*), i8([^*]*)\* (.*), i16 (.*), i32 [01], i1 ([^)]*)\)~call void @llvm.mem\1.p\2i16(i8\3* \4, i8\5* \6, i16 \7, i1 \8)~g
s~call void @llvm\.mem(cpy|move)\.p([^(]*)i32\(i8([^*]*)\* (.*), i8([^*]*)\* (.*), i32 (.*), i32 [01], i1 ([^)]*)\)~call void @llvm.mem\1.p\2i32(i8\3* \4, i8\5* \6, i32 \7, i1 \8)~g
s~call void @llvm\.mem(cpy|move)\.p([^(]*)i64\(i8([^*]*)\* (.*), i8([^*]*)\* (.*), i64 (.*), i32 [01], i1 ([^)]*)\)~call void @llvm.mem\1.p\2i64(i8\3* \4, i8\5* \6, i64 \7, i1 \8)~g
s~call void @llvm\.mem(cpy|move)\.p([^(]*)i128\(i8([^*]*)\* (.*), i8([^*]*)\* (.*), i128 (.*), i32 [01], i1 ([^)]*)\)~call void @llvm.mem\1.p\2i128(i8\3* \4, i8\5* \6, i128 \7, i1 \8)~g
s~call void @llvm\.mem(cpy|move)\.p([^(]*)i8\(i8([^*]*)\* (.*), i8([^*]*)\* (.*), i8 (.*), i32 ([0-9]*), i1 ([^)]*)\)~call void @llvm.mem\1.p\2i8(i8\3* align \8 \4, i8\5* align \8 \6, i8 \7, i1 \9)~g
s~call void @llvm\.mem(cpy|move)\.p([^(]*)i16\(i8([^*]*)\* (.*), i8([^*]*)\* (.*), i16 (.*), i32 ([0-9]*), i1 ([^)]*)\)~call void @llvm.mem\1.p\2i16(i8\3* align \8 \4, i8\5* align \8 \6, i16 \7, i1 \9)~g
s~call void @llvm\.mem(cpy|move)\.p([^(]*)i32\(i8([^*]*)\* (.*), i8([^*]*)\* (.*), i32 (.*), i32 ([0-9]*), i1 ([^)]*)\)~call void @llvm.mem\1.p\2i32(i8\3* align \8 \4, i8\5* align \8 \6, i32 \7, i1 \9)~g
s~call void @llvm\.mem(cpy|move)\.p([^(]*)i64\(i8([^*]*)\* (.*), i8([^*]*)\* (.*), i64 (.*), i32 ([0-9]*), i1 ([^)]*)\)~call void @llvm.mem\1.p\2i64(i8\3* align \8 \4, i8\5* align \8 \6, i64 \7, i1 \9)~g
s~call void @llvm\.mem(cpy|move)\.p([^(]*)i128\(i8([^*]*)\* (.*), i8([^*]*)\* (.*), i128 (.*), i32 ([0-9]*), i1 ([^)]*)\)~call void @llvm.mem\1.p\2i128(i8\3* align \8 \4, i8\5* align \8 \6, i128 \7, i1 \9)~g

 The remaining changes in the series will:
Step 2) Expand the IRBuilder API to allow creation of memcpy/memmove with differing
   source and dest alignments.
Step 3) Update Clang to use the new IRBuilder API.
Step 4) Update Polly to use the new IRBuilder API.
Step 5) Update LLVM passes that create memcpy/memmove calls to use the new IRBuilder API,
        and those that use use MemIntrinsicInst::[get|set]Alignment() to use
        getDestAlignment() and getSourceAlignment() instead.
Step 6) Remove the single-alignment IRBuilder API for memcpy/memmove, and the
        MemIntrinsicInst::[get|set]Alignment() methods.

Reviewers: pete, hfinkel, lhames, reames, bollu

Reviewed By: reames

Subscribers: niosHD, reames, jholewinski, qcolombet, jfb, sanjoy, arsenm, dschuff, dylanmckay, mehdi_amini, sdardis, nemanjai, david2050, nhaehnle, javed.absar, sbc100, jgravelle-google, eraman, aheejin, kbarton, JDevlieghere, asb, rbar, johnrusso, simoncook, jordy.potman.lists, apazos, sabuasal, llvm-commits

Differential Revision: https://reviews.llvm.org/D41675

llvm-svn: 322965
2018-01-19 17:13:12 +00:00
Alexey Bataev fa80c47c6a [SLP] Fix vectorization for tree with trunc to minimum required bit width.
Summary:
If the vectorized tree has truncate to minimum required bit width and
the vector type of the cast operation after the truncation is the same
as the vector type of the cast operands, count cost of the vector cast
operation as 0, because this cast will be later removed.
Also, if the vectorization tree root operations are integer cast operations, do not consider them as candidates for truncation. It will just create extra number of the same vector/scalar operations, which will be removed by instcombiner.

Reviewers: RKSimon, spatel, mkuper, hfinkel, mssimpso

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D41948

llvm-svn: 322946
2018-01-19 14:40:13 +00:00
John Brawn 2867bd72c0 [InstCombine] Make foldSelectOpOp able to handle two-operand getelementptr
Three (or more) operand getelementptrs could plausibly also be handled, but
handling only two-operand fits in easily with the existing BinaryOperator
handling.

Differential Revision: https://reviews.llvm.org/D39958

llvm-svn: 322930
2018-01-19 10:05:15 +00:00
Sanjay Patel a19b748f6d [InstSimplify] regenerate checks and add tests for commutes; NFC
llvm-svn: 322907
2018-01-18 23:11:24 +00:00
Alexey Bataev a2d6fe4ab4 [SLP] Fix test checks, NFC.
llvm-svn: 322865
2018-01-18 17:34:27 +00:00
Craig Topper 83b0a98902 [X86] Use vmovdqu64/vmovdqa64 for unmasked integer vector stores for consistency with loads.
Previously we used 64 for vXi64 stores and 32 for everything else. This change uses 64 for everything just like do for loads.

llvm-svn: 322820
2018-01-18 07:44:09 +00:00
Rafael Espindola 9fbc040599 Make GlobalValues with non-default visibilility dso_local.
This is similar to r322317, but for visibility. It is not as neat
because we have to special case extern_weak.

The idea is the same as the previous change, make the transition to
explicit dso_local easier for the frontends. With this they only have
to add dso_local to symbols where we need some external information to
decide if it is dso_local (like it being part of an ELF executable).

llvm-svn: 322806
2018-01-18 02:08:23 +00:00
Zaara Syeda c9dc7b451b Revert [PowerPC] This reverts commit rL322721
Failing build bots. Revert the commit now.

llvm-svn: 322748
2018-01-17 20:00:15 +00:00
Sanjay Patel 218a0b51dd [InstCombine] add baseline tests for D39958; NFC
llvm-svn: 322733
2018-01-17 19:04:18 +00:00
Zaara Syeda 8e951fd2f6 [PowerPC] Add handling for ColdCC calling convention and a pass to mark
candidates with coldcc attribute.

This patch adds support for the coldcc calling convention for Power.
This changes the set of non-volatile registers. It includes a pass to stress
test the implementation by marking all static directly called functions with
the coldcc attribute through the option -enable-coldcc-stress-test. It also
includes an option, -ppc-enable-coldcc, to add the coldcc attribute to
functions which are cold at all call sites based on BlockFrequencyInfo when
the containing function does not call any non cold functions.

Differential Revision: https://reviews.llvm.org/D38413

llvm-svn: 322721
2018-01-17 18:22:55 +00:00
Sanjay Patel aa766efd09 [InstCombine] fix demanded-bits propagation for zext/trunc
I was comparing the demanded-bits implementations between InstCombine
and TargetLowering as part of investigating questions in D42088 and
noticed that this was wrong in IR. We were losing all of the prior
known bits when we got back to the 'zext'.

llvm-svn: 322662
2018-01-17 14:39:28 +00:00
Sanjay Patel 178deccb63 [InstCombine] add test to show hole in demanded bits; NFC
llvm-svn: 322660
2018-01-17 14:27:35 +00:00
Ivan A. Kosarev 4d0ff0c74d [Transforms] Support making mutable versions of new-format TBAA access tags
Differential Revision: https://reviews.llvm.org/D41565

llvm-svn: 322650
2018-01-17 13:29:54 +00:00
Florian Hahn c6c89bffdc [CallSiteSplitting] Pass list of (BB, Conditions) pairs to splitCallSite.
This removes some duplication from splitCallSite and makes it easier to
add additional code dealing with each predecessor. It also allows us to
split for more than 2 predecessors, although that is not enabled for
now.

Reviewers: junbuml, mcrosier, davidxl, davide

Reviewed By: junbuml

Differential Revision: https://reviews.llvm.org/D41858

llvm-svn: 322599
2018-01-16 22:13:15 +00:00
Alexey Bataev 6977dbcc7b [SLP] Fix for PR32164: Improve vectorization of reverse order of extract operations.
Summary: Sometimes vectorization of insertelement instructions with extractelement operands may produce an extra shuffle operation, if these operands are in the reverse order. Patch tries to improve this situation by the reordering of the operands to remove this extra shuffle operation.

Reviewers: mkuper, hfinkel, RKSimon, spatel

Subscribers: mzolotukhin, llvm-commits

Differential Revision: https://reviews.llvm.org/D33954

llvm-svn: 322579
2018-01-16 18:17:01 +00:00
Hiroshi Inoue 99a8faa615 [SROA] fix assetion failure
This patch fixes the assertion failure in SROA reported in PR35657.
PR35657 reports the assertion failure due to r319522 (splitting for non-whole-alloca slices), but this problem can happen even without r319522.

The problem exists in a check for reusing an existing alloca when rewriting partitions. As the original comment said, we can reuse the existing alloca if the new alloca has the same type and offset with the existing one. But the code checks only type of the alloca and then check the offset using an assert.
In a corner case with out-of-bounds access (e.g. @PR35657 function added in unit test), it is possible that the two allocas have the same type but different offsets.

This patch makes the check of the offset in the if condition, and re-enables the splitting for non-whole-alloca slices.

Differential Revision: https://reviews.llvm.org/D41981

llvm-svn: 322533
2018-01-16 06:23:05 +00:00
Andrei Elovikov 7457aa0bce [LV] Don't call recordVectorLoopValueForInductionCast for newly-created IV from a trunc.
Summary:
This method is supposed to be called for IVs that have casts in their use-def
chains that are completely ignored after vectorization under PSE. However, for
truncates of such IVs the same InductionDescriptor is used during
creation/widening of both original IV based on PHINode and new IV based on
TruncInst.

This leads to unintended second call to recordVectorLoopValueForInductionCast
with a VectorLoopVal set to the newly created IV for a trunc and causes an
assert due to attempt to store new information for already existing entry in the
map. This is wrong and should not be done.

Fixes PR35773.

Reviewers: dorit, Ayal, mssimpso

Reviewed By: dorit

Subscribers: RKSimon, dim, dcaballe, hsaito, llvm-commits, hiraditya

Differential Revision: https://reviews.llvm.org/D41913

llvm-svn: 322473
2018-01-15 10:56:07 +00:00
Sanjay Patel 4158eff0f8 [InstSimplify] fold implied null ptr check (PR35790)
This extends rL322327 to handle the pointer cast and should solve:
https://bugs.llvm.org/show_bug.cgi?id=35790

Name: or_eq_zero
  %isnull = icmp eq i64* %p, null
  %x = ptrtoint i64* %p to i64
  %somebits = and i64 %x, %y
  %somebits_are_zero = icmp eq i64 %somebits, 0
  %or = or i1 %somebits_are_zero, %isnull
  =>
  %or = %somebits_are_zero

Name: and_ne_zero
  %isnotnull = icmp ne i64* %p, null
  %x = ptrtoint i64* %p to i64
  %somebits = and i64 %x, %y
  %somebits_are_not_zero = icmp ne i64 %somebits, 0
  %and = and i1 %somebits_are_not_zero, %isnotnull
  =>
  %and = %somebits_are_not_zero

https://rise4fun.com/Alive/CQ3

llvm-svn: 322439
2018-01-13 15:44:44 +00:00
Sanjay Patel 6691e40980 [InstSimplify] add tests for implied ptr cmp with null (PR35790); NFC
llvm-svn: 322411
2018-01-12 22:16:26 +00:00
Brian M. Rzycki 9b7ae23256 [JumpThreading] Preservation of DT and LVI across the pass
Summary:
See D37528 for a previous (non-deferred) version of this
patch and its description.

Preserves dominance in a deferred manner using a new class
DeferredDominance. This reduces the performance impact of
updating the DominatorTree at every edge insertion and
deletion. A user may call DDT->flush() within JumpThreading
for an up-to-date DT. This patch currently has one flush()
at the end of runImpl() to ensure DT is preserved across
the pass.

LVI is also preserved to help subsequent passes such as
CorrelatedValuePropagation. LVI is simpler to maintain and
is done immediately (not deferred). The code to perform the
preversation was minimally altered and simply marked as
preserved for the PassManager to be informed.

This extends the analysis available to JumpThreading for
future enhancements such as threading across loop headers.

Reviewers: dberlin, kuhar, sebpop

Reviewed By: kuhar, sebpop

Subscribers: mgorny, dmgreen, kuba, rnk, rsmith, hiraditya, llvm-commits

Differential Revision: https://reviews.llvm.org/D40146

llvm-svn: 322401
2018-01-12 21:06:48 +00:00
Max Kazantsev ef0576000c [IRCE][NFC] Make range check's End a non-null SCEV
Currently, IRC contains `Begin` and `Step` as SCEVs and `End` as value.
Aside from that, `End` can also be `nullptr` which can be later conditionally
converted into a non-null SCEV.

To make this logic more transparent, this patch makes `End` a SCEV and
calculates it early, so that it is never a null.

Differential Revision: https://reviews.llvm.org/D39590

llvm-svn: 322364
2018-01-12 10:00:26 +00:00
Serguei Katkov a757d65cec [LoopDeletion] Handle users in unreachable block
This is a fix for PR35884.

When we want to delete dead loop we must clean uses in unreachable blocks
otherwise we'll get an assert during deletion of instructions from the loop.

Reviewers: anna, davide
Reviewed By: anna
Subscribers: llvm-commits, lebedev.ri
Differential Revision: https://reviews.llvm.org/D41943

llvm-svn: 322357
2018-01-12 07:24:43 +00:00
Sanjay Patel 6ef6aa987c [InstSimplify] fold implied cmp with zero (PR35790)
This doesn't handle the more complicated case in the bug report yet:
https://bugs.llvm.org/show_bug.cgi?id=35790

For that, we have to match / look through a cast.

llvm-svn: 322327
2018-01-11 23:27:37 +00:00
Sanjay Patel ac0edcb3f3 [InstSimplify] add tests for implied cmp with zero (PR35790); NFC
llvm-svn: 322323
2018-01-11 22:48:07 +00:00
Rafael Espindola e4b0231c63 Make internal/private GVs implicitly dso_local.
While updating clang tests for having clang set dso_local I noticed
that:

- There are *a lot* of tests to update.
- Many of the updates are redundant.

They are redundant because a GV is "obviously dso_local". This patch
starts formalizing that a bit by requiring that internal and private
GVs be dso_local too. Since they all are, we don't have to print
dso_local to the textual representation, making it a bit more compact
and easier to read.

llvm-svn: 322317
2018-01-11 22:15:05 +00:00
Fiona Glaser efe6a84e5b [Sink] Really really fix predicate in legality check
LoadInst isn't enough; we need to include intrinsics that perform loads too.

All side-effecting intrinsics and such are already covered by the isSafe
check, so we just need to care about things that read from memory.

D41960, originally from D33179.

llvm-svn: 322311
2018-01-11 21:28:57 +00:00
Benjamin Kramer 738e6e7cb0 [InstCombine] Apply the fix from r322284 for sin / cos -> tan too
llvm-svn: 322285
2018-01-11 15:33:21 +00:00
Benjamin Kramer 44993ede60 [InstCombine] For cos/sin -> tan copy attributes from cos instead of the
parent function

Ideally we should merge the attributes from the functions somehow, but
this is obviously an improvement over taking random attributes from the
caller which will trip up the verifier if they're nonsensical for an
unary intrinsic call.

llvm-svn: 322284
2018-01-11 15:19:02 +00:00
Sanjay Patel e63d8dda5a [ValueTracking] recognize min/max-of-min/max with notted ops (PR35875)
This was originally planned as the fix for:
https://bugs.llvm.org/show_bug.cgi?id=35834
...but simpler transforms handled that case, so I implemented a 
lesser solution. It turns out we need to handle the case with 'not'
ops too because the real code example that we are trying to solve:
https://bugs.llvm.org/show_bug.cgi?id=35875
...has extra uses of the intermediate values, so we can't rely on 
smaller canonicalizations to get us to the goal.

As with rL321672, I've tried to show every possibility in the
codegen tests because that's the simplest way to prove we're doing
the right thing in the wide variety of permutations of this pattern.

We can also show an InstCombine win because we added a fold for
this case in:
rL321998 / D41603

An Alive proof for one variant of the pattern to show that the 
InstCombine and codegen results are correct:
https://rise4fun.com/Alive/vd1

Name: min3_nots
  %nx = xor i8 %x, -1
  %ny = xor i8 %y, -1
  %nz = xor i8 %z, -1
  %cmpxz = icmp slt i8 %nx, %nz
  %minxz = select i1 %cmpxz, i8 %nx, i8 %nz
  %cmpyz = icmp slt i8 %ny, %nz
  %minyz = select i1 %cmpyz, i8 %ny, i8 %nz
  %cmpyx = icmp slt i8 %y, %x
  %r = select i1 %cmpyx, i8 %minxz, i8 %minyz
=>
  %cmpxyz = icmp slt i8 %minxz, %ny
  %r = select i1 %cmpxyz, i8 %minxz, i8 %ny

Name: min3_nots_alt
  %nx = xor i8 %x, -1
  %ny = xor i8 %y, -1
  %nz = xor i8 %z, -1
  %cmpxz = icmp slt i8 %nx, %nz
  %minxz = select i1 %cmpxz, i8 %nx, i8 %nz
  %cmpyz = icmp slt i8 %ny, %nz
  %minyz = select i1 %cmpyz, i8 %ny, i8 %nz
  %cmpyx = icmp slt i8 %y, %x
  %r = select i1 %cmpyx, i8 %minxz, i8 %minyz
=>
  %xz = icmp sgt i8 %x, %z
  %maxxz = select i1 %xz, i8 %x, i8 %z
  %xyz = icmp sgt i8 %maxxz, %y
  %maxxyz = select i1 %xyz, i8 %maxxz, i8 %y
  %r = xor i8 %maxxyz, -1

llvm-svn: 322283
2018-01-11 15:13:47 +00:00
Sanjay Patel e0df4650f8 [InstCombine] add min3-with-nots test (PR35875); NFC
llvm-svn: 322281
2018-01-11 14:53:45 +00:00
Dmitry Venikov e5fbf591a7 [InstCombine] Missed optimization in math expression: sin(x) / cos(x) => tan(x)
Summary: This patch enables folding sin(x) / cos(x) -> tan(x), cos(x) / sin(x) -> 1 / tan(x) under -ffast-math flag

Reviewers: hfinkel, spatel

Reviewed By: spatel

Subscribers: andrew.w.kaylor, efriedma, scanon, llvm-commits

Differential Revision: https://reviews.llvm.org/D41286

llvm-svn: 322255
2018-01-11 06:33:00 +00:00
Alexey Bataev 90e29b81d6 [SLP] Add/update tests for SLP vectorizer, NFC.
llvm-svn: 322225
2018-01-10 21:29:18 +00:00
Sanjay Patel d04026ea43 [InstCombine] add test to show missed bswap; NFC
D41353 / D41233 are proposing to alter the shl/and canonicalization,
but I think that would just move an existing pattern-matching hole
to a different place.

llvm-svn: 322206
2018-01-10 18:47:21 +00:00
Bjorn Pettersson 3851496e6e Avoid inlining if there is byval arguments with non-alloca address space
Summary:
After teaching InlineCost more about address spaces ()
another fault was detected in the inliner. If an argument has
the byval attribute the parameter might be copied to an alloca.
That part seems to work fine even if the argument has a different
address space than the alloca address space. However, if the
address spaces differ, then the inlined function still might
refer to the parameter using the original address space (the
inliner does not handle that situation very well).

This patch avoids the problem by simply disallowing inlining
when there are byval arguments with address space that differs
from the alloca address space.

I'm not really sure how to transform the code if we want to
get inlining for this situation. I assume that it never has
been working, and that the fixes in r321809 just exposed an
old problem.

Fault found by skatkov (Serguei Katkov). It is mentioned in
follow up comments to https://reviews.llvm.org/D40455.

Reviewers: skatkov

Reviewed By: skatkov

Subscribers: uabelho, eraman, llvm-commits, haicheng

Differential Revision: https://reviews.llvm.org/D41898

llvm-svn: 322181
2018-01-10 13:01:18 +00:00
Vlad Tsyrklevich cdec22ef9a LowerTypeTests: Add limited support for aliases
Summary:
LowerTypeTests moves some function definitions from individual object
files to the merged module, leaving a stub to be called in the merged
module's jump table. If an alias was pointing to such a function
definition LowerTypeTests would fail because the alias would be left
without a definition to point to.

This change 1) emits information about aliases to the ThinLTO summary,
2) replaces aliases pointing to function definitions that are moved to
the merged module with function declarations, and 3) re-emits those
aliases in the merged module pointing to the correct function
definitions.

The patch does not correctly fix all possible mis-uses of aliases in
LowerTypeTests. For example, it does not handle aliases with a different
type from the pointed to function.

The addition of alias data increases the size of Chrome build artifacts
by less than 1%.

Reviewers: pcc

Reviewed By: pcc

Subscribers: mehdi_amini, eraman, mgrang, llvm-commits, eugenis, kcc

Differential Revision: https://reviews.llvm.org/D41741

llvm-svn: 322139
2018-01-10 00:00:51 +00:00
Michael Zolotukhin 1f562176e9 [LoopRotate] Detect loops with indirect branches better (we're giving up on them).
llvm-svn: 322137
2018-01-09 23:54:35 +00:00
Chris Bieneman abdea268c1 [IPSCCP] Remove calls without side effects
Summary:
When performing constant propagation for call instructions we have historically replaced all uses of the return from a call, but not removed the call itself. This is required for correctness if the calls have side effects, however the compiler should be able to safely remove calls that don't have side effects.

This allows the compiler to completely fold away calls to functions that have no side effects if the inputs are constant and the output can be determined at compile time.

Reviewers: davide, sanjoy, bruno, dberlin

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D38856

llvm-svn: 322125
2018-01-09 21:58:46 +00:00