Commit Graph

10585 Commits

Author SHA1 Message Date
Roman Lebedev 6aca33534b [InstSimplify] peek through unsigned FP casts for sign-bit compares (PR36682)
This pattern came up in PR36682 / D44390
https://bugs.llvm.org/show_bug.cgi?id=36682
https://reviews.llvm.org/D44390
https://godbolt.org/g/oKvT5H

See also D44421, D44424

Reviewers: spatel, majnemer, efriedma, arsenm

Reviewed By: spatel

Subscribers: wdng, llvm-commits

Differential Revision: https://reviews.llvm.org/D44425

llvm-svn: 327642
2018-03-15 16:17:46 +00:00
Matthew Simpson c1c4ad6e64 [ConstantFolding, InstSimplify] Handle more vector GEPs
This patch addresses some additional cases where the compiler crashes upon
encountering vector GEPs. This should fix PR36116.

Differential Revision: https://reviews.llvm.org/D44219
Reference: https://bugs.llvm.org/show_bug.cgi?id=36116

llvm-svn: 327638
2018-03-15 16:00:29 +00:00
Sanjay Patel 43f71eade0 [InstSimplify] add tests with NaN operand for fp binops; NFC
llvm-svn: 327631
2018-03-15 14:48:39 +00:00
Sanjay Patel a4f42f2cfd [PatternMatch, InstSimplify] allow undef elements when matching any vector FP zero
This matcher implementation appears to be slightly more efficient than 
the generic constant check that it is replacing because every use was 
for matching FP patterns, but the previous code would check int and 
pointer type nulls too. 

llvm-svn: 327627
2018-03-15 14:29:27 +00:00
Sanjay Patel 8f063d0c70 [InstSimplify] remove 'nsz' requirement for frem 0, X
From the LangRef definition for frem: 
"The value produced is the floating-point remainder of the two operands. 
This is the same output as a libm ‘fmod‘ function, but without any 
possibility of setting errno. The remainder has the same sign as the 
dividend. This instruction is assumed to execute in the default 
floating-point environment."

llvm-svn: 327626
2018-03-15 14:04:31 +00:00
Ulrich Weigand f4ceef8d3f [Debug] Retain both copies of debug intrinsics in HoistThenElseCodeToIf
When hoisting common code from the "then" and "else" branches of a condition
to before the "if", the HoistThenElseCodeToIf routine will attempt to merge
the debug location associated with the two original copies of the hoisted
instruction.

This is a problem in the special case where the hoisted instruction is a
debug info intrinsic, since for those the debug location is considered
part of the intrinsic and attempting to modify it may resut in invalid
IR.  This is the underlying cause of PR36410.

This patch fixes the problem by handling debug info intrinsics specially:
instead of hoisting one copy and merging the two locations, the code now
simply hoists both copies, each with its original location intact.  Note
that this is still only done in the case where both original copies are
otherwise (i.e. apart from location metadata) identical.

Reviewed By: aprantl

Differential Revision: https://reviews.llvm.org/D44312

llvm-svn: 327622
2018-03-15 12:28:48 +00:00
Fedor Sergeev 194a407bda [New PM][IRCE] port of Inductive Range Check Elimination pass to the new pass manager
There are two nontrivial details here:
* Loop structure update interface is quite different with new pass manager,
  so the code to add new loops was factored out

* BranchProbabilityInfo is not a loop analysis, so it can not be just getResult'ed from
  within the loop pass. It cant even be queried through getCachedResult as LoopCanonicalization
  sequence (e.g. LoopSimplify) might invalidate BPI results.

  Complete solution for BPI will likely take some time to discuss and figure out,
  so for now this was partially solved by making BPI optional in IRCE
  (skipping a couple of profitability checks if it is absent).

Most of the IRCE tests got their corresponding new-pass-manager variant enabled.
Only two of them depend on BPI, both marked with TODO, to be turned on when BPI
starts being available for loop passes.

Reviewers: chandlerc, mkazantsev, sanjoy, asbirlea
Reviewed By: mkazantsev
Differential Revision: https://reviews.llvm.org/D43795

llvm-svn: 327619
2018-03-15 11:01:19 +00:00
Andrei Elovikov f9b8035f3c [LoopUnroll] Ignore ephemeral values when checking full unroll profitability.
Summary:
Before this patch call graph is like this in the LoopUnrollPass:

  tryToUnrollLoop
    ApproximateLoopSize
      collectEphemeralValues
      /* Use collected ephemeral values */
    computeUnrollCount
      analyzeLoopUnrollCost
        /* Bail out from the analysis if loop contains CallInst */

This patch moves collection of the ephemeral values to the tryToUnrollLoop
function and passes the collected values into both ApproximateLoopsize (as
before) and additionally starts using them in analyzeLoopUnrollCost:

  tryToUnrollLoop
    collectEphemeralValues
    ApproximateLoopSize(EphValues)
      /* Use EphValues */
    computeUnrollCount(EphValues)
      analyzeLoopUnrollCost(EphValues)
        /* Ignore ephemeral values - they don't contribute to the final cost */
        /* Bail out from the analysis if loop contains CallInst */

Reviewers: mzolotukhin, evstupac, sanjoy

Reviewed By: evstupac

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D43931

llvm-svn: 327617
2018-03-15 09:59:15 +00:00
Sanjay Patel e4e3f79b83 [InstSimplify] add tests for frem and vectors with undef; NFC
These should all be folded. The vector tests need to have
m_AnyZero updated to ignore undef elements, but we need to
be careful not to return the existing value in that case
and unintentionally propagate undef.

llvm-svn: 327585
2018-03-14 22:45:58 +00:00
Philip Reames 0adbb19409 [EarlyCSE] Exploit open ended invariant.start scopes
If we have an invariant.start with no corresponding invariant.end, then the memory location becomes invariant indefinitely after the invariant.start. As a result, anything dominated by the start is guaranteed to see the value the memory location had when the invariant.start executed.

This patch adds an AvailableInvariants table which tracks the generation a particular memory location became invariant and then uses that information to allow value forwarding that would otherwise be disallowed by potentially aliasing stores. (Reminder: In EarlyCSE everything clobbers everything by default.)

This should be compatible with the MemorySSA variant, but design is generational. We can and should add first class support for invariant.start within MemorySSA at a later time.  I took a quick look at doing so, but probably need some input from a MemorySSA expert.

Differential Revision: https://reviews.llvm.org/D43716

llvm-svn: 327577
2018-03-14 21:35:06 +00:00
Sanjay Patel 11f7f9908b [InstSimplify] fix folds for (0.0 - X) + X --> 0 (PR27151)
As shown in:
https://bugs.llvm.org/show_bug.cgi?id=27151
...the existing fold could miscompile when X is NaN.

The fold was also dependent on 'ninf' but that's not necessary.

From IEEE-754 (with default rounding which we can assume for these opcodes):
"When the sum of two operands with opposite signs (or the difference of two 
operands with like signs) is exactly zero, the sign of that sum (or difference) 
shall be +0...However, x + x = x − (−x) retains the same sign as x even when 
x is zero."

llvm-svn: 327575
2018-03-14 21:23:27 +00:00
Sanjay Patel fd82fd000c [InstSimplify] add tests to show missing/broken fadd folds (PR27151, PR26958); NFC
llvm-svn: 327554
2018-03-14 18:52:40 +00:00
Sanjay Patel e011d7964d [InstSimplify] regenerate checks; NFC
llvm-svn: 327553
2018-03-14 18:49:57 +00:00
Roman Lebedev 60d24445dd [InstSimplify] [NFC] cast-unsigned-icmp-cmp-0.ll - don't run instcombine
As disscussed in post-commit review of D44421, there is simply
no reason to run instcombine on this testcase.

llvm-svn: 327541
2018-03-14 17:59:12 +00:00
Roman Lebedev 978aae7614 [InstSimplify] [NFC] Add tests for peeking through unsigned FP casts for sign compares (PR36682)
Summary:
This pattern came up in PR36682 / D44390
https://bugs.llvm.org/show_bug.cgi?id=36682
https://reviews.llvm.org/D44390
https://godbolt.org/g/oKvT5H

Looking at the IR pattern in question, as per [[ https://github.com/rutgers-apl/alive-nj | alive-nj ]], for all the type combinations i checked
(input: `i16`, `i32`, `i64`; intermediate: `half`/`i16`, `float`/`i32`, `double`/`i64`)
for the following `icmp` comparisons the `uitofp`+`bitcast`+`icmp` can be evaluated to a boolean:
* `slt 0`
* `sgt -1`
I did not check vectors, but i'm guessing it's the same there.
{F5889242}

Thus all these cases are in the testcase (along with the vector variant with additional `undef` element in the middle).
There are no negative patterns here (unless alive-nj lied/is broken), all of these should be optimized.

Reviewers: spatel, majnemer, efriedma, arsenm

Reviewed By: spatel

Subscribers: wdng, llvm-commits

Differential Revision: https://reviews.llvm.org/D44421

llvm-svn: 327535
2018-03-14 17:31:08 +00:00
Roman Lebedev 6ab60358ca [InstCombine] [NFC] Add tests for peeking through unsigned FP casts for zero-equality compares (PR36682)
Summary:
This pattern came up in PR36682 / D44390
https://bugs.llvm.org/show_bug.cgi?id=36682
https://reviews.llvm.org/D44390
https://godbolt.org/g/oKvT5H

Looking at the IR pattern in question, as per [[ https://github.com/rutgers-apl/alive-nj | alive-nj ]], for all the type combinations i checked
(input: `i16`, `i32`, `i64`; intermediate: `half`/`i16`, `float`/`i32`, `double`/`i64`)
for the following `icmp` comparisons the `uitofp`+`bitcast` can be dropped:
* `eq 0`
* `ne 0`
I did not check vectors, but i'm guessing it's the same there.
{F5889189}

Thus all these cases are in the testcase (along with the vector variant with additional `undef` element in the middle).
There are no negative patterns here (unless alive-nj lied/is broken), all of these should be optimized.

Generated with
{F5889196}

Reviewers: spatel, majnemer, efriedma, arsenm

Reviewed By: spatel

Subscribers: wdng, llvm-commits

Differential Revision: https://reviews.llvm.org/D44416

llvm-svn: 327534
2018-03-14 17:31:03 +00:00
Hiroshi Yamauchi e6a3dc7699 Simplify more cases of logical ops of masked icmps.
Summary:
For example,

((X & 255) != 0) && ((X & 15) == 8) -> ((X & 15) == 8).
((X & 7) != 0) && ((X & 15) == 8) -> false.

Reviewers: davidxl

Reviewed By: davidxl

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D43835

llvm-svn: 327450
2018-03-13 21:13:18 +00:00
Brian M. Rzycki 252165b27a [LazyValueInfo] PR33357 prevent infinite recursion on BinaryOperator
Summary:
It is possible for LVI to encounter instructions that are not in valid
SSA form and reference themselves. One example is the following:
  %tmp4 = and i1 %tmp4, undef
Before this patch LVI would recurse until running out of stack memory
and crashed.  This patch marks these self-referential instructions as
Overdefined and aborts analysis on the instruction.

Fixes https://bugs.llvm.org/show_bug.cgi?id=33357

Reviewers: craig.topper, anna, efriedma, dberlin, sebpop, kuhar

Reviewed by: dberlin

Subscribers: uabelho, spatel, a.elovikov, fhahn, eli.friedman, mzolotukhin, spop, evandro, davide, llvm-commits

Differential Revision: https://reviews.llvm.org/D34135

llvm-svn: 327432
2018-03-13 18:14:10 +00:00
Sanjay Patel 204edeca56 [InstCombine] fix fmul reassociation to avoid creating an extra fdiv
This was supposed to be an NFC refactoring that will eventually allow
eliminating the isFast() predicate, but there's a rare possibility
that we would pessimize the code as shown in the test case because
we failed to check 'hasOneUse()' properly. This version also removes
an inefficiency of the old code; we would look for: 
(X * C) * C1 --> X * (C * C1)
...but that pattern is always handled by 
SimplifyAssociativeOrCommutative().

llvm-svn: 327404
2018-03-13 14:46:32 +00:00
Daniel Neilson 41e781d5f1 [SROA] Take advantage of separate alignments for memcpy source and destination
Summary:
This change is part of step five in the series of changes to remove alignment argument from
memcpy/memmove/memset in favour of alignment attributes. In particular, this changes the
SROA pass to cease using the old getAlignment() & setAlignment() APIs of MemoryIntrinsic in
favour of getting source & dest specific alignments through the new API. This allows us
to enhance visitMemTransferInst to be more aggressive setting the alignment in memcpy
calls that it creates, as well as to only change the alignment of a memcpy/memmove
argument that it replaces.

Steps:
Step 1) Remove alignment parameter and create alignment parameter attributes for
memcpy/memmove/memset. ( rL322965, rC322964, rL322963 )
Step 2) Expand the IRBuilder API to allow creation of memcpy/memmove with differing
source and dest alignments. ( rL323597 )
Step 3) Update Clang to use the new IRBuilder API. ( rC323617 )
Step 4) Update Polly to use the new IRBuilder API. ( rL323618 )
Step 5) Update LLVM passes that create memcpy/memmove calls to use the new IRBuilder API,
and those that use use MemIntrinsicInst::[get|set]Alignment() to use [get|set]DestAlignment()
and [get|set]SourceAlignment() instead. ( rL323886, rL323891, rL324148, rL324273, rL324278,
rL324384, rL324395, rL324402, rL324626, rL324642, rL324653, rL324654, rL324773, rL324774,
rL324781, rL324784, rL324955, rL324960, rL325816 )
Step 6) Remove the single-alignment IRBuilder API for memcpy/memmove, and the
MemIntrinsicInst::[get|set]Alignment() methods.

Reference
   http://lists.llvm.org/pipermail/llvm-dev/2015-August/089384.html
   http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20151109/312083.html

Reviewers: chandlerc, bollu, efriedma

Reviewed By: efriedma

Subscribers: efriedma, eraman, llvm-commits

Differential Revision: https://reviews.llvm.org/D42974

llvm-svn: 327398
2018-03-13 14:25:33 +00:00
Eugene Leviant 6f42a2cd91 [Evaluator] Evaluate load/store with bitcast
Differential revision: https://reviews.llvm.org/D43457

llvm-svn: 327381
2018-03-13 10:19:50 +00:00
Clement Courbet 9f0b3170bc [MergeICmps] Make sure that the comparison only has one use.
Summary: Fixes PR36557.

Reviewers: trentxintong, spatel

Subscribers: mstorsjo, llvm-commits

Differential Revision: https://reviews.llvm.org/D44083

llvm-svn: 327372
2018-03-13 07:05:55 +00:00
Serguei Katkov bbfbf21ddc Revert [SCEV] Fix isKnownPredicate
It is a revert of rL327362 which causes build bot failures with assert like

Assertion `isAvailableAtLoopEntry(RHS, L) && "RHS is not available at Loop Entry"' failed.

llvm-svn: 327363
2018-03-13 06:36:00 +00:00
Serguei Katkov b05574c0d3 [SCEV] Fix isKnownPredicate
IsKnownPredicate is updated to implement the following algorithm
proposed by @sanjoy and @mkazantsev :
isKnownPredicate(Pred, LHS, RHS) {
  Collect set S all loops on which either LHS or RHS depend.
  If S is non-empty
    a. Let PD be the element of S which is dominated by all other elements of S
    b. Let E(LHS) be value of LHS on entry of PD.
       To get E(LHS), we should just take LHS and replace all AddRecs that
       are attached to PD on with their entry values.
       Define E(RHS) in the same way.
    c. Let B(LHS) be value of L on backedge of PD.
       To get B(LHS), we should just take LHS and replace all AddRecs that
       are attached to PD on with their backedge values.
       Define B(RHS) in the same way.
    d. Note that E(LHS) and E(RHS) are automatically available on entry of PD,
       so we can assert on that.
    e. Return true if isLoopEntryGuardedByCond(Pred, E(LHS), E(RHS)) &&
                      isLoopBackedgeGuardedByCond(Pred, B(LHS), B(RHS))
Return true if Pred, L, R is known from ranges, splitting etc.
}
This is follow-up for https://reviews.llvm.org/D42417.

Reviewers: sanjoy, mkazantsev, reames
Reviewed By: sanjoy, mkazantsev
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D43507

llvm-svn: 327362
2018-03-13 06:10:27 +00:00
Vlad Tsyrklevich aab6000684 Reland r327041: [ThinLTO] Keep available_externally symbols live
Summary:
This change fixes PR36483. The bug was originally introduced by a change
that marked non-prevailing symbols dead. This broke LowerTypeTests
handling of available_externally functions, which are non-prevailing.
LowerTypeTests uses liveness information to avoid emitting thunks for
unused functions.

Marking available_externally functions dead is incorrect, the functions
are used though the function definitions are not. This change keeps them
live, and lets the EliminateAvailableExternally/GlobalDCE passes remove
them later instead.

(Reland with a suspected fix for a unit test failure I haven't been able
to reproduce locally)

Reviewers: pcc, tejohnson

Reviewed By: tejohnson

Subscribers: grimar, mehdi_amini, inglorion, eraman, llvm-commits

Differential Revision: https://reviews.llvm.org/D43690

llvm-svn: 327360
2018-03-13 05:08:48 +00:00
Sanjay Patel a71f95404f [InstCombine] add test to show fmul transform creates extra fdiv; NFC
Also, move fmul reassociation tests to the same file as other fmul transforms.

llvm-svn: 327342
2018-03-12 23:10:08 +00:00
Volkan Keles 4ecdb44a64 BlockExtractor: Don’t delete functions directly
Blocks may have function calls, so don’t erase functions
directly to avoid erasing a function that has a user.

llvm-svn: 327340
2018-03-12 22:28:18 +00:00
Sanjay Patel 1dbb6b7282 [PatternMatch] enhance m_NaN() to ignore undef elements in vectors
llvm-svn: 327339
2018-03-12 22:18:47 +00:00
Saleem Abdulrasool 8b342680bf ObjCARC: teach the cloner about funclets
In the case that the CallInst that is being moved has an associated
operand bundle which is a funclet, the move will construct an invalid
instruction.  The new site will have a different token and needs to be
reassociated with the new instruction.

Unfortunately, there is no way to alter the bundle after the
construction of the instruction.  Replace the call instruction cloning
with a custom helper to clone the instruction and reassociate the
funclet token.

llvm-svn: 327336
2018-03-12 21:46:09 +00:00
Sanjay Patel 5b034c83d6 [InstSimplify] add fcmp tests for constant NaN vector with undef elt; NFC
llvm-svn: 327335
2018-03-12 21:44:17 +00:00
Sanjay Patel a0d8d127c6 [PatternMatch, InstSimplify] allow undef elements when matching vector -0.0
This is the FP equivalent of D42818. Use it for the few cases in InstSimplify 
with -0.0 folds (that's the only current use of m_NegZero()).

Differential Revision: https://reviews.llvm.org/D43792

llvm-svn: 327307
2018-03-12 18:17:01 +00:00
Roman Lebedev 92e359ed67 [InstCombine] [NFC] Add tests for peeking through FP casts for sign-bit compares (PR36682)
Summary:
This pattern came up in PR36682:
https://bugs.llvm.org/show_bug.cgi?id=36682
https://godbolt.org/g/LhuD9A

Tests for proposed fix in D44367.

Looking at the IR pattern in question, as per [[ https://github.com/rutgers-apl/alive-nj | alive-nj ]], for all the type combinations i checked
(input: `i16`, `i32`, `i64`; intermediate: `half`/`i16`, `float`/`i32`, `double`/`i64`)
for the following `icmp` comparisons the `sitofp`+`bitcast` can be dropped:
* `eq 0`
* `ne 0`
* `slt 0`
* `sle 0`
* `sge 0`
* `sgt 0`
* `slt 1`
* `sge 1`
* `sle -1`
* `sgt -1`
I did not check vectors, but i'm guessing it's the same there.
{F5887419}

Thus all these cases are in the testcase (along with the vector variant with additional `undef` element in the middle).
There are no negative patterns here (unless alive-nj lied/is broken), all of these should be optimized.

Generated with {F5887551}

Reviewers: spatel, majnemer, efriedma, arsenm

Reviewed By: spatel

Subscribers: nlopes, wdng, llvm-commits

Differential Revision: https://reviews.llvm.org/D44390

llvm-svn: 327301
2018-03-12 17:43:02 +00:00
Sanjay Patel 49b7dc2968 [InstSimplify] add test for m_NegZero with undef elt; NFC
llvm-svn: 327287
2018-03-12 15:47:32 +00:00
Eugene Leviant 19e238746b [ThinLTO] Recommit of import global variables
This wasreverted in r326638 due to link problems and fixed
afterwards

llvm-svn: 327254
2018-03-12 10:30:50 +00:00
Justin Lebar 24b6640b1b Back out "Re-land: Teach CorrelatedValuePropagation to reduce the width of udiv/urem instructions."
This reverts r326908, originally landed as D44102.

Reverted for causing performance regressions on x86.  (These regressions
are not yet understood.)

llvm-svn: 327252
2018-03-12 09:26:09 +00:00
Serguei Katkov a20e05bb94 [CGP] Fix the remove of matched phis in complex addressing mode
When we replace the Phi we created with matched ones it is possible that
there are two identical phi nodes in IR. And matcher is smart enough to find that
new created phi matches both of them. So we try to replace our phi node with
matched ones twice and what is bad we delete our phi node twice causing a crash.

As soon as we found that we have two identical Phi nodes it makes sense to do
a clean-up and replace one phi node by other one.
The patch implements it.

Reviewers: john.brawn, reames
Reviewed By: john.brawn
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D43758

llvm-svn: 327250
2018-03-12 03:50:07 +00:00
Sanjay Patel 6ee46702d5 [InstCombine] add tests for casted sign-bit cmp (PR36682); NFC
llvm-svn: 327243
2018-03-11 16:45:31 +00:00
Sanjay Patel 4222716822 [InstSimplify] fp_binop X, undef --> NaN
The variable operand could be NaN, so it's always safe to propagate NaN.

llvm-svn: 327212
2018-03-10 16:51:28 +00:00
Sanjay Patel e5606b4fa5 [ConstantFold] fp_binop AnyConstant, undef --> NaN
With the updated LangRef ( D44216 / rL327138 ) in place, we can proceed with more constant folding.

I'm intentionally taking the conservative path here: no matter what the constant or the FMF, we can 
always fold to NaN. This is because the undef operand can be chosen as NaN, and in our simplified 
default FP env, nothing else happens - NaN just propagates to the result. If we find some way/need 
to propagate undef instead, that can be added subsequently.

The tests show that we always choose the same quiet NaN constant (0x7FF8000000000000 in IR text). 
There were suggestions to improve that with a 'NaN' string token or not always print a 64-bit hex 
value, but those are independent changes. We might also consider setting/propagating the payload of 
NaN constants as an enhancement.

Differential Revision: https://reviews.llvm.org/D44308

llvm-svn: 327208
2018-03-10 15:56:25 +00:00
Florian Hahn a7dcfa746e [PartialInlining] Use isInlineViable to detect constructs preventing inlining.
Use isInlineViable to prevent inlining of functions with non-inlinable
constructs, in case cost analysis is skipped.

Reviewers: efriedma, sfertile, davide, davidxl

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D42846

llvm-svn: 327207
2018-03-10 14:53:44 +00:00
Ulrich Weigand 019dd2316d Revert "[Debug] Retain both sets of debug intrinsics in HoistThenElseCodeToIf"
This reverts commit r327175 as problems in debug info generation were shown.

llvm-svn: 327176
2018-03-09 22:00:10 +00:00
Ulrich Weigand fa4e63c0d6 [Debug] Retain both sets of debug intrinsics in HoistThenElseCodeToIf
When hoisting common code from the "then" and "else" branches of a condition
to before the "if", there is no need to require that debug intrinsics match
before moving them (and merging them).  Instead, we can simply always keep
all debug intrinsics from both sides of the "if".

This fixes PR36410, which describes a problem where as a result of the attempt
to merge debug locations for two debug intrinsics we end up with an invalid
intrinsic, where the scope indicated in the !dbg location no longer matches
the scope of the variable tracked by the intrinsic.

In addition, this has the benefit that we no longer throw away information
that is actually still valid, helping to generate better debug data.

Reviewed By: vsk

Differential Revision: https://reviews.llvm.org/D44312

llvm-svn: 327175
2018-03-09 21:37:07 +00:00
Peter Collingbourne 2974856ad4 Use branch funnels for virtual calls when retpoline mitigation is enabled.
The retpoline mitigation for variant 2 of CVE-2017-5715 inhibits the
branch predictor, and as a result it can lead to a measurable loss of
performance. We can reduce the performance impact of retpolined virtual
calls by replacing them with a special construct known as a branch
funnel, which is an instruction sequence that implements virtual calls
to a set of known targets using a binary tree of direct branches. This
allows the processor to speculately execute valid implementations of the
virtual function without allowing for speculative execution of of calls
to arbitrary addresses.

This patch extends the whole-program devirtualization pass to replace
certain virtual calls with calls to branch funnels, which are
represented using a new llvm.icall.jumptable intrinsic. It also extends
the LowerTypeTests pass to recognize the new intrinsic, generate code
for the branch funnels (x86_64 only for now) and lay out virtual tables
as required for each branch funnel.

The implementation supports full LTO as well as ThinLTO, and extends the
ThinLTO summary format used for whole-program devirtualization to
support branch funnels.

For more details see RFC:
http://lists.llvm.org/pipermail/llvm-dev/2018-January/120672.html

Differential Revision: https://reviews.llvm.org/D42453

llvm-svn: 327163
2018-03-09 19:11:44 +00:00
Renato Golin bc94b98c44 [LV] Adding test for r327109
llvm-svn: 327155
2018-03-09 18:02:36 +00:00
Farhana Aleen a7cb31123c [AMDGPU] Supported ds_read_b128 generation; Widened vector length for local address-space.
Summary: Starting from GCN 2nd generation, ISA supports ds_read_b128 on top of ds_read_b64.
         This patch supports ds_read_b128 instruction pattern and generation of this instruction.
         In the vectorizer, this patch also widen the vector length so that vectorizer generates
         128 bit loads for local address-space which gets translated to ds_read_b128.
         Since the performance benefit is not clear; compiler generates ds_read_b128 under -amdgpu-ds128.

Author: FarhanaAleen

Reviewed By: rampitec, arsenm

Subscribers: llvm-commits, AMDGPU

Differential Revision: https://reviews.llvm.org/D44210

llvm-svn: 327153
2018-03-09 17:41:39 +00:00
Chad Rosier 95d9ccb2a0 [JumpThreading] Don't restrict cast-traversal to i1
In r263618, JumpThreading learned to look trough simple cast instructions, but
only if the source of those cast instructions was a phi/cmp i1 (in an effort to
limit compile time effects). I think this condition is too restrictive. For
switches with limited value range, InstCombine will readily introduce an extra
trunc instruction to a smaller integer type (e.g. from i8 to i2), leaving us in
the somewhat perverse situation that jump-threading would work before running
instcombine, but not after. Since instcombine produces this pattern, I think we
need to consider it canonical and support it in JumpThreading.  In general,
for limiting recursion, I think the existing restriction to phi and cmp nodes
should be sufficient to avoid looking through unprofitable chains of
instructions.

Patch by Keno Fischer!
Differential Revision: https://reviews.llvm.org/D42262

llvm-svn: 327150
2018-03-09 16:43:46 +00:00
Sanjay Patel 3675b8cece [InstSimplify] fix FP infinite hex constant values in tests; NFC
Really should improve this...

llvm-svn: 327144
2018-03-09 16:14:02 +00:00
Stefan Pintilie ef7c4976bb Revert "[PowerPC] LSR tunings for PowerPC"
Revert the rest of the LST tune commit.
It seems that the LSR tune commit breaks internal tests.
Reverting the commit.

llvm-svn: 327143
2018-03-09 16:08:55 +00:00
Stefan Pintilie 7f879a8467 Revert "[PowerPC] Move test to correct location."
Revert part of the LSR tune commit.

llvm-svn: 327142
2018-03-09 16:08:48 +00:00
Eric Christopher 3caa0fd050 Revert "[ThinLTO] Keep available_externally symbols live"
This reverts commit r327041 and the followup attempts at fixing the testcase as they're still failing.

llvm-svn: 327094
2018-03-09 01:25:18 +00:00
Adrian Prantl 5b477be72a LowerDbgDeclare: ignore dbg.declares for allocas with volatile access
There is no point in lowering a dbg.declare describing an alloca that
has volatile loads or stores as users, since the alloca cannot be
elided. Lowering the dbg.declare will result in larger debug info that
may also have worse coverage than just describing the alloca.

rdar://problem/34496278

llvm-svn: 327092
2018-03-09 00:45:04 +00:00
Sanjay Patel ee770e9c4e [Reassociate] fix test to be independent of FP undef
llvm-svn: 327071
2018-03-08 22:05:27 +00:00
Sanjay Patel 2ee7b9349d [ConstantFold] fp_binop undef, undef --> undef
These are uncontroversial and independent of a proposed LangRef edits (D44216).

I tried to fix tests that would fold away:
rL327004
rL327028
rL327030
rL327034

I'm not sure if the Reassociate tests are meaningless yet, but they probably will be 
as we add more folds, so if anyone has suggestions or wants to fix those, please do.

Differential Revision: https://reviews.llvm.org/D44258

llvm-svn: 327058
2018-03-08 20:42:49 +00:00
Vlad Tsyrklevich c9a1a6e964 Specify that test from r327041 requires asserts
llvm-svn: 327051
2018-03-08 19:46:19 +00:00
Vlad Tsyrklevich f6337d3a73 Fix test failure introduced in r327041
The "Assertion: `...' failed" error message format is not identical
across platforms.

llvm-svn: 327047
2018-03-08 19:20:08 +00:00
Vlad Tsyrklevich 7b66ef1036 [ThinLTO] Keep available_externally symbols live
Summary:
This change fixes PR36483. The bug was originally introduced by a change
that marked non-prevailing symbols dead. This broke LowerTypeTests
handling of available_externally functions, which are non-prevailing.
LowerTypeTests uses liveness information to avoid emitting thunks for
unused functions.

Marking available_externally functions dead is incorrect, the functions
are used though the function definitions are not. This change keeps them
live, and lets the EliminateAvailableExternally/GlobalDCE passes remove
them later instead.

I've also enabled EliminateAvailableExternally for all optimization
levels, I believe it being disabled for O1 was an oversight.

Reviewers: pcc, tejohnson

Reviewed By: tejohnson

Subscribers: grimar, mehdi_amini, inglorion, eraman, llvm-commits

Differential Revision: https://reviews.llvm.org/D43690

llvm-svn: 327041
2018-03-08 18:48:03 +00:00
Sanjay Patel 31051f8314 [InstCombine] add min/max tests with not ops; NFC
These are based on:
https://bugs.llvm.org/show_bug.cgi?id=35875
It's not clear if/how instcombine can reduce these,
but we should have the tests here either way to 
document current behavior.

llvm-svn: 327039
2018-03-08 18:34:23 +00:00
Sanjay Patel 788a4336cd [StructurizeCFG] fix test to be independent of FP undef
llvm-svn: 327028
2018-03-08 17:13:57 +00:00
Sanjay Patel e755bf87a5 [StructurizeCFG] auto-generate full checks; NFC
Not sure what the intent of this test is, but this will change when we fix FP undef constant folding.

llvm-svn: 327022
2018-03-08 16:25:37 +00:00
Sanjay Patel faf9b0f322 [InstCombine] regenerate checks; NFC
We may not need any of these tests after rL327012, but leaving 
them here for now until that's confirmed.

llvm-svn: 327014
2018-03-08 15:46:38 +00:00
Sanjay Patel a87b74f72b [InstSimplify] add more tests for FP undef; NFC
llvm-svn: 327012
2018-03-08 15:39:39 +00:00
Sanjay Patel 4ad3c32144 [InstCombine, NewGVN] remove FP undef from tests
I'm trying to preserve the intent of these tests by using 
non-undef operands; if we fix FP undef folding these tests
will not pass.

llvm-svn: 327004
2018-03-08 14:57:08 +00:00
Stefan Pintilie f8c2dce236 [PowerPC] Move test to correct location.
Test was added in r326906 to an incorrect location.
Moving the test to PPC CodeGen directory as the test is PPC specific.

llvm-svn: 326923
2018-03-07 18:27:10 +00:00
Justin Lebar eccfbf1bcd Re-land: Teach CorrelatedValuePropagation to reduce the width of udiv/urem instructions.
Summary:
If the operands of a udiv/urem can be proved to fit within a smaller
power-of-two-sized type, reduce the width of the udiv/urem.

Backed out for failing an assert in clang bootstrap builds.  Re-landing
with a fix for handling non-power-of-two inputs (e.g. udiv i24).

Original Differential Revision: https://reviews.llvm.org/D44102

llvm-svn: 326908
2018-03-07 16:56:49 +00:00
Stefan Pintilie f8438e8e59 [PowerPC] LSR tunings for PowerPC
The purpose of this patch is to have LSR generate better code on Power.
This is done by overriding isLSRCostLess.

Differential Revision: https://reviews.llvm.org/D40855

llvm-svn: 326906
2018-03-07 16:53:09 +00:00
Justin Lebar eeeb0eb049 Revert rL326898: "Teach CorrelatedValuePropagation to reduce the width of udiv/urem instructions."
Breaks bootstrap builds: clang built with this patch asserts while
building MCDwarf.cpp: Assertion `castIsValid(op, S, Ty) && "Invalid
cast!"' failed.

llvm-svn: 326900
2018-03-07 16:05:43 +00:00
Justin Lebar cb9e89c39b Teach CorrelatedValuePropagation to reduce the width of udiv/urem instructions.
Summary:
If the operands of a udiv/urem can be proved to fit within a smaller
power-of-two-sized type, reduce the width of the udiv/urem.

Reviewers: spatel, sanjoy

Subscribers: llvm-commits, hiraditya

Differential Revision: https://reviews.llvm.org/D44102

llvm-svn: 326898
2018-03-07 15:11:13 +00:00
Sven van Haastregt 19f531d31e [LoadStoreVectorizer] Differentiate between <1 x T> and T
The LoadStoreVectorizer thought that <1 x T> and T were the same types
when merging stores, leading to a crash later.

Patch by Erik Hogeman.

Differential Revision: https://reviews.llvm.org/D44014

llvm-svn: 326884
2018-03-07 10:29:28 +00:00
Evgeny Stupachenko 204ade4102 Add early exit on reassociation of 0 expression.
Summary:

Before the patch a try to reassociate ((v * 16) * 0) * 1 fall into infinite loop

Reviewers: pankajchawla

Differential Revision: http://reviews.llvm.org/D41467

From: Evgeny Stupachenko <evstupac@gmail.com>
                         <evgeny.v.stupachenko@intel.com>
llvm-svn: 326861
2018-03-07 02:17:08 +00:00
Sebastian Pop bf6e1c26cf DA: remove uses of GEP, only ask SCEV
It's been quite some time the Dependence Analysis (DA) is broken,
as it uses the GEP representation to "identify" multi-dimensional arrays.
It even wrongly detects multi-dimensional arrays in single nested loops:

from test/Analysis/DependenceAnalysis/Coupled.ll, example @couple6
;; for (long int i = 0; i < 50; i++) {
;; A[i][3*i - 6] = i;
;; *B++ = A[i][i];

DA used to detect two subscripts, which makes no sense in the LLVM IR
or in C/C++ semantics, as there are no guarantees as in Fortran of
subscripts not overlapping into a next array dimension:

maximum nesting levels = 1
SrcPtrSCEV = %A
DstPtrSCEV = %A
using GEPs
subscript 0
    src = {0,+,1}<nuw><nsw><%for.body>
    dst = {0,+,1}<nuw><nsw><%for.body>
    class = 1
    loops = {1}
subscript 1
    src = {-6,+,3}<nsw><%for.body>
    dst = {0,+,1}<nuw><nsw><%for.body>
    class = 1
    loops = {1}
Separable = {}
Coupled = {1}

With the current patch, DA will correctly work on only one dimension:

maximum nesting levels = 1
SrcSCEV = {(-2424 + %A)<nsw>,+,1212}<%for.body>
DstSCEV = {%A,+,404}<%for.body>
subscript 0
    src = {(-2424 + %A)<nsw>,+,1212}<%for.body>
    dst = {%A,+,404}<%for.body>
    class = 1
    loops = {1}
Separable = {0}
Coupled = {}

This change removes all uses of GEP from DA, and we now only rely
on the SCEV representation.

The patch does not turn on -da-delinearize by default, and so the DA analysis
will be more conservative in the case of multi-dimensional memory accesses in
nested loops.

I disabled some interchange tests, as the DA is not able to disambiguate
the dependence anymore. To make DA stronger, we may need to
compute a bound on the number of iterations based on the access functions
and array dimensions.

The patch cleans up all the CHECKs in test/Transforms/LoopInterchange/*.ll to
avoid checking for snippets of LLVM IR: this form of checking is very hard to
maintain. Instead, we now check for output of the pass that are more meaningful
than dozens of lines of LLVM IR. Some tests now require -debug messages and thus
only enabled with asserts.

Patch written by Sebastian Pop and Aditya Kumar.

Differential Revision: https://reviews.llvm.org/D35430

llvm-svn: 326837
2018-03-06 21:55:59 +00:00
Sanjay Patel ed2211d50f [PatternMatch] define m_Not using m_Xor and cst_pred_ty
Using cst_pred_ty in the definition allows us to match vectors with undef elements.

This is a continuation of an effort to make all pattern matchers allow undef elements in vectors:
rL325437
rL325466
D43792

Differential Revision: https://reviews.llvm.org/D44076

llvm-svn: 326823
2018-03-06 18:19:42 +00:00
Florian Hahn 517dc51c48 [CallSiteSplitting] Do not crash when BB's terminator changes.
Change doCallSiteSplitting to iterate until we reach the terminator instruction.
tryToSplitCallSite can replace BB's terminator in case BB is a successor of
itself. Then IE will be invalidated and we also have to check the current
terminator.

Reviewers: junbuml, davidxl, davide, fhahn

Reviewed By: fhahn, junbuml

Differential Revision: https://reviews.llvm.org/D43824

llvm-svn: 326793
2018-03-06 14:00:58 +00:00
Daniel Neilson 82daad31fe [RewriteStatepoints] Fix stale parse points
Summary:
RewriteStatepointsForGC collects parse points for further processing.
During the collection if a callsite is found in an unreachable block
(DominatorTree::isReachableFromEntry()) then all unreachable blocks are
removed by removeUnreachableBlocks(). Some of the removed blocks could
have been reachable according to DominatorTree::isReachableFromEntry().
In this case the collected parse points became stale and resulted in a
crash when accessed.

The fix is to unconditionally canonicalize the IR to
removeUnreachableBlocks and then collect the parse points.

The added test crashes with the old version and passes with this patch.

Patch by Yevgeny Rouban!

Reviewed by: Anna

Differential Revision: https://reviews.llvm.org/D43929

llvm-svn: 326748
2018-03-05 22:27:30 +00:00
Alexey Bataev 625ce229b1 [SLP] Additional tests for stores vectorization, NFC.
llvm-svn: 326740
2018-03-05 20:20:12 +00:00
Daniel Neilson bdda115e19 [InstCombine] Don't blow up in foldICmpWithCastAndCast on vector icmp instructions.
Summary:
Presently, InstCombiner::foldICmpWithCastAndCast() implicitly assumes that it is
only invoked with icmp instructions of integer type. If that assumption is broken,
and it is called with an icmp of vector type, then it fails (asserts/crashes).

This patch addresses the deficiency. It allows it to simplify
icmp (ptrtoint x), (ptrtoint/c) of vector type into a compare of the inputs,
much as is done when the type is integer.

Reviewers: apilipenko, fedor.sergeev, mkazantsev, anna

Reviewed By: anna

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D44063

llvm-svn: 326730
2018-03-05 18:05:51 +00:00
Craig Topper 8452faceae [InstCombine] Add constant vector support to getMinimumFPType for visitFPTrunc.
This patch teaches getMinimumFPType to support shrinking a vector of ConstantFPs. This should improve our ability to combine vector fptrunc with fp binops.

Differential Revision: https://reviews.llvm.org/D43774

llvm-svn: 326729
2018-03-05 18:04:12 +00:00
Florian Hahn 0b7c6422fb [IPSCCP] Add getCompare which returns either true, false, undef or null.
getCompare returns true, false or undef constants if the comparison can
be evaluated, or nullptr if it cannot. This is in line with what
ConstantExpr::getCompare returns. It also allows us to use
ConstantExpr::getCompare for comparing constants.

Reviewers: davide, mssimpso, dberlin, anna

Reviewed By: davide

Differential Revision: https://reviews.llvm.org/D43761

llvm-svn: 326720
2018-03-05 17:33:50 +00:00
Xin Tong 8345c0e3a5 [MergeICmp] We can discard initial blocks that do other work
Summary:
 We can discard initial blocks that do other work
We do not need to limit ourselves to just the first block in the chain.

Reviewers: courbet, davide

Reviewed By: courbet

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D44029

llvm-svn: 326698
2018-03-05 13:54:47 +00:00
Fedor Indutny f9e09c1dd0 [CallSiteSplitting] properly split musttail calls
Summary:
`musttail` calls can't be naively splitted. The split blocks must
include not only the call instruction itself, but also (optional)
`bitcast` and `return` instructions that follow it.

Clone `bitcast` and `ret`, place them into the split blocks, and
remove the tail block when done.

Reviewers: junbuml, mcrosier, davidxl, davide, fhahn

Reviewed By: fhahn

Subscribers: JDevlieghere, llvm-commits

Differential Revision: https://reviews.llvm.org/D43729

llvm-svn: 326666
2018-03-03 21:40:14 +00:00
Sanjay Patel 9119b844a3 [InstCombine] add test for vectors with undef elts; NFC
llvm-svn: 326661
2018-03-03 18:00:15 +00:00
Sanjay Patel 1a8d5c3d1f [InstCombine] (~X) - (~Y) --> Y - X
llvm-svn: 326660
2018-03-03 17:53:25 +00:00
Sanjay Patel 73eb2d2555 [InstCombine] add tests for notnotsub; NFC
As shown in D44043, we may need this fold in the backend,
but it's also missing in the IR optimizer.

llvm-svn: 326659
2018-03-03 17:20:37 +00:00
Chandler Carruth a4619d9944 [ThinLTO] Revert r325320: Import global variables
This caused some links to fail with ThinLTO due to missing symbols as
well as causing some binaries to have failures at runtime. We're working
with the author to get a test case, but want to get the tree green
again.

Further, it appears to introduce a data race. While the test usage of
threads was disabled in r325361 & r325362, that isn't an acceptable fix.
I've reverted both of these as well. This code needs to be thread safe.
Test cases for this are already on the original commit thread.

llvm-svn: 326638
2018-03-02 23:40:08 +00:00
Sanjay Patel 2fd0acf05a [InstCombine] partly fix FMF for fmul+log2 fold
The code was checking that all of the instructions in the 
sequence are 'fast', but that's not necessary. The final 
multiply is all that we need to check (tests adjusted). 
The fmul doesn't need to be fully 'fast' either, but that 
can be another patch.

llvm-svn: 326608
2018-03-02 20:32:46 +00:00
Sanjay Patel bb7228703a [InstCombine] add tests for rL169025; NFC
This narrow fold was added with no motivation or test cases 
a bit over 5 years ago. Removing a constant operand is a 
good canonicalization? We should handle Y*2.0 too then?

llvm-svn: 326606
2018-03-02 19:26:13 +00:00
Craig Topper 18799f4c07 [InstCombine] Allow fptrunc (fpext X)) to be reduced to a single fpext/ftrunc
If we are only truncating bits from the extend we should be able to just use a smaller extend.

If we are truncating more than the extend we should be able to just use a fptrunc since the presense of the fpextend shouldn't affect rounding.

Differential Revision: https://reviews.llvm.org/D43970

llvm-svn: 326595
2018-03-02 18:16:51 +00:00
Yaxun Liu 3c42f1c3c9 LoopUnroll: respect pragma unroll when AllowRemainder is disabled
Currently when AllowRemainder is disabled, pragma unroll count is not
respected even though there is no remainder. This bug causes a loop
fully unrolled in many cases even though the user specifies a unroll
count. Especially it affects OpenCL/CUDA since in many cases a loop
contains convergent instructions and currently AllowRemainder is
disabled for such loops.

Differential Revision: https://reviews.llvm.org/D43826

llvm-svn: 326585
2018-03-02 16:22:32 +00:00
Clement Courbet c9119b3b6a [MergeICmps] Revert accidentally submitted failing test case.
Reverts r326574.

llvm-svn: 326582
2018-03-02 14:53:33 +00:00
Clement Courbet 1de4a183d4 [MergeIcmps] Add the test case from PR36557.
Summary: See PR36557.

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D44009

llvm-svn: 326574
2018-03-02 14:34:39 +00:00
Craig Topper 8cb5fc15ee [InstCombine] Add more test case to fpextend.ll.
This includes the test cases from D43970 and additional tests for combining (fptrunc (binop (fpext), (fpext))) where the pre-extended types don't match the trunc and therefore can't be completely removed.

llvm-svn: 326528
2018-03-02 01:34:42 +00:00
Fedor Indutny 1571b1271e [ArgumentPromotion] don't break musttail invariant PR36543
Summary:
Do not break musttail invariant by promoting arguments of musttail
callee or caller.

Reviewers: sanjoy, dberlin, hfinkel, george.burgess.iv, fhahn, rnk

Reviewed By: rnk

Subscribers: rnk, llvm-commits

Differential Revision: https://reviews.llvm.org/D43926

llvm-svn: 326521
2018-03-02 00:59:27 +00:00
Craig Topper 113446ca37 [InstCombine] Simplify test cases by removing loads/stores that aren't required for what is being tested.
The loads and stores were getting the data and storing the results. There's no reason we can't just use function arguments and return.

llvm-svn: 326515
2018-03-02 00:27:44 +00:00
Sanjay Patel d0cdb2f861 [InstCombine] allow fmul fold with less than 'fast'
This is a retry of r326502 with updates to the reassociate 
test file that I missed the first time.

@test15_reassoc in the supposed -reassociate test file 
(except that it tests 2 other passes too...) shows that
there's no clear responsiblity for reassociation transforms.

Instcombine now gets that case, but only because the
constant values are identical. Otherwise, it would still
miss that pattern. 

Reassociate doesn't get that case because it hasn't been 
updated to use less than 'fast' FMF.

llvm-svn: 326513
2018-03-02 00:14:51 +00:00
Sanjay Patel f2664d0663 [Reassociate] regenerate checks; NFC
llvm-svn: 326511
2018-03-01 23:41:03 +00:00
Sanjay Patel eb5d046890 revert r326502: [InstCombine] allow fmul fold with less than 'fast'
I forgot that I added tests for 'reassoc' to -reassociate, but
suprisingly that file calls -instcombine too, so it is affected.
I'll update that file and try again.

llvm-svn: 326510
2018-03-01 23:39:24 +00:00
Sanjay Patel 7373ae5c9a [InstCombine] allow fmul fold with less than 'fast'
llvm-svn: 326502
2018-03-01 22:53:47 +00:00
Craig Topper f1a7c6755d [InstCombine] Auto-generate complete checks. NFC
llvm-svn: 326474
2018-03-01 20:05:07 +00:00
Sanjay Patel d696e93cb6 [InstCombine] remove stale comments for tests; NFC
llvm-svn: 326448
2018-03-01 16:28:32 +00:00
Sanjay Patel 3fd43a843b [InstCombine] move/add tests for fmul reassociation; NFC
This transform may be out-of-scope for instcombine, 
but this is only documenting the current behavior.

llvm-svn: 326442
2018-03-01 15:30:44 +00:00
Sanjay Patel 237e52674f [InstCombine] auto-generate full checks; NFC
llvm-svn: 326440
2018-03-01 15:13:42 +00:00
Max Kazantsev f8d2969abb [SCEV] Smart range calculation for SCEVUnknown Phis
The range of SCEVUnknown Phi which merges values `X1, X2, ..., XN`
can be evaluated as `U(Range(X1), Range(X2), ..., Range(XN))`.

Reviewed By: sanjoy
Differential Revision: https://reviews.llvm.org/D43810

llvm-svn: 326418
2018-03-01 06:56:48 +00:00
Reid Kleckner 3762a089d7 [IPSCCP] do not break musttail invariant (PR36485)
Do not replace results of `musttail` calls with a constant if the
call itself can't be removed.

Do not zap returns of `musttail` callees, if the call site can't be
removed and replaced with a constant.

Do not zap returns of `musttail`-calling blocks, this breaks
invariant too.

Patch by Fedor Indutny

Differential Revision: https://reviews.llvm.org/D43695

llvm-svn: 326404
2018-03-01 01:19:18 +00:00
Reid Kleckner cb9611ca67 [DAE] don't remove args of musttail target/caller
`musttail` requires identical signatures of caller and callee. Removing
arguments breaks `musttail` semantics.

PR36441

Patch by Fedor Indutny

Differential Revision: https://reviews.llvm.org/D43708

llvm-svn: 326394
2018-03-01 00:09:35 +00:00
Sanjay Patel eaf5a120ed [InstCombine] simplify code for X * -1.0 --> -X; NFC
I've added random FMF to one of the tests to show those are propagated.

llvm-svn: 326377
2018-02-28 22:30:04 +00:00
Jonas Devlieghere 9ca064552a [GlobalOpt] don't change CC of musttail calle(e|r)
When the function has musttail call - its cc is fixed to be equal to the
cc of the musttail callee. In such case (and in the case of the musttail
callee), GlobalOpt should not change the cc to fastcc as it will break
the invariant.

This fixes PR36546

Patch by: Fedor Indutny (indutny)

Differential revision: https://reviews.llvm.org/D43859

llvm-svn: 326376
2018-02-28 22:28:44 +00:00
Sanjay Patel 356e77f550 [InstCombine] auto-generate complete checks; NFC
llvm-svn: 326331
2018-02-28 16:53:45 +00:00
Xin Tong 8ba674e43b [MergeICmp] Fix a bug in MergeICmp that can lead to a block being processed more than once.
Summary:
Fix a bug in MergeICmp that can lead to a BCECmp block being processed more than once and eventually lead to a broken LLVM module.
The problem is that if the non-constant value is not produced by the last block, the producer will be processed once when the its parent block
is processed and second time when the last block is processed.

We end up having 2 same BCECmpBlock in the merge queue. And eventually lead to a broken LLVM module.

Reviewers: courbet, davide

Reviewed By: courbet

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D43825

llvm-svn: 326318
2018-02-28 12:08:00 +00:00
Mohammad Shahid ddeee12f59 [SLP] Added new tests and updated existing for jumbled load, NFC.
llvm-svn: 326303
2018-02-28 04:19:34 +00:00
Sanjay Patel bf28a8fc01 [InstSimplify] add tests for FP with undef operand; NFC
Are any of these correct?

llvm-svn: 326241
2018-02-27 20:17:18 +00:00
Craig Topper 301991080e [ValueTracking] Teach cannotBeOrderedLessThanZeroImpl to look through ExtractElement.
This is similar to what's done in computeKnownBits and computeSignBits. Don't do anything fancy just collect information valid for any element.

Differential Revision: https://reviews.llvm.org/D43789

llvm-svn: 326237
2018-02-27 19:53:45 +00:00
Sanjay Patel 8529dd5ee1 [ARM] add loop vectorizer test based on 482.sphinx3 from SPEC2006; NFC
This is a slight reduction of one of the benchmarks
that suffered with D43079. Cost model changes should
not cause this test to remain scalarized.

llvm-svn: 326221
2018-02-27 18:33:24 +00:00
Sanjay Patel 04d1d79ee5 [AArch64] add SLP test based on TSVC; NFC
This is a slight reduction of one of the benchmarks
that suffered with D43079. Cost model changes should
not cause this test to remain scalarized.

llvm-svn: 326217
2018-02-27 18:06:15 +00:00
Florian Hahn 1807c516c7 [NewGVN] Update phi-of-ops def block when updating existing ValuePHI.
In case we update a ValuePHI node created earlier, we could update it
based on a different OpPHI which could be in a different block.
We need to update the TempToBlock mapping reflecting the new block,
otherwise we would end up placing the new phi node in a wrong block.

This problem is exposed by the test case in
https://bugs.llvm.org/show_bug.cgi?id=36504.

This patch fixes a slightly simpler problem than in the bug report. In
the bug's re-producer, the additional problem is that we are re-using a
ValuePHI node with to few incoming values for the new OpPHI. If this
patch makes sense, I will follow it up with a patch that creates a new
PHI node if the existing PHI node has a different number of incoming
values.

Reviewers: davide, dberlin

Reviewed By: dberlin

Differential Revision: https://reviews.llvm.org/D43770

llvm-svn: 326181
2018-02-27 09:34:51 +00:00
Adam Nemet b424cd5d61 Make test agnostic to cost model
This was causing bot failures on greendragon

llvm-svn: 326169
2018-02-27 05:41:16 +00:00
Evgeny Stupachenko f1c058d99b Fix r326154 buildbots test fail
Summary:

Add specific mtriples to tests added in r326154.

From: Evgeny Stupachenko <evstupac@gmail.com>
                         <evgeny.v.stupachenko@intel.com>
llvm-svn: 326158
2018-02-27 01:33:11 +00:00
Evgeny Stupachenko a732611ac8 Fix PR36032, PR35432
Summary:

The change fix an assert fail at ScalarEvolutionExpander.cpp:
  assert(ExitCount != SE.getCouldNotCompute() && "Invalid loop count");

Reviewers: sbaranga

Differential Revision: http://reviews.llvm.org/D42604

From: Evgeny Stupachenko <evstupac@gmail.com>
                         <evgeny.v.stupachenko@intel.com>
llvm-svn: 326154
2018-02-27 00:17:31 +00:00
Sanjay Patel 66911b16e6 [InstCombine, InstSimplify] add tests with undef elements in constant FP vectors; NFC
llvm-svn: 326148
2018-02-26 23:23:02 +00:00
Craig Topper 69c8972fd1 [ValueTracking] Teach cannotBeOrderedLessThanZeroImpl to handle vector constants.
Summary: This allows vector fabs to be removed in more cases.

Reviewers: spatel, arsenm, RKSimon

Reviewed By: spatel

Subscribers: wdng, llvm-commits

Differential Revision: https://reviews.llvm.org/D43739

llvm-svn: 326138
2018-02-26 22:33:17 +00:00
Simon Pilgrim 9929f90740 [X86][SSE] Reduce FADD/FSUB/FMUL costs on later targets (PR36280)
Agner's tables indicate that for SSE42+ targets (Core2 and later) we can reduce the FADD/FSUB/FMUL costs down to 1, which should fix the Himeno benchmark.

Note: the AVX512 FDIV costs look rather dodgy, but this isn't part of this patch.

Differential Revision: https://reviews.llvm.org/D43733

llvm-svn: 326133
2018-02-26 22:10:17 +00:00
Alexey Bataev b44e2b75e8 [SLP] Added new test + fixed some checks, NFC.
llvm-svn: 326117
2018-02-26 20:01:24 +00:00
Craig Topper 43fb1cdef7 [InstCombine] Add test cases with vector constants to fpextend.ll
llvm-svn: 326115
2018-02-26 19:36:37 +00:00
Craig Topper b284e8b9b4 [InstCombine] Switch to using FileCheck instead of grep. Auto-generate checks. NFC
llvm-svn: 326114
2018-02-26 19:36:36 +00:00
Sanjay Patel 31a90468e1 [InstCombine] allow fdiv folds with less than fully 'fast' ops
Note: gcc appears to allow this fold with -freciprocal-math alone, 
but clang/llvm require more than that with this patch. The wording
in the definitions seems fuzzy enough that it could go either way,
but we'll err on the conservative side of FMF interpretation.

This patch also changes the newly created fmul to have FMF propagated
by the last fdiv rather than intersecting the FMF of the fdivs. This
matches the behavior of other folds near here. The new fmul is only 
used to produce an intermediate op for the final fdiv result, so it
shouldn't be any stricter than that result. The previous behavior
could result in dropping FMF via other folds in instcombine or CSE.

Differential Revision: https://reviews.llvm.org/D43398

llvm-svn: 326098
2018-02-26 16:02:45 +00:00
Renato Golin 9d1b2acaaa [LV] Move isLegalMasked* functions from Legality to CostModel
All SIMD architectures can emulate masked load/store/gather/scatter
through element-wise condition check, scalar load/store, and
insert/extract. Therefore, bailing out of vectorization as legality
failure, when they return false, is incorrect. We should proceed to cost
model and determine profitability.

This patch is to address the vectorizer's architectural limitation
described above. As such, I tried to keep the cost model and
vectorize/don't-vectorize behavior nearly unchanged. Cost model tuning
should be done separately.

Please see
http://lists.llvm.org/pipermail/llvm-dev/2018-January/120164.html for
RFC and the discussions.

Closes D43208.

Patch by: Hideki Saito <hideki.saito@intel.com>

llvm-svn: 326079
2018-02-26 11:06:36 +00:00
Florian Hahn ed45836253 [LoopInterchange] Add test case for D43236.
llvm-svn: 326078
2018-02-26 10:46:25 +00:00
Craig Topper aee341ef28 [InstSimplify] Add test cases for removal of vector fabs on known positive.
llvm-svn: 326050
2018-02-25 06:51:52 +00:00
Craig Topper 2b8f051aaa [InstSimplify] Remove unused parameter from test cases.
llvm-svn: 326049
2018-02-25 06:51:51 +00:00
Adam Nemet e4e1de60aa Revert "StructurizeCFG: Test for branch divergence correctly"
This reverts commit r325881.

Breaks many bots

llvm-svn: 326037
2018-02-24 17:29:09 +00:00
Sanjay Patel db53d1847b [InstSimplify] sqrt(X) * sqrt(X) --> X
This was misplaced in InstCombine. We can loosen the FMF as a follow-up step.

llvm-svn: 325965
2018-02-23 22:20:13 +00:00
Sanjay Patel d32104e1b2 [InstCombine] allow fmul-sqrt folds with less than full -ffast-math
Also, add a Builder method for intrinsics to reduce code duplication for clients.

llvm-svn: 325960
2018-02-23 21:16:12 +00:00
Matt Davis 708271849a [Test] Fix the test to output to /dev/null instead of redirecting.
The redirection was confusing the windows build machine.

llvm-svn: 325937
2018-02-23 19:03:04 +00:00
Matt Davis 523c656e25 [Debug] Add dbg.value intrinsics for PHIs created during LCSSA.
Summary:
This patch is an enhancement to propagate dbg.value information when Phis are created on behalf of LCSSA.
I noticed a case where a value carried across a loop was reported as <optimized out>.

Specifically this case:
```
int bar(int x, int y) {
  return x + y;
}

int foo(int size) {
  int val = 0;
  for (int i = 0; i < size; ++i) {
    val = bar(val, i);  // Both val and i are correct
  }
  return val; // <optimized out>
}
```

In the above case, after all of the interesting computation completes our value
is reported as "optimized out." This change will add a dbg.value to correct this.

This patch also moves the dbg.value insertion routine from LoopRotation.cpp 
into Local.cpp, so that we can share it in both places (LoopRotation and LCSSA).

Reviewers: mzolotukhin, aprantl, vsk, davide

Reviewed By: aprantl, vsk

Subscribers: dberlin, llvm-commits

Differential Revision: https://reviews.llvm.org/D42551

llvm-svn: 325926
2018-02-23 17:38:27 +00:00
Nicolai Haehnle 43c1115cd4 StructurizeCFG: Test for branch divergence correctly
Summary:
This fixes cases like the new test @nonuniform. In that test, %cc itself
is a uniform value; however, when reading it after the end of the loop in
basic block %if, its value is effectively non-uniform.

This problem was encountered in
https://bugs.freedesktop.org/show_bug.cgi?id=103743; however, this change
in itself is not sufficient to fix that bug, as there is another issue
in the AMDGPU backend.

Change-Id: I32bbffece4a32f686fab54964dae1a5dd72949d4

Reviewers: arsenm, rampitec, jlebar

Subscribers: wdng, tpr, llvm-commits

Differential Revision: https://reviews.llvm.org/D40546

llvm-svn: 325881
2018-02-23 10:45:46 +00:00
Bjorn Steinbrink 983d6c3f18 Mark MergedLoadStoreMotion as not preserving MemDep results
Summary:
MemDep caches results that signify that a dependence is non-local, and
there is currently no way to invalidate such cache entries.
Unfortunately, when MLSM sinks a store that can result in a non-local
dependence becoming a local one, and then MemDep gives wrong answers.
The easiest way out here is to just say that MLSM does indeed not
preserve MemDep results.

Reviewers: davide, Gerolf

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D43177

llvm-svn: 325880
2018-02-23 10:41:57 +00:00
Daniel Neilson 20c9207be3 [AlignmentFromAssumptions] Set source and dest alignments of memory intrinsiscs separately
Summary:
This change is part of step five in the series of changes to remove alignment argument from
memcpy/memmove/memset in favour of alignment attributes. In particular, this changes the
AlignmentFromAssumptions pass to cease using the old getAlignment()/setAlignment API of
MemoryIntrinsic in favour of getting/setting source & dest specific alignments through
the new API. This allows us to simplify some of the code in this pass and also be more
aggressive about setting the source and destination alignments separately.

Steps:
Step 1) Remove alignment parameter and create alignment parameter attributes for
memcpy/memmove/memset. ( rL322965, rC322964, rL322963 )
Step 2) Expand the IRBuilder API to allow creation of memcpy/memmove with differing
source and dest alignments. ( rL323597 )
Step 3) Update Clang to use the new IRBuilder API. ( rC323617 )
Step 4) Update Polly to use the new IRBuilder API. ( rL323618 )
Step 5) Update LLVM passes that create memcpy/memmove calls to use the new IRBuilder API,
and those that use use MemIntrinsicInst::[get|set]Alignment() to use [get|set]DestAlignment()
and [get|set]SourceAlignment() instead. ( rL323886, rL323891, rL324148, rL324273, rL324278,
rL324384, rL324395, rL324402, rL324626, rL324642, rL324653, rL324654, rL324773, rL324774,
rL324781, rL324784, rL324955, rL324960 )
Step 6) Remove the single-alignment IRBuilder API for memcpy/memmove, and the
MemIntrinsicInst::[get|set]Alignment() methods.

Reference
   http://lists.llvm.org/pipermail/llvm-dev/2015-August/089384.html
   http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20151109/312083.html

Reviewers: hfinkel, bollu, reames

Reviewed By: reames

Subscribers: reames, llvm-commits

Differential Revision: https://reviews.llvm.org/D43081

llvm-svn: 325816
2018-02-22 18:55:59 +00:00
Luke Cheeseman 6c1e6bbe0c [FunctionAttrs][ArgumentPromotion][GlobalOpt] Disable some optimisations passes for naked functions
- Fix for bug 36078.
- Prevent the functionattrs, function-attrs, globalopt and argpromotion passes
  from changing naked functions.
- These passes can perform some alterations to the functions that should not be
  applied. An example is removing parameters that are seemingly not used because
  they are only referenced in the inline assembly. Another example is marking
  the function as fastcc.

llvm-svn: 325788
2018-02-22 14:42:08 +00:00
Sanjay Patel 92b7371113 [InstCombine] add fmul multi-use test; NFC
Also, rename tests to make their intent clearer.

llvm-svn: 325785
2018-02-22 14:27:16 +00:00
Simon Pilgrim 864949d5e9 [SLPVectorizer][X86] Add load extend tests (PR36091)
llvm-svn: 325772
2018-02-22 12:19:34 +00:00
Sanjay Patel 9befaeb582 [InstCombine] add some random FMF to tests so we know it's not dropped; NFC
llvm-svn: 325734
2018-02-21 22:48:28 +00:00
Sanjay Patel d53da082a0 [AArch64] fix IR names to not be 'tmp' because that gives the CHECK script problems
llvm-svn: 325718
2018-02-21 20:48:14 +00:00
Sanjay Patel ffe51e450f [AArch64] add SLP test for matmul (PR36280); NFC
This is a slight reduction of one of the benchmarks
that suffered with D43079. Cost model changes should
not cause this test to remain scalarized.

llvm-svn: 325717
2018-02-21 20:34:16 +00:00
Alexey Bataev 650f639d33 [LV] Fix test checks, NFC
llvm-svn: 325699
2018-02-21 16:48:23 +00:00
Alexey Bataev cdd0675ddc [SLP] Fix test checks, NFC.
llvm-svn: 325689
2018-02-21 15:32:58 +00:00
Silviu Baranga 10ad93c6bf [SCEV] Temporarily disable loop versioning for the purpose
of turning SCEVUnknowns of PHIs into AddRecExprs.

This feature is now hidden behind the -scev-version-unknown flag.

Fixes PR36032 and PR35432.

llvm-svn: 325687
2018-02-21 15:20:32 +00:00
Vedant Kumar 56492f9177 [BDCE] Salvage debug info from dying insts
This results in 15 additional unique source variables in a stage2 build
of FileCheck (at '-Os -g'), with a negligible increase in the size of
the .debug_loc section.

llvm-svn: 325660
2018-02-21 01:55:33 +00:00
Sanjay Patel e6143904b9 revert r325515: [TTI CostModel] change default cost of FP ops to 1 (PR36280)
There are too many perf regressions resulting from this, so we need to 
investigate (and add tests for) targets like ARM and AArch64 before 
trying to reinstate.

llvm-svn: 325658
2018-02-21 01:42:52 +00:00
Sanjay Patel 6f716a7c5e [InstCombine] C / -X --> -C / X
We already do this in DAGCombiner, but it should
also be good to eliminate the fsub use in IR.

This is similar to rL325648.

llvm-svn: 325649
2018-02-21 00:01:45 +00:00
Sanjay Patel d8dd0151fc [InstCombine] -X / C --> X / -C for FP
We already do this in DAGCombiner, but it should 
also be good to eliminate the fsub use in IR.

llvm-svn: 325648
2018-02-20 23:51:16 +00:00
Sanjay Patel 8357371861 [InstCombine] add tests for fdiv with negated op and constant op; NFC
llvm-svn: 325644
2018-02-20 23:34:43 +00:00
Sanjay Patel 3e569ac0cc [PatternMatch] allow vector matches with m_FNeg
llvm-svn: 325642
2018-02-20 23:29:05 +00:00
Sanjoy Das 737fa40ffa [DSE] Don't DSE stores that subsequent memmove calls read from
Summary:
We used to remove the first memmove in cases like this:

  memmove(p, p+2, 8);
  memmove(p, p+2, 8);

which is incorrect.  Fix this by changing isPossibleSelfRead to what was most
likely the intended behavior.

Historical note: the buggy code was added in https://reviews.llvm.org/rL120974
to address PR8728.

Reviewers: rsmith

Subscribers: mcrosier, llvm-commits, jlebar

Differential Revision: https://reviews.llvm.org/D43425

llvm-svn: 325641
2018-02-20 23:19:34 +00:00
Sanjay Patel 4f65e0d008 [InstCombine] auto-generate full checks; NFC
llvm-svn: 325639
2018-02-20 23:08:47 +00:00
Sanjay Patel 088f4690f5 [InstCombine] add test for vector -X/-Y; NFC
m_FNeg doesn't match vector types.

llvm-svn: 325637
2018-02-20 22:46:38 +00:00
Benjamin Kramer 1516dd70bb Fix broken test from r325630.
llvm-svn: 325634
2018-02-20 22:30:16 +00:00
Benjamin Kramer fd0630665b [MemoryBuiltins] Check nobuiltin status when identifying calls to free.
This is usually not a problem because this code's main purpose is
eliminating unused new/delete pairs. We got deletes of nullptr or
nobuiltin deletes of builtin new wrong though.

llvm-svn: 325630
2018-02-20 22:00:33 +00:00
Sanjay Patel e29caaa9c5 [PatternMatch] enhance m_SignMask() to ignore undef elements in vectors
llvm-svn: 325623
2018-02-20 21:02:40 +00:00
Sanjay Patel ff7b777bbe [InstSimplify] add tests for m_SignMask with undef vector elements; NFC
llvm-svn: 325622
2018-02-20 20:53:35 +00:00
Alexey Bataev 42bcec7d38 [LV] Fix test checks, NFC.
llvm-svn: 325617
2018-02-20 19:49:25 +00:00
Alexey Bataev 47dfd249f0 [SLP] Fix tests checks, NFC.
llvm-svn: 325605
2018-02-20 18:11:50 +00:00
Sanjay Patel 90f4c8ec29 [InstCombine] fold fdiv with non-splat divisor to fmul: X/C --> X * (1/C)
llvm-svn: 325590
2018-02-20 16:08:15 +00:00
Sanjay Patel 1d14779aed [InstCombine] allow fdiv with constant dividend folds with less than full -ffast-math
It's possible that we could allow this either 'arcp' or 'reassoc' alone, but this
should be conservatively better than what we have right now. GCC allows this with
only -freciprocal-math.

The last test is changed to show a case that is expected to fold, but we need D43398.

llvm-svn: 325533
2018-02-19 21:46:52 +00:00
Sanjay Patel e82cc6fcc5 [InstCombine] move fdiv tests; NFC
Also, use vector constants just to prove that already works.

llvm-svn: 325530
2018-02-19 21:13:39 +00:00
Sanjay Patel 3e8a76abfd [TTI CostModel] change default cost of FP ops to 1 (PR36280)
This change was mentioned at least as far back as:
https://bugs.llvm.org/show_bug.cgi?id=26837#c26
...and I found a real program that is harmed by this: 
Himeno running on AMD Jaguar gets 6% slower with SLP vectorization:
https://bugs.llvm.org/show_bug.cgi?id=36280
...but the change here appears to solve that bug only accidentally.

The div/rem costs for x86 look very wrong in some cases, but that's already true, 
so we can fix those in follow-up patches. There's also evidence that more cost model
changes are needed to solve SLP problems as shown in D42981, but that's an independent 
problem (though the solution may be adjusted after this change is made).

Differential Revision: https://reviews.llvm.org/D43079

llvm-svn: 325515
2018-02-19 16:11:44 +00:00
Ivan A. Kosarev f03f579d1d [Transforms] Propagate new-format TBAA tags on simplification of memory-transfer intrinsics
With this patch in place, when a new-format TBAA tag is available
for a memory-transfer intrinsic call, we prefer propagating that
new-format tag. Otherwise, we fallback to the old approach where
we try to construct a proper TBAA access tag from 'tbaa.struct'
metadata.

Differential Revision: https://reviews.llvm.org/D41543

llvm-svn: 325488
2018-02-19 12:10:20 +00:00
Sanjay Patel adf6e88c74 [PatternMatch, InstSimplify] enhance m_AllOnes() to ignore undef elements in vectors
Loosening the matcher definition reveals a subtle bug in InstSimplify (we should not
assume that because an operand constant matches that it's safe to return it as a result).

So I'm making that change here too (that diff could be independent, but I'm not sure how 
to reveal it before the matcher change).

This also seems like a good reason to *not* include matchers that capture the value.
We don't want to encourage the potential misstep of propagating undef values when it's
not allowed/intended.

I didn't include the capture variant option here or in the related rL325437 (m_One), 
but it already exists for other constant matchers.

llvm-svn: 325466
2018-02-18 18:05:08 +00:00
Sanjay Patel 7faceaed31 [InstSimplify] add tests with vector undef elts; NFC
llvm-svn: 325465
2018-02-18 17:39:09 +00:00
Sanjay Patel f569578373 [PatternMatch] enhance m_One() to ignore undef elements in vectors
llvm-svn: 325437
2018-02-17 16:00:42 +00:00
Sanjay Patel a6a1426cf1 [InstSimplify, InstCombine] add tests with vector undef elts; NFC
These would fold if the m_One pattern matcher accounted for undef elts.

llvm-svn: 325436
2018-02-17 15:55:40 +00:00
Sanjay Patel 841ca95219 [InstSimplify] add vector select tests with undef elts in condition; NFC
llvm-svn: 325419
2018-02-17 01:18:53 +00:00
Sanjay Patel 870fbda805 [InstCombine] add FMF to better show current fdiv fold behavior; NFC
llvm-svn: 325365
2018-02-16 17:46:50 +00:00
Eugene Leviant 8c83b9b8c5 [ThinLTO] Fix data race in test #2
Switched to the right option (-thinlto-threads)

llvm-svn: 325362
2018-02-16 17:25:03 +00:00
Eugene Leviant c9724d9149 [ThinLTO] Fix data race in test
llvm-svn: 325361
2018-02-16 16:56:33 +00:00
Brian M. Rzycki f1a7df5ef2 [JumpThreading] PR36133 enable/disable DominatorTree for LVI analysis
Summary:
The LazyValueInfo pass caches a copy of the DominatorTree when available.
Whenever there are pending DominatorTree updates within JumpThreading's
DeferredDominance object we cannot use the cached DT for LVI analysis.
This commit adds the new methods enableDT() and disableDT() to LVI.
JumpThreading also sets the appropriate usage model before calling LVI
analysis methods.

Fixes https://bugs.llvm.org/show_bug.cgi?id=36133

Reviewers: sebpop, dberlin, kuhar

Reviewed by: sebpop, kuhar

Subscribers: uabelho, llvm-commits, aprantl, hiraditya, a.elovikov

Differential Revision: https://reviews.llvm.org/D42717

llvm-svn: 325356
2018-02-16 16:35:17 +00:00
Ivan A. Kosarev 53270d0fa6 [Transforms] Propagate TBAA info in SROA
Now that we have the new TBAA metadata format that is capable of
representing accesses to aggregates, we can propagate TBAA access
tags from memory setting and transferring intrinsics to load and
store instructions and vice versa.

Since SROA produces lots of new loads and stores on optimized
builds, this change significantly decreases the share of
undecorated memory accesses on such builds.

Differential Revision: https://reviews.llvm.org/D41563

llvm-svn: 325329
2018-02-16 10:10:29 +00:00
Eugene Leviant 7331a0bf1c [ThinLTO] Import global variables
Differential revision: https://reviews.llvm.org/D43077

llvm-svn: 325320
2018-02-16 08:11:04 +00:00
Vedant Kumar 3dc6de619a Remove brittle check lines from a test, NFC
llvm-svn: 325310
2018-02-16 01:21:01 +00:00
Vedant Kumar 616fdb00df [GVN] Partially revert debug info salvage change (r325063)
In r325063, we salvaged debug values from dying instructions in
GVN::processBlock() and GVN::performScalarPRE().

The change in performScalarPRE(), while correct, is unhelpful. It
introduced a call to salvageDebugInfo() which was immediately followed
by a RAUW, meaning it prevented the RAUW from efficiently updating
dbg.value intrinsics.  This commit reverts the mistake and tightens up
the affected test case.

llvm-svn: 325308
2018-02-16 01:15:20 +00:00
Vedant Kumar 1df820ecd7 [DCE] Salvage debug info from dead insts
This results in small increases in the size of the .debug_loc section
and the number of unique source variables in a stage2 build of opt.

llvm-svn: 325301
2018-02-15 22:26:18 +00:00
Brian Gesiak a5e3675bd3 [Coroutines] Don't move stores for allocator args
Summary:
The behavior described in Coroutines TS `[dcl.fct.def.coroutine]/7`
allows coroutine parameters to be passed into allocator functions.
The instructions to store values into the alloca'd parameters must not
be moved past the frame allocation, otherwise uninitialized values are
passed to the allocator.

Test Plan: `check-llvm`

Reviewers: rsmith, GorNishanov, eric_niebler

Reviewed By: GorNishanov

Subscribers: compnerd, EricWF, llvm-commits

Differential Revision: https://reviews.llvm.org/D43000

llvm-svn: 325285
2018-02-15 19:31:45 +00:00
Vedant Kumar 24953dc876 [SCCP] Test that constant propagation updates debug info, NFC
This extends an existing test to check that SCCP updates the operands of
relevant dbg.value instructions as it does its work.

llvm-svn: 325281
2018-02-15 19:13:04 +00:00
Alexey Bataev 862c476fc2 [SLP] Fix the test for the reversed stores, NFC.
llvm-svn: 325268
2018-02-15 17:11:50 +00:00
Alexey Bataev ac619599d8 [SLP] Added test for reversed stores, NFC.
llvm-svn: 325265
2018-02-15 16:56:49 +00:00
Sanjay Patel 9174416e89 [InstCombine] test fdiv folds better; NFC
We had redundant tests, but no tests for extra uses or vectors.
'fast' is an overly conservative requirement for these folds.

llvm-svn: 325262
2018-02-15 16:28:15 +00:00
Sanjay Patel 339b4d338d [InstCombine] allow sin/cos transforms with 'reassoc'
The variable name 'AllowReassociate' is a lie at this point because
it's set to 'isFast()' which is more than the 'reassoc' FMF after
rL317488.

In D41286, we showed that this transform may be valid even with strict
math by brute force checking every 32-bit float result.

There's a potential problem here because we're replacing with a tan()
libcall rather than a hypothetical LLVM tan intrinsic. So we might
set errno when we should be guaranteed not to do that. But that's
independent of this change.

llvm-svn: 325247
2018-02-15 15:07:12 +00:00
Sanjay Patel 6a0f667077 [InstCombine] allow X / C -> X * (1.0/C) for vector splat FP constants
llvm-svn: 325237
2018-02-15 13:55:52 +00:00
Max Kazantsev 6e4ce23add [NFC] Fix metadata placement in test
llvm-svn: 325215
2018-02-15 07:13:18 +00:00
Max Kazantsev c5941d12f4 [SCEV] Favor isKnownViaSimpleReasoning over constant ranges check
There is a more powerful but still simple function `isKnownViaSimpleReasoning ` that
does constant range check and few more additional checks. We use it some places (e.g.
when proving implications) and in some other places we only check constant ranges.

Currently, indvar simplifier fails to remove the check in following loop:

  int inc = ...;
  for (int i = inc, j = inc - 1; i < 200; ++i, ++j)
    if (i > j) { ... }

This patch replaces all usages of `isKnownPredicateViaConstantRanges` with
`isKnownViaSimpleReasoning` to have smarter proofs. In particular, it fixes the
case above.

Reviewed-By: sanjoy
Differential Revision: https://reviews.llvm.org/D43175

llvm-svn: 325214
2018-02-15 07:09:00 +00:00
Sanjay Patel af5d499cb9 [InstCombine] add tests and comments for fdiv X, C; NFC
llvm-svn: 325161
2018-02-14 19:54:51 +00:00
Craig Topper 1c19cc1745 [InstCombine] Don't fold select(C, Z, binop(select(C, X, Y), W)) -> select(C, Z, binop(Y, W)) if the binop is rem or div.
The select may have been preventing a division by zero or INT_MIN/-1 so removing it might not be safe.

Fixes PR36362.

Differential Revision: https://reviews.llvm.org/D43276

llvm-svn: 325148
2018-02-14 18:08:33 +00:00
Sanjay Patel 48f671bb27 [InstCombine] regenerate checks; NFC
llvm-svn: 325144
2018-02-14 17:37:32 +00:00
Alexey Bataev 7f246e003a [SLP] Allow vectorization of reversed loads.
Summary:
Reversed loads are handled as gathering. But we can just reshuffle
these values. Patch adds support for vectorization of reversed loads.

Reviewers: RKSimon, spatel, mkuper, hfinkel

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D43022

llvm-svn: 325134
2018-02-14 15:29:15 +00:00
Florian Hahn b4e3bad89b Recommit r325001: [CallSiteSplitting] Support splitting of blocks with instrs before call.
For basic blocks with instructions between the beginning of the block
and a call we have to duplicate the instructions before the call in all
split blocks and add PHI nodes for uses of the duplicated instructions
after the call.

Currently, the threshold for the number of instructions before a call
is quite low, to keep the impact on binary size low.

Reviewers: junbuml, mcrosier, davidxl, davide

Reviewed By: junbuml

Differential Revision: https://reviews.llvm.org/D41860

llvm-svn: 325126
2018-02-14 13:59:12 +00:00
Florian Hahn c6296fea3f [LoopInterchange] Incrementally update the dominator tree.
We can use incremental dominator tree updates to avoid re-calculating
the dominator tree after interchanging 2 loops.

Reviewers: dmgreen, kuhar

Reviewed By: kuhar

Differential Revision: https://reviews.llvm.org/D43176

llvm-svn: 325122
2018-02-14 13:13:15 +00:00
Petar Jovanovic 1768957c82 [Utils] Salvage the debug info of DCE'ed 'and' instructions
Preserve debug info from a dead 'and' instruction with a constant.

Patch by Djordje Todorovic.

Differential Revision: https://reviews.llvm.org/D43163

llvm-svn: 325119
2018-02-14 13:10:35 +00:00
Elena Demikhovsky 945b7e5aa6 Adding a width of the GEP index to the Data Layout.
Making a width of GEP Index, which is used for address calculation, to be one of the pointer properties in the Data Layout.
p[address space]:size:memory_size:alignment:pref_alignment:index_size_in_bits.
The index size parameter is optional, if not specified, it is equal to the pointer size.

Till now, the InstCombiner normalized GEPs and extended the Index operand to the pointer width.
It works fine if you can convert pointer to integer for address calculation and all registered targets do this.
But some ISAs have very restricted instruction set for the pointer calculation. During discussions were desided to retrieve information for GEP index from the Data Layout.
http://lists.llvm.org/pipermail/llvm-dev/2018-January/120416.html

I added an interface to the Data Layout and I changed the InstCombiner and some other passes to take the Index width into account.
This change does not affect any in-tree target. I added tests to cover data layouts with explicitly specified index size.

Differential Revision: https://reviews.llvm.org/D42123

llvm-svn: 325102
2018-02-14 06:58:08 +00:00
Sanjay Patel 3ce76ad26f [InstCombine] put tests of mul with neg operand(s) together; NFC
llvm-svn: 325066
2018-02-13 23:02:12 +00:00
Vedant Kumar 1d5d31b706 [GVN] Salvage debug info from dead insts
This preserves an additional 581 unique source variables in a stage2
build of clang (according to `llvm-dwarfdump --statistics`). It
increases the size of the .debug_loc section by 0.1% (or 87139 bytes).

Differential Revision: https://reviews.llvm.org/D43255

llvm-svn: 325063
2018-02-13 22:27:17 +00:00
Sanjay Patel 7558d860af [InstCombine] (lshr X, 31) * Y --> (ashr X, 31) & Y
This replaces the bit-tracking based fold that did the same thing,
but it only worked for scalars and not directly. 

There is no evidence in existing regression tests that the greater 
power of bit-tracking was needed here, but we should be aware of 
this potential loss of optimization.

llvm-svn: 325062
2018-02-13 22:24:37 +00:00
Sanjay Patel fdb3b036cc [InstCombine] add vector tests, fix comments; NFC
The scalar folds are done indirectly and use potentially
expensive value tracking calls. That can be improved
along with the enhancement to support vector types.

llvm-svn: 325051
2018-02-13 21:19:42 +00:00
Sanjay Patel cb8ac00f73 [InstCombine] (bool X) * Y --> X ? Y : 0
This is both a functional improvement for vectors and an
efficiency improvement for scalars. The existing code below
the new folds does the same thing for scalars, but in an 
indirect and expensive way.

llvm-svn: 325048
2018-02-13 20:41:22 +00:00