Commit Graph

1214 Commits

Author SHA1 Message Date
Eli Friedman d0e6ae5678 Revert r300746 (SCEV analysis for or instructions).
There have been multiple reports of this causing problems: a
compile-time explosion on the LLVM testsuite, and a stack
overflow for an opencl kernel.

llvm-svn: 300928
2017-04-20 23:59:05 +00:00
Eli Friedman e77d2b86b4 [SCEV] Make SCEV or modeling more aggressive.
Use haveNoCommonBitsSet to figure out whether an "or" instruction
is equivalent to addition. This handles more cases than just
checking for a constant on the RHS.

Differential Revision: https://reviews.llvm.org/D32239

llvm-svn: 300746
2017-04-19 20:19:58 +00:00
Matt Arsenault 4c1ecded63 AMDGPU: Change DivergenceAnalysis for function arguments
Stop assuming all functions are kernels.

llvm-svn: 300719
2017-04-19 17:42:34 +00:00
Serguei Katkov 2616bbb16d [BPI] Use metadata info before any other heuristics
Metadata potentially is more precise than any heuristics we use, so
it makes sense to use first metadata info if it is available. However it makes
sense to examine it against other strong heuristics like unreachable one.
If edge coming to unreachable block has higher probability then it is expected 
by unreachable heuristic then we use heuristic and remaining probability is
distributed among other reachable blocks equally.

An example where metadata might be more strong then unreachable heuristic is
as follows: it is possible that there are two branches and for the branch A
metadata says that its probability is (0, 2^25). For the branch B
the probability is (1, 2^25).
So the expectation is that first edge of B is hotter than first edge of A
because first edge of A did not executed at least once.
If first edge of A points to the unreachable block then using the unreachable
heuristics we'll set the probability for A to (1, 2^20) and now edge of A
becomes hotter than edge of B.
This is unexpected behavior.

This fixed the biggest part of https://bugs.llvm.org/show_bug.cgi?id=32214

Reviewers: sanjoy, junbuml, vsk, chandlerc

Reviewed By: chandlerc

Subscribers: llvm-commits, reames, davidxl

Differential Revision: https://reviews.llvm.org/D30631

llvm-svn: 300440
2017-04-17 04:33:04 +00:00
Brian Gesiak 0a7894d99c [Analysis] Support bitreverse in -demanded-bits pass
Summary:
* Add a bitreverse case in the demanded bits analysis pass.
* Add tests for the bitreverse (and bswap) intrinsic in the
  demanded bits pass.
* Add a test case to the BDCE tests: that manipulations to
  high-order bits are eliminated once the bits are reversed
  and then right-shifted.

Reviewers: mkuper, jmolloy, hfinkel, trentxintong

Reviewed By: jmolloy

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D31857

llvm-svn: 300215
2017-04-13 16:44:25 +00:00
Piotr Padlewski 04aee46779 Remove readnone from invariant.group.barrier
Summary:
Readnone attribute would cause CSE of two barriers with
the same argument, which is invalid by example:

    struct Base {
          virtual int foo() { return 42; }
    };

    struct Derived1 : Base {
          int foo() override { return 50; }
    };

    struct Derived2 : Base {
          int foo() override { return 100; }
    };

    void foo() {
        Base *x = new Base{};
        new (x) Derived1{};
        int a = std::launder(x)->foo();
        new (x) Derived2{};
        int b = std::launder(x)->foo();
    }

Here 2 calls of std::launder will produce @llvm.invariant.group.barrier,
which would be merged into one call, causing devirtualization
to devirtualize second call into Derived1::foo() instead of
Derived2::foo()

Reviewers: chandlerc, dberlin, hfinkel

Subscribers: llvm-commits, rsmith, amharc

Differential Revision: https://reviews.llvm.org/D31531

llvm-svn: 300101
2017-04-12 20:45:12 +00:00
Renato Golin ab85113a93 [SystemZ] Fix target specific tests
llvm-svn: 300078
2017-04-12 17:14:46 +00:00
Jonas Paulsson 9b4875434e [SystemZ] Updated test fp-cast.ll
This did not get included in the previous commit for SystemZ cost functions.

llvm-svn: 300053
2017-04-12 12:11:41 +00:00
Jonas Paulsson fccc7d66c3 [SystemZ] TargetTransformInfo cost functions implemented.
getArithmeticInstrCost(), getShuffleCost(), getCastInstrCost(),
getCmpSelInstrCost(), getVectorInstrCost(), getMemoryOpCost(),
getInterleavedMemoryOpCost() implemented.

Interleaved access vectorization enabled.

BasicTTIImpl::getCastInstrCost() improved to check for legal extending loads,
in which case the cost of the z/sext instruction becomes 0.

Review: Ulrich Weigand, Renato Golin.
https://reviews.llvm.org/D29631

llvm-svn: 300052
2017-04-12 11:49:08 +00:00
Serguei Katkov ecebc3db72 [BPI] Refactor post domination calculation and simple fix for ColdCall
Collection of PostDominatedByUnreachable and PostDominatedByColdCall have been
split out of heuristics itself. Update of the data happens now for each basic
block (before update for PostDominatedByColdCall might be skipped if
unreachable or matadata heuristic handled this basic block).

This separation allows re-ordering of heuristics without loosing
the post-domination information.

Reviewers: sanjoy, junbuml, vsk, chandlerc, reames

Reviewed By: chandlerc

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D31701

llvm-svn: 300029
2017-04-12 05:42:14 +00:00
Daniel Berlin 554dcd8c89 MemorySSA: Move to Analysis, from Transforms/Utils. It's used as
Analysis, it has Analysis passes, and once NewGVN is made an Analysis,
this removes the cross dependency from Analysis to Transform/Utils.
NFC.

llvm-svn: 299980
2017-04-11 20:06:36 +00:00
Matt Arsenault f10061ec70 Add address space mangling to lifetime intrinsics
In preparation for allowing allocas to have non-0 addrspace.

llvm-svn: 299876
2017-04-10 20:18:21 +00:00
Max Kazantsev 2e44d2969a [ScalarEvolution] Re-enable Predicate implication from operations
The patch rL298481 was reverted due to crash on clang-with-lto-ubuntu build.
The reason of the crash was type mismatch between either a or b and RHS in the following situation:

  LHS = sext(a +nsw b) > RHS.

This is quite rare, but still possible situation. Normally we need to cast all {a, b, RHS} to their widest type.
But we try to avoid creation of new SCEV that are not constants to avoid initiating recursive analysis that
can take a lot of time and/or cache a bad value for iterations number. To deal with this, in this patch we
reject this case and will not try to analyze it if the type of sum doesn't match with the type of RHS. In this
situation we don't need to create any non-constant SCEVs.

This patch also adds an assertion to the method IsProvedViaContext so that we could fail on it and not
go further into range analysis etc (because in some situations these analyzes succeed even when the passed
arguments have wrong types, what should not normally happen).

The patch also contains a fix for a problem with too narrow scope of the analysis caused by wrong
usage of predicates in recursive invocations.

The regression test on the said failure: test/Analysis/ScalarEvolution/implied-via-addition.ll

Reviewers: reames, apilipenko, anna, sanjoy

Reviewed By: sanjoy

Subscribers: mzolotukhin, mehdi_amini, llvm-commits

Differential Revision: https://reviews.llvm.org/D31238

llvm-svn: 299205
2017-03-31 12:05:30 +00:00
Matt Arsenault 79f837c254 AMDGPU: Add all atomicrmw fields to atomic.inc/dec
Add scope, order, isVolatile

llvm-svn: 299122
2017-03-30 22:21:40 +00:00
Max Kazantsev 7696a7edf9 Revert "[ScalarEvolution] Re-enable Predicate implication from operations"
This reverts commit rL298690

Causes failures on clang.

llvm-svn: 298693
2017-03-24 07:04:31 +00:00
Max Kazantsev 89554446e7 [ScalarEvolution] Re-enable Predicate implication from operations
The patch rL298481 was reverted due to crash on clang-with-lto-ubuntu build.
The reason of the crash was type mismatch between either a or b and RHS in the following situation:

  LHS = sext(a +nsw b) > RHS.

This is quite rare, but still possible situation. Normally we need to cast all {a, b, RHS} to their widest type.
But we try to avoid creation of new SCEV that are not constants to avoid initiating recursive analysis that
can take a lot of time and/or cache a bad value for iterations number. To deal with this, in this patch we
reject this case and will not try to analyze it if the type of sum doesn't match with the type of RHS. In this
situation we don't need to create any non-constant SCEVs.

This patch also adds an assertion to the method IsProvedViaContext so that we could fail on it and not
go further into range analysis etc (because in some situations these analyzes succeed even when the passed
arguments have wrong types, what should not normally happen).

The patch also contains a fix for a problem with too narrow scope of the analysis caused by wrong
usage of predicates in recursive invocations.

The regression test on the said failure: test/Analysis/ScalarEvolution/implied-via-addition.ll

llvm-svn: 298690
2017-03-24 06:19:00 +00:00
Zhaoshi Zheng e3c9070f06 Model ashr(shl(x, n), m) as mul(x, 2^(n-m)) when n > m
Given below case:

  %y = shl %x, n
  %z = ashr %y, m

when n = m, SCEV models it as sext(trunc(x)). This patch tries to handle
the case where n > m by using sext(mul(trunc(x), 2^(n-m)))) as the SCEV
expression.

llvm-svn: 298631
2017-03-23 18:06:09 +00:00
Anna Thomas e27b39a976 [LVI] Add an LVI printer pass to capture test LVI cache after transformations
Summary:
Adding a printer pass for printing the LVI cache values after transformations
that use LVI.
This will help us in identifying cases where LVI
invariants are violated, or transforms that leave LVI in an incorrect state.
Right now, I have added two test cases to show that the printer pass is working.
I will be adding more test cases in a later change, once this change is
checked in upstream.

Reviewers: reames, dberlin, sanjoy, apilipenko

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D30790

llvm-svn: 298542
2017-03-22 19:27:12 +00:00
Max Kazantsev c6effaa495 Revert "[ScalarEvolution] Predicate implication from operations"
This reverts commit rL298481

Fails clang-with-lto-ubuntu build.

llvm-svn: 298489
2017-03-22 07:50:33 +00:00
Max Kazantsev 15e76aa0f8 [ScalarEvolution] Predicate implication from operations
This patch allows SCEV predicate analysis to prove implication of some expression predicates
from context predicates related to arguments of those expressions.
It introduces three new rules:

For addition:
  (A >X && B >= 0) || (B >= 0 && A > X) ===> (A + B) > X.

For division:
  (A > X) && (0 < B <= X + 1) ===> (A / B > 0).
  (A > X) && (-B <= X < 0) ===> (A / B >= 0).

Using these rules, SCEV is able to prove facts like "if X > 1 then X / 2 > 0".
They can also be combined with the same context, to prove more complex expressions like
"if X > 1 then X/2 + 1 > 1".

Diffirential Revision: https://reviews.llvm.org/D30887

Reviewed by: sanjoy

llvm-svn: 298481
2017-03-22 04:48:46 +00:00
Matt Arsenault 3dbeefa978 AMDGPU: Mark all unspecified CC functions in tests as amdgpu_kernel
Currently the default C calling convention functions are treated
the same as compute kernels. Make this explicit so the default
calling convention can be changed to a non-kernel.

Converted with perl -pi -e 's/define void/define amdgpu_kernel void/'
on the relevant test directories (and undoing in one place that actually
wanted a non-kernel).

llvm-svn: 298444
2017-03-21 21:39:51 +00:00
David Green da21170c49 [ConstantFolding] Fix to prevent constant folding having to repeatedly scan operands. NFCI
After the loop unroll threshold was increased in r295538, very
large constant expressions can be created. This prevents them
from having to be recursively scanned, leading to a compile
time blow-up.

Differential Revision: https://reviews.llvm.org/D30689

llvm-svn: 298356
2017-03-21 10:17:39 +00:00
Eli Friedman b1578d3612 [SCEV] Fix trip multiple calculation
If loop bound containing calculations like min(a,b), the Scalar
Evolution API getSmallConstantTripMultiple returns 4294967295 "-1"
as the trip multiple. The problem is that, SCEV use -1 * umax to
represent umin. The multiple constant -1 was returned, and the logic
of guarding against huge trip counts was skipped. Because -1 has 32
active bits.

The fix attempt to factor more general cases. First try to get the
greatest power of two divisor of trip count expression. In case
overflow happens, the trip count expression is still divisible by the
greatest power of two divisor returned. Returns 1 if not divisible by 2.

Patch by Huihui Zhang <huihuiz@codeaurora.org>

Differential Revision: https://reviews.llvm.org/D30840

llvm-svn: 298301
2017-03-20 20:25:46 +00:00
Michael Zolotukhin 99de88d1f3 [SCEV] Compute affine range in another way to avoid bitwidth extending.
Summary:
This approach has two major advantages over the existing one:
1. We don't need to extend bitwidth in our computations. Extending
bitwidth is a big issue for compile time as we often end up working with
APInts wider than 64bit, which is a slow case for APInt.
2. When we zero extend a wrapped range, we lose some information (we
replace the range with [0, 1 << src bit width)). Thus, avoiding such
extensions better preserves information.

Correctness testing:
I ran 'ninja check' with assertions that the new implementation of
getRangeForAffineAR gives the same results as the old one (this
functionality is not present in this patch). There were several failures
- I inspected them manually and found out that they all are caused by
the fact that we're returning more accurate results now (see bullet (2)
above).
Without such assertions 'ninja check' works just fine, as well as
SPEC2006.

Compile time testing:
CTMark/Os:
 - mafft/pairlocalalign	-16.98%
 - tramp3d-v4/tramp3d-v4	-12.72%
 - lencod/lencod	-11.51%
 - Bullet/bullet	-4.36%
 - ClamAV/clamscan	-3.66%
 - 7zip/7zip-benchmark	-3.19%
 - sqlite3/sqlite3	-2.95%
 - SPASS/SPASS	-2.74%
 - Average	-5.81%

Performance testing:
The changes are expected to be neutral for runtime performance.

Reviewers: sanjoy, atrick, pete

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D30477

llvm-svn: 297992
2017-03-16 21:07:38 +00:00
Jonas Paulsson facc4c5c7b [BasicTTIImpl] Bugfix in getIntrinsicInstrCost()
Don't call getScalarizationOverhead(RetTy, true, false) if RetTy is void type.

Review: Hal Finkel
https://reviews.llvm.org/D31024

llvm-svn: 297954
2017-03-16 14:05:34 +00:00
Simon Pilgrim 06c70adcf0 [X86] Add missing BITREVERSE costs for SSE2 vectors and i8/i16/i32/i64 scalars
Prep work for PR31810

llvm-svn: 297876
2017-03-15 19:34:55 +00:00
Jonas Paulsson a48ea231c0 [TargetTransformInfo] getIntrinsicInstrCost() scalarization estimation improved
getIntrinsicInstrCost() used to only compute scalarization cost based on types.
This patch improves this so that the actual arguments are checked when they are
available, in order to handle only unique non-constant operands.

Tests updates:

Analysis/CostModel/X86/arith-fp.ll
Transforms/LoopVectorize/AArch64/interleaved_cost.ll
Transforms/LoopVectorize/ARM/interleaved_cost.ll

The improvement in getOperandsScalarizationOverhead() to differentiate on
constants made it necessary to update the interleaved_cost.ll tests even
though they do not relate to intrinsics.

Review: Hal Finkel
https://reviews.llvm.org/D29540

llvm-svn: 297705
2017-03-14 06:35:36 +00:00
Javed Absar 382f98733a [ConstantFold] Fix defect in constant folding computation for GEP
When the array indexes are all determined by GVN to be constants,
a call is made to constant-folding to optimize/simplify the address
computation.

The constant-folding, however, makes a mistake in that it sometimes reads
back stale Idxs instead of NewIdxs, that it re-computed in previous iteration.
This leads to incorrect addresses coming out of constant-folding to GEP.
A test case is included. The error is only triggered when indexes have particular
patterns that the stale/new index updates interplay matters.

Reviewers: Daniel Berlin
Differential Revision: https://reviews.llvm.org/D30642

llvm-svn: 297317
2017-03-08 23:01:50 +00:00
Tobias Grosser 4c384b3eb1 Fix minor typo introduce in r297014
llvm-svn: 297020
2017-03-06 16:03:26 +00:00
Tobias Grosser 6b31b15b5a New Test-Case for Region Analysis
While working on improvements to region info analysis, this test case caused an
incorrect region bb2 => bb3 to be detected.

Reviewers: grosser

Contributed-by: Nandini Singhal <cs15mtech01004@iith.ac.in>

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D30652

llvm-svn: 297014
2017-03-06 15:20:38 +00:00
Tobias Grosser 495ad0b2a0 New Test-Case for Region Analysis
While working on improvements to the region info analysis, this test case caused
an incorrect region 1 => 2 to be detected.  It is incorrect because entry has an
outgoing edge to 3.  This is interesting because 1 dom 2 and 2 pdom 1, which
should have been enough to prevent incoming forward edges into the region and
outgoing forward edges from the region.

Reviewers: grosser

Subscribers: llvm-commits

Contributed-by: Nandini Singhal <cs15mtech01004@iith.ac.in>

Differential Revision: https://reviews.llvm.org/D30603

llvm-svn: 296988
2017-03-05 14:08:28 +00:00
Tobias Grosser f818c3300b Revert "Fix PR 24415 (at least), by making our post-dominator tree behavior sane."
and also "clang-format GenericDomTreeConstruction.h, since the current
formatting makes it look like their is a bug in the loop indentation, and there
is not"

This reverts commit r296535.

There are still some open design questions which I would like to discuss. I
revert this for Daniel (who gave the OK), as he is on vacation.

llvm-svn: 296812
2017-03-02 21:08:37 +00:00
Igor Laevsky 37cba43604 [BasicAA] Take attributes into account when requesting modref info for a call site
Differential Revision: https://reviews.llvm.org/D29989

llvm-svn: 296617
2017-03-01 13:19:51 +00:00
Daniel Berlin 03f6938edc Fix PR 24415 (at least), by making our post-dominator tree behavior sane.
Summary:
Currently, our post-dom tree tries to ignore and remove the effects of
infinite loops.  It fails miserably at this, because it tries to do it
ahead of time, and thus can only detect self-loops, and any other type
of infinite loop, it pretends doesn't exist at all.

This can, in a bunch of cases, lead to wrong answers and a completely
empty post-dom tree.

Wrong answer:

```
declare void foo()
define internal void @f() {
entry:
  br i1 undef, label %bb35, label %bb3.i

bb3.i:
  call void @foo()
  br label %bb3.i

bb35.loopexit3:
  br label %bb35

bb35:
  ret void
}
```
We get:
```
Inorder PostDominator Tree:
  [1]  <<exit node>> {0,7}
    [2] %bb35 {1,6}
      [3] %bb35.loopexit3 {2,3}
      [3] %entry {4,5}
```

This is a trivial modification of the testcase for PR 6047
Note that we pretend bb3.i doesn't exist.
We also pretend that bb35 post-dominates entry.

While it's true that it does not exit in a theoretical sense, it's not
really helpful to try to ignore the effect and pretend that bb35
post-dominates entry.  Worse, we pretend the infinite loop does
nothing (it's usually considered a side-effect), and doesn't even
exist, even when it calls a function.  Sadly, this makes it impossible
to use when you are trying to move code safely.  All compilers also
create virtual or real single exit nodes (including us), and connect
infinite loops there (which this patch does).  In fact, others have
worked around our behavior here, to the point of building their own
post-dom trees:
https://zneak.github.io/fcd/2016/02/17/structuring.html and pointing
out the region infrastructure is near-useless for them with postdom in
this state :(

Completely empty post-dom tree:
```
define void @spam() #0 {
bb:
  br label %bb1

bb1:                                              ; preds = %bb1, %bb
  br label %bb1

bb2:                                              ; No predecessors!
  ret void
}
```
Printing analysis 'Post-Dominator Tree Construction' for function 'foo':
=============================--------------------------------
Inorder PostDominator Tree:
  [1]  <<exit node>> {0,1}

:(

(note that even if you ignore the effects of infinite loops, bb2
should be present as an exit node that post-dominates nothing).

This patch changes post-dom to properly handle infinite loops and does
root finding during calculation to prevent empty tress in such cases.

We match gcc's (and the canonical theoretical) behavior for infinite
loops (find the backedge, connect it to the exit block).

Testcases coming as soon as i finish running this on a ton of random graphs :)

Reviewers: chandlerc, davide

Subscribers: bryant, llvm-commits

Differential Revision: https://reviews.llvm.org/D29705

llvm-svn: 296535
2017-02-28 22:57:50 +00:00
Sanjoy Das 5cd6c5cacf [ValueTracking] Make poison propagation more aggressive
Summary:
Motivation: fix PR31181 without regression (the actual fix is still in
progress).  However, the actual content of PR31181 is not relevant
here.

This change makes poison propagation more aggressive in the following
cases:

 1. poision * Val == poison, for any Val.  In particular, this changes
    existing intentional and documented behavior in these two cases:
     a. Val is 0
     b. Val is 2^k * N
 2. poison << Val == poison, for any Val
 3. getelementptr is poison if any input is poison

I think all of these are justified (and are axiomatically true in the
new poison / undef model):

1a: we need poison * 0 to be poison to allow transforms like these:

  A * (B + C) ==> A * B + A * C

If poison * 0 were 0 then the above transform could not be allowed
since e.g. we could have A = poison, B = 1, C = -1, making the LHS

  poison * (1 + -1) = poison * 0 = 0

and the RHS

  poison * 1 + poison * -1 = poison + poison = poison

1b: we need e.g. poison * 4 to be poison since we want to allow

  A * 4 ==> A + A + A + A

If poison * 4 were a value with all of their bits poison except the
last four; then we'd not be able to do this transform since then if A
were poison the LHS would only be "partially" poison while the RHS
would be "full" poison.

2: Same reasoning as (1b), we'd like have the following kinds
transforms be legal:

  A << 1 ==> A + A

Reviewers: majnemer, efriedma

Subscribers: mcrosier, llvm-commits

Differential Revision: https://reviews.llvm.org/D30185

llvm-svn: 295809
2017-02-22 06:52:32 +00:00
Guozhi Wei 7ec2c72095 [PPC] Give unaligned memory access lower cost on processor that supports it
Newer ppc supports unaligned memory access, it reduces the cost of unaligned memory access significantly. This patch handles this case in PPCTTIImpl::getMemoryOpCost.

This patch fixes pr31492.

Differential Revision: https://reviews.llvm.org/D28630

This is resubmit of r292680, which was reverted by r293092. The internal application failures were actually caused by a source code bug.

llvm-svn: 295506
2017-02-17 22:29:39 +00:00
Matt Arsenault d2c8a337aa AMDGPU: Remove SI_fs_constant and SI_fs_interp intrinsics
Update test uses with expansion in terms of new intrinsics.

llvm-svn: 295269
2017-02-16 02:01:13 +00:00
Igor Laevsky c11c1ed909 [SCEV] Cache results during GetMinTrailingZeros query
Differential Revision: https://reviews.llvm.org/D29759

llvm-svn: 295060
2017-02-14 15:53:12 +00:00
Sanjay Patel 97e4b98749 [ValueTracking] use nonnull argument attribute to eliminate null checks
Enhancing value tracking's analysis of null-ness was suggested in D27855, so here's a first attempt at that.

This is part of solving:
https://llvm.org/bugs/show_bug.cgi?id=28430

Differential Revision: https://reviews.llvm.org/D28204

llvm-svn: 294897
2017-02-12 15:35:34 +00:00
Dorit Nuzman eac89d736c [LV/LoopAccess] Check statically if an unknown dependence distance can be
proven larger than the loop-count

This fixes PR31098: Try to resolve statically data-dependences whose
compile-time-unknown distance can be proven larger than the loop-count, 
instead of resorting to runtime dependence checking (which are not always 
possible).

For vectorization it is sufficient to prove that the dependence distance 
is >= VF; But in some cases we can prune unknown dependence distances early,
and even before selecting the VF, and without a runtime test, by comparing 
the distance against the loop iteration count. Since the vectorized code 
will be executed only if LoopCount >= VF, proving distance >= LoopCount 
also guarantees that distance >= VF. This check is also equivalent to the 
Strong SIV Test.

Reviewers: mkuper, anemet, sanjoy

Differential Revision: https://reviews.llvm.org/D28044

llvm-svn: 294892
2017-02-12 09:32:53 +00:00
Michael Kuperstein e6d59fdca5 [X86] Add costs for non-AVX512 single-source permutation integer shuffles
Differential Revision: https://reviews.llvm.org/D29416

llvm-svn: 293932
2017-02-02 20:27:13 +00:00
Michael Kuperstein cb4ceeda7f [X86] Extend single-source shuffle cost test to test more arches. NFC.
llvm-svn: 293793
2017-02-01 18:09:47 +00:00
Eli Friedman 10d1ff64fe [SCEV] Simplify/generalize howFarToZero solving.
Make SolveLinEquationWithOverflow take the start as a SCEV, so we can
solve more cases. With that implemented, get rid of the special case
for powers of two.

The additional functionality probably isn't particularly useful,
but it might help a little for certain cases involving pointer
arithmetic.

Differential Revision: https://reviews.llvm.org/D28884

llvm-svn: 293576
2017-01-31 00:42:42 +00:00
Matt Arsenault 41c1499504 AMDGPU: Fix atomic_inc/atomic_dec + ds_swizzle not being divergent
llvm-svn: 293504
2017-01-30 17:09:47 +00:00
Mehdi Amini 1726fc698c Fix BasicAA incorrect assumption on GEP
This is fixing pr31761: BasicAA is deducing NoAlias
on the result of the GEP if the base pointer is itself NoAlias.

This is possible only if the NoAlias on the base pointer is
deduced with a non-sized query: this should guarantee that
the pointers are belonging to different memory allocation
and that the GEP can't legally jump from one to another.

Differential Revision: https://reviews.llvm.org/D29216

llvm-svn: 293293
2017-01-27 16:12:22 +00:00
Daniil Fukalov b09dac59fc [SCEV] Introduce add operation inlining limit
Inlining in getAddExpr() can cause abnormal computational time in some cases.
New parameter -scev-addops-inline-threshold is intruduced with default value 500.

Reviewers: sanjoy

Subscribers: mzolotukhin, llvm-commits

Differential Revision: https://reviews.llvm.org/D28812

llvm-svn: 293176
2017-01-26 13:33:17 +00:00
Daniel Jasper 65144c852d Revert "[PPC] Give unaligned memory access lower cost on processor that supports it"
This reverts commit r292680. It is causing significantly worse
performance and test timeouts in our internal builds. I have already
routed reproduction instructions your way.

llvm-svn: 293092
2017-01-25 21:21:08 +00:00
Chandler Carruth d501b18990 This test apparently requires an x86 target and is failing on numerous
bots ever since d0k fixed the CHECK lines so that it did something at
all.

It isn't actually testing SCEV directly but LSR, so move it into LSR and
the x86-specific tree of tests that already exists there. Target
dependence is common and unavoidable with the current design of LSR.

llvm-svn: 292774
2017-01-23 08:33:29 +00:00
Chandler Carruth a504f2b8e8 [PM] Teach LVI to correctly invalidate itself when its dependencies
become unavailable.

The AssumptionCache is now immutable but it still needs to respond to
DomTree invalidation if it ended up caching one.

This lets us remove one of the explicit invalidates of LVI but the
other one continues to avoid hitting a latent bug.

llvm-svn: 292769
2017-01-23 06:35:12 +00:00
Benjamin Kramer 1fd0d44e9b Attempt to fix test in release builds.
llvm-svn: 292762
2017-01-22 21:01:19 +00:00