Commit Graph

13521 Commits

Author SHA1 Message Date
Martin Storsjo dfc1aee25b Revert "[SLP] avoid reduction transform on patterns that the backend can load-combine"
This reverts SVN r373833, as it caused a failed assert "Non-zero loop
cost expected" on building numerous projects, see PR43582 for details
and reproduction samples.

llvm-svn: 373882
2019-10-07 08:21:37 +00:00
Sanjay Patel aab8b3ab9c [InstCombine] fold fneg disguised as select+fmul (PR43497)
Extends rL373230 and solves the motivating bug (although in a narrow way):
https://bugs.llvm.org/show_bug.cgi?id=43497

llvm-svn: 373851
2019-10-06 14:15:48 +00:00
Sanjay Patel 61c22a83de [InstCombine] add fast-math-flags for better test coverage; NFC
llvm-svn: 373848
2019-10-06 13:19:05 +00:00
Sanjay Patel c38881a6b7 [InstCombine] don't assume 'inbounds' for bitcast pointer to GEP transform (PR43501)
https://bugs.llvm.org/show_bug.cgi?id=43501
We can't declare a GEP 'inbounds' in general. But we may salvage that information if
we have known dereferenceable bytes on the source pointer.

Differential Revision: https://reviews.llvm.org/D68244

llvm-svn: 373847
2019-10-06 13:08:08 +00:00
Sanjay Patel e2321bb448 [SLP] avoid reduction transform on patterns that the backend can load-combine
I don't see an ideal solution to these 2 related, potentially large, perf regressions:
https://bugs.llvm.org/show_bug.cgi?id=42708
https://bugs.llvm.org/show_bug.cgi?id=43146

We decided that load combining was unsuitable for IR because it could obscure other
optimizations in IR. So we removed the LoadCombiner pass and deferred to the backend.
Therefore, preventing SLP from destroying load combine opportunities requires that it
recognizes patterns that could be combined later, but not do the optimization itself (
it's not a vector combine anyway, so it's probably out-of-scope for SLP).

Here, we add a scalar cost model adjustment with a conservative pattern match and cost
summation for a multi-instruction sequence that can probably be reduced later.
This should prevent SLP from creating a vector reduction unless that sequence is
extremely cheap.

In the x86 tests shown (and discussed in more detail in the bug reports), SDAG combining
will produce a single instruction on these tests like:

  movbe   rax, qword ptr [rdi]

or:

  mov     rax, qword ptr [rdi]

Not some (half) vector monstrosity as we currently do using SLP:

  vpmovzxbq       ymm0, dword ptr [rdi + 1] # ymm0 = mem[0],zero,zero,..
  vpsllvq ymm0, ymm0, ymmword ptr [rip + .LCPI0_0]
  movzx   eax, byte ptr [rdi]
  movzx   ecx, byte ptr [rdi + 5]
  shl     rcx, 40
  movzx   edx, byte ptr [rdi + 6]
  shl     rdx, 48
  or      rdx, rcx
  movzx   ecx, byte ptr [rdi + 7]
  shl     rcx, 56
  or      rcx, rdx
  or      rcx, rax
  vextracti128    xmm1, ymm0, 1
  vpor    xmm0, xmm0, xmm1
  vpshufd xmm1, xmm0, 78          # xmm1 = xmm0[2,3,0,1]
  vpor    xmm0, xmm0, xmm1
  vmovq   rax, xmm0
  or      rax, rcx
  vzeroupper
  ret

Differential Revision: https://reviews.llvm.org/D67841

llvm-svn: 373833
2019-10-05 18:03:58 +00:00
Aditya Kumar 6a2673605e Invalidate assumption cache before outlining.
Subscribers: llvm-commits

Tags: #llvm

Reviewers: compnerd, vsk, sebpop, fhahn, tejohnson

Reviewed by: vsk

Differential Revision: https://reviews.llvm.org/D68478

llvm-svn: 373807
2019-10-04 22:46:42 +00:00
Roman Lebedev fb5af8b9b9 [InstCombine] Fold 'icmp eq/ne (?trunc (lshr/ashr %x, bitwidth(x)-1)), 0' -> 'icmp sge/slt %x, 0'
We do indeed already get it right in some cases, but only transitively,
with one-use restrictions. Since we only need to produce a single
comparison, it makes sense to match the pattern directly:
  https://rise4fun.com/Alive/kPg

llvm-svn: 373802
2019-10-04 22:16:22 +00:00
Roman Lebedev f304d4d185 [InstCombine] Right-shift shift amount reassociation with truncation (PR43564, PR42391)
Initially (D65380) i believed that if we have rightshift-trunc-rightshift,
we can't do any folding. But as it usually happens, i was wrong.

https://rise4fun.com/Alive/GEw
https://rise4fun.com/Alive/gN2O

In https://bugs.llvm.org/show_bug.cgi?id=43564 we happen to have
this very sequence, of two right shifts separated by trunc.
And "just" so that happens, we apparently can fold the pattern
if the total shift amount is either 0, or it's equal to the bitwidth
of the innermost widest shift - i.e. if we are left with only the
original sign bit. Which is exactly what is wanted there.

llvm-svn: 373801
2019-10-04 22:16:11 +00:00
Roman Lebedev ae738641d5 [NFC][InstCombine] Autogenerate shift.ll test
llvm-svn: 373800
2019-10-04 22:15:57 +00:00
Roman Lebedev 007452532b [NFC][InstCombine] Autogenerate icmp-shr-lt-gt.ll test
llvm-svn: 373799
2019-10-04 22:15:49 +00:00
Roman Lebedev 3c56cc920f [NFC][InstCombine] Tests for bit test via highest sign-bit extract (w/ trunc) (PR43564)
https://rise4fun.com/Alive/x5IS

llvm-svn: 373798
2019-10-04 22:15:41 +00:00
Roman Lebedev 6a954748c8 [NFC][InstCombine] Tests for right-shift shift amount reassociation (w/ trunc) (PR43564, PR42391)
https://rise4fun.com/Alive/GEw

llvm-svn: 373797
2019-10-04 22:15:32 +00:00
Sanjay Patel 6e312388b6 [InstCombine] add tests for fneg disguised as fmul; NFC
llvm-svn: 373788
2019-10-04 20:54:14 +00:00
Kevin P. Neal 68b8052121 [FPEnv] Strict FP tests should use the requisite function attributes.
A set of function attributes is required in any function that uses constrained
floating point intrinsics. None of our tests use these attributes.

This patch fixes this.

These tests have been tested against the IR verifier changes in D68233.

Reviewed by:	andrew.w.kaylor, cameron.mcinally, uweigand
Approved by:	andrew.w.kaylor
Differential Revision:	https://reviews.llvm.org/D67925

llvm-svn: 373761
2019-10-04 17:03:46 +00:00
Peter Collingbourne 71662116fd LowerTypeTests: Rename local functions to avoid collisions with identically named functions in ThinLTO modules.
Without this we can encounter link errors or incorrect behaviour
at runtime as a result of the wrong function being referenced.

Differential Revision: https://reviews.llvm.org/D67945

llvm-svn: 373678
2019-10-03 23:42:44 +00:00
Alina Sbirlea 145cdad119 [MemorySSA] Don't hoist stores if interfering uses (as calls) exist.
llvm-svn: 373674
2019-10-03 22:20:04 +00:00
Roman Lebedev c780645736 [NFC][InstCombine] Some tests for sub-of-negatible pattern
As we have previously estabilished, `sub` is an outcast,
and should be considered non-canonical iff it can be converted to `add`.

It can be converted to `add` if it's second operand can be negated.
So far we mostly only do that for constants and negation itself,
but we should be more through.

llvm-svn: 373597
2019-10-03 13:36:00 +00:00
Roman Lebedev ae3315af07 [InstCombine] Bypass high bit extract before variable sign-extension (PR43523)
https://rise4fun.com/Alive/8BY - valid for lshr+trunc+variable sext
https://rise4fun.com/Alive/7jk - the variable sext can be redundant

https://rise4fun.com/Alive/Qslu - 'exact'-ness of first shift can be preserver

https://rise4fun.com/Alive/IF63 - without trunc we could view this as
                                  more general "drop redundant mask before right-shift",
                                  but let's handle it here for now
https://rise4fun.com/Alive/iip - likewise, without trunc, variable sext can be redundant.

There's more patterns for sure - e.g. we can have 'lshr' as the final shift,
but that might be best handled by some more generic transform, e.g.
"drop redundant masking before right-shift" (PR42456)

I'm singling-out this sext patch because you can only extract
high bits with `*shr` (unlike abstract bit masking),
and i *know* this fold is wanted by existing code.

I don't believe there is much to review here,
so i'm gonna opt into post-review mode here.

https://bugs.llvm.org/show_bug.cgi?id=43523

llvm-svn: 373542
2019-10-02 23:02:12 +00:00
Roman Lebedev 29339149c3 [NFC][InstCombine] Add tests for 'variable sext of variable high bit extract' pattern (PR43523)
https://bugs.llvm.org/show_bug.cgi?id=43523

llvm-svn: 373541
2019-10-02 23:01:58 +00:00
David Bolvansky 6b45029676 [InstCombine] Transform bcopy to memmove
bcopy is still widely used mainly for network apps. Sadly, LLVM has no optimizations for bcopy, but there are some for memmove. 
Since bcopy == memmove, it is profitable to transform bcopy to memmove and use current optimizations for memmove for free here.

llvm-svn: 373537
2019-10-02 22:49:20 +00:00
Sanjay Patel 3f4726b818 [SLP] add test for vectorization of different widths (PR28457); NFC
llvm-svn: 373483
2019-10-02 16:12:42 +00:00
Florian Hahn f2ffa7a1c0 [InstCombine] Precommit tests for D68265
llvm-svn: 373458
2019-10-02 12:32:37 +00:00
Sanjay Patel be21ceb565 [InstSimplify] fold fma/fmuladd with a NaN or undef operand
This is intended to be similar to the constant folding results from
D67446
and earlier, but not all operands are constant in these tests, so the
responsibility for folding is left to InstSimplify.

Differential Revision: https://reviews.llvm.org/D67721

llvm-svn: 373455
2019-10-02 12:12:02 +00:00
Roman Lebedev 053014f8f9 [InstCombine] Deal with -(trunc(X >>u 63)) -> trunc(X >>s 63)
Identical to it's trunc-less variant, just pretent-to hoist
trunc, and everything else still holds:
https://rise4fun.com/Alive/JRU

llvm-svn: 373364
2019-10-01 17:50:20 +00:00
Roman Lebedev 65144149d0 [InstCombine] Preserve 'exact' in -(X >>u 31) -> (X >>s 31) fold
https://rise4fun.com/Alive/yR4

llvm-svn: 373363
2019-10-01 17:50:09 +00:00
Roman Lebedev f273fc793a [NFC][InstCombine] (Better) tests for sign-bit-smearing pattern
https://rise4fun.com/Alive/JRU
https://rise4fun.com/Alive/yR4 <- we can preserve 'exact'

llvm-svn: 373362
2019-10-01 17:49:58 +00:00
Philip Reames 0200626f0b [IndVars] An implementation of loop predication without a need for speculation
This patch implements a variation of a well known techniques for JIT compilers - we have an implementation in tree as LoopPredication - but with an interesting twist. This version does not assume the ability to execute a path which wasn't taken in the original program (such as a guard or widenable.condition intrinsic). The benefit is that this works for arbitrary IR from any frontend (including C/C++/Fortran). The tradeoff is that it's restricted to read only loops without implicit exits.

This builds on SCEV, and can thus eliminate the loop varying portion of the any early exit where all exits are understandable by SCEV. A key advantage is that fixing deficiency exposed in SCEV - already found one while writing test cases - will also benefit all of full redundancy elimination (and most other loop transforms).

I haven't seen anything in the literature which quite matches this. Given that, I'm not entirely sure that keeping the name "loop predication" is helpful. Anyone have suggestions for a better name? This is analogous to partial redundancy elimination - since we remove the condition flowing around the backedge - and has some parallels to our existing transforms which try to make conditions invariant in loops.

Factoring wise, I chose to put this in IndVarSimplify since it's a generally applicable to all workloads. I could split this off into it's own pass, but we'd then probably want to add that new pass every place we use IndVars.  One solid argument for splitting it off into it's own pass is that this transform is "too good". It breaks a huge number of existing IndVars test cases as they tend to be simple read only loops.  At the moment, I've opted it off by default, but if we add this to IndVars and enable, we'll have to update around 20 test files to add side effects or disable this transform.

Near term plan is to fuzz this extensively while off by default, reflect and discuss on the factoring issue mentioned just above, and then enable by default.  I also need to give some though to supporting widenable conditions in this framing.

Differential Revision: https://reviews.llvm.org/D67408

llvm-svn: 373351
2019-10-01 17:03:44 +00:00
David Bolvansky 4037582d6b Revert [InstCombine] sprintf(dest, "%s", str) -> memccpy(dest, str, 0, MAX)
Seems to be slower than memcpy + strlen.

llvm-svn: 373335
2019-10-01 13:19:04 +00:00
David Bolvansky 8fc6a1bf56 [InstCombine] sprintf(dest, "%s", str) -> memccpy(dest, str, 0, MAX)
llvm-svn: 373333
2019-10-01 13:03:10 +00:00
Evandro Menezes 110b1138ba [InstCombine] Expand the simplification of log()
Expand the simplification of special cases of `log()` to include `log2()`
and `log10()` as well as intrinsics and more types.

Differential revision: https://reviews.llvm.org/D67199

llvm-svn: 373261
2019-09-30 20:52:21 +00:00
David Bolvansky a05e671c7e [FunctionAttrs] Added noalias for memccpy/mempcpy arguments
llvm-svn: 373251
2019-09-30 19:43:48 +00:00
Roman Lebedev 0205be8f12 [NFC][InstCombine] Redundant-left-shift-input-masking: add some more undef tests
llvm-svn: 373248
2019-09-30 19:15:51 +00:00
Rong Xu 3674050087 [PGO] Don't group COMDAT variables for compiler generated profile variables in ELF
With this patch, compiler generated profile variables will have its own COMDAT
name for ELF format, which syncs the behavior with COFF. Tested with clang
PGO bootstrap. This shows a modest reduction in object sizes in ELF format.

Differential Revision: https://reviews.llvm.org/D68041

llvm-svn: 373241
2019-09-30 18:11:22 +00:00
Sanjay Patel 712b7c2463 [InstCombine] fold negate disguised as select+mul
Name: negate if true
  %sel = select i1 %cond, i32 -1, i32 1
  %r = mul i32 %sel, %x
  =>
  %m = sub i32 0, %x
  %r = select i1 %cond, i32 %m, i32 %x

  Name: negate if false
  %sel = select i1 %cond, i32 1, i32 -1
  %r = mul i32 %sel, %x
  =>
  %m = sub i32 0, %x
  %r = select i1 %cond, i32 %x, i32 %m

https://rise4fun.com/Alive/Nlh

llvm-svn: 373230
2019-09-30 17:02:26 +00:00
Sanjay Patel 8913882fa2 [InstCombine] add tests for negate disguised as mul; NFC
llvm-svn: 373222
2019-09-30 15:43:27 +00:00
Paul Robinson 14945186c2 [SSP] [1/3] Revert "StackProtector: Use PointerMayBeCaptured"
"Captured" and "relevant to Stack Protector" are not the same thing.

This reverts commit f29366b1f5.
aka r363169.

Differential Revision: https://reviews.llvm.org/D67842

llvm-svn: 373216
2019-09-30 15:01:35 +00:00
Roman Lebedev d30093bb8a [DivRemPairs] Don't assert that we won't ever get expanded-form rem pairs in different BB's (PR43500)
If we happen to have the same div in two basic blocks,
and in one of those we also happen to have the rem part,
we'd match the div-rem pair, but the wrong ones.
So let's drop overly-ambiguous assert.

Fixes https://bugs.llvm.org/show_bug.cgi?id=43500

llvm-svn: 373167
2019-09-29 15:25:24 +00:00
Alexey Bataev 8b1eeafb91 [SLP] Fix for PR31847: Assertion failed: (isLoopInvariant(Operands[i], L) && "SCEVAddRecExpr operand is not loop-invariant!")
Initially SLP vectorizer replaced all going-to-be-vectorized
instructions with Undef values. It may break ScalarEvaluation and may
cause a crash.
Reworked SLP vectorizer so that it does not replace vectorized
instructions by UndefValue anymore. Instead vectorized instructions are
marked for deletion inside if BoUpSLP class and deleted upon class
destruction.

Reviewers: mzolotukhin, mkuper, hfinkel, RKSimon, davide, spatel

Subscribers: RKSimon, Gerolf, anemet, hans, majnemer, llvm-commits, sanjoy

Differential Revision: https://reviews.llvm.org/D29641

llvm-svn: 373166
2019-09-29 14:18:06 +00:00
Wei Mi f0c4e70e95 [SampleFDO] Create a separate flag profile-accurate-for-symsinlist to handle
profile symbol list.

Currently many existing users using profile-sample-accurate want to reduce
code size as much as possible. Their use cases are different from the scenario
profile symbol list tries to handle -- the major motivation of adding profile
symbol list is to get the major memory/code size saving without introduce
performance regression. So to keep the behavior of profile-sample-accurate
unchanged, we think decoupling these two things and using a new flag to
control the handling of profile symbol list may be better.

When profile-sample-accurate and the new flag profile-accurate-for-symsinlist
are both present, since profile-sample-accurate is a user assertion we let it
have a higher precedence.

Differential Revision: https://reviews.llvm.org/D68047

llvm-svn: 373133
2019-09-27 22:33:59 +00:00
Roman Lebedev 9c604a0dd6 [NFC][PhaseOrdering] Add end-to-end tests for the 'two shifts by sext' problem
We start with two separate sext's, but EarlyCSE runs before InstCombine,
so when we get them, they are a single sext, and we just ignore that.
Likewise, if we had a single sext, we don't do anything there.

llvm-svn: 373115
2019-09-27 19:32:43 +00:00
Sanjay Patel 1b40402aa2 [InstSimplify] add tests for fma/fmuladd with undef operand; NFC
llvm-svn: 373109
2019-09-27 18:38:51 +00:00
Roman Lebedev 269f1bea0d [InstCombine] Simplify shift-by-sext to shift-by-zext
Summary:
This is valid for any `sext` bitwidth pair:
```
Processing /tmp/opt.ll..

----------------------------------------
  %signed = sext %y
  %r = shl %x, %signed
  ret %r
=>
  %unsigned = zext %y
  %r = shl %x, %unsigned
  ret %r
  %signed = sext %y

Done: 2016
Optimization is correct!
```

(This isn't so for funnel shifts, there it's illegal for e.g. i6->i7.)

Main motivation is the C++ semantics:
```
int shl(int a, char b) {
    return a << b;
}
```
ends as
```
  %3 = sext i8 %1 to i32
  %4 = shl i32 %0, %3
```
https://godbolt.org/z/0jgqUq
which is, as this shows, too pessimistic.

There is another problem here - we can only do the fold
if sext is one-use. But we can trivially have cases
where several shifts have the same sext shift amount.
This should be resolved, later.

Reviewers: spatel, nikic, RKSimon

Reviewed By: spatel

Subscribers: efriedma, hiraditya, nlopes, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D68103

llvm-svn: 373106
2019-09-27 18:12:15 +00:00
Simon Pilgrim 756f5cfc2a [SLPVectorizer][X86] Regenerate arith-fp tests
llvm-svn: 373063
2019-09-27 10:04:25 +00:00
Roman Lebedev 0956480459 [NFC][InstCombine] Revisit shift-by-signext tests
llvm-svn: 373055
2019-09-27 09:09:15 +00:00
Wei Mi 9c8efeda5c Revert "[LoopInfo] Limit the iterations to check whether a loop has dedicated
exits"

Get a better approach in https://reviews.llvm.org/D68107 to solve the problem.
Revert the initial patch and will commit the new one soon.

This reverts commit rL372990.

llvm-svn: 373044
2019-09-27 05:43:30 +00:00
Jordan Rupprecht f98d2c099a Revert [SLP] Fix for PR31847: Assertion failed: (isLoopInvariant(Operands[i], L) && "SCEVAddRecExpr operand is not loop-invariant!")
This reverts r372626 (git commit 6a278d9073)

llvm-svn: 373019
2019-09-26 22:09:17 +00:00
Kit Barton 50bc610460 [LoopFusion] Add ability to fuse guarded loops
Summary:
This patch extends the current capabilities in loop fusion to fuse guarded loops
(as defined in https://reviews.llvm.org/D63885). The patch adds the necessary
safety checks to ensure that it safe to fuse the guarded loops (control flow
equivalent, no intervening code, and same guard conditions). It also provides an
alternative method to perform the actual fusion of guarded loops. The mechanics
to fuse guarded loops are slightly different then fusing non-guarded loops, so I
opted to keep them separate methods. I will be cleaning this up in later
patches, and hope to converge on a single method to fuse both guarded and
non-guarded loops, but for now I think the review will be easier to keep them
separate.

Reviewers: jdoerfert, Meinersbur, dmgreen, etiotto, Whitney

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D65464

llvm-svn: 373018
2019-09-26 21:42:45 +00:00
Zhaoshi Zheng 1128fa0924 [Unroll] Do NOT unroll a loop with small runtime upperbound
For a runtime loop if we can compute its trip count upperbound:

Don't unroll if:
1. loop is not guaranteed to run either zero or upperbound iterations; and
2. trip count upperbound is less than UnrollMaxUpperBound
Unless user or TTI asked to do so.

If unrolling, limit unroll factor to loop's trip count upperbound.

Differential Revision: https://reviews.llvm.org/D62989

Change-Id: I6083c46a9d98b2e22cd855e60523fdc5a4929c73
llvm-svn: 373017
2019-09-26 21:40:27 +00:00
Roman Lebedev 86b40b0bbf [InstCombine][NFC] Add tests for shift-by-signext
llvm-svn: 373013
2019-09-26 20:49:30 +00:00
Roman Lebedev d1ef2e48fb [InstCombine][NFC] Regenerate load-cmp.ll test
llvm-svn: 373012
2019-09-26 20:49:21 +00:00