Commit Graph

11262 Commits

Author SHA1 Message Date
Sanjay Patel b1546da0e8 [InstCombine] fix typos in tests; NFC
See D50036.

llvm-svn: 339713
2018-08-14 19:13:07 +00:00
Sanjay Patel 73b7e9f65e [InstCombine] add tests for pow->sqrt; NFC
D50036 should fix the missed optimizations.

llvm-svn: 339711
2018-08-14 19:05:37 +00:00
Anna Thomas 60a1e4dddc [LV] Teach about non header phis that have uses outside the loop
Summary:
This patch teaches the loop vectorizer to vectorize loops with non
header phis that have have outside uses.  This is because the iteration
dependence distance for these phis can be widened upto VF (similar to
how we do for induction/reduction) if they do not have a cyclic
dependence with header phis. When identifying reduction/induction/first
order recurrence header phis, we already identify if there are any cyclic
dependencies that prevents vectorization.

The vectorizer is taught to extract the last element from the vectorized
phi and update the scalar loop exit block phi to contain this extracted
element from the vector loop.

This patch can be extended to vectorize loops where instructions other
than phis have outside uses.

Reviewers: Ayal, mkuper, mssimpso, efriedma

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D50579

llvm-svn: 339703
2018-08-14 18:22:19 +00:00
David Bolvansky ba74d1c4ea [NFC] Tests for select with binop fold - FP opcodes
llvm-svn: 339692
2018-08-14 17:03:47 +00:00
Sanjay Patel c8e3943e89 [InstCombine] regenerate checks; NFC
llvm-svn: 339683
2018-08-14 15:21:13 +00:00
Sanjay Patel 19c7e7dab4 [InstCombine] regenerate checks; NFC
llvm-svn: 339681
2018-08-14 15:18:52 +00:00
Tomasz Krupa e766e5f636 [X86] Constant folding of adds/subs intrinsics
Summary: This adds constant folding of signed add/sub with saturation intrinsics.

Reviewers: craig.topper, spatel, RKSimon, chandlerc, efriedma

Reviewed By: craig.topper

Subscribers: rnk, llvm-commits

Differential Revision: https://reviews.llvm.org/D50499

llvm-svn: 339659
2018-08-14 09:04:01 +00:00
Reid Kleckner 40e7663b1f [BasicAA] Don't assume tail calls with byval don't alias allocas
Summary:
Calls marked 'tail' cannot read or write allocas from the current frame
because the current frame might be destroyed by the time they run.
However, a tail call may use an alloca with byval. Calling with byval
copies the contents of the alloca into argument registers or stack
slots, so there is no lifetime issue. Tail calls never modify allocas,
so we can return just ModRefInfo::Ref.

Fixes PR38466, a longstanding bug.

Reviewers: hfinkel, nlewycky, gbiv, george.burgess.iv

Subscribers: hiraditya, llvm-commits

Differential Revision: https://reviews.llvm.org/D50679

llvm-svn: 339636
2018-08-14 01:24:35 +00:00
Roman Lebedev 3534874fbf [InstCombine] Re-land: Optimize redundant 'signed truncation check pattern'.
Summary:
This comes with `Implicit Conversion Sanitizer - integer sign change` (D50250):
```
signed char test(unsigned int x) { return x; }
```
`clang++ -fsanitize=implicit-conversion -S -emit-llvm -o - /tmp/test.cpp -O3`
* Old: {F6904292}
* With this patch: {F6904294}

General pattern:
  X & Y

Where `Y` is checking that all the high bits (covered by a mask `4294967168`)
are uniform, i.e.  `%arg & 4294967168`  can be either  `4294967168`  or  `0`
Pattern can be one of:
  %t = add        i32 %arg,    128
  %r = icmp   ult i32 %t,      256
Or
  %t0 = shl       i32 %arg,    24
  %t1 = ashr      i32 %t0,     24
  %r  = icmp  eq  i32 %t1,     %arg
Or
  %t0 = trunc     i32 %arg  to i8
  %t1 = sext      i8  %t0   to i32
  %r  = icmp  eq  i32 %t1,     %arg
This pattern is a signed truncation check.

And `X` is checking that some bit in that same mask is zero.
I.e. can be one of:
  %r = icmp sgt i32   %arg,    -1
Or
  %t = and      i32   %arg,    2147483648
  %r = icmp eq  i32   %t,      0

Since we are checking that all the bits in that mask are the same,
and a particular bit is zero, what we are really checking is that all the
masked bits are zero.
So this should be transformed to:
  %r = icmp ult i32 %arg, 128

The transform itself ended up being rather horrible, even though i omitted some cases.
Surely there is some infrastructure that can help clean this up that i missed?

https://rise4fun.com/Alive/3Ou

The initial commit (rL339610)
was reverted, since the first assert was being triggered.
The @positive_with_extra_and test now has coverage for that case.

Reviewers: spatel, craig.topper

Reviewed By: spatel

Subscribers: RKSimon, erichkeane, vsk, llvm-commits

Differential Revision: https://reviews.llvm.org/D50465

llvm-svn: 339621
2018-08-13 21:54:37 +00:00
Roman Lebedev 93f7e7f03e [NFC][InstCombine] Add a test for D50465 that used to assert
This is valid to fold, too.
https://rise4fun.com/Alive/0lz

llvm-svn: 339619
2018-08-13 21:49:33 +00:00
Sanjay Patel 15bff18c6f [SimplifyLibCalls] don't drop fast-math-flags on trig reflection folds (retry r339608)
Even though this code is below a function called optimizeFloatingPointLibCall(),
we apparently can't guarantee that we're dealing with FPMathOperators, so bail
out immediately if that's not true.

llvm-svn: 339618
2018-08-13 21:49:19 +00:00
Roman Lebedev 28a42c7706 Revert "[InstCombine] Optimize redundant 'signed truncation check pattern'."
At least one buildbot was able to actually trigger that assert
on the top of the function. Will investigate.

This reverts commit r339610.

llvm-svn: 339612
2018-08-13 20:46:22 +00:00
Roman Lebedev 4c4750771f [InstCombine] Optimize redundant 'signed truncation check pattern'.
Summary:
This comes with `Implicit Conversion Sanitizer - integer sign change` (D50250):
```
signed char test(unsigned int x) { return x; }
```
`clang++ -fsanitize=implicit-conversion -S -emit-llvm -o - /tmp/test.cpp -O3`
* Old: {F6904292}
* With this patch: {F6904294}

General pattern:
  X & Y

Where `Y` is checking that all the high bits (covered by a mask `4294967168`)
are uniform, i.e.  `%arg & 4294967168`  can be either  `4294967168`  or  `0`
Pattern can be one of:
  %t = add        i32 %arg,    128
  %r = icmp   ult i32 %t,      256
Or
  %t0 = shl       i32 %arg,    24
  %t1 = ashr      i32 %t0,     24
  %r  = icmp  eq  i32 %t1,     %arg
Or
  %t0 = trunc     i32 %arg  to i8
  %t1 = sext      i8  %t0   to i32
  %r  = icmp  eq  i32 %t1,     %arg
This pattern is a signed truncation check.

And `X` is checking that some bit in that same mask is zero.
I.e. can be one of:
  %r = icmp sgt i32   %arg,    -1
Or
  %t = and      i32   %arg,    2147483648
  %r = icmp eq  i32   %t,      0

Since we are checking that all the bits in that mask are the same,
and a particular bit is zero, what we are really checking is that all the
masked bits are zero.
So this should be transformed to:
  %r = icmp ult i32 %arg, 128

https://rise4fun.com/Alive/3Ou

Reviewers: spatel, craig.topper

Reviewed By: spatel

Subscribers: RKSimon, erichkeane, vsk, llvm-commits

Differential Revision: https://reviews.llvm.org/D50465

llvm-svn: 339610
2018-08-13 20:33:08 +00:00
Sanjay Patel 66c6fe6534 revert r339608 - [SimplifyLibCalls] don't drop fast-math-flags on trig reflection folds
Can't set the builder flags without knowing this is an FPMathOperator. I'll add a test
for that and try again.

llvm-svn: 339609
2018-08-13 20:20:38 +00:00
Sanjay Patel 981f50919e [SimplifyLibCalls] don't drop fast-math-flags on trig reflection folds
llvm-svn: 339608
2018-08-13 20:14:27 +00:00
Anna Thomas cce7c24af1 NFC: Add a test to LV showing that reduction is not possible when reduction var is reset in the loop
Added a test case to reduction showing where it's illegal to identify
vectorize a loop.
Resetting the reduction var during loop iterations disallows us from
widening the dependency cycle to VF, thereby making it illegal to
vectorize the loop.

llvm-svn: 339605
2018-08-13 19:55:25 +00:00
Sanjay Patel e45a83d447 [SimplifyLibCalls] add reflection fold for -sin(-x) (PR38458)
This is a very partial fix for the reported problem. I suspect
we do not get this fold in most motivating cases because most of
the time, the libcall would have been replaced by an intrinsic,
and that optimization is handled elsewhere...but maybe it should
be handled here?

llvm-svn: 339604
2018-08-13 19:24:41 +00:00
Roman Lebedev 2da1ef5b9e [InstCombine][NFC] Tests for 'signed truncation check' optimization
See D50465 for the actual opt itself.

Differential Revision: https://reviews.llvm.org/D50464

llvm-svn: 339602
2018-08-13 18:51:09 +00:00
Sanjay Patel e33062369e [InstCombine] add more tests for trig reflections; NFC (PR38458)
llvm-svn: 339598
2018-08-13 18:34:32 +00:00
Simon Pilgrim 82edf8d329 [InstCombine] Limit simplifyAllocaArraySize constant folding to values that fit into a uint64_t
Fixes OSS-Fuzz: https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=5223

llvm-svn: 339584
2018-08-13 16:50:20 +00:00
Sanjay Patel d379f39e18 [InstCombine] auto-generate full checks and add cos intrinsic test; NFC
llvm-svn: 339579
2018-08-13 16:29:01 +00:00
Evandro Menezes 5ecd6c1a46 [SLC] Expand simplification of pow() for vector types
Also consider vector constants when simplifying `pow()`.

Differential revision: https://reviews.llvm.org/D50035

llvm-svn: 339578
2018-08-13 16:12:37 +00:00
Max Kazantsev 5c490b49c3 [GuardWidening] Widen very likely non-taken br instructions
This is a second part of D49974 that handles widening of conditional branches that
have very likely `false` branch.

Differential Revision: https://reviews.llvm.org/D50040
Reviewed By: reames

llvm-svn: 339537
2018-08-13 07:58:19 +00:00
Craig Topper 484b342c68 [X86] Add constant folding for AVX512 versions of scalar floating point to integer conversion intrinsics.
Summary:
We've supported constant folding for sse versions for many years. This patch adds support for the avx512 versions including unsigned with the default rounding mode. We could probably do more with other roundings modes and SAE in the future.

The test cases are largely based on the sse.ll test cases. But I did add some test cases to ensure the unsigned versions don't accept negative values. Also checked the bounds of f64->i32 conversions to make sure unsigned has a larger positive range than signed.

Reviewers: RKSimon, spatel, chandlerc

Reviewed By: RKSimon

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D50553

llvm-svn: 339529
2018-08-12 22:09:54 +00:00
David Bolvansky cd57242587 [NFC] Fixed build, updated tests
llvm-svn: 339524
2018-08-12 18:32:53 +00:00
David Bolvansky ddfe408f9a [NFC] Renamed test file
llvm-svn: 339523
2018-08-12 17:43:27 +00:00
David Bolvansky 01d98cc03f [InstCombine] Fold Select with binary op - non-commutative opcodes
Summary:
Basic version was merged - https://reviews.llvm.org/D49954

This adds support for FP & non-commutative opcodes

Precommited tests: https://reviews.llvm.org/rL338727

Reviewers: spatel, lebedev.ri

Reviewed By: spatel

Subscribers: jfb

Differential Revision: https://reviews.llvm.org/D50190

llvm-svn: 339520
2018-08-12 17:30:07 +00:00
Sanjay Patel dc185ee275 [InstCombine] fix/enhance fadd/fsub factorization
(X * Z) + (Y * Z) --> (X + Y) * Z
  (X * Z) - (Y * Z) --> (X - Y) * Z
  (X / Z) + (Y / Z) --> (X + Y) / Z
  (X / Z) - (Y / Z) --> (X - Y) / Z

The existing code that implemented these folds failed to 
optimize vectors, and it transformed code with multiple 
uses when it should not have.

llvm-svn: 339519
2018-08-12 15:48:26 +00:00
Sanjay Patel ce104b6c16 [InstCombine] move/add tests for fadd/fsub factorization; NFC
llvm-svn: 339518
2018-08-12 15:06:15 +00:00
David Green f7111d1ece [UnJ] Improve explicit loop count checks
Try to improve the computed counts when it has been explicitly set by a pragma
or command line option. This moves the code around, so that first call to
computeUnrollCount to get a sensible count and override that if explicit unroll
and jam counts are specified.

Also added some extra debug messages for when unroll and jamming is disabled.

Differential Revision: https://reviews.llvm.org/D50075

llvm-svn: 339501
2018-08-11 07:37:31 +00:00
Philip Reames 85afd1a9a0 [LICM] Hoist assumes out of loops
If we have an assume which is known to execute and whose operand is invariant, we can lift that into the pre-header. So long as we don't change which paths the assume executes on, this is a legal transformation. It's likely to be a useful canonicalization as other transforms only look for dominating assumes.

Differential Revision: https://reviews.llvm.org/D50364

llvm-svn: 339481
2018-08-10 22:21:56 +00:00
Sanjay Patel 0b62b01129 [InstCombine] add tests for fsub factorization; NFC
The tests show that;
1. The fold doesn't fire for vectors, but it should.
2. The fold fires regardless of uses, but it shouldn't.

llvm-svn: 339470
2018-08-10 21:00:27 +00:00
Sanjay Patel 3950095edf [InstCombine] add tests to show disabling of libcall/intrinsic shrinking; NFC
llvm-svn: 339467
2018-08-10 20:12:36 +00:00
Matt Arsenault d35f46caf1 AMDGPU: Turn class x, p_zero|n_zero into fcmp oeq x, 0
The library does use this for some reason.

llvm-svn: 339461
2018-08-10 18:58:49 +00:00
Sanjay Patel 12a2911f62 [InstCombine] add/update tests for selectBinOpIdentity; NFC
This includes a test that would have exposed the bug in rL339439
which was reverted at rL339446. The compare can be integer while
the binop is FP or vice-versa, so we need to use the binop type
when we ask for the identity constant.

llvm-svn: 339453
2018-08-10 17:20:24 +00:00
David Bolvansky 5099835541 [InstCombine][NFC] Added tests for select with binop fold
llvm-svn: 339441
2018-08-10 15:29:09 +00:00
Max Kazantsev 4e9def57c7 [NFC] Add tests that demonstrate that MustExecute is fundamentally broken
llvm-svn: 339417
2018-08-10 09:20:46 +00:00
George Burgess IV ff08c80efc [MemorySSA] "Fix" lifetime intrinsic handling
MemorySSA currently creates MemoryAccesses for lifetime intrinsics, and
sometimes treats them as clobbers. This may/may not be the best way
forward, but while we're doing it, we should consider
MayAlias/PartialAlias to be clobbers.

The ideal fix here is probably to remove all of this reasoning about
lifetimes from MemorySSA + put it into the passes that need to care. But
that's a wayyy broader fix that needs some consensus, and we have
miscompiles + a release branch today, and this should solve the
miscompiles just as well.

differential revision is D43269. Landing without an explicit LGTM (and
without using the special please-autoclose-this syntax) so we can still
use that revision as a place to decide what the right fix here is.

llvm-svn: 339411
2018-08-10 05:14:43 +00:00
David Bolvansky 909889b2cb [InstCombine] Transform str(n)cmp to memcmp
Summary:
Motivation examples:
int strcmp_memcmp() {
    char buf[12];
    return strcmp(buf, "key") == 0;
}

int strcmp_memcmp2() {
    char buf[12];
    return strcmp(buf, "key") != 0;
}

int strncmp_memcmp() {
    char buf[12];
    return strncmp(buf, "key", 3) == 0;
}

can be turned to memcmp.

See test file for more cases.

Reviewers: efriedma

Reviewed By: efriedma

Subscribers: spatel, llvm-commits

Differential Revision: https://reviews.llvm.org/D50233

llvm-svn: 339410
2018-08-10 04:32:54 +00:00
Matt Arsenault d54b7f0592 ValueTracking: Start enhancing isKnownNeverNaN
llvm-svn: 339399
2018-08-09 22:40:08 +00:00
Sanjay Patel c6944f795d [InstSimplify] move minnum/maxnum with Inf folds from instcombine
llvm-svn: 339396
2018-08-09 22:20:44 +00:00
Philip Reames ca256d93fb [LICM] hoist fences out of loops w/o memory operations
The motivating case is an otherwise dead loop with a fence in it. At the moment, this goes all the way through the optimizer and we end up emitting an entirely pointless loop on x86. This case may seem a bit contrived, but we've seen it in real code as the result of otherwise reasonable lowering strategies combined w/thread local memory optimizations (such as escape analysis).

To handle this simple case, we can teach LICM to hoist must execute fences when there is no other memory operation within the loop.

Differential Revision: https://reviews.llvm.org/D50489

llvm-svn: 339378
2018-08-09 20:18:42 +00:00
Sanjay Patel 55accd7dd3 [InstCombine] allow fsub+fmul FMF folds for vectors
llvm-svn: 339368
2018-08-09 18:42:12 +00:00
Alina Sbirlea bf9fe79397 SCEV should forget all loops containing a deleted block.
Summary:
LoopSimplifyCFG should update ScEv for all loops after a block is deleted.
If the deleted block "Succ" is part of L, then it is part of all parent loops, so forget topmost loop.

Reviewers: greened, mkazantsev, sanjoy

Subscribers: jlebar, javed.absar, uabelho, llvm-commits

Differential Revision: https://reviews.llvm.org/D50422

llvm-svn: 339363
2018-08-09 17:53:26 +00:00
Sanjay Patel 373790293e [InstCombine] add vector tests for fsub+fmul; NFC
llvm-svn: 339361
2018-08-09 17:40:27 +00:00
Reid Kleckner 80c6ec11d9 [GlobalOpt] Don't apply fastcc if it would break inalloca invariants
The inalloca parameter has to be the only parameter passed in memory.
Changing the convention to fastcc can break that.

At some point we should teach global opt how to optimize ABI attributes
like inalloca and maybe byval. These attributes are mainly used to match
C ABIs. They are harder for LLVM to optimize and they don't always
generate the best code.

Fixes PR38487

llvm-svn: 339360
2018-08-09 17:29:26 +00:00
Philip Reames 954eab1087 [LICM] Add tests for future hoisting of fence instructions [NFC]
The main interesting case is a fence in an otherwise dead loop or one containing only arithmetic.  This can happen as a result of DSE or other transforms from seemingly reasonable initial IR.  

llvm-svn: 339310
2018-08-09 04:21:02 +00:00
Sanjay Patel fe839695a8 [InstCombine] fold fadd+fsub with common operand
This is a sibling to the simplify from:
https://reviews.llvm.org/rL339174

llvm-svn: 339267
2018-08-08 16:19:22 +00:00
Sanjay Patel 2054dd79c2 [InstCombine] fold fsub+fsub with common operand
This is a sibling to the simplify from:
rL339171

llvm-svn: 339266
2018-08-08 16:04:48 +00:00
Sanjay Patel abd4767a0d [InstCombine] add tests for fsub folds; NFC
The scalar cases are handled in instcombine's internal
reassociation pass for FP ops, but it misses the vector types.

These patterns are similar to what was handled in InstSimplify in:
https://reviews.llvm.org/rL339171
https://reviews.llvm.org/rL339174
https://reviews.llvm.org/rL339176
...but we can't use instsimplify on these because we require negation
of the original operand.

llvm-svn: 339263
2018-08-08 15:44:56 +00:00