Commit Graph

12530 Commits

Author SHA1 Message Date
Sanjay Patel 0dd67ed462 [InstCombine] add tests for uaddsat using min; NFC
llvm-svn: 357005
2019-03-26 16:19:13 +00:00
Sanjay Patel 418ee7b7bb [InstCombine] update tests to use FileCheck; NFC
llvm-svn: 357004
2019-03-26 15:58:33 +00:00
Simon Pilgrim 6f96795b88 [SLPVectorizer] Merge reorderAltShuffleOperands into reorderInputsAccordingToOpcode
As discussed on D59738, this generalizes reorderInputsAccordingToOpcode to handle multiple + non-commutative instructions so we can get rid of reorderAltShuffleOperands and make use of the extra canonicalizations that reorderInputsAccordingToOpcode brings.

Differential Revision: https://reviews.llvm.org/D59784

llvm-svn: 356939
2019-03-25 20:05:27 +00:00
Simon Pilgrim 77749567a1 [SLPVectorizer] Update file missed in rL356913
Differential Revision: https://reviews.llvm.org/D59738

llvm-svn: 356915
2019-03-25 16:14:21 +00:00
Simon Pilgrim ff3abef395 [SLPVectorizer] reorderInputsAccordingToOpcode - remove non-Instruction canonicalization
Remove attempts to commute non-Instructions to the LHS - the codegen changes appear to rely on chance more than anything else and also have a tendency to fight existing instcombine canonicalization which moves constants to the RHS of commutable binary ops.

This is prep work towards:
(a) reusing reorderInputsAccordingToOpcode for alt-shuffles and removing the similar reorderAltShuffleOperands
(b) improving reordering to optimized cases with commutable and non-commutable instructions to still find splat/consecutive ops.

Differential Revision: https://reviews.llvm.org/D59738

llvm-svn: 356913
2019-03-25 15:53:55 +00:00
Simon Pilgrim 9eb0de8573 [X86][SLP] Show example of failure to uniformly commute splats for 'alt' shuffles.
If either the main/alt opcodes isn't commutable we may end up with the splats not correctly commuted to the same side.

llvm-svn: 356837
2019-03-23 16:14:04 +00:00
Daniel Sanders ef8761fd3b Fix non-determinism in Reassociate caused by address coincidences
Summary:
Between building the pair map and querying it there are a few places that
erase and create Values. It's rare but the address of these newly created
Values is occasionally the same as a just-erased Value that we already
have in the pair map. These coincidences should be accounted for to avoid
non-determinism.

Thanks to Roman Tereshin for the test case.

Reviewers: rtereshin, bogner

Reviewed By: rtereshin

Subscribers: mgrang, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D59401

llvm-svn: 356803
2019-03-22 20:16:35 +00:00
Sanjay Patel a0aaa11afc [SLP] fix variables names in test; NFC
'tmpXXX' conflicts with the auto-generated script regex names.
That could cause mask a bug or fail if the output changes.

llvm-svn: 356790
2019-03-22 18:33:11 +00:00
James Y Knight c0e6b8ac3a IR: Support parsing numeric block ids, and emit them in textual output.
Just as as llvm IR supports explicitly specifying numeric value ids
for instructions, and emits them by default in textual output, now do
the same for blocks.

This is a slightly incompatible change in the textual IR format.

Previously, llvm would parse numeric labels as string names. E.g.
  define void @f() {
    br label %"55"
  55:
    ret void
  }
defined a label *named* "55", even without needing to be quoted, while
the reference required quoting. Now, if you intend a block label which
looks like a value number to be a name, you must quote it in the
definition too (e.g. `"55":`).

Previously, llvm would print nameless blocks only as a comment, and
would omit it if there was no predecessor. This could cause confusion
for readers of the IR, just as unnamed instructions did prior to the
addition of "%5 = " syntax, back in 2008 (PR2480).

Now, it will always print a label for an unnamed block, with the
exception of the entry block. (IMO it may be better to print it for
the entry-block as well. However, that requires updating many more
tests.)

Thus, the following is supported, and is the canonical printing:
  define i32 @f(i32, i32) {
    %3 = add i32 %0, %1
    br label %4

  4:
    ret i32 %3
  }

New test cases covering this behavior are added, and other tests
updated as required.

Differential Revision: https://reviews.llvm.org/D58548

llvm-svn: 356789
2019-03-22 18:27:13 +00:00
Philip Reames d627048c07 [Tests] Add masked.gather tests for non-constant masks + speculation possibilities
llvm-svn: 356782
2019-03-22 16:39:04 +00:00
Bixia Zheng bdf0230cff [ConstantFolding] Fix GetConstantFoldFPValue to avoid cast overflow.
Summary:
In C++, the behavior of casting a double value that is beyond the range
of a single precision floating-point to a float value is undefined. This
change replaces such a cast with APFloat::convert to convert the value,
which is consistent with how we convert a double value to a half value.

Reviewers: sanjoy

Subscribers: lebedev.ri, sanjoy, jlebar, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D59500

llvm-svn: 356781
2019-03-22 16:37:37 +00:00
Philip Reames f032e85d64 [tests] Add a generic masked.gather test to show sometimes we can't transform
llvm-svn: 356779
2019-03-22 16:30:56 +00:00
Philip Reames e234fd6118 [tests] Add tests for converting masked.load to load speculatively
llvm-svn: 356778
2019-03-22 16:26:57 +00:00
Philip Reames 4a518c7055 [Tests] Use valid alignment in masked.gather tests
llvm-svn: 356775
2019-03-22 16:20:24 +00:00
Tim Renouf 94c163c34e InstCombineSimplifyDemanded: Allow v3 results for AMDGCN buffer and image intrinsics
This helps to avoid the situation where RA spots that only 3 of the
v4f32 result of a load are used, and immediately reallocates the 4th
register for something else, requiring a stall waiting for the load.

Differential Revision: https://reviews.llvm.org/D58906

Change-Id: I947661edfd5715f62361a02b100f14aeeada29aa
llvm-svn: 356768
2019-03-22 15:53:50 +00:00
Dinar Temirbulatov f95351b918 [SLPVectorizer] Add test related to SLP Throttling support, NFCI.
llvm-svn: 356754
2019-03-22 14:50:53 +00:00
Nikita Popov b86576a5b9 [InstSimplify] Add tests for signed icmp of and/or; NFC
Even if a signed predicate is used, the ranges computed for and/or
are unsigned, resulting in missed simplifications.

llvm-svn: 356720
2019-03-21 21:13:08 +00:00
Akira Hatanaka b576c77a9e Don't add a tail keyword to calls to ObjC runtime functions if the calls
are annotated with notail.

r356705 annotated calls to objc_retainAutoreleasedReturnValue with
notail on x86-64. This commit teaches ARC optimizer to check the notail
marker on the call before turning it into a tail call.

rdar://problem/38675807

llvm-svn: 356707
2019-03-21 20:16:09 +00:00
Craig Topper 16dc165046 [InstCombine] Don't transform ((C1 OP zext(X)) & C2) -> zext((C1 OP X) & C2) if either zext or OP has another use.
If they have other users we'll just end up increasing the instruction count.

We might be able to weaken this to only one of them having a single use if we can prove that the and will be removed.

Fixes PR41164.

Differential Revision: https://reviews.llvm.org/D59630

llvm-svn: 356690
2019-03-21 17:50:49 +00:00
Craig Topper 9f0b17a248 [ScalarizeMaskedMemIntrin] Add support for scalarizing expandload and compressstore intrinsics.
This adds support for scalarizing these intrinsics as well the X86TargetTransformInfo support to avoid scalarizing them in the cases X86 can handle.

I've omitted handling special cases for constant masks for this first pass. Though CodeGenPrepare can constant fold the branch conditions and remove some of the control flow anyway.

Fixes PR40994 and is covers most of PR3666. Might want to implement constant masks to close that.

Differential Revision: https://reviews.llvm.org/D59180

llvm-svn: 356687
2019-03-21 17:38:52 +00:00
Nikita Popov 3af5b28f47 [ValueTracking] Use ConstantRange based overflow check for signed sub
This is D59450, but for signed sub. This case is not NFC, because
the overflow logic in ConstantRange is more powerful than the existing
check. This resolves the TODO in the function.

I've added two tests to show that this indeed catches more cases than
the previous logic, but the main correctness test coverage here is in
the existing ConstantRange unit tests.

Differential Revision: https://reviews.llvm.org/D59617

llvm-svn: 356685
2019-03-21 17:23:51 +00:00
Sanjay Patel d47eac59ef [CodeGenPrepare] limit formation of overflow intrinsics (PR41129)
This is probably a bigger limitation than necessary, but since we don't have any evidence yet
that this transform led to real-world perf improvements rather than regressions, I'm making a
quick, blunt fix.

In the motivating x86 example from:
https://bugs.llvm.org/show_bug.cgi?id=41129
...and shown in the regression test, we want to avoid an extra instruction in the dominating
block because that could be costly.

The x86 LSR test diff is reversing the changes from D57789. There's no evidence that 1 version
is any better than the other yet.

Differential Revision: https://reviews.llvm.org/D59602

llvm-svn: 356665
2019-03-21 13:57:07 +00:00
Craig Topper 72d888ba9f [InstCombine] Add test case for PR41164. NFC
llvm-svn: 356645
2019-03-21 05:33:10 +00:00
Nikita Popov 03dbfc2eef [InstCombine] Add additional sub nsw inference tests; NFC
nsw can be determined based on known bits here, but currently
isn't.

llvm-svn: 356620
2019-03-20 21:42:17 +00:00
Philip Reames e4588bbf80 Simplify operands of masked stores and scatters based on demanded elements
If we know we're not storing a lane, we don't need to compute the lane. This could be improved by using the undef element result to further prune the mask, but I want to separate that into its own change since it's relatively likely to expose other problems.

Differential Revision: https://reviews.llvm.org/D57247

llvm-svn: 356590
2019-03-20 18:44:58 +00:00
Alina Sbirlea 5baa72ea74 [LICM & MemorySSA] Don't sink/hoist stores in the presence of ordered loads.
Summary:
Before this patch, if any Use existed in the loop, with a defining
access in the loop, we conservatively decide to not move the store.
What this approach was missing, is that ordered loads are not Uses, they're Defs
in MemorySSA. So, even when the clobbering walker does not find that
volatile load to interfere, we still cannot hoist a store past a
volatile load.
Resolves PR41140.

Reviewers: george.burgess.iv

Subscribers: sanjoy, jlebar, Prazek, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D59564

llvm-svn: 356588
2019-03-20 18:33:37 +00:00
Nikita Popov 00b5ecab5d [ValueTracking] Compute range for abs without nsw
This is a small followup to D59511. The code that was moved into
computeConstantRange() there is a bit overly conversative: If the
abs is not nsw, it does not compute any range. However, abs without
nsw still has a well-defined contiguous unsigned range from 0 to
SIGNED_MIN. This is a lot less useful than the usual 0 to SIGNED_MAX
range, but if we're already here we might as well specify it...

Differential Revision: https://reviews.llvm.org/D59563

llvm-svn: 356586
2019-03-20 18:16:02 +00:00
Nikita Popov 37cf25c3c6 [InstCombine] Fold add nuw + uadd.with.overflow
Fold add nuw and uadd.with.overflow with constants if the
addition does not overflow.

Part of https://bugs.llvm.org/show_bug.cgi?id=38146.

Patch by Dan Robertson.

Differential Revision: https://reviews.llvm.org/D59471

llvm-svn: 356584
2019-03-20 18:00:27 +00:00
Sanjay Patel fb44f99b73 [CGP][x86] add tests for usubo regression (PR41129); NFC
llvm-svn: 356559
2019-03-20 15:02:35 +00:00
Nikita Popov 2dd1566e8b [InstSimplify] Add additional cmp of abs without nsw tests; NFC
llvm-svn: 356520
2019-03-19 21:12:21 +00:00
Philip Reames 70537abe52 Demanded elements support for masked.load and masked.gather
Teach instcombine to propagate demanded elements through a masked load or masked gather instruction. This is in the broader context of improving vector pointer instcombine under https://reviews.llvm.org/D57140.

Differential Revision: https://reviews.llvm.org/D57372

llvm-svn: 356510
2019-03-19 20:10:00 +00:00
Nikita Popov 208381953b [ValueTracking] Use computeConstantRange() for unsigned add/sub overflow
Improve computeOverflowForUnsignedAdd/Sub in ValueTracking by
intersecting the computeConstantRange() result into the ConstantRange
created from computeKnownBits(). This allows us to detect some
additional never/always overflows conditions that can't be determined
from known bits.

This revision also adds basic handling for constants to
computeConstantRange(). Non-splat vectors will be handled in a followup.

The signed case will also be handled in a followup, as it needs some
more groundwork.

Differential Revision: https://reviews.llvm.org/D59386

llvm-svn: 356489
2019-03-19 17:53:56 +00:00
Sanjay Patel 5b820323ca [InstCombine] fold logic-of-nan-fcmps (PR41069)
Combine 2 fcmps that are checking for nan-ness:
   and (fcmp ord X, 0), (and (fcmp ord Y, 0), Z) --> and (fcmp ord X, Y), Z
   or  (fcmp uno X, 0), (or  (fcmp uno Y, 0), Z) --> or  (fcmp uno X, Y), Z

This is an exact match for a minimal reassociation pattern.
If we want to handle this more generally that should go in
the reassociate pass and allow removing this code.

This should fix:
https://bugs.llvm.org/show_bug.cgi?id=41069

llvm-svn: 356471
2019-03-19 16:39:17 +00:00
Teresa Johnson bda581b831 [InstCombine] Add missing test for icmp transformation (NFC)
This was split out of D59378. There was no testing for the EQ case in
foldICmpWithDominatingICmp, add one here.

llvm-svn: 356463
2019-03-19 15:43:56 +00:00
Simon Pilgrim 8ee477a2ab [InstSimplify] SimplifyICmpInst - icmp eq/ne %X, undef -> undef
As discussed on PR41125 and D59363, we have a mismatch between icmp eq/ne cases with an undef operand:

When the other operand is constant we fold to undef (handled in ConstantFoldCompareInstruction)
When the other operand is non-constant we fold to a bool constant based on isTrueWhenEqual (handled in SimplifyICmpInst).

Neither is really wrong, but this patch changes the logic in SimplifyICmpInst to consistently fold to undef.

The NewGVN test change is annoying (as with most heavily reduced tests) but AFAICT I have kept the purpose of the test based on rL291968.

Differential Revision: https://reviews.llvm.org/D59541

llvm-svn: 356456
2019-03-19 14:08:23 +00:00
Sanjay Patel 423b958306 [InstCombine] add FMF to tests for extra coverage; NFC
ninf is probably the only relevant possible flag here
(nnan allows simplification and nsz never makes a difference).

llvm-svn: 356453
2019-03-19 13:39:29 +00:00
Markus Lavin b86ce219f4 [DebugInfo] Introduce DW_OP_LLVM_convert
Introduce a DW_OP_LLVM_convert Dwarf expression pseudo op that allows
for a convenient way to perform type conversions on the Dwarf expression
stack. As an additional bonus it paves the way for using other Dwarf
v5 ops that need to reference a base_type.

The new DW_OP_LLVM_convert is used from lib/Transforms/Utils/Local.cpp
to perform sext/zext on debug values but mainly the patch is about
preparing terrain for adding other Dwarf v5 ops that need to reference a
base_type.

For Dwarf v5 the op maps to DW_OP_convert and for earlier versions a
complex shift & mask pattern is generated to emulate sext/zext.

This is a recommit of r356442 with trivial fixes for the failing tests.

Differential Revision: https://reviews.llvm.org/D56587

llvm-svn: 356451
2019-03-19 13:16:28 +00:00
Simon Pilgrim 9497b2b2f7 [InstCombine] Regenerate + add icmp with undef tests
Better test coverage for PR41125 and D59363

llvm-svn: 356448
2019-03-19 11:44:22 +00:00
Markus Lavin ad78768d59 Revert "[DebugInfo] Introduce DW_OP_LLVM_convert"
This reverts commit 1cf4b593a7ebd666fc6775f3bd38196e8e65fafe.

Build bots found failing tests not detected locally.

Failing Tests (3):
  LLVM :: DebugInfo/Generic/convert-debugloc.ll
  LLVM :: DebugInfo/Generic/convert-inlined.ll
  LLVM :: DebugInfo/Generic/convert-linked.ll

llvm-svn: 356444
2019-03-19 09:17:28 +00:00
Markus Lavin cd8a940b37 [DebugInfo] Introduce DW_OP_LLVM_convert
Introduce a DW_OP_LLVM_convert Dwarf expression pseudo op that allows
for a convenient way to perform type conversions on the Dwarf expression
stack. As an additional bonus it paves the way for using other Dwarf
v5 ops that need to reference a base_type.

The new DW_OP_LLVM_convert is used from lib/Transforms/Utils/Local.cpp
to perform sext/zext on debug values but mainly the patch is about
preparing terrain for adding other Dwarf v5 ops that need to reference a
base_type.

For Dwarf v5 the op maps to DW_OP_convert and for earlier versions a
complex shift & mask pattern is generated to emulate sext/zext.

Differential Revision: https://reviews.llvm.org/D56587

llvm-svn: 356442
2019-03-19 08:48:19 +00:00
Nikita Popov 3e9770d2dc Revert "[ValueTracking][InstSimplify] Support min/max selects in computeConstantRange()"
This reverts commit 106f0cdefb.

This change impacts the AMDGPU smed3.ll and umed3.ll codegen tests.

llvm-svn: 356424
2019-03-18 22:26:27 +00:00
Nikita Popov 106f0cdefb [ValueTracking][InstSimplify] Support min/max selects in computeConstantRange()
Add support for min/max flavor selects in computeConstantRange(),
which allows us to fold comparisons of a min/max against a constant
in InstSimplify. This was suggested by spatel as an alternative
approach to D59378. I've also added the infinite looping test from
that revision here.

Differential Revision: https://reviews.llvm.org/D59506

llvm-svn: 356415
2019-03-18 21:35:19 +00:00
Nikita Popov 930341ba30 [InstCombine] Add tests for add nuw + uaddo; NFC
Baseline tests for D59471 (InstCombine of `add nuw` and `uaddo` with
constants).

Patch by Dan Robertson.

Differential Revision: https://reviews.llvm.org/D59472

llvm-svn: 356414
2019-03-18 21:35:09 +00:00
Nikita Popov 05baa9ee1a [InstSimplify] Add additional icmp of min/max tests; NFC
These are baseline tests for D59506.

llvm-svn: 356408
2019-03-18 21:19:56 +00:00
Nikita Popov c1d4fc8a62 [InstCombine] Improve with.overflow intrinsic tests; NFC
- Do not use unnamed values in saddo tests
- Add tests for canonicalization of a constant arg0

Patch by Dan Robertson.

Differential Revision: https://reviews.llvm.org/D59476

llvm-svn: 356403
2019-03-18 20:08:35 +00:00
Warren Ristow ad7d0ded2e [SCEV] Guard movement of insertion point for loop-invariants
This reinstates r347934, along with a tweak to address a problem with
PHI node ordering that that commit created (or exposed). (That commit
was reverted at r348426, due to the PHI node issue.)

Original commit message:

r320789 suppressed moving the insertion point of SCEV expressions with
dev/rem operations to the loop header in non-loop-invariant situations.
This, and similar, hoisting is also unsafe in the loop-invariant case,
since there may be a guard against a zero denominator. This is an
adjustment to the fix of r320789 to suppress the movement even in the
loop-invariant case.

This fixes PR30806.

Differential Revision: https://reviews.llvm.org/D57428

llvm-svn: 356392
2019-03-18 18:52:35 +00:00
Sanjay Patel 08b5e68ef6 [InstCombine] add/adjust test for NaN checks; NFC
llvm-svn: 356383
2019-03-18 17:37:05 +00:00
Sanjay Patel 6063393536 [InstCombine] allow general vector constants for funnel shift to shift transforms
Follow-up to:
rL356338
rL356369

We can calculate an arbitrary vector constant minus the bitwidth, so there's
no need to limit this transform to scalars and splats.

llvm-svn: 356372
2019-03-18 14:27:51 +00:00
Sanjay Patel 84de8a30a0 [InstCombine] extend rotate-left-by-constant canonicalization to funnel shift
Follow-up to:
rL356338

Rotates are a special case of funnel shift where the 2 input operands
are the same value, but that does not need to be a restriction for the
canonicalization when the shift amount is a constant.

llvm-svn: 356369
2019-03-18 14:10:11 +00:00
Sanjay Patel d7f1539322 [InstCombine] add funnel shift tests with arbitrary constants; NFC
llvm-svn: 356367
2019-03-18 13:35:51 +00:00
Matt Arsenault 4873056ced Remove immarg from llvm.expect
The LangRef claimed this was required to be a constant, but this
appears to be wrong.

Fixes bug 41079.

llvm-svn: 356353
2019-03-17 23:16:18 +00:00
Sanjay Patel b3bcd95771 [InstCombine] canonicalize rotate right by constant to rotate left
This was noted as a backend problem:
https://bugs.llvm.org/show_bug.cgi?id=41057
...and subsequently fixed for x86:
rL356121
But we should canonicalize these in IR for the benefit of all targets
and improve IR analysis such as CSE.

llvm-svn: 356338
2019-03-17 19:08:00 +00:00
Sanjay Patel a3a2f9424e [InstCombine] add tests for rotate by constant using funnel intrinsics; NFC
llvm-svn: 356337
2019-03-17 18:50:39 +00:00
Philip Reames 68a2e4d48b [SimplifyDemandedVec] Strengthen handling all undef lanes (particularly GEPs)
A change of two parts:
1) A generic enhancement for all callers of SDVE to exploit the fact that if all lanes are undef, the result is undef.
2) A GEP specific piece to strengthen/fix the vector index undef element handling, and call into the generic infrastructure when visiting the GEP.

The result is that we replace a vector gep with at least one undef in each lane with a undef.  We can also do the same for vector intrinsics.  Once the masked.load patch (D57372) has landed, I'll update to include call tests as well.

Differential Revision: https://reviews.llvm.org/D57468

llvm-svn: 356293
2019-03-15 19:54:06 +00:00
Sanjay Patel 052d1b7b66 [InstCombine] add tests for logic of NaN fcmps; NFC
llvm-svn: 356287
2019-03-15 18:14:25 +00:00
Philip Reames 6867b1f7de [tests] Add a test for constexpr mask as requested in D57372
llvm-svn: 356285
2019-03-15 18:06:32 +00:00
Sanjay Patel a70c9d49af [InstCombine] add tests for masked store/scatter; NFC
Baseline tests for D57247

llvm-svn: 356283
2019-03-15 18:00:28 +00:00
Florian Hahn 728293ac87 [LSR] Update test from rL356256 after rebase.
llvm-svn: 356257
2019-03-15 12:37:50 +00:00
Florian Hahn d9e88f7b7f [LSR] Check for signed overflow in NarrowSearchSpaceByDetectingSupersets.
We are adding a sign extended IR value to an int64_t, which can cause
signed overflows, as in the attached test case, where we have a formula
with BaseOffset = -1 and a constant with numeric_limits<int64_t>::min().

If the addition would overflow, skip the simplification for this
formula. Note that the target triple is required to trigger the failure.

Reviewers: qcolombet, gilr, kparzysz, efriedma

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D59211

llvm-svn: 356256
2019-03-15 12:17:36 +00:00
Matt Arsenault 1d83670dbd AMDGPU: Remove intrinsic operand assert
Before r355981, this was under LLVM_DEBUG. I don't think the assert is
quite right, but this really should be a verifier check. Instcombine
should not be asserting on this sort of thing.

llvm-svn: 356219
2019-03-14 23:45:09 +00:00
Sanjay Patel 2c9275a790 [CGP] add another bailout for degenerate code (PR41064)
This is almost the same as:
rL355345
...and should prevent any potential crashing from examples like:
https://bugs.llvm.org/show_bug.cgi?id=41064
...although the bug was masked by:
rL355823
...and I'm not sure how to repro the problem after that change.

llvm-svn: 356218
2019-03-14 23:14:31 +00:00
Paul Robinson 96c1f2cd6c Tighten up tests that use -debugify as a shortcut. NFC
These now verify that a given instruction has a specific source
location, rather than any old location. We want to make sure we
propagate the correct locations from one instruction to another.

llvm-svn: 356217
2019-03-14 23:09:17 +00:00
Nikita Popov 48eb21ee5f [InstCombine] Add tests for range-based saturing math overflow; NFC
Tests for cases where overflow can be determined, but not based on
known bits.

llvm-svn: 356203
2019-03-14 21:06:46 +00:00
Sanjay Patel 38f07b1966 [InstCombine] remove duplicate tests
These got accidentally doubled with rL356191.

llvm-svn: 356195
2019-03-14 19:41:21 +00:00
Sanjay Patel de1d5d3675 [InstCombine] canonicalize funnel shift constant shift amount to be modulo bitwidth
The shift argument is defined to be modulo the bitwidth, so if that argument
is a constant, we can always reduce the constant to its minimal form to allow
better CSE and other follow-on transforms.

We need to be careful to ignore constant expressions here, or we will likely
infinite loop. I'm adding a general vector constant query for that case.

Differential Revision: https://reviews.llvm.org/D59374

llvm-svn: 356192
2019-03-14 19:22:08 +00:00
Sanjay Patel 6e86216531 [InstCombine] add tests for funnel shift constant shift amount mod bitwidth; NFC
llvm-svn: 356191
2019-03-14 19:22:00 +00:00
Sanjay Patel 43570a0a62 [InstCombine] add tests for funnel shift constant shift amount mod bitwidth; NFC
llvm-svn: 356175
2019-03-14 17:39:40 +00:00
Sam Parker 3b2ba20afd [ARM] Run ARMParallelDSP in the IRPasses phase
Run EarlyCSE before ParallelDSP and do this in the backend IR opt
phase.

Differential Revision: https://reviews.llvm.org/D59257

llvm-svn: 356130
2019-03-14 10:57:40 +00:00
Matt Arsenault 0253620f89 Verifier: Make sure masked load/store alignment is a power of 2
The same should also be done for scatter/gather, but the verifier
doesn't check those at all now.

llvm-svn: 356094
2019-03-13 19:46:34 +00:00
Craig Topper 9bae5ba076 [X86] Add ImmArg markings to intrinsics.
Remove test cases that checked for not crashing when immediate operands were passed not an immediate. These are now considered ill-formed in IR.

This was done by manually scanning the intrinsic file for llvm_i32_ty and llvm_i8_ty which are the predominant types we use for immediates. Most of them are on vector intrinsics. I might have missed some other intrinsics.

Differential Revision: https://reviews.llvm.org/D58302

llvm-svn: 355993
2019-03-12 23:48:07 +00:00
Matt Arsenault caf1316f71 IR: Add immarg attribute
This indicates an intrinsic parameter is required to be a constant,
and should not be replaced with a non-constant value.

Add the attribute to all AMDGPU and generic intrinsics that comments
indicate it should apply to. I scanned other target intrinsics, but I
don't see any obvious comments indicating which arguments are intended
to be only immediates.

This breaks one questionable testcase for the autoupgrade. I'm unclear
on whether the autoupgrade is supposed to really handle declarations
which were never valid. The verifier fails because the attributes now
refer to a parameter past the end of the argument list.

llvm-svn: 355981
2019-03-12 21:02:54 +00:00
Philip Reames 9b6b4fac83 [SROA] Fix a crash when trying to convert a memset to an non-integral pointer type
The included test case currently crashes on tip of tree. Rather than adding a bailout, I chose to restructure the code so that the existing helper function could be used. Given that, the majority of the diff is NFC-ish, but the key difference is that canConvertValue returns false when only one side is a non-integral pointer.

Thanks to Cherry Zhang for the test case.

Differential Revision: https://reviews.llvm.org/D59000

llvm-svn: 355962
2019-03-12 20:15:05 +00:00
Tim Northover 8935aca9c7 CodeGenPrep: preserve inbounds attribute when sinking GEPs.
Targets can potentially emit more efficient code if they know address
computations never overflow. For example ILP32 code on AArch64 (which only has
64-bit address computation) can ignore the possibility of overflow with this
extra information.

llvm-svn: 355926
2019-03-12 15:22:23 +00:00
Fangrui Song f260967055 [SimplifyLibCalls] Fix comments about fputs, memchr, and s[n]printf. NFC
llvm-svn: 355905
2019-03-12 10:31:52 +00:00
Sanjoy Das 3f5ce18658 Reland "Relax constraints for reduction vectorization"
Change from original commit: move test (that uses an X86 triple) into the X86
subdirectory.

Original description:
Gating vectorizing reductions on *all* fastmath flags seems unnecessary;
`reassoc` should be sufficient.

Reviewers: tvvikram, mkuper, kristof.beyls, sdesmalen, Ayal

Reviewed By: sdesmalen

Subscribers: dcaballe, huntergr, jmolloy, mcrosier, jlebar, bixia, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D57728

llvm-svn: 355889
2019-03-12 01:31:44 +00:00
Sanjoy Das 2136a5bc49 Revert "Relax constraints for reduction vectorization"
This reverts commit r355868.  Breaks hexagon.

llvm-svn: 355873
2019-03-11 22:37:31 +00:00
Sanjoy Das 93f8cc186a Relax constraints for reduction vectorization
Summary:
Gating vectorizing reductions on *all* fastmath flags seems unnecessary;
`reassoc` should be sufficient.

Reviewers: tvvikram, mkuper, kristof.beyls, sdesmalen, Ayal

Reviewed By: sdesmalen

Subscribers: dcaballe, huntergr, jmolloy, mcrosier, jlebar, bixia, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D57728

llvm-svn: 355868
2019-03-11 21:36:41 +00:00
Brian Gesiak d7b68132d8 [coroutines][PR40979] Ignore unreachable uses across suspend points
Summary:
Depends on https://reviews.llvm.org/D59069.

https://bugs.llvm.org/show_bug.cgi?id=40979 describes a bug in which the
-coro-split pass would assert that a use was across a suspend point from
a definition. Normally this would mean that a value would "spill" across
a suspend point and thus need to be stored in the coroutine frame. However,
in this case the use was unreachable, and so it would not be necessary
to store the definition on the frame.

To prevent the assert, simply remove unreachable basic blocks from a
coroutine function before computing spills. This avoids the assert
reported in PR40979.

Reviewers: GorNishanov, tks2103

Reviewed By: GorNishanov

Subscribers: EricWF, jdoerfert, llvm-commits, lewissbaker

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D59068

llvm-svn: 355852
2019-03-11 18:31:28 +00:00
Jeremy Morse 90ede5f4bf [SimplifyCFG] Retain debug info when threading jumps with critical edges
Fixes bug 38023: https://bugs.llvm.org/show_bug.cgi?id=38023

The SimplifyCFG pass will perform jump threading in some cases where
doing so is trivial and would simplify the CFG. When folding a series
of blocks with redundant conditional branches into an unconditional "critical
edge" block, it does not keep the debug location associated with the previous
conditional branch.

This patch fixes the bug described by copying the debug info from the
old conditional branch to the new unconditional branch instruction, and
adds a regression test for the SimplifyCFG pass that covers this case.

Patch by Stephen Tozer!

Differential Revision: https://reviews.llvm.org/D59206

llvm-svn: 355833
2019-03-11 16:23:59 +00:00
Sam Parker 52760bf435 [CGP] Limit distance between overflow math and cmp
Inserting an overflowing arithmetic intrinsic can increase register
pressure by producing two values at a point where only one is needed,
while the second use maybe several blocks away. This increase in
pressure is likely to be more detrimental on performance than
rematerialising one of the original instructions.
    
So, check that the arithmetic and compare instructions are no further
apart than their immediate successor/predecessor.

Differential Revision: https://reviews.llvm.org/D59024

llvm-svn: 355823
2019-03-11 13:19:46 +00:00
Jeremy Morse b60aea4131 [JumpThreading] Retain debug info when replacing branch instructions
Fixes bug 37966: https://bugs.llvm.org/show_bug.cgi?id=37966

The Jump Threading pass will replace certain conditional branch
instructions with unconditional branches when it can prove that only one
branch can occur. Prior to this patch, it would not carry the debug
info from the old instruction to the new one.

This patch fixes the bug described by copying the debug info from the
conditional branch instruction to the new unconditional branch
instruction, and adds a regression test for the Jump Threading pass that
covers this case.

Patch by Stephen Tozer!

Differential Revision: https://reviews.llvm.org/D58963

llvm-svn: 355822
2019-03-11 11:48:57 +00:00
Craig Topper 69f8c1653d [ScalarizeMaskedMemIntrin] Use IRBuilder functions that take uint32_t/uint64_t for getelementptr, extractelement, and insertelement.
This saves needing to call getInt32 ourselves. Making the code a little shorter.

The test changes are because insert/extract use getInt64 internally. Shouldn't be a functional issue.

This cleanup because I plan to write similar code for expandload/compressstore.

llvm-svn: 355767
2019-03-09 02:08:41 +00:00
Rong Xu ce3be45cac [CodeGenPrepare] Fix ModifiedDT flag in optimizeSelectInst
r44412 fixed a huge compile time regression but it needed ModifiedDT flag to be
maintained correctly in optimizations in optimizeBlock() and optimizeInst().
Function optimizeSelectInst() does not update the flag.
This patch propagates the flag in optimizeSelectInst() back to
optimizeBlock().

This patch also removes ModifiedDT in CodeGenPrepare class (which is not used).
The property of ModifiedDT is now recorded in a ref parameter.

Differential Revision: https://reviews.llvm.org/D59139

llvm-svn: 355751
2019-03-08 22:46:18 +00:00
Clement Courbet 8e16d73346 [SelectionDAG] Allow the user to specify a memeq function.
Summary:
Right now, when we encounter a string equality check,
e.g. `if (memcmp(a, b, s) == 0)`, we try to expand to a comparison if `s` is a
small compile-time constant, and fall back on calling `memcmp()` else.

This is sub-optimal because memcmp has to compute much more than
equality.

This patch replaces `memcmp(a, b, s) == 0` by `bcmp(a, b, s) == 0` on platforms
that support `bcmp`.

`bcmp` can be made much more efficient than `memcmp` because equality
compare is trivially parallel while lexicographic ordering has a chain
dependency.

Subscribers: fedor.sergeev, jyknight, ckennelly, gchatelet, llvm-commits

Differential Revision: https://reviews.llvm.org/D56593

llvm-svn: 355672
2019-03-08 09:07:45 +00:00
Florian Hahn 6ca0985aa5 [InterleavedAccessAnalysis] Fix integer overflow in insertMember.
Without checking for integer overflow, invalid members can be added
 e.g. if the calculated key overflows, becomes positive and the largest key.

This fixes
      https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=7560
      https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=13128
      https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=13229

Reviewers: Ayal, anna, hsaito, efriedma

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D55538

llvm-svn: 355613
2019-03-07 17:50:16 +00:00
David Green ffc922ec35 [LSR] Attempt to increase the accuracy of LSR's setup cost
In some loops, we end up generating loop induction variables that look like:
  {(-1 * (zext i16 (%i0 * %i1) to i32))<nsw>,+,1}
As opposed to the simpler:
  {(zext i16 (%i0 * %i1) to i32),+,-1}
i.e we count up from -limit to 0, not the simpler counting down from limit to
0. This is because the scores, as LSR calculates them, are the same and the
second is filtered in place of the first. We end up with a redundant SUB from 0
in the code.

This patch tries to make the calculation of the setup cost a little more
thoroughly, recursing into the scev members to better approximate the setup
required. The cost function for comparing LSR costs is:

return std::tie(C1.NumRegs, C1.AddRecCost, C1.NumIVMuls, C1.NumBaseAdds,
                C1.ScaleCost, C1.ImmCost, C1.SetupCost) <
       std::tie(C2.NumRegs, C2.AddRecCost, C2.NumIVMuls, C2.NumBaseAdds,
                C2.ScaleCost, C2.ImmCost, C2.SetupCost);
So this will only alter results if none of the other variables turn out to be
different.

Differential Revision: https://reviews.llvm.org/D58770

llvm-svn: 355597
2019-03-07 13:44:40 +00:00
Nick Desaulniers 212c8ac23f [LoopRotate] fix crash encountered with callbr
Summary:
While implementing inlining support for callbr
(https://bugs.llvm.org/show_bug.cgi?id=40722), I hit a crash in Loop
Rotation when trying to build the entire x86 Linux kernel
(drivers/char/random.c). This is a small fix up to r353563.

Test case is drivers/char/random.c (with callbr's inlined), then ran
through creduce, then `opt -opt-bisect-limit=<limit>`, then bugpoint.

Thanks to Craig Topper for immediately spotting the fix, and teaching me
how to fish.

Reviewers: craig.topper, jyknight

Reviewed By: craig.topper

Subscribers: hiraditya, llvm-commits, srhines

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D58929

llvm-svn: 355564
2019-03-06 23:04:40 +00:00
Rong Xu 3ee1524afc [PGO] Fix hexagon buildbot errors in r355541
Add "REQUIRES: x86-registered-target" to thinlto test cases.

llvm-svn: 355556
2019-03-06 22:16:47 +00:00
Rong Xu 05c0afe842 [PGO] Context sensitive PGO (part 4)
Part 4 of CSPGO changes:
(1) add support in cmake for cspgo build.
(2) fix an issue in big endian.
(3) test cases.

Differential Revision: https://reviews.llvm.org/D54175

llvm-svn: 355541
2019-03-06 19:31:37 +00:00
Philip Reames 9549f7560f [AtomicExpand] Allow libcall expansion for non-zero address spaces (try 2)
Restore a reverted commit, with the silly mistake fixed.  Sorry for the previous breakage.  

Be consistent about how we treat atomics in non-zero address spaces.  If we get to the backend, we tend to lower them as if in address space 0.  Do the same if we need to insert a libcall instead.

Differential Revision: https://reviews.llvm.org/D58760

llvm-svn: 355540
2019-03-06 19:27:13 +00:00
Nikita Popov 884feb1b69 [InstCombine] Fold add nsw + sadd.with.overflow
Fold `add nsw` and `sadd.with.overflow` with constants if the addition
does not overflow.

Part of https://bugs.llvm.org/show_bug.cgi?id=38146.

Patch by Dan Robertson.

Differential Revision: https://reviews.llvm.org/D58881

llvm-svn: 355530
2019-03-06 18:30:00 +00:00
Mitch Phillips f0c21e2ff5 Revert "[AtomicExpand] Allow libcall expansion for non-zero address spaces" for buildbot failures.
llvm-svn: 355461
2019-03-06 00:25:40 +00:00
Florian Hahn 13bbcb3264 [ARM] Sink zext/sext operands for add and sub to enable vsubl generation.
This uses the infrastructure added in rL353152 to sink zext and sexts to
sub/add users, to enable vsubl/vaddl generation when NEON is available.

See https://bugs.llvm.org/show_bug.cgi?id=40025.

Reviewers: SjoerdMeijer, t.p.northover, samparker, efriedma

Reviewed By: samparker

Differential Revision: https://reviews.llvm.org/D58063

llvm-svn: 355460
2019-03-06 00:10:03 +00:00
Philip Reames 1e4c5d3611 [AtomicExpand] Allow libcall expansion for non-zero address spaces
Be consistent about how we treat atomics in non-zero address spaces.  If we get to the backend, we tend to lower them as if in address space 0.  Do the same if we need to insert a libcall instead.

Differential Revision: https://reviews.llvm.org/D58760

llvm-svn: 355453
2019-03-05 23:00:14 +00:00
Florian Hahn add2d2e304 [SLP] Fix invalid triple in X86 tests
x86-64 is an invalid architecture in triples. Changing it to the correct
triple (x86_64) changes some tests, because SLP is not deemed profitable
any more.

Reviewers: ABataev, RKSimon, spatel

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D58931

llvm-svn: 355420
2019-03-05 17:56:35 +00:00
David Green 4511f3fa86 [SCEV] Ensure that isHighCostExpansion takes into account what is being divided
A SCEV is not low-cost just because you can divide it by a power of 2. We need to also
check what we are dividing to make sure it too is not a high-code expansion. This helps
to not expand the exit value of certain loops, helping not to bloat the code.

The change in no-iv-rewrite.ll is reverting back to what it was testing before rL194116,
and looks a lot like the other tests in replace-loop-exit-folds.ll.

Differential Revision: https://reviews.llvm.org/D58435

llvm-svn: 355393
2019-03-05 12:12:18 +00:00
David Green 3bcb0aa7f9 [SCEV] Add some extra tests for IndVarSimplifys loop exit values. NFC.
Add some tests for various loops of the form:
  while(S >= 32) {
    S -= 32;
    something();
  };
  return S;

llvm-svn: 355389
2019-03-05 11:18:55 +00:00
Florian Hahn fd2d89f98b Fix invalid target triples in tests. (NFC)
llvm-svn: 355349
2019-03-04 23:37:41 +00:00
Sanjay Patel 3b2d0bc7c2 [CodeGenPrepare] avoid crashing on non-canonical/degenerate code
The test is reduced from an example in the post-commit thread for:
rL354746
http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20190304/632396.html

While we must avoid dying here, the real question should be:
Why is non-canonical and/or degenerate code making it to CGP when
using the new pass manager?

llvm-svn: 355345
2019-03-04 22:47:13 +00:00
Sanjay Patel 6e32b46b1d [ConstantHoisting] avoid hang/crash from unreachable blocks (PR40930)
I'm not too familiar with this pass, so there might be a better
solution, but this appears to fix the degenerate:
PR40930
PR40931
PR40932
PR40934
...without affecting any real-world code.

As we've seen in several other passes, when we have unreachable blocks,
they can contain semi-bogus IR and/or cause unexpected conditions. We
would not typically expect these patterns to make it this far, but we
have to guard against them anyway.

llvm-svn: 355337
2019-03-04 20:57:14 +00:00
Nikita Popov 8670faf939 [InstCombine] Add tests for add nsw + sadd.with.overflow; NFC
Baseline tests for D58881, which fixes part of PR38146.

Patch by Dan Robertson.

llvm-svn: 355328
2019-03-04 19:35:46 +00:00
Davide Italiano 672bec223d [InstCombine] Mark debug values as unavailable after DCE.
Fixes PR40838.

llvm-svn: 355301
2019-03-04 04:38:58 +00:00
Sanjay Patel e076491759 [InstCombine] remove stale FIXME comment from test; NFC
llvm-svn: 355293
2019-03-03 19:08:54 +00:00
Sanjay Patel 2a70703770 [ValueTracking] do not try to peek through bitcasts in computeKnownBitsFromAssume()
There are no tests for this case, and I'm not sure how it could ever work,
so I'm just removing this option from the matcher. This should fix PR40940:
https://bugs.llvm.org/show_bug.cgi?id=40940

llvm-svn: 355292
2019-03-03 18:59:33 +00:00
Amaury Sechet d341a94261 Add extra ops in add to sub transform test in order to enforce proper operand ordering. NFC
llvm-svn: 355291
2019-03-03 15:11:13 +00:00
Amaury Sechet 315d0bbb9c Add test case for add to sub transformation. NFC
llvm-svn: 355277
2019-03-02 20:12:25 +00:00
Sanjay Patel 1f65903dc1 [InstCombine] move add after smin/smax
Follow-up to rL355221.
This isn't specifically called for within PR14613,
but we'll get there eventually if it's not already
requested in some other bug report.

https://rise4fun.com/Alive/5b0

  Name: smax
  Pre: WillNotOverflowSignedSub(C1,C0)
  %a = add nsw i8 %x, C0
  %cond = icmp sgt i8 %a, C1
  %r = select i1 %cond, i8 %a, i8 C1
  =>
  %c2 = icmp sgt i8 %x, C1-C0
  %u2 = select i1 %c2, i8 %x, i8 C1-C0
  %r = add nsw i8 %u2, C0

  Name: smin
  Pre: WillNotOverflowSignedSub(C1,C0)
  %a = add nsw i32 %x, C0
  %cond = icmp slt i32 %a, C1
  %r = select i1 %cond, i32 %a, i32 C1
  =>
  %c2 = icmp slt i32 %x, C1-C0
  %u2 = select i1 %c2, i32 %x, i32 C1-C0
  %r = add nsw i32 %u2, C0

llvm-svn: 355272
2019-03-02 16:45:10 +00:00
Sanjay Patel 42ad8685c6 [InstCombine] add tests for add+smin/smax; NFC
llvm-svn: 355271
2019-03-02 16:45:05 +00:00
Xing GUO 23b1dfe675 [Transforms] fix typo in test case. NFC.
llvm-svn: 355265
2019-03-02 08:32:32 +00:00
Florian Hahn 98f11a7d75 [SCEV] Handle case where MaxBECount is less precise than ExactBECount for OR.
In some cases, MaxBECount can be less precise than ExactBECount for AND
and OR (the AND case was PR26207). In the OR test case, both ExactBECounts are
undef, but MaxBECount are different, so we hit the assertion below. This
patch uses the same solution the AND case already uses.

Assertion failed:
   ((isa<SCEVCouldNotCompute>(ExactNotTaken) || !isa<SCEVCouldNotCompute>(MaxNotTaken))
     && "Exact is not allowed to be less precise than Max"), function ExitLimit

This patch also consolidates test cases for both AND and OR in a single
test case.

Fixes https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=13245

Reviewers: sanjoy, efriedma, mkazantsev

Reviewed By: sanjoy

Differential Revision: https://reviews.llvm.org/D58853

llvm-svn: 355259
2019-03-02 02:31:44 +00:00
Craig Topper 35f55d72f6 [X86] Remove IntrArgMemOnly from target specific gather/scatter intrinsics
IntrArgMemOnly implies that only memory pointed to by pointer typed arguments will be accessed. But these intrinsics allow you to pass null to the pointer argument and put the full address into the index argument. Other passes won't be able to understand this.

A colleague found that ISPC was creating gathers like this and then dead store elimination removed some stores because it didn't understand what the gather was doing since the pointer argument was null.

Differential Revision: https://reviews.llvm.org/D58805

llvm-svn: 355228
2019-03-01 21:02:40 +00:00
Craig Topper e6bfb0919c [X86] Add test case for D58805. NFC
This demonstrates dead store elimination removing a store that may alias a gather that uses null as its base.

llvm-svn: 355227
2019-03-01 21:02:34 +00:00
Philip Reames cf0a978e1f [InstCombine] Extend saturating idempotent atomicrmw transform to FP
I'm assuming that the nan propogation logic for InstructonSimplify's handling of fadd and fsub is correct, and applying the same to atomicrmw.

Differential Revision: https://reviews.llvm.org/D58836

llvm-svn: 355222
2019-03-01 19:50:36 +00:00
Sanjay Patel 6e1e7e1c3e [InstCombine] move add after umin/umax
In the motivating cases from PR14613:
https://bugs.llvm.org/show_bug.cgi?id=14613
...moving the add enables us to narrow the
min/max which eliminates zext/trunc which
enables signficantly better vectorization.
But that bug is still not completely fixed.

https://rise4fun.com/Alive/5KQ

  Name: umax
  Pre: C1 u>= C0
  %a = add nuw i8 %x, C0
  %cond = icmp ugt i8 %a, C1
  %r = select i1 %cond, i8 %a, i8 C1
  =>
  %c2 = icmp ugt i8 %x, C1-C0
  %u2 = select i1 %c2, i8 %x, i8 C1-C0
  %r = add nuw i8 %u2, C0

  Name: umin
  Pre: C1 u>= C0
  %a = add nuw i32 %x, C0
  %cond = icmp ult i32 %a, C1
  %r = select i1 %cond, i32 %a, i32 C1
  =>
  %c2 = icmp ult i32 %x, C1-C0
  %u2 = select i1 %c2, i32 %x, i32 C1-C0
  %r = add nuw i32 %u2, C0

llvm-svn: 355221
2019-03-01 19:42:40 +00:00
Sanjay Patel 20292a0526 [InstCombine] add tests for umin/umax narrowing (PR14613); NFC
llvm-svn: 355220
2019-03-01 19:42:34 +00:00
Philip Reames 2226e9a745 [LICM] Infer proper alignment from loads during scalar promotion
This patch fixes an issue where we would compute an unnecessarily small alignment during scalar promotion when no store is not to be guaranteed to execute, but we've proven load speculation safety. Since speculating a load requires proving the existing alignment is valid at the new location (see Loads.cpp), we can use the alignment fact from the load.

For non-atomics, this is a performance problem. For atomics, this is a correctness issue, though an *incredibly* rare one to see in practice. For atomics, we might not be able to lower an improperly aligned load or store (i.e. i32 align 1). If such an instruction makes it all the way to codegen, we *may* fail to codegen the operation, or we may simply generate a slow call to a library function. The part that makes this super hard to see in practice is that the memory location actually *is* well aligned, and instcombine knows that. So, to see a failure, you have to have a) hit the bug in LICM, b) somehow hit a depth limit in InstCombine/ValueTracking to avoid fixing the alignment, and c) then have generated an instruction which fails codegen rather than simply emitting a slow libcall. All around, pretty hard to hit.

Differential Revision: https://reviews.llvm.org/D58809

llvm-svn: 355217
2019-03-01 18:45:05 +00:00
Philip Reames 1648f95eb1 [Tests] More missing atomicrmw combines
llvm-svn: 355215
2019-03-01 18:24:05 +00:00
Philip Reames 21f7c35df1 [Tests] Add tests for missed optimizations of saturating and idempotent FP atomicrmws
llvm-svn: 355212
2019-03-01 18:10:37 +00:00
Philip Reames 77982868c5 [InstCombine] Extend "idempotent" atomicrmw optimizations to floating point
An idempotent atomicrmw is one that does not change memory in the process of execution.  We have already added handling for the various integer operations; this patch extends the same handling to floating point operations which were recently added to IR.

Note: At the moment, we canonicalize idempotent fsub to fadd when ordering requirements prevent us from using a load.  As discussed in the review, I will be replacing this with canonicalizing both floating point ops to integer ops in the near future.

Differential Revision: https://reviews.llvm.org/D58251

llvm-svn: 355210
2019-03-01 18:00:07 +00:00
Sanjay Patel 12b1f2418d [InstCombine] add tests for add+umin/umax canonicalization; NFC
Fixing this should solve the biggest part of the vector problems seen in:
https://bugs.llvm.org/show_bug.cgi?id=14613

llvm-svn: 355206
2019-03-01 17:29:10 +00:00
Fangrui Song f4b25f700a [ConstantHoisting] Call cleanup() in ConstantHoistingPass::runImpl to avoid dangling elements in ConstIntInfoVec for new PM
Summary:
ConstIntInfoVec contains elements extracted from the previous function.
In new PM, releaseMemory() is not called and the dangling elements can
cause segfault in findConstantInsertionPoint.

Rename releaseMemory() to cleanup() to deliver the idea that it is
mandatory and call cleanup() in ConstantHoistingPass::runImpl to fix
this.

Reviewers: ormris, zzheng, dmgreen, wmi

Reviewed By: ormris, wmi

Subscribers: llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D58589

llvm-svn: 355174
2019-03-01 05:27:01 +00:00
Sanjay Patel 4a47f5f550 [InstCombine] fold adds of constants separated by sext/zext
This is part of a transform that may be done in the backend:
D13757
...but it should always be beneficial to fold this sooner in IR
for all targets.

https://rise4fun.com/Alive/vaiW

  Name: sext add nsw
  %add = add nsw i8 %i, C0
  %ext = sext i8 %add to i32
  %r = add i32 %ext, C1
  =>
  %s = sext i8 %i to i32
  %r = add i32 %s, sext(C0)+C1

  Name: zext add nuw
  %add = add nuw i8 %i, C0
  %ext = zext i8 %add to i16
  %r = add i16 %ext, C1
  =>
  %s = zext i8 %i to i16
  %r = add i16 %s, zext(C0)+C1

llvm-svn: 355118
2019-02-28 19:05:26 +00:00
Philip Reames 9915b1fa4a [Tests] Strengthen LICM test corpus to show alignment striping. (part 2)
This should have been part of r355110, but my brain isn't quite awake yet, despite the coffee.  Per the original submit comment... Doing scalar promotion w/o being able to prove the alignment of the hoisted load or sunk store is a bug.  Update tests to actually show the alignment so that impact of the patch which fixes this can be seen.

llvm-svn: 355111
2019-02-28 18:17:51 +00:00
Philip Reames 63a67527a4 [Tests] Strengthen LICM test corpus to show alignment striping
Doing scalar promotion w/o being able to prove the alignment of the hoisted load or sunk store is a bug.  Update tests to actually show the alignment so that impact of the patch which fixes this can be seen.

llvm-svn: 355110
2019-02-28 18:08:04 +00:00
Nikita Popov af2b0bef43 [ValueTracking] More accurate unsigned sub overflow detection
Second part of D58593.

Compute precise overflow conditions based on all known bits, rather
than just the sign bits. Unsigned a - b overflows iff a < b, and we
can determine whether this always/never happens based on the minimal
and maximal values achievable for a and b subject to the known bits
constraint.

llvm-svn: 355109
2019-02-28 18:04:20 +00:00
Nikita Popov 6c57395fb4 [ValueTracking] More accurate unsigned add overflow detection
Part of D58593.

Compute precise overflow conditions based on all known bits, rather
than just the sign bits. Unsigned a + b overflows iff a > ~b, and we
can determine whether this always/never happens based on the minimal
and maximal values achievable for a and ~b subject to the known bits
constraint.

llvm-svn: 355072
2019-02-28 08:11:20 +00:00
Eric Christopher 07944353fc Temporarily revert "ArgumentPromotion should copy all metadata to new Function" and the dependent patch "Refine ArgPromotion metadata handling" as they're causing segfaults in argument promotion.
This reverts commits r354032 and r353537.

llvm-svn: 355060
2019-02-28 01:11:12 +00:00
Sanjay Patel ac96a92d82 [InstCombine] add tests for add+ext+add; NFC
llvm-svn: 355020
2019-02-27 19:27:45 +00:00
Nikita Popov a54fe15610 [InstCombine] Add additional add.sat overflow tests; NFC
Baseline for D58593.

llvm-svn: 354996
2019-02-27 16:18:29 +00:00
Sanjay Patel e375074dff [InstCombine] regenerate complete checks; NFC
llvm-svn: 354993
2019-02-27 15:59:30 +00:00
Vedant Kumar 73522d1678 [HotColdSplit] Disable splitting for sanitized functions
Splitting can make sanitizer errors harder to understand, as the
trapping instruction may not be in the function where the bug was
detected.

rdar://48142697

llvm-svn: 354931
2019-02-26 22:55:46 +00:00
Sanjay Patel 9dada83d6c [InstSimplify] remove zero-shift-guard fold for general funnel shift
As discussed on llvm-dev:
http://lists.llvm.org/pipermail/llvm-dev/2019-February/130491.html

We can't remove the compare+select in the general case because
we are treating funnel shift like a standard instruction (as
opposed to a special instruction like select/phi).

That means that if one of the operands of the funnel shift is
poison, the result is poison regardless of whether we know that
the operand is actually unused based on the instruction's
particular semantics.

The motivating case for this transform is the more specific
rotate op (rather than funnel shift), and we are preserving the
fold for that case because there is no chance of introducing
extra poison when there is no anonymous extra operand to the
funnel shift.

llvm-svn: 354905
2019-02-26 18:26:56 +00:00
Sanjay Patel 421c6e6864 [InstSimplify] add tests for rotate; NFC
Rotate is a special-case of funnel shift that has different
poison constraints than the general case. That's not visible
yet in the existing tests, but it needs to be corrected.

llvm-svn: 354894
2019-02-26 16:44:08 +00:00
Sanjay Patel 840f5d6dce [InstCombine] remove duplicate (but not updated) tests; NFC
Not sure how it happened, but rL354886 was a duplicate of rL354881,
but not updated with rL354887.

llvm-svn: 354889
2019-02-26 15:25:42 +00:00
Sanjay Patel e8bf0f79bd [InstCombine] canonicalize more unsigned saturated add with 'not'
Yet another pattern variation suggested by:
https://bugs.llvm.org/show_bug.cgi?id=14613

There are 8 more potential commuted patterns here on top of the
8 that were already handled (rL354221, rL354276, rL354393).
We have the obvious commute of the 'add' + commute of the cmp
predicate/operands (ugt/ult) + commute of the select operands:

Name: base
%notx = xor i32 %x, -1
%a = add i32 %notx, %y
%c = icmp ult i32 %x, %y
%r = select i1 %c, i32 -1, i32 %a
=>
%c2 = icmp ult i32 %a, %y
%r = select i1 %c2, i32 -1, i32 %a

Name: ugt
%notx = xor i32 %x, -1
%a = add i32 %notx, %y
%c = icmp ugt i32 %y, %x
%r = select i1 %c, i32 -1, i32 %a
=>
%c2 = icmp ult i32 %a, %y
%r = select i1 %c2, i32 -1, i32 %a

Name: commute select
%notx = xor i32 %x, -1
%a = add i32 %notx, %y
%c = icmp ult i32 %y, %x
%r = select i1 %c, i32 %a, i32 -1
=>
%c2 = icmp ult i32 %a, %y
%r = select i1 %c2, i32 -1, i32 %a

Name: ugt + commute select
%notx = xor i32 %x, -1
%a = add i32 %notx, %y
%c = icmp ugt i32 %x, %y
%r = select i1 %c, i32 %a, i32 -1
=>
%c2 = icmp ult i32 %a, %y
%r = select i1 %c2, i32 -1, i32 %a

https://rise4fun.com/Alive/den

llvm-svn: 354887
2019-02-26 15:18:49 +00:00
Sanjay Patel c9af54bb55 [InstCombine] add more tests for saturated add; NFC
llvm-svn: 354886
2019-02-26 15:18:44 +00:00
Sanjay Patel 0d4f9216aa [InstCombine] add more tests for saturated add; NFC
llvm-svn: 354881
2019-02-26 14:40:23 +00:00
Matt Arsenault f97ace5639 AMDGPU: Remove IntrReadMem from memtime/memrealtime intrinsics
EarlyCSE with MemorySSA was able to use this to merge multiple calls
with no intervening store.

llvm-svn: 354814
2019-02-25 20:16:11 +00:00
Simon Pilgrim a066f1f9e6 [Vectorizer] Add vectorization support for fixed smul/umul intrinsics
This requires a couple of tweaks to existing vectorization functions as they were assuming that only the second call argument (ctlz/cttz/powi) could ever be the 'always scalar' argument, but for smul.fix + umul.fix its the third argument.

Differential Revision: https://reviews.llvm.org/D58616

llvm-svn: 354790
2019-02-25 15:42:02 +00:00
Simon Pilgrim f54186abb6 [SLPVectorizer][X86] Add fixed smul/umul tests
Baseline tests - fixed mul intrinsics aren't flagged as vectorizable yet

llvm-svn: 354783
2019-02-25 13:26:30 +00:00
Nikita Popov b7918f3c14 [InstCombine] Add tests for PR40846; NFC
The icmps are the same as the overflow result of the intrinsic.

llvm-svn: 354760
2019-02-24 21:55:37 +00:00
Nikita Popov bdefe47857 [InstCombine] Move with.overflow tests to separate file; NFC
And regenerate checks. I had to rename some variables, because
update_test_checks can't deal with the same variable names used
in lower and upper case. I've also dropped the result type aliases,
as just using the type directly gives a cleaner result.

llvm-svn: 354759
2019-02-24 21:55:31 +00:00
Sanjay Patel 26aa702463 [InstCombine] add test for icmp+add fold; NFC
llvm-svn: 354750
2019-02-24 17:31:15 +00:00
Sanjay Patel 9907d3c8b4 [InstCombine] canonicalize add/sub with bool
add A, sext(B) --> sub A, zext(B)

We have to choose 1 of these forms, so I'm opting for the
zext because that's easier for value tracking.

The backend should be prepared for this change after:
D57401
rL353433

This is also a preliminary step towards reducing the amount
of bit hackery that we do in IR to optimize icmp/select.
That should be waiting to happen at a later optimization stage.

The seeming regression in the fuzzer test was discussed in:
D58359

We were only managing that fold in instcombine by luck, and
other passes should be able to deal with that better anyway.

llvm-svn: 354748
2019-02-24 16:57:45 +00:00
Sanjay Patel 986a024c19 [InstCombine] regenerate checks; NFC
llvm-svn: 354747
2019-02-24 16:11:58 +00:00
Sanjay Patel cb04ba032f [CGP] add special-cases to form unsigned add with overflow (PR40486)
There's likely a missed IR canonicalization for at least 1 of these
patterns. Otherwise, we wouldn't have needed the pattern-matching
enhancement in D57516.

Note that -- unlike usubo added with D57789 -- the TLI hook for
this transform defaults to 'on'. So if there's any perf fallout
from this, targets should look at how they're lowering the uaddo
node in SDAG and/or override that hook.

The x86 diffs suggest that there's some missing pattern-matching
for forming inc/dec.

This should fix the remaining known problems in:
https://bugs.llvm.org/show_bug.cgi?id=40486
https://bugs.llvm.org/show_bug.cgi?id=31754

llvm-svn: 354746
2019-02-24 15:31:27 +00:00
Sanjay Patel 973143ab79 [CGP] add tests for uaddo increment/decrement; NFC
llvm-svn: 354699
2019-02-22 23:19:34 +00:00
Roman Tereshin 99a6672bba [LowerSwitch][AMDGPU] Do not handle impossible values
This patch adds LazyValueInfo to LowerSwitch to compute the range of the
value being switched over and reduce the size of the tree LowerSwitch
builds to lower a switch.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D58096

llvm-svn: 354670
2019-02-22 14:33:46 +00:00
Alina Sbirlea d2d3244363 [LoopSimplifyCFG] Update MemorySSA after r353911.
Summary:
MemorySSA is not properly updated in LoopSimplifyCFG after recent changes. Use SplitBlock utility to resolve that and clear all updates once handleDeadExits is finished.
All updates that follow are removal of edges which are safe to handle via the removeEdge() API.
Also, deleting dead blocks is done correctly as is, i.e. delete from MemorySSA before updating the CFG and DT.

Reviewers: mkazantsev, rtereshin

Subscribers: sanjoy, jlebar, Prazek, george.burgess.iv, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D58524

llvm-svn: 354613
2019-02-21 19:54:05 +00:00
Joey Gouly 92af1360f3 [InferAddressSpaces] Fix crash on select of non-ptr operands
Check the operands of a select are pointers, to determine if it is an address
expression or not.

https://reviews.llvm.org/D58226

llvm-svn: 354576
2019-02-21 12:31:36 +00:00
Max Kazantsev 8ff2e86994 [NFC] Replace EOL in test file
llvm-svn: 354562
2019-02-21 09:56:23 +00:00
Max Kazantsev b672602f9e [TEST] Add failing test that shows problems with MSSA update in LoopSimplifyCFG
llvm-svn: 354559
2019-02-21 09:40:24 +00:00
Max Kazantsev 10489d76f6 [LoopSimplifyCFG] Add missing MSSA edge deletion
When we create fictive switch in preheader, we should take
care about MSSA and delete edge between old preheader and
header.

llvm-svn: 354547
2019-02-21 05:51:29 +00:00
Sanjay Patel 198cc305e9 [CGP] match a special-case of unsigned subtract overflow
This is the 'sub0' (negate) pattern from PR31754:
https://bugs.llvm.org/show_bug.cgi?id=31754

llvm-svn: 354519
2019-02-20 21:23:04 +00:00
Sanjay Patel 8d91faa481 [CGP][x86] add tests for usubo special-case; NFC
This is another example from PR31754:
https://bugs.llvm.org/show_bug.cgi?id=31754

llvm-svn: 354475
2019-02-20 15:40:58 +00:00
Sanjay Patel 68171e3cd6 [InstSimplify] use any-zero matcher for fcmp folds
The m_APFloat matcher does not work with anything but strict
splat vector constants, so we could miss these folds and then
trigger an assertion in instcombine:
https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=13201

The previous attempt at this in rL354406 had a logic bug that
actually triggered a regression test failure, but I failed to
notice it the first time.

llvm-svn: 354467
2019-02-20 14:34:00 +00:00
Simon Pilgrim 9921e73d95 [SLPVectorizer][X86] Add add/sub/mul overflow tests
Baseline tests - overflow intrinsics aren't flagged as vectorizable yet

llvm-svn: 354454
2019-02-20 12:04:54 +00:00
Eric Christopher 2534592b9f Temporarily Revert "[X86][SLP] Enable SLP vectorization for 128-bit horizontal X86 instructions (add, sub)"
As this has broken the lto bootstrap build for 3 days and is
showing a significant regression on the Dither_benchmark results (from
the LLVM benchmark suite) -- specifically, on the
BENCHMARK_FLOYD_DITHER_128, BENCHMARK_FLOYD_DITHER_256, and
BENCHMARK_FLOYD_DITHER_512; the others are unchanged.  These have
regressed by about 28% on Skylake, 34% on Haswell, and over 40% on
Sandybridge.

This reverts commit r353923.

llvm-svn: 354434
2019-02-20 04:42:07 +00:00
Sanjay Patel 5fefb02e27 [InstCombine] regenerate test checks; NFC
llvm-svn: 354420
2019-02-20 01:24:59 +00:00
Philip Reames 79d5e16f51 [GVN] Small tweaks to comments, style, and missed vector handling
Noticed these while doing a final sweep of the code to make sure I hadn't missed anything in my last couple of patches.  The (minor) missed optimization was noticed because of the stylistic fix to avoid an overly specific cast.

llvm-svn: 354412
2019-02-20 00:31:28 +00:00
Sanjay Patel 49f97395ab Revert "[InstSimplify] use any-zero matcher for fcmp folds"
This reverts commit 058bb83513.
Forgot to update another test affected by this change.

llvm-svn: 354408
2019-02-20 00:20:38 +00:00
Philip Reames a259dc3263 [GVN] Fix last crasher w/non-integral pointers
Same case as for memset and memcpy, but this time for clobbering stores and loads.  We still can't allow coercion to or from non-integrals, regardless of the transform.

Now that I'm done the whole little sequence, it seems apparent that we'd entirely missed reasoning about clobbers in the original GVN support for non-integral pointers.

My appologies, I thought we'd upstreamed all of this, but it turns out we were still carrying a downstream hack which hid all of these issues.  My chanks to Cherry Zhang for helping debug.

llvm-svn: 354407
2019-02-20 00:15:54 +00:00
Sanjay Patel 058bb83513 [InstSimplify] use any-zero matcher for fcmp folds
The m_APFloat matcher does not work with anything but strict
splat vector constants, so we could miss these folds and then
trigger an assertion in instcombine:
https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=13201

llvm-svn: 354406
2019-02-20 00:09:50 +00:00
Sanjay Patel 9cf04addf3 [InstSimplify] add vector tests for fcmp+fabs; NFC
llvm-svn: 354404
2019-02-19 23:58:02 +00:00
Philip Reames 952d234d00 [GVN] Fix a crash bug w/non-integral pointers and memtransfers
Problem is very similiar to the one fixed for memsets in r354399, we try to coerce a value to non-integral type, and then crash while try to do so.  Since we shouldn't be doing such coercions to start with, easy fix.  From inspection, I see two other cases which look to be similiar and will follow up with most test cases and fixes if confirmed.

llvm-svn: 354403
2019-02-19 23:49:38 +00:00
Philip Reames 322eb7660e [GVN] Fix a non-integral pointer bug w/vector types
GVN generally doesn't forward structs or array types, but it *will* forward vector types to non-vectors and vice versa.  As demonstrated in tests, we need to inhibit the same set of transforms for vector of non-integral pointers as for non-integral pointers themselves.

llvm-svn: 354401
2019-02-19 23:19:51 +00:00
Philip Reames 92756a80e7 [GVN] Fix a crash bug around non-integral pointers
If we encountered a location where we tried to forward the value of a memset to a load of a non-integral pointer, we crashed.  Such a forward is not legal in general, but we can forward null pointers.  Test for both cases are included.

llvm-svn: 354399
2019-02-19 23:07:15 +00:00
Philip Reames 979587d91d [Test] Autogenerate existing tests before adding more
llvm-svn: 354398
2019-02-19 22:57:30 +00:00
Sanjay Patel c1e0184317 [InstCombine] reduce even more unsigned saturated add with 'not' op
We want to use the sum in the icmp to allow matching with
m_UAddWithOverflow and eliminate the 'not'. This is discussed
in D51929 and is another step towards solving PR14613:
https://bugs.llvm.org/show_bug.cgi?id=14613

  Name: uaddsat, -1 fval
  %notx = xor i32 %x, -1
  %a = add i32 %x, %y
  %c = icmp ugt i32 %notx, %y
  %r = select i1 %c, i32 %a, i32 -1
  =>
  %a = add i32 %x, %y
  %c2 = icmp ugt i32 %y, %a
  %r = select i1 %c2, i32 -1, i32 %a

  Name: uaddsat, -1 fval + ult
  %notx = xor i32 %x, -1
  %a = add i32 %x, %y
  %c = icmp ult i32 %y, %notx
  %r = select i1 %c, i32 %a, i32 -1
  =>
  %a = add i32 %x, %y
  %c2 = icmp ugt i32 %y, %a
  %r = select i1 %c2, i32 -1, i32 %a

https://rise4fun.com/Alive/nTp

llvm-svn: 354393
2019-02-19 22:14:21 +00:00
Craig Topper fdc71aca8b [ArgumentPromotion] Add a lit.local.cfg to disable X86 specific tests if the X86 target doesn't exist.
Hopefully this fixes some buildbot failure after r354376

llvm-svn: 354388
2019-02-19 21:58:23 +00:00
Sanjay Patel dcb93c0dda [InstCombine] rearrange saturated add folds; NFC
This is no-functional-change-intended, but that was also
true when it was part of rL354276, and I managed to lose
2 predicates for the fold with constant...causing much bot
distress. So this time I'm adding a couple of negative tests
to avoid that.

llvm-svn: 354384
2019-02-19 21:46:13 +00:00
Andrew Scheidecker 8ca3f3863e [ConstantFold] Fix misfolding fcmp of a ConstantExpr NaN with itself.
The code incorrectly inferred that the relationship of a constant expression
to itself is FCMP_OEQ (ordered and equal), when it's actually FCMP_UEQ
(unordered *or* equal). This change corrects that, and adds some more limited
folds that can be done in this case.

Differential revision: https://reviews.llvm.org/D51216

llvm-svn: 354381
2019-02-19 21:21:54 +00:00
Andrew Scheidecker bddf892a6d [ConstantFold] Fix misfolding of icmp with a bitcast FP second operand.
In the process of trying to eliminate the bitcast, this was producing a
malformed icmp with FP operands.

Differential revision: https://reviews.llvm.org/D51215

llvm-svn: 354380
2019-02-19 21:03:20 +00:00
Craig Topper 6d0190f21c [X86] Don't consider functions ABI compatible for ArgumentPromotion pass if they view 512-bit vectors differently.
The use of the -mprefer-vector-width=256 command line option mixed with functions
using vector intrinsics can create situations where one function thinks 512 vectors
are legal, but another fucntion does not.

If a 512 bit vector is passed between them via a pointer, its possible ArgumentPromotion
might try to pass by value instead. This will result in type legalization for the two
functions handling the 512 bit vector differently leading to runtime failures.

Had the 512 bit vector been passed by value from clang codegen, both functions would
have been tagged with a min-legal-vector-width=512 function attribute. That would
make them be legalized the same way.

I observed this issue in 32-bit mode where a union containing a 512 bit vector was
being passed by a function that used intrinsics to one that did not. The caller
ended up passing in zmm0 and the callee tried to read it from ymm0 and ymm1.

The fix implemented here is just to consider it a mismatch if two functions
would handle 512 bit differently without looking at the types that are being
considered. This is the easist and safest fix, but it can be improved in the future.

Differential Revision: https://reviews.llvm.org/D58390

llvm-svn: 354376
2019-02-19 20:12:20 +00:00
Craig Topper 236e1ce1d9 [X86] Filter out tuning feature flags and a few ISA feature flags when checking for function inline compatibility.
Tuning flags don't have any effect on the available instructions so aren't a good reason to prevent inlining.

There are also some ISA flags that don't have any intrinsics our ABI requirements that we can exclude. I've put only the most basic ones like cmpxchg16b and lahfsahf. These are interesting because they aren't present in all 64-bit CPUs, but we have codegen workarounds when they aren't present.

Loosening these checks can help with scenarios where a caller has a more specific CPU than a callee. The default tuning flags on our generic 'x86-64' CPU can currently make it inline compatible with other CPUs. I've also added an example test for 'nocona' and 'prescott' where 'nocona' is just a 64-bit capable version of 'prescott' but in 32-bit mode they should be completely compatible.

I've based the implementation here of the similar code in AMDGPU.

Differential Revision: https://reviews.llvm.org/D58371

llvm-svn: 354355
2019-02-19 17:05:11 +00:00
Sanjay Patel d8b4efcb6b [CGP] form usub with overflow from sub+icmp
The motivating x86 cases for forming the intrinsic are shown in PR31754 and PR40487:
https://bugs.llvm.org/show_bug.cgi?id=31754
https://bugs.llvm.org/show_bug.cgi?id=40487
..and those are shown in the IR test file and x86 codegen file.

Matching the usubo pattern is harder than uaddo because we have 2 independent values rather than a def-use.

This adds a TLI hook that should preserve the existing behavior for uaddo formation, but disables usubo
formation by default. Only x86 overrides that setting for now although other targets will likely benefit
by forming usbuo too.

Differential Revision: https://reviews.llvm.org/D57789

llvm-svn: 354298
2019-02-18 23:33:05 +00:00
Sanjay Patel 8a35d339c9 Revert "[InstCombine] reduce even more unsigned saturated add with 'not' op"
This reverts commit 079b610c29.
Bots are failing after this change on a stage 2 compile of clang.

llvm-svn: 354277
2019-02-18 16:04:22 +00:00
Sanjay Patel 079b610c29 [InstCombine] reduce even more unsigned saturated add with 'not' op
We want to use the sum in the icmp to allow matching with
m_UAddWithOverflow and eliminate the 'not'. This is discussed
in D51929 and is another step towards solving PR14613:
https://bugs.llvm.org/show_bug.cgi?id=14613

  Name: uaddsat, -1 fval
  %notx = xor i32 %x, -1
  %a = add i32 %x, %y
  %c = icmp ugt i32 %notx, %y
  %r = select i1 %c, i32 %a, i32 -1
  =>
  %a = add i32 %x, %y
  %c2 = icmp ugt i32 %y, %a
  %r = select i1 %c2, i32 -1, i32 %a

  Name: uaddsat, -1 fval + ult
  %notx = xor i32 %x, -1
  %a = add i32 %x, %y
  %c = icmp ult i32 %y, %notx
  %r = select i1 %c, i32 %a, i32 -1
  =>
  %a = add i32 %x, %y
  %c2 = icmp ugt i32 %y, %a
  %r = select i1 %c2, i32 -1, i32 %a

https://rise4fun.com/Alive/nTp

llvm-svn: 354276
2019-02-18 15:21:39 +00:00
Sanjay Patel 92b5b195db [InstCombine] add even more tests for unsigned saturated add; NFC
The pattern-matching from rL354221 / rL354224 doesn't cover
these, so we're up to 8 different commuted possibilities.

There may still be 1 more variant of this pattern.

llvm-svn: 354236
2019-02-17 20:01:59 +00:00
Max Kazantsev 635b988578 [TEST] Remove 2>&1 from tests
Avoid confusing CHECKS with debug dumps of sets that can be printed differently.

llvm-svn: 354229
2019-02-17 18:22:00 +00:00
Sanjay Patel b341ee7071 [InstCombine] reduce more unsigned saturated add with 'not' op
We want to use the sum in the icmp to allow matching with
m_UAddWithOverflow and eliminate the 'not'. This is discussed
in D51929 and is another step towards solving PR14613:
https://bugs.llvm.org/show_bug.cgi?id=14613

  Name: not op
  %notx = xor i32 %x, -1
  %a = add i32 %x, %y
  %c = icmp ult i32 %notx, %y
  %r = select i1 %c, i32 -1, i32 %a
  =>
  %a = add i32 %x, %y
  %c2 = icmp ult i32 %a, %y
  %r = select i1 %c2, i32 -1, i32 %a

  Name: not op ugt
  %notx = xor i32 %x, -1
  %a = add i32 %x, %y
  %c = icmp ugt i32 %y, %notx
  %r = select i1 %c, i32 -1, i32 %a
  =>
  %a = add i32 %x, %y
  %c2 = icmp ult i32 %a, %y
  %r = select i1 %c2, i32 -1, i32 %a

https://rise4fun.com/Alive/niom

(The matching here is still incomplete.)

llvm-svn: 354224
2019-02-17 16:48:50 +00:00
Sanjay Patel db02293d9d [InstCombine] add more tests for unsigned saturated add; NFC
Extend the pattern-matching from rL354219 / rL354221.

llvm-svn: 354223
2019-02-17 16:44:11 +00:00
Sanjay Patel bee2073542 [InstCombine] reduce unsigned saturated add with 'not' op
We want to use the sum in the icmp to allow matching with
m_UAddWithOverflow and eliminate the 'not'. This is discussed
in D51929 and is another step towards solving PR14613:
https://bugs.llvm.org/show_bug.cgi?id=14613

(The matching here is incomplete. Trying to take minimal steps
to make sure we don't induce infinite looping from existing
canonicalizations of the 'select'.)

llvm-svn: 354221
2019-02-17 15:58:48 +00:00
Sanjay Patel 3e1193743c [InstCombine] add tests for unsigned saturated add; NFC
We're missing IR canonicalizations for this op as shown in D51929.

llvm-svn: 354219
2019-02-17 15:09:41 +00:00
Philip Reames cae6c767e8 [InstCombine] Convert atomicrmws to xchg or store where legal
Implement two more transforms of atomicrmw:
1) We can convert an atomicrmw which produces a known value in memory into an xchg instead.
2) We can convert an atomicrmw xchg w/o users into a store for some orderings.

Differential Revision: https://reviews.llvm.org/D58290

llvm-svn: 354170
2019-02-15 21:23:51 +00:00
Vedant Kumar 5f5cac3ae2 [CodeExtractor] Do not lift lifetime.end markers for region inputs
If a lifetime.end marker occurs along one path through the extraction
region, but not another, then it's still incorrect to lift the marker,
because there is some path through the extracted function which would
ordinarily not reach the marker. If the call to the extracted function
is in a loop, unrolling can cause inputs to the function to become
optimized out as undef after the first iteration.

To prevent incorrect stack slot merging in the calling function, it
should be sufficient to lift lifetime.start markers for region inputs.
I've tested this theory out by doing a stage2 check-all with randomized
splitting enabled.

This is a follow-up to r353973, and there's additional context for this
change in https://reviews.llvm.org/D57834.

rdar://47896986

Differential Revision: https://reviews.llvm.org/D58253

llvm-svn: 354159
2019-02-15 18:46:58 +00:00
Philip Reames d9c4646d05 [Tests] Demonstrate more missing atomicrmw transforms
llvm-svn: 354146
2019-02-15 17:11:30 +00:00
Sanjay Patel 8a2b543a13 [InstCombine] fix crash while trying to narrow a binop of shuffles (PR40734)
https://bugs.llvm.org/show_bug.cgi?id=40734

llvm-svn: 354144
2019-02-15 16:31:55 +00:00
Max Kazantsev 184bd7a0d8 [TEST] Update test comments, refactor checks with update_test_checks.py
This patch changes messages in guards-related tests to adequately reflect
reality.

llvm-svn: 354101
2019-02-15 07:06:52 +00:00
Philip Reames 485474208e Canonicalize all integer "idempotent" atomicrmw ops
For "idempotent" atomicrmw instructions which we can't simply turn into load, canonicalize the operation and constant. This reduces the matching needed elsewhere in the optimizer, but doesn't directly impact codegen.

For any architecture where OR/Zero is not a good default choice, you can extend the AtomicExpand lowerIdempotentRMWIntoFencedLoad mechanism. I reviewed X86 to make sure this works well, haven't audited other backends.

Differential Revision: https://reviews.llvm.org/D58244

llvm-svn: 354058
2019-02-14 20:41:17 +00:00
Philip Reames 97067d3c73 Teach instcombine about remaining "idempotent" atomicrmw types
Expand on Quentin's r353471 patch which converts some atomicrmws into loads. Handle remaining operation types, and fix a slight bug. Atomic loads are required to have alignment. Since this was within the InstCombine fixed point, somewhere else in InstCombine was adding alignment before the verifier saw it, but still, we should fix.

Terminology wise, I'm using the "idempotent" naming that is used for the same operations in AtomicExpand and X86ISelLoweringInfo. Once this lands, I'll add similar tests for AtomicExpand, and move the pattern match function to a common location. In the review, there was seemingly consensus that "idempotent" was slightly incorrect for this context.  Once we setle on a better name, I'll update all uses at once.

Differential Revision: https://reviews.llvm.org/D58242

llvm-svn: 354046
2019-02-14 18:39:14 +00:00
Philip Reames 617cd10bf7 [Tests] Add tests for all idemptotent atomicrmws
Base for a followup patch to strengthen the InstCombine transform, and then integrate the ExpandAtomics logic.

llvm-svn: 354036
2019-02-14 17:01:15 +00:00
Teresa Johnson c374a800e7 Refine ArgPromotion metadata handling
Summary:
In r353537 we now copy all metadata to the new function, with the old
being removed when the old function is eliminated. In some cases the old
function is dropped to a declaration (seems to only occur with the old
PM). Go ahead and clear all metadata from the old function to handle that
case, since verification will complain otherwise. This is consistent
with what was being done for debug metadata before r353537.

Reviewers: davidxl, uabelho

Subscribers: jdoerfert, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D58215

llvm-svn: 354032
2019-02-14 14:14:24 +00:00
Florian Hahn 6ab83b7db6 [LoopUnrollPeel] Add case where we should forget the peeled loop from SCEV.
The test case requires the peeled loop to be forgotten after peeling,
even though it does not have a parent. When called via the unroller,
SE->forgetTopmostLoop is also called, so the test case would also pass
without any SCEV invalidation, but peelLoop is exposed as utility
function. Also, in the test case, simplifyLoop will make changes,
removing the loop from SCEV, but it is better to not rely on this
behavior.

Reviewers: sanjoy, mkazantsev

Reviewed By: mkazantsev

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D58192

llvm-svn: 354031
2019-02-14 13:59:39 +00:00
Sanjay Patel d05ba496bc [ConstProp] add IR tests to show miscompiles; NFC
A fix for these is proposed in D51216.

llvm-svn: 353992
2019-02-13 23:27:31 +00:00
Vedant Kumar 4b0cc9a7c8 [CodeExtractor] Only lift lifetime markers present in the extraction region
When CodeExtractor finds liftime markers referencing inputs to the
extraction region, it lifts these markers out of the region and inserts
them around the call to the extracted function (see r350420, PR39671).

However, it should *only* lift lifetime markers that are actually
present in the extraction region. I.e., if a start marker is present in
the extraction region but a corresponding end marker isn't (or vice
versa), only the start marker (or end marker, resp.) should be lifted.

Differential Revision: https://reviews.llvm.org/D57834

llvm-svn: 353973
2019-02-13 19:53:38 +00:00
Jeremy Morse f10af3f134 [DebugInfo][InstCombine] Prefer to salvage debuginfo over sinking it
When instcombine sinks an instruction between two basic blocks, it sinks any
dbg.value users in the source block with it, to prevent debug use-before-free.
However we can do better by attempting to salvage the debug users, which would
avoid moving where the variable location changes. If we successfully salvage,
still sink a (cloned) dbg.value with the sunk instruction, as the sunk
instruction is more likely to be "live" later in the compilation process.

If we can't salvage dbg.value users of a sunk instruction, mark the dbg.values
in the original block as being undef. This terminates any earlier variable
location range, and represents the fact that we've optimized out the variable
location for a portion of the program.

Differential Revision: https://reviews.llvm.org/D56788

llvm-svn: 353936
2019-02-13 10:54:53 +00:00
Max Kazantsev 2bb95e7c76 [GuardWidening] Support widening of explicitly expressed guards
This patch adds support of guards expressed in explicit form via
`widenable_condition` in Guard Widening pass.

Differential Revision: https://reviews.llvm.org/D56075
Reviewed By: reames

llvm-svn: 353932
2019-02-13 09:56:30 +00:00
Anton Afanasyev ca9aff9353 [X86][SLP] Enable SLP vectorization for 128-bit horizontal X86 instructions (add, sub)
Try to use 64-bit SLP vectorization. In addition to horizontal instrs
this change triggers optimizations for partial vector operations (for instance,
using low halfs of 128-bit registers xmm0 and xmm1 to multiply <2 x float> by
<2 x float>).

Fixes llvm.org/PR32433

llvm-svn: 353923
2019-02-13 08:26:43 +00:00
Matt Arsenault d24296e282 AMDGPU: Ignore CodeObjectV3 when inlining
This was inhibiting inlining of library functions when clang was
invoking the inliner directly. This is covering a bit of a mess with
subtarget feature handling, and this shouldn't be a subtarget
feature. The behavior is different depending on whether you are using
a -mattr flag in clang, or llc, opt.

llvm-svn: 353899
2019-02-12 23:30:11 +00:00
Sanjay Patel cf3a906fb4 [ConstProp] add test for miscompile from bitcast transform; NFC
This problem goes with the fix in D51215.

llvm-svn: 353883
2019-02-12 21:49:56 +00:00
Sam McCall a860219c5e [LoopSimplifyCFG] Fix test broken in release mode in r353813
llvm-svn: 353842
2019-02-12 14:43:30 +00:00
Jeremy Morse b33a5c7347 [DebugInfo] Don't salvage load operations (PR40628).
Salvaging a redundant load instruction into a debug expression hides a
memory read from optimisation passes. Passes that alter memory behaviour
(such as LICM promoting memory to a register) aren't aware of these debug
memory reads and leave them unaltered, making the debug variable location
point somewhere unsafe.

Teaching passes to know about these debug memory reads would be challenging
and probably incomplete. Finding dbg.value instructions that need to be fixed
would likely be computationally expensive too, as more analysis would be
required. It's better to not generate debug-memory-reads instead, alas.

Changed tests:
 * DeadStoreElim: test for salvaging of intermediate operations contributing
   to the dead store, instead of salvaging of the redundant load,
 * GVN: remove debuginfo behaviour checks completely, this behaviour is still
   covered by other tests,
 * InstCombine: don't test for salvaged loads, we're removing that behaviour.

Differential Revision: https://reviews.llvm.org/D57962

llvm-svn: 353824
2019-02-12 10:54:30 +00:00
Max Kazantsev 2a184af221 [IndVars] Fix corner case with unreachable Phi inputs. PR40454
Logic in `getInsertPointForUses` doesn't account for a corner case when `Def`
only comes to a Phi user from unreachable blocks. In this case, the incoming
value may be arbitrary (and not even available in the input block) and break
the loop-related invariants that are asserted below.

In fact, if we encounter this situation, no IR modification is needed. This
Phi will be simplified away with nearest cleanup.

Differential Revision: https://reviews.llvm.org/D58045
Reviewed By: spatel

llvm-svn: 353816
2019-02-12 09:59:44 +00:00
Max Kazantsev bf6af8fbf0 [LoopSimplifyCFG] Change logic of dead loops removal to avoid hitting asserts
The function `LI.erase` has some invariants that need to be preserved when it
tries to remove a loop which is not the top-level loop. In particular, it
requires loop's preheader to be strictly in loop's parent. Our current logic
of deletion of dead blocks may erase the information about preheader before we
handle the loop, and therefore we may hit this assertion.

This patch changes the logic of loop deletion: we make them top-level loops
before we actually erase them. This allows us to trigger the simple branch of
`erase` logic which just detatches blocks from the loop and does not try to do
some complex stuff that need this invariant.

Thanks to @uabelho for reporting this!

Differential Revision: https://reviews.llvm.org/D57221
Reviewed By: fedor.sergeev

llvm-svn: 353813
2019-02-12 09:37:00 +00:00
Max Kazantsev 6bf861597c [LoopSimplifyCFG] Pay respect to LCSSA when removing dead blocks
Utility function that we use for blocks deletion always unconditionally removes
one-input Phis. In LoopSimplifyCFG, it can lead to breach of LCSSA form.
This patch alters this function to keep them if needed.

Differential Revision: https://reviews.llvm.org/D57231
Reviewed By: fedor.sergeev

llvm-svn: 353803
2019-02-12 07:48:07 +00:00
Eli Friedman 806136f8ef [LoopReroll] Fix reroll root legality checking.
The code checked that the first root was an appropriate distance from
the base value, but skipped checking the other roots. This could lead to
rerolling a loop that can't be legally rerolled (at least, not without
rewriting the loop in a non-trivial way).

Differential Revision: https://reviews.llvm.org/D56812

llvm-svn: 353779
2019-02-12 00:33:25 +00:00
Evandro Menezes f4a369596f [TargetLibraryInfo] Update run time support for Windows
It seems that, since VC19, the `float` C99 math functions are supported for all
targets, unlike the C89 ones.

According to the discussion at https://reviews.llvm.org/D57625.

llvm-svn: 353758
2019-02-11 22:12:01 +00:00
Michael Kruse 77a614a6e1 Refactor setAlreadyUnrolled() and setAlreadyVectorized().
Loop::setAlreadyUnrolled() and
LoopVectorizeHints::setLoopAlreadyUnrolled() both add loop metadata that
stops the same loop from being transformed multiple times. This patch
merges both implementations.

In doing so we fix 3 potential issues:

 * setLoopAlreadyUnrolled() kept the llvm.loop.vectorize/interleave.*
   metadata even though it will not be used anymore. This already caused
   problems such as http://llvm.org/PR40546. Change the behavior to the
   one of setAlreadyUnrolled which deletes this loop metadata.

 * setAlreadyUnrolled() used to create a new LoopID by calling
   MDNode::get with nullptr as the first operand, then replacing it by
   the returned references using replaceOperandWith. It is possible
   that MDNode::get would instead return an existing node (due to
   de-duplication) that then gets modified. To avoid, use a fresh
   TempMDNode that does not get uniqued with anything else before
   replacing it with replaceOperandWith.

 * LoopVectorizeHints::matchesHintMetadataName() only compares the
   suffix of the attribute to set the new value for. That is, when
   called with "enable", would erase attributes such as
   "llvm.loop.unroll.enable", "llvm.loop.vectorize.enable" and
   "llvm.loop.distribute.enable" instead of the one to replace.
   Fortunately, function was only called with "isvectorized".

Differential Revision: https://reviews.llvm.org/D57566

llvm-svn: 353738
2019-02-11 19:45:44 +00:00
Sanjay Patel 587fd849f0 [InstCombine] Fix matchRotate bug when one operand is a ConstantExpr shift
This bug seems to be harmless in release builds, but will cause an error in UBSAN
builds or an assertion failure in debug builds.

When it gets to this opcode comparison, it assumes both of the operands are BinaryOperators,
but the prior m_LogicalShift will also match a ConstantExpr. The cast<BinaryOperator> will
assert in a debug build, or reading an invalid value for BinaryOp from memory with
((BinaryOperator*)constantExpr)->getOpcode() will cause an error in a UBSAN build.

The test I added will fail without this change in debug/UBSAN builds, but not in release.

Patch by: @AndrewScheidecker (Andrew Scheidecker)

Differential Revision: https://reviews.llvm.org/D58049

llvm-svn: 353736
2019-02-11 19:26:27 +00:00
Alina Sbirlea 605b21739d [LICM&MSSA] Limit store hoisting.
Summary:
If there is no clobbering access for a store inside the loop, that store
can only be hoisted if there are no interfearing loads.
A more general verification introduced here: there are no loads that are
not optimized to an access outside the loop.
Addresses PR40586.

Reviewers: george.burgess.iv

Subscribers: sanjoy, jlebar, Prazek, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D57967

llvm-svn: 353734
2019-02-11 19:07:15 +00:00
Evandro Menezes 4b86c474ff [TargetLibraryInfo] Update run time support for Windows
It seems that the run time for Windows has changed and supports more math
functions than it used to, especially on AArch64, ARM, and AMD64.

Fixes PR40541.

Differential revision: https://reviews.llvm.org/D57625

llvm-svn: 353733
2019-02-11 19:02:28 +00:00
Max Kazantsev 0136e7a246 [TEST] Add missing opportunity test for PR39673
llvm-svn: 353693
2019-02-11 12:58:18 +00:00
Max Kazantsev 8ec0c5e02f [TEST] Add failing test from PR40454
llvm-svn: 353688
2019-02-11 10:44:57 +00:00
Carlos Alberto Enciso e848d426a7 [DWARF] LLVM ERROR: Broken function found, while removing Debug Intrinsics.
Check that when SimplifyCFG is flattening a 'br', all their debug intrinsic instructions are removed, including any dbg.label referencing a label associated with the basic blocks being removed.

As the test case involves a CFG transformation, move it to the correct location.

Differential Revision: https://reviews.llvm.org/D57444

llvm-svn: 353682
2019-02-11 10:16:38 +00:00
Fangrui Song 6e679f8ba5 [GlobalOpt] Simplify __cxa_atexit elimination
cxxDtorIsEmpty checks callers recursively to determine if the
__cxa_atexit-registered function is empty, and eliminates the
__cxa_atexit call accordingly.

This recursive check is unnecessary as redundant instructions and
function calls can be removed by early-cse and inliner. In addition,
cxxDtorIsEmpty does not mark visited function and it may visit a
function exponential times (multiplication principle).

llvm-svn: 353603
2019-02-09 09:18:37 +00:00
Gabor Buella 53980b24b7 Extra processing for BitCast + PHI in InstCombine
For some specific cases with bitcast A->B->A with intervening PHI nodes InstCombiner::optimizeBitCastFromPhi transformation creates extra PHI nodes, which are actually a copy of already created PHI or in another words, they are redundant. These extra PHI nodes could lead to extra move instructions generated after DeSSA transformation. This happens when several conditions are met

 - SROA kicks in and creates new alloca;
 - there is a simple assignment L = R, which falls under 'canonicalize loads' done by combineLoadToOperationType (this transformation is by default). Exactly this transformation is the reason of bitcasts generated;
 - the alloca is then used in A->B->A + PHI chain;
 - there is a loop unrolling.

As a result optimizeBitCastFromPhi creates as many of PHI nodes for each new SROA alloca as loop unrolling factor is. These new extra PHI nodes are redundant actually except of one and should not be created. Moreover the idea of optimizeBitCastFromPhi is to get rid of the cast (when possible) but that doesn't happen in these conditions.

The proposed fix is to do the cast replacement for the whole calculated/accumulated PHI closure not for one cast only, which is an argument to the optimizeBitCastFromPhi. These will help to accomplish several things: 1) avoid extra PHI nodes generated as all casts which may trigger optimizeBitCastFromPhi transformation will be replaced, 3) bitcasts will be replaced, and 3) create more opportunities to remove dead code, which appears after the replacement.

A new test case shows that it's possible to get rid of all bitcasts completely and get quite good code reduction.

Author: Igor Tsimbalist <igor.v.tsimbalist@intel.com>

Reviewed By: Carrot

Differential Revision: https://reviews.llvm.org/D57053

llvm-svn: 353595
2019-02-09 01:44:28 +00:00
Craig Topper 784929d045 Implementation of asm-goto support in LLVM
This patch accompanies the RFC posted here:
http://lists.llvm.org/pipermail/llvm-dev/2018-October/127239.html

This patch adds a new CallBr IR instruction to support asm-goto
inline assembly like gcc as used by the linux kernel. This
instruction is both a call instruction and a terminator
instruction with multiple successors. Only inline assembly
usage is supported today.

This also adds a new INLINEASM_BR opcode to SelectionDAG and
MachineIR to represent an INLINEASM block that is also
considered a terminator instruction.

There will likely be more bug fixes and optimizations to follow
this, but we felt it had reached a point where we would like to
switch to an incremental development model.

Patch by Craig Topper, Alexander Ivchenko, Mikhail Dvoretckii

Differential Revision: https://reviews.llvm.org/D53765

llvm-svn: 353563
2019-02-08 20:48:56 +00:00
Teresa Johnson 3ce8112dad ArgumentPromotion should copy all metadata to new Function
Summary:
ArgumentPromotion had code to specifically move the dbg metadata over to
the new function, but other metadata such as the function_entry_count
!prof metadata was not. Replace code that moved dbg metadata with a call
to copyMetadata. The old metadata is automatically removed when the old
Function is removed.

Reviewers: davidxl

Subscribers: llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D57846

llvm-svn: 353537
2019-02-08 17:08:27 +00:00
Max Kazantsev 6b63d3a277 [LoopSimplifyCFG] Use DTU.applyUpdates instead of insert/deleteEdge
`insert/deleteEdge` methods in DTU can make updates incorrectly in some cases
(see https://bugs.llvm.org/show_bug.cgi?id=40528), and it is recommended to
use `applyUpdates` methods instead when it is needed to make a mass update in CFG.

Differential Revision: https://reviews.llvm.org/D57316
Reviewed By: kuhar

llvm-svn: 353502
2019-02-08 08:12:41 +00:00
Sergey Dmitriev 807960e6ef [CodeExtractor] Update function's assumption cache after extracting blocks from it
Summary: Assumption cache's self-updating mechanism does not correctly handle the case when blocks are extracted from the function by the CodeExtractor. As a result function's assumption cache may have stale references to the llvm.assume calls that were moved to the outlined function. This patch fixes this problem by removing extracted llvm.assume calls from the function’s assumption cache.

Reviewers: hfinkel, vsk, fhahn, davidxl, sanjoy

Reviewed By: hfinkel, vsk

Subscribers: llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D57215

llvm-svn: 353500
2019-02-08 06:55:18 +00:00
Quentin Colombet 96f54de8ff [InstCombine] Optimize `atomicrmw <op>, 0` into `load atomic` when possible
This commit teaches InstCombine how to replace an atomicrmw operation
into a simple load atomic.
For a given `atomicrmw <op>`, this is possible when:
1. The ordering of that operation is compatible with a load (i.e.,
   anything that doesn't have a release semantic).
2. <op> does not modify the value being stored

Differential Revision: https://reviews.llvm.org/D57854

llvm-svn: 353471
2019-02-07 21:27:23 +00:00
Sanjay Patel 781d883862 [InstCombine] Fix crashing from (icmp (bitcast ([su]itofp X)), Y)
This fixes a class of bugs introduced by D44367, 
which transforms various cases of icmp (bitcast ([su]itofp X)), Y to icmp X, Y. 
If the bitcast is between vector types with a different number of elements, 
the current code will produce bad IR along the lines of: icmp <N x i32> ..., <M x i32> <...>.

This patch suppresses the transform if the bitcast changes the number of vector elements.

Patch by: @AndrewScheidecker (Andrew Scheidecker)

Differential Revision: https://reviews.llvm.org/D57871

llvm-svn: 353467
2019-02-07 21:12:01 +00:00
Florian Hahn ba5acbc4fe [LV] Prevent interleaving if computeMaxVF returned None.
As discussed in D57382, interleaving should be avoided if computeMaxVF
returns None, same as we currently do for vectorization.

Fixes https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=6477

Reviewers: Ayal, dcaballe, hsaito, mkuper, rengolin

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D57837

llvm-svn: 353461
2019-02-07 20:49:10 +00:00
Teresa Johnson c36c10ddfb [HotColdSplit] With PGO add profile entry metadata to split cold function
Summary:
When compiling with profile data, ensure the split cold function gets
cold function_entry_count metadata (just use 0 since it should be cold).
Otherwise with function sections it will not be placed in the unlikely
text section with other cold code.

Reviewers: vsk

Subscribers: sebpop, hiraditya, davidxl, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D57900

llvm-svn: 353434
2019-02-07 17:50:35 +00:00
Sam Parker 67756c09f2 [LSR] Generate cross iteration indexes
Modify GenerateConstantOffsetsImpl to create offsets that can be used
by indexed addressing modes. If formulae can be generated which
result in the constant offset being the same size as the recurrence,
we can generate a pre-indexed access. This allows the pointer to be
updated via the single pre-indexed access so that (hopefully) no
add/subs are required to update it for the next iteration. For small
cores, this can significantly improve performance DSP-like loops.

Differential Revision: https://reviews.llvm.org/D55373

llvm-svn: 353403
2019-02-07 13:32:54 +00:00
Sanjay Patel 68bc5fb0ad [InstCombine] X | C == C --> (X & ~C) == 0
We should canonicalize to one of these forms,
and compare-with-zero could be more conducive
to follow-on transforms. This also leads to
generally better codegen as shown in PR40611:
https://bugs.llvm.org/show_bug.cgi?id=40611

llvm-svn: 353313
2019-02-06 16:43:54 +00:00
Sanjay Patel 51abb86f09 [InstCombine] add tests for PR40611 and regenerate checks; NFC
Lots of unrelated diffs here from the newer version of the script.

llvm-svn: 353312
2019-02-06 16:17:05 +00:00
Max Kazantsev a4ccfc1841 [LoopSimplifyCFG] Do not count dead exit blocks twice, make CFG simpler
llvm-svn: 353276
2019-02-06 07:49:17 +00:00
Vedant Kumar bd94b4287c [HotColdSplit] Do not split out `resume` instructions
Resumes that are not reachable from a cleanup landing pad are considered
to be unreachable. It’s not safe to split them out.

rdar://47808235

llvm-svn: 353242
2019-02-05 23:39:02 +00:00
Sanjay Patel cddb1e5469 [InstCombine] limit extracting shuffle transform based on uses
As discussed in D53037, this can lead to worse codegen, and we
don't generally expect the backend to be able to optimize
arbitrary shuffles. If there's only one use of the 1st shuffle,
that means it's getting removed, so that should always be
safe.

llvm-svn: 353235
2019-02-05 22:58:45 +00:00
Sanjay Patel 0272b44ea2 [InstCombine] split shuffle test to show extra use constraint; NFC
As discussed in D53037, this transform can cause codegen problems
if the 1st shuffle has multiple uses.

llvm-svn: 353233
2019-02-05 22:46:13 +00:00
Sanjay Patel b696b771bf [CGP] add test for unsigned subtract of 1 with overflow; NFC
llvm-svn: 353179
2019-02-05 15:27:40 +00:00
Florian Hahn 3b251963c3 [CGP] Add support for sinking operands to their users, if they are free.
This patch improves code generation for some AArch64 ACLE intrinsics. It adds
support to CGP to duplicate and sink operands to their user, if they can be
folded into a target instruction, like zexts and sub into usubl. It adds a
TargetLowering hook shouldSinkOperands, which looks at the operands of
instructions to see if sinking is profitable.

I decided to add a new target hook, as for the sinking to be profitable,
at least on AArch64, we have to look at multiple operands of an
instruction, instead of looking at the users of a zext for example.

The sinking is done in CGP, because it works around an instruction
selection limitation. If instruction selection is not limited to a
single basic block, this patch should not be needed any longer.

Alternatively this could be done in the LoopSink pass, which tries to
undo LICM for instructions in blocks that are not executed frequently.

Note that we do not force the operands to sink to have a single user,
because we duplicate them before sinking. Therefore this is only
desirable if they really can be done for free. Additionally we could
consider the impact on live ranges later on.

This should fix https://bugs.llvm.org/show_bug.cgi?id=40025.

As for performance, we have internal code that uses intrinsics and can
be speed up by 10% by this change.

Reviewers: SjoerdMeijer, t.p.northover, samparker, efriedma, RKSimon, spatel

Reviewed By: samparker

Differential Revision: https://reviews.llvm.org/D57377

llvm-svn: 353152
2019-02-05 10:27:40 +00:00
Max Kazantsev d5e595b7a6 [LSR] Check SCEV on isZero() after extend. PR40514
When LSR first adds SCEVs to BaseRegs, it only does it if `isZero()` has
returned false. In the end, in invocation of `InsertFormula`, it asserts that
all values there are still not zero constants. However between these two
points, it makes some transformations, in particular extends them to wider
type.

SCEV does not give us guarantee that if `S` is not a constant zero, then
`sext(S)` is also not a constant zero. It might have missed some optimizing
transforms when it was calculating `S` and then made them when it took `sext`.
For example, it may happen if previously optimizing transforms were limited
by depth or somehow else.

This patch adds a bailout when we may end up with a zero SCEV after extension.

Differential Revision: https://reviews.llvm.org/D57565
Reviewed By: samparker

llvm-svn: 353136
2019-02-05 04:30:37 +00:00
Evandro Menezes 98f356cd74 Revert "[PATCH] [TargetLibraryInfo] Update run time support for Windows"
This reverts accidental commit ff5527718d.

llvm-svn: 353118
2019-02-04 23:34:50 +00:00
Evandro Menezes ff5527718d [PATCH] [TargetLibraryInfo] Update run time support for Windows
It seems that the run time for Windows has changed and supports more math
functions than before.  Since LLVM requires at least VS2015, I assume that
this is the run time that would be redistributed with programs built with
Clang.  Thus, I based this update on the header file `math.h` that
accompanies it.

This patch addresses the PR40541.  Unfortunately, I have no access to a
Windows development environment to validate it.

llvm-svn: 353114
2019-02-04 23:29:41 +00:00
Sanjay Patel 738e17b5df [CGP] fix bogus test names/comments; NFC
Inverted operand 0 and operand 1.

llvm-svn: 353106
2019-02-04 22:37:05 +00:00
Sanjay Patel 1f9e23e3cc [CGP] add tests for usubo; NFC
llvm-svn: 353103
2019-02-04 22:27:08 +00:00
Nicolai Haehnle a69146e67e [InstCombine] Cleanup the TFE/LWE check in AMDGPU SimplifyDemanded
Summary:
The fix added in r352904 is not quite correct, or rather misleading:

1. When the texfailctrl (TFC) argument was non-constant, the fix assumed
   non-TFE/LWE, which is incorrect.

2. Regardless, this code path cannot even be hit for correct
   TFE/LWE-enabled calls, because those return a struct. Added
   a test case for those for completeness.

Change-Id: I92d314dbc67a2670f6d7adaab765ef45f56a49cf

Reviewers: hliao, dstuttard, arsenm

Subscribers: kzhuravl, jvesely, wdng, yaxunl, tpr, t-tye, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D57681

llvm-svn: 353097
2019-02-04 21:24:19 +00:00
Michael Kruse 70560a0a2c [WarnMissedTransforms] Do not warn about already vectorized loops.
LoopVectorize adds llvm.loop.isvectorized, but leaves
llvm.loop.vectorize.enable. Do not consider such a loop for user-forced
vectorization since vectorization already happened -- by prioritizing
llvm.loop.isvectorized except for TM_SuppressedByUser.

Fixes http://llvm.org/PR40546

Differential Revision: https://reviews.llvm.org/D57542

llvm-svn: 353082
2019-02-04 19:55:59 +00:00
Sanjay Patel c00bdab4c8 [CGP] use IRBuilder to simplify code
This is no-functional-change-intended although there could
be intermediate variations caused by a difference in the
debug info produced by setting that from the builder's 
insertion point. 

I'm updating the IR test file associated with this code just
to show that the naming differences from using the builder
are visible.

The motivation for adding a helper function is that we are
likely to extend this code to deal with other overflow ops.

llvm-svn: 353056
2019-02-04 16:30:46 +00:00
Dmitry Venikov 3643cbbf9c Commit tests for changes in revision D41608
llvm-svn: 353037
2019-02-04 10:32:07 +00:00
Davide Italiano 73929c4d24 [LoopIdiomRecognize] @llvm.dbg values shouldn't affect the transformation.
Summary: PR40564

Reviewers: aprantl, rnk

Subscribers: llvm-commits, hiraditya

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D57629

llvm-svn: 353007
2019-02-03 20:33:20 +00:00
Sanjay Patel 84ceae6048 [CGP] adjust target constraints for forming uaddo
There are 2 changes visible here:
1. There's no reason to limit this transform based on number
   of condition registers. That diff allows PPC to produce 
   slightly better (dot-instructions should be generally good) 
   code.
   Note: someone that cares about PPC codegen might want to 
   look closer at that output because it seems like we could
   still improve this.

2. We (probably?) should not bother trying to form uaddo (or
   other overflow ops) when there's no target support for such
   an op. This goes beyond checking whether the op is expanded
   because both PPC and AArch64 show better codegen for standard
   types regardless of whether the op is legal/custom.

llvm-svn: 353001
2019-02-03 17:53:09 +00:00
Sanjay Patel 837552fe9f [PatternMatch] add special-case uaddo matching for increment-by-one (2nd try)
This is the most important uaddo problem mentioned in PR31754:
https://bugs.llvm.org/show_bug.cgi?id=31754
...but that was overcome in x86 codegen with D57637.

That patch also corrects the inc vs. add regressions seen with the  previous attempt at this.

Still, we want to make this matcher complete, so we can potentially canonicalize the pattern 
even if it's an 'add 1' operation.
Pattern matching, however, shouldn't assume that we have canonicalized IR, so we match 4 
commuted variants of uaddo.

There's also a test with a crazy type to show that the existing CGP transform based on this 
matcher is not limited by target legality checks.

I'm not sure if the Hexagon diff means the test is no longer testing what it intended to
test, but that should be solvable in a follow-up.

Differential Revision: https://reviews.llvm.org/D57516

llvm-svn: 352998
2019-02-03 16:16:48 +00:00
Sanjay Patel b961bd68f0 [CGP] move test file to prevent bot failures
The test specifiies the triple, so it needs to be in the
x86 directory in case a bot has been configured without
the x86 target.

llvm-svn: 352992
2019-02-03 14:19:45 +00:00
Dmitry Venikov aaa709f2ec [InstSimplify] Missed optimization in math expression: log10(pow(10.0,x)) == x, log2(pow(2.0,x)) == x
Summary: This patch enables folding following instructions under -ffast-math flag: log10(pow(10.0,x)) -> x, log2(pow(2.0,x)) -> x

Reviewers: hfinkel, spatel, efriedma, craig.topper, zvi, majnemer, lebedev.ri

Reviewed By: spatel, lebedev.ri

Subscribers: lebedev.ri, llvm-commits

Differential Revision: https://reviews.llvm.org/D41940

llvm-svn: 352981
2019-02-03 03:48:30 +00:00
Florian Hahn dd2ef0af46 [LCSSA] Handle case with single new PHI faster.
If there is only a single available value, all uses must be dominated by
the single value and there is no need to search for a reaching
definition.

This drastically speeds up LCSSA in some cases. For the test case
from PR37202, it speeds up LCSSA construction by 4 times.

Time-passes without this patch for test case from PR37202:

    Total Execution Time: 29.9285 seconds (29.9276 wall clock)

    ---User Time---   --System Time--   --User+System--   ---Wall Time---  --- Name ---
    5.2786 ( 17.7%)   0.0021 (  1.2%)   5.2806 ( 17.6%)   5.2808 ( 17.6%)  Unswitch loops
    4.3739 ( 14.7%)   0.0303 ( 18.1%)   4.4042 ( 14.7%)   4.4042 ( 14.7%)  Loop-Closed SSA Form Pass
    4.2658 ( 14.3%)   0.0192 ( 11.5%)   4.2850 ( 14.3%)   4.2851 ( 14.3%)  Loop-Closed SSA Form Pass #2
    2.2307 (  7.5%)   0.0013 (  0.8%)   2.2320 (  7.5%)   2.2318 (  7.5%)  Loop Invariant Code Motion
    2.0888 (  7.0%)   0.0012 (  0.7%)   2.0900 (  7.0%)   2.0897 (  7.0%)  Unroll loops
    1.6761 (  5.6%)   0.0013 (  0.8%)   1.6774 (  5.6%)   1.6774 (  5.6%)  Value Propagation
    1.3686 (  4.6%)   0.0029 (  1.8%)   1.3716 (  4.6%)   1.3714 (  4.6%)  Induction Variable Simplification
    1.1457 (  3.8%)   0.0010 (  0.6%)   1.1468 (  3.8%)   1.1468 (  3.8%)  Loop-Closed SSA Form Pass #4
    1.1384 (  3.8%)   0.0005 (  0.3%)   1.1389 (  3.8%)   1.1389 (  3.8%)  Loop-Closed SSA Form Pass #6
    1.1360 (  3.8%)   0.0027 (  1.6%)   1.1387 (  3.8%)   1.1387 (  3.8%)  Loop-Closed SSA Form Pass #5
    1.1331 (  3.8%)   0.0010 (  0.6%)   1.1341 (  3.8%)   1.1340 (  3.8%)  Loop-Closed SSA Form Pass #3

Time passes with this patch

  Total Execution Time: 19.2802 seconds (19.2813 wall clock)

   ---User Time---   --System Time--   --User+System--   ---Wall Time---  --- Name ---
   4.4234 ( 23.2%)   0.0038 (  2.0%)   4.4272 ( 23.0%)   4.4273 ( 23.0%)  Unswitch loops
   2.3828 ( 12.5%)   0.0020 (  1.1%)   2.3848 ( 12.4%)   2.3847 ( 12.4%)  Unroll loops
   1.8714 (  9.8%)   0.0020 (  1.1%)   1.8734 (  9.7%)   1.8735 (  9.7%)  Loop Invariant Code Motion
   1.7973 (  9.4%)   0.0022 (  1.2%)   1.7995 (  9.3%)   1.8003 (  9.3%)  Value Propagation
   1.4010 (  7.3%)   0.0033 (  1.8%)   1.4043 (  7.3%)   1.4044 (  7.3%)  Induction Variable Simplification
   0.9978 (  5.2%)   0.0244 ( 13.1%)   1.0222 (  5.3%)   1.0224 (  5.3%)  Loop-Closed SSA Form Pass #2
   0.9611 (  5.0%)   0.0257 ( 13.8%)   0.9868 (  5.1%)   0.9868 (  5.1%)  Loop-Closed SSA Form Pass
   0.5856 (  3.1%)   0.0015 (  0.8%)   0.5871 (  3.0%)   0.5869 (  3.0%)  Unroll loops #2
   0.4132 (  2.2%)   0.0012 (  0.7%)   0.4145 (  2.1%)   0.4143 (  2.1%)  Loop Invariant Code Motion #3

Reviewers: efriedma, davide, mzolotukhin

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D57033

llvm-svn: 352960
2019-02-02 15:26:05 +00:00
Evandro Menezes 39ad187ec9 [InstCombine] Refactor test checks (NFC)
llvm-svn: 352935
2019-02-01 22:52:05 +00:00