Commit Graph

12050 Commits

Author SHA1 Message Date
John Brawn 39ac159c24 [LICM] Adjust how moving the re-hoist point works
In some cases the order that we hoist instructions in means that when rehoisting
(which uses the same order as hoisting) we can rehoist to a block A, then a
block B, then block A again. This currently causes an assertion failure as it
expects that when changing the hoist point it only ever moves to a block that
dominates the hoist point being moved from.

Fix this by moving the re-hoist point when it doesn't dominate the dominator of
hoisted instruction, or in other words when it wouldn't dominate the uses of
the instruction being rehoisted.

Differential Revision: https://reviews.llvm.org/D55266

llvm-svn: 350408
2019-01-04 17:12:09 +00:00
Simon Pilgrim c2aadfaaad [SLPVectorizer] Flag ADD/SUB SSAT/USAT intrinsics trivially vectorizable (PR40123)
Enables SLP vectorization for the SSE2 PADDS/PADDUS/PSUBS/PSUBUS style intrinsics

llvm-svn: 350300
2019-01-03 12:18:23 +00:00
Simon Pilgrim c5e22b29e4 [SLPVectorizer][X86] Add ADD/SUB SSAT/USAT tests (PR40123)
llvm-svn: 350297
2019-01-03 12:02:14 +00:00
Pete Cooper 697281df42 Teach ObjCARC optimizer about equivalent PHIs when eliminating autoreleaseRV/retainRV pairs
OptimizeAutoreleaseRVCall skips optimizing llvm.objc.autoreleaseReturnValue if it
sees a user which is llvm.objc.retainAutoreleasedReturnValue, and if they have
equivalent arguments (either identical or equivalent PHIs). It then assumes that
ObjCARCOpt::OptimizeRetainRVCall will optimize the pair instead.

Trouble is, ObjCARCOpt::OptimizeRetainRVCall doesn't know about equivalent PHIs
so optimizes in a different way and we are left with an unoptimized llvm.objc.autoreleaseReturnValue.

This teaches ObjCARCOpt::OptimizeRetainRVCall to also understand PHI equivalence.

rdar://problem/47005143

Reviewed By: ahatanak

Differential Revision: https://reviews.llvm.org/D56235

llvm-svn: 350284
2019-01-03 01:38:08 +00:00
Nikita Popov 41f5710328 [BDCE] Fix typo in test; NFC
shl by 32 is undefined. This was intended to be a shl by 31 as part
of a rotate sequence.

llvm-svn: 350265
2019-01-02 22:34:32 +00:00
Pete Cooper 8d58048024 Fix assert in ObjCARC optimizer when deleting retainBlock of null or undef.
The caller to EraseInstruction had this conditional:

    // ARC calls with null are no-ops. Delete them.
    if (IsNullOrUndef(Arg))

but the assert inside EraseInstruction only allowed ConstantPointerNull and not
undef or bitcasts.

This adds support for both of these cases.

rdar://problem/47003805

llvm-svn: 350261
2019-01-02 21:00:02 +00:00
Nikita Popov cc6ef7f153 [BDCE] Remove instructions without demanded bits
If an instruction has no demanded bits, remove it directly during BDCE,
instead of leaving it for something else to clean up.

Differential Revision: https://reviews.llvm.org/D56185

llvm-svn: 350257
2019-01-02 20:02:14 +00:00
Sanjay Patel 654e6aabb9 [InstCombine] canonicalize raw IR rotate patterns to funnel shift
The final piece of IR-level analysis to allow this was committed with:
rL350188

Using the intrinsics should improve transforms based on cost models
like vectorization and inlining.

The backend should be prepared too, so we can now canonicalize more
sequences of shift/logic to the intrinsics and know that the end
result should be equal or better to the original code even if the
target does not have an actual rotate instruction.

llvm-svn: 350199
2019-01-01 21:51:39 +00:00
Nikita Popov c5a023b624 [BDCE] Regenerate test checks; NFC
llvm-svn: 350190
2019-01-01 12:27:23 +00:00
Nikita Popov d4bf57be6b [BDCE] Remove -instsimplify from BDCE test; NFC
To make it more obvious which part of the transformation is carried
out by BDCE. Also drop the CHECK-IO lines which only run -instsimplify
as they don't really seem meaningful if the main check doesn't run
-instsimplify either.

llvm-svn: 350189
2019-01-01 10:17:35 +00:00
Nikita Popov bc9986e9ad Reapply "[BDCE][DemandedBits] Detect dead uses of undead instructions"
This (mostly) fixes https://bugs.llvm.org/show_bug.cgi?id=39771.

BDCE currently detects instructions that don't have any demanded bits
and replaces their uses with zero. However, if an instruction has
multiple uses, then some of the uses may be dead (have no demanded bits)
even though the instruction itself is still live. This patch extends
DemandedBits/BDCE to detect such uses and replace them with zero.
While this will not immediately render any instructions dead, it may
lead to simplifications (in the motivating case, by converting a rotate
into a simple shift), break dependencies, etc.

The implementation tries to strike a balance between analysis power and
complexity/memory usage. Originally I wanted to track demanded bits on
a per-use level, but ultimately we're only really interested in whether
a use is entirely dead or not. I'm using an extra set to track which uses
are dead. However, as initially all uses are dead, I'm not storing uses
those user is also dead. This case is checked separately instead.

The previous attempt to land this lead to miscompiles, because cases
where uses were initially dead but were later found to be live during
further analysis were not always correctly removed from the DeadUses
set. This is fixed now and the added test case demanstrates such an
instance.

Differential Revision: https://reviews.llvm.org/D55563

llvm-svn: 350188
2019-01-01 10:05:26 +00:00
Chen Zheng 4952e668f8 [InstCombine] canonicalize MUL with NEG operand
-X * Y --> -(X * Y)
X * -Y --> -(X * Y)

Differential Revision: https://reviews.llvm.org/D55961

llvm-svn: 350185
2019-01-01 01:09:20 +00:00
Chen Zheng 763c8973bf [InstCombine] [NFC] update testcases for canonicalize MUL with NEG operand
llvm-svn: 350154
2018-12-29 12:18:15 +00:00
Anna Thomas bae11e7999 [UnrollRuntime] NFC: Updated exiting tests and added more tests
Added more tests for multiple exiting blocks to the LatchExit.
Today these cases are not supported. Patch to follow soon.

llvm-svn: 350135
2018-12-28 19:21:50 +00:00
Anna Thomas 98743fa77a [UnrollRuntime] NFC: Add comment and verify LCSSA
Added -verify-loop-lcssa to test cases.
Updated comments in ConnectProlog.

llvm-svn: 350131
2018-12-28 18:52:16 +00:00
Max Kazantsev 8f70c9bf49 [NFC] Add failing test on LCSSA form preservation of LoopSimplifyCFG
llvm-svn: 350119
2018-12-28 10:43:37 +00:00
Max Kazantsev 530ff8f3cc Temporarily disable term folding in LoopSimplifyCFG, add tests
llvm-svn: 350117
2018-12-28 06:22:39 +00:00
Craig Topper c9a6000755 [LoopIdiomRecognize] Add CTTZ support
Summary:
Existing LIR recognizes CTLZ where shifting input variable right until it is zero. (Shift-Until-Zero idiom)

This commit:
1. Augments Shift-Until-Zero idiom to recognize CTTZ where input variable is shifted left.
2. Prepare for BitScan idiom recognition.

Patch by Yuanfang Chen (tabloid.adroit)

Reviewers: craig.topper, evstupac

Reviewed By: craig.topper

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D55876

llvm-svn: 350074
2018-12-26 21:59:48 +00:00
Max Kazantsev edabb9ae56 [LoopSimplifyCFG] Delete dead exiting edges
This patch teaches LoopSimplifyCFG to remove dead exiting edges
from loops.

Differential Revision: https://reviews.llvm.org/D54025
Reviewed By: fedor.sergeev

llvm-svn: 350049
2018-12-24 07:41:33 +00:00
Max Kazantsev 347c583772 Return "[LoopSimplifyCFG] Delete dead in-loop blocks"
The underlying bug that caused the revert should be fixed by rL348567.

Differential Revision: https://reviews.llvm.org/D54023

llvm-svn: 350045
2018-12-24 06:06:17 +00:00
Reid Kleckner 84a2d29681 [BasicAA] Fix AA bug on dynamic allocas and stackrestore
Summary:
BasicAA has special logic for unescaped allocas, which normally applies
equally well to dynamic and static allocas. However, llvm.stackrestore
has the power to end the lifetime of dynamic allocas, without referring
to them directly.

stackrestore is already marked with the most conservative memory
modification attributes, but because the alloca is not escaped, the
normal logic produces incorrect results. I think BasicAA needs a special
case here to teach it about the relationship between dynamic allocas and
stackrestore.

Fixes PR40118

Reviewers: gbiv, efriedma, george.burgess.iv

Subscribers: hiraditya, llvm-commits

Differential Revision: https://reviews.llvm.org/D55969

llvm-svn: 349945
2018-12-21 19:59:03 +00:00
Chen Zheng ddfaf07526 [InstCombine] [NFC] testcases for canonicalize MUL with NEG operand
llvm-svn: 349847
2018-12-20 22:37:05 +00:00
Nikita Popov f17421e595 [ConstantFolding] Consolidate and extend bitcount intrinsic tests; NFC
Move constant folding tests into ConstantFolding/bitcount.ll and drop
various tests in other places. Add coverage for undefs.

llvm-svn: 349806
2018-12-20 19:46:52 +00:00
Nikita Popov 6d79e43208 [ConstantFolding] Add undef tests for overflow intrinsics; NFC
llvm-svn: 349805
2018-12-20 19:46:46 +00:00
Nikita Popov da1beca0ac [ConstantFolding] Regenerate test checks; NFC
Bring overflow-ops.ll into current format. Remove redundant entry
blocks.

llvm-svn: 349804
2018-12-20 19:46:43 +00:00
Florian Hahn ef307b8c26 [LAA] Avoid generating RT checks for known deps preventing vectorization.
If we found unsafe dependences other than 'unknown', we already know at
compile time that they are unsafe and the runtime checks should always
fail. So we can avoid generating them in those cases.

This should have no negative impact on performance as the runtime checks
that would be created previously should always fail. As a sanity check,
I measured the test-suite, spec2k and spec2k6 and there were no regressions.

Reviewers: Ayal, anemet, hsaito

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D55798

llvm-svn: 349794
2018-12-20 18:49:09 +00:00
Michael Kruse 199427100b [InstCombine] Preserve access-group metadata.
Preserve llvm.access.group metadata when combining store instructions.
This was forgotten in r349725.

Fixes llvm.org/PR40117

llvm-svn: 349774
2018-12-20 17:11:02 +00:00
Simon Pilgrim 8c4abd20eb [InstCombine] Make x86 PADDS/PSUBS constant folding tests generic
As discussed on D55894, this replaces the existing PADDS/PSUBUS intrinsics with the the sadd/ssub.sat generic intrinsics and moves the tests out of the x86 subfolder.

PR40110 has been raised to fix the regression with constant folding vectors containing undef elements.

llvm-svn: 349759
2018-12-20 13:50:12 +00:00
Clement Courbet 36a3480385 Re-land r349731 "[CodeGen][ExpandMemcmp] Add an option for allowing overlapping loads.
Update PPC ir following GEP->bitcat to bitcat->GEP->bitcat change.

llvm-svn: 349747
2018-12-20 13:01:04 +00:00
Piotr Sobczak deaacc17fe [InstCombine][AMDGPU] Handle more buffer intrinsics
Summary:
Include the following intrinsics in the InsctCombine
simplification:

* amdgcn_raw_buffer_load
* amdgcn_raw_buffer_load_format
* amdgcn_struct_buffer_load
* amdgcn_struct_buffer_load_format

Change-Id: I14deceff74bcb21179baf6aa6e94bf39e7d63d5d

Reviewers: arsenm

Reviewed By: arsenm

Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D55882

llvm-svn: 349735
2018-12-20 10:08:18 +00:00
Clement Courbet e22cf4d7cb Revert r349731 "[CodeGen][ExpandMemcmp] Add an option for allowing overlapping loads."
Forgot to update PowerPC tests for the GEP->bitcast change.

llvm-svn: 349733
2018-12-20 09:58:33 +00:00
Clement Courbet 1bb6e1b0f2 [CodeGen][ExpandMemcmp] Add an option for allowing overlapping loads.
Summary:
This allows expanding {7,11,13,14,15,21,22,23,25,26,27,28,29,30,31}-byte memcmp
in just two loads on X86. These were previously calling memcmp.

Reviewers: spatel, gchatelet

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D55263

llvm-svn: 349731
2018-12-20 09:13:47 +00:00
Michael Kruse 978ba61536 Introduce llvm.loop.parallel_accesses and llvm.access.group metadata.
The current llvm.mem.parallel_loop_access metadata has a problem in that
it uses LoopIDs. LoopID unfortunately is not loop identifier. It is
neither unique (there's even a regression test assigning the some LoopID
to multiple loops; can otherwise happen if passes such as LoopVersioning
make copies of entire loops) nor persistent (every time a property is
removed/added from a LoopID's MDNode, it will also receive a new LoopID;
this happens e.g. when calling Loop::setLoopAlreadyUnrolled()).
Since most loop transformation passes change the loop attributes (even
if it just to mark that a loop should not be processed again as
llvm.loop.isvectorized does, for the versioned and unversioned loop),
the parallel access information is lost for any subsequent pass.

This patch unlinks LoopIDs and parallel accesses.
llvm.mem.parallel_loop_access metadata on instruction is replaced by
llvm.access.group metadata. llvm.access.group points to a distinct
MDNode with no operands (avoiding the problem to ever need to add/remove
operands), called "access group". Alternatively, it can point to a list
of access groups. The LoopID then has an attribute
llvm.loop.parallel_accesses with all the access groups that are parallel
(no dependencies carries by this loop).

This intentionally avoid any kind of "ID". Loops that are clones/have
their attributes modifies retain the llvm.loop.parallel_accesses
attribute. Access instructions that a cloned point to the same access
group. It is not necessary for each access to have it's own "ID" MDNode,
but those memory access instructions with the same behavior can be
grouped together.

The behavior of llvm.mem.parallel_loop_access is not changed by this
patch, but should be considered deprecated.

Differential Revision: https://reviews.llvm.org/D52116

llvm-svn: 349725
2018-12-20 04:58:07 +00:00
Eli Friedman a69084ffa8 [CodeGenPrepare] Fix bad IR created by large offset GEP splitting.
Creating the IR builder, then modifying the CFG, leads to an IRBuilder
where the BB and insertion point are inconsistent, so new instructions
have the wrong parent.

Modified an existing test because the test wasn't covering anything
useful (the "invoke" was not actually an invoke by the time we hit the
code in question).

Differential Revision: https://reviews.llvm.org/D55729

llvm-svn: 349693
2018-12-19 22:52:04 +00:00
Nikita Popov 3817ee7908 Revert "[BDCE][DemandedBits] Detect dead uses of undead instructions"
This reverts commit r349674. It causes a failure in
test-suite enc-3des.execution_time.

llvm-svn: 349684
2018-12-19 22:09:02 +00:00
Nikita Popov 649e125451 [BDCE][DemandedBits] Detect dead uses of undead instructions
This (mostly) fixes https://bugs.llvm.org/show_bug.cgi?id=39771.

BDCE currently detects instructions that don't have any demanded bits
and replaces their uses with zero. However, if an instruction has
multiple uses, then some of the uses may be dead (have no demanded bits)
even though the instruction itself is still live. This patch extends
DemandedBits/BDCE to detect such uses and replace them with zero.
While this will not immediately render any instructions dead, it may
lead to simplifications (in the motivating case, by converting a rotate
into a simple shift), break dependencies, etc.

The implementation tries to strike a balance between analysis power and
complexity/memory usage. Originally I wanted to track demanded bits on
a per-use level, but ultimately we're only really interested in whether
a use is entirely dead or not. I'm using an extra set to track which uses
are dead. However, as initially all uses are dead, I'm not storing uses
those user is also dead. This case is checked separately instead.

The test case has a couple of cases that are not simplified yet. In
particular, we're only looking at uses of instructions right now. I think
it would make sense to also extend this to arguments. Furthermore
DemandedBits doesn't yet know some of the tricks that InstCombine does
for the demanded bits or bitwise or/and/xor in combination with known
bits information.

Differential Revision: https://reviews.llvm.org/D55563

llvm-svn: 349674
2018-12-19 19:56:21 +00:00
Simon Pilgrim 420d1e12b6 Regenerate test
llvm-svn: 349646
2018-12-19 17:24:34 +00:00
Sanjay Patel bb3d3a8f2e [InstCombine] add tests for extract of vector load; NFC
There's a mismatch internally about how we are handling these patterns. 
We count loads as cheapToScalarize(), but then we don't actually 
scalarize them, so that can leave extra instructions compared to where 
we started when scalarizing other ops. If it's cheapToScalarize, then 
we should be scalarizing.

llvm-svn: 349560
2018-12-18 22:51:06 +00:00
Pete Cooper a3e0be109c Preserve the linkage for objc* intrinsics as clang will set them to weak_external in some cases
Clang uses weak linkage for objc runtime functions when they are not available on the platform.

The intrinsic has this linkage so we just need to pass that on to the runtime call.

llvm-svn: 349559
2018-12-18 22:42:08 +00:00
Pete Cooper d0ffdf8782 Add nonlazybind to objc_retain/objc_release when converting from intrinsics.
For performance reasons, clang set nonlazybind on these functions.  Now that we
are using intrinsics instead of runtime calls, we should set this attribute when
creating the runtime functions.

llvm-svn: 349558
2018-12-18 22:31:34 +00:00
Florian Hahn 485f2826ba [LAA] Introduce enum for vectorization safety status (NFC).
This patch adds a VectorizationSafetyStatus enum, which will be extended
in a follow up patch to distinguish between 'safe with runtime checks'
and 'known unsafe' dependences.

Reviewers: anemet, anna, Ayal, hsaito

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D54892

llvm-svn: 349556
2018-12-18 22:25:11 +00:00
Sanjay Patel 608d128c42 [LoopVectorize] auto-generate complete checks; NFC
The first test claims to show that the vectorizer will
generate a vector load/loop, but then this file runs
other passes which might scalarize that op. I'm removing 
instcombine from the RUN line here to break that dependency.
Also, I'm generating full checks to make it clear exactly 
what the vectorizer has done.

llvm-svn: 349554
2018-12-18 22:23:04 +00:00
Pete Cooper f86db5ce9e Rewrite objc intrinsics to runtime methods in PreISelIntrinsicLowering instead of SDAG.
SelectionDAG currently changes these intrinsics to function calls, but that won't work
for other ISel's.  Also we want to eventually support nonlazybind and weak linkage coming
from the front-end which we can't do in SelectionDAG.

llvm-svn: 349552
2018-12-18 22:20:03 +00:00
Sanjay Patel 84758ead22 [InstCombine] auto-generate complete checks; NFC
llvm-svn: 349548
2018-12-18 22:09:15 +00:00
Pete Cooper be4f571107 Change the objc ARC optimizer to use the new objc.* intrinsics
We're moving ARC optimisation and ARC emission in clang away from runtime methods
and towards intrinsics.  This is the part which actually uses the intrinsics in the ARC
optimizer when both analyzing the existing calls and emitting new ones.

Differential Revision: https://reviews.llvm.org/D55348

Reviewers: ahatanak
llvm-svn: 349534
2018-12-18 20:32:49 +00:00
Nikita Popov 20853a7807 [InstCombine] Simplify cttz/ctlz + icmp eq/ne into mask check
Checking whether a number has a certain number of trailing / leading
zeros means checking whether it is of the form XXXX1000 / 0001XXXX,
which can be done with an and+icmp.

Related to https://bugs.llvm.org/show_bug.cgi?id=28668. As a next
step, this can be extended to non-equality predicates.

Differential Revision: https://reviews.llvm.org/D55745

llvm-svn: 349530
2018-12-18 19:59:50 +00:00
Sanjay Patel e0afd278b1 [InstCombine] add tests for scalarization; NFC
We miss pattern matching a splat constant if it has undef elements.

llvm-svn: 349515
2018-12-18 17:56:59 +00:00
Michael Kruse 3284775b70 [LoopUnroll] Honor '#pragma unroll' even with -fno-unroll-loops.
When using clang with `-fno-unroll-loops` (implicitly added with `-O1`),
the LoopUnrollPass is not not added to the (legacy) pass pipeline. This
also means that it will not process any loop metadata such as
llvm.loop.unroll.enable (which is generated by #pragma unroll or
WarnMissedTransformationsPass emits a warning that a forced
transformation has not been applied (see
https://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20181210/610833.html).
Such explicit transformations should take precedence over disabling
heuristics.

This patch unconditionally adds LoopUnrollPass to the optimizing
pipeline (that is, it is still not added with `-O0`), but passes a flag
indicating whether automatic unrolling is dis-/enabled. This is the same
approach as LoopVectorize uses.

The new pass manager's pipeline builder has no option to disable
unrolling, hence the problem does not apply.

Differential Revision: https://reviews.llvm.org/D55716

llvm-svn: 349509
2018-12-18 17:16:05 +00:00
Dylan McKay f920da009e [IPO][AVR] Create new Functions in the default address space specified in the data layout
This modifies the IPO pass so that it respects any explicit function
address space specified in the data layout.

In targets with nonzero program address spaces, all functions should, by
default, be placed into the default program address space.

This is required for Harvard architectures like AVR. Without this, the
functions will be marked as residing in data space, and thus not be
callable.

This has no effect to any in-tree official backends, as none use an
explicit program address space in their data layouts.

Patch by Tim Neumann.

llvm-svn: 349469
2018-12-18 09:52:52 +00:00
Tim Northover 856628f707 SROA: preserve alignment tags on loads and stores.
When splitting up an alloca's uses we were dropping any explicit
alignment tags, which means they default to the ABI-required default
alignment and this can cause miscompiles if the real value was smaller.

Also refactor the TBAA metadata into a parent class since it's shared by
both children anyway.

llvm-svn: 349465
2018-12-18 09:29:39 +00:00
Sanjay Patel 200885e654 [AggressiveInstCombine] convert rotate with guard branch into funnel shift (PR34924)
Now, that we have funnel shift intrinsics, it should be safe to convert this form of rotate to it. 
In the worst case (a target that doesn't have rotate instructions), we will expand this into a 
branch-less sequence of ALU ops (neg/and/and/lshr/shl/or) in the backend, so it's still very 
likely to be a perf improvement over the original code.

The motivating source code pattern for this is shown in:
https://bugs.llvm.org/show_bug.cgi?id=34924

Background:
I looked at several different options before deciding where to try this - instcombine, simplifycfg, 
CGP - because it doesn't fit cleanly anywhere AFAIK.

The backend (CGP, SDAG, GlobalIsel?) is too late for what we're trying to accomplish. We want to 
have the IR converted before we reach things like vectorization because the reduced code can make a 
loop much simpler to transform.

Technically, this could be included in instcombine, but it's a large pattern match that includes 
control-flow, so it just felt wrong to stuff into there (although I have a draft of that patch). 
Similarly, this could be part of simplifycfg, but all of this pattern matching is a stretch.

So we're left with our relatively new dumping ground for homeless transforms: aggressive-instcombine. 
This only runs at -O3, but that seems like a reasonable limitation given that source code has many 
options to avoid this pattern (including the recently added clang intrinsics for rotates).

I'm including a PhaseOrdering test because we require the teamwork of 3 passes (aggressive-instcombine, 
instcombine, simplifycfg) to get this into the minimal IR form that we want. That test shows a bug
with the new pass manager that's independent of this change (but it will be masked if we canonicalize
harder to funnel shift intrinsics in instcombine).

Differential Revision: https://reviews.llvm.org/D55604

llvm-svn: 349396
2018-12-17 21:14:51 +00:00
Sanjay Patel 1a6e9ec434 [InstCombine] don't widen an arbitrary sequence of vector ops (PR40032)
The problem is shown specifically for a case with vector multiply here:
https://bugs.llvm.org/show_bug.cgi?id=40032
...and this might mask the original backend bug for ARM shown in:
https://bugs.llvm.org/show_bug.cgi?id=39967

As the test diffs here show, we were (and probably still aren't) doing 
these kinds of transforms in a principled way. We are producing more or 
equal wide instructions than we started with in some cases, so we still 
need to restrict/correct other transforms from overstepping.

If there are perf regressions from this change, we can either carve out 
exceptions to the general IR rules, or improve the backend to do these 
transforms when we know the transform is profitable. That's probably 
similar to a change like D55448.

Differential Revision: https://reviews.llvm.org/D55744

llvm-svn: 349389
2018-12-17 20:27:43 +00:00
Nikita Popov 221f3fc750 [InstSimplify] Simplify saturating add/sub + icmp
If a saturating add/sub has one constant operand, then we can
determine the possible range of outputs it can produce, and simplify
an icmp comparison based on that.

The implementation is based on a similar existing mechanism for
simplifying binary operator + icmps.

Differential Revision: https://reviews.llvm.org/D55735

llvm-svn: 349369
2018-12-17 17:45:18 +00:00
Sanjay Patel beb7bb6192 [AggressiveInstCombine] add test for rotate insertion point; NFC
As noted in D55604 - we need a test to make sure that the new intrinsic
is inserted into a valid position.

llvm-svn: 349347
2018-12-17 12:36:35 +00:00
Davide Italiano e41e1d015f [EarlyCSE] If DI can't be salvaged, mark it as unavailable.
Fixes PR39874.

llvm-svn: 349323
2018-12-17 01:42:39 +00:00
Nikita Popov d03c27f837 [InstCombine] Add cttz/ctlz + select non-bitwidth tests; NFC
llvm-svn: 349322
2018-12-16 23:48:18 +00:00
Nikita Popov b4133791c8 [InstCombine] Regenerate test checks; NFC
Also drop unnecessary entry blocks and avoid use of anonymous
variables.

llvm-svn: 349321
2018-12-16 23:48:11 +00:00
Nikita Popov 8be3314c3f [InstCombine] Make cttz/ctlz knownbits tests more robust; NFC
Tests checking for the addition of !range metadata should be
preserved if cttz/ctlz + icmp is optimized.

llvm-svn: 349318
2018-12-16 19:12:08 +00:00
Nikita Popov 3b5d224bde Revert "[InstCombine] Regenerate test checks; NFC"
This reverts commit r349311.

Didn't check this carefully enough...

llvm-svn: 349312
2018-12-16 18:27:37 +00:00
Nikita Popov db2ef76ab7 [InstCombine] Regenerate test checks; NFC
llvm-svn: 349311
2018-12-16 18:22:57 +00:00
Nikita Popov bb69aaa681 [InstCombined] Add more tests for cttz/ctlz + icmp; NFC
Test cases other than icmp with the bitwidth.

llvm-svn: 349310
2018-12-16 17:51:32 +00:00
Nikita Popov 32083bd072 [InstCombine] Add additional saturating add/sub + icmp tests; NFC
These test comparisons with saturating add/sub in non-canonical
form.

llvm-svn: 349309
2018-12-16 17:45:25 +00:00
Sanjay Patel e49db58616 [InstCombine] regenerate test checks; NFC
llvm-svn: 349307
2018-12-16 16:14:42 +00:00
Sanjay Patel db84980851 [InstCombine] add tests for vector widening transforms (PR40032); NFC
llvm-svn: 349306
2018-12-16 15:50:50 +00:00
Nikita Popov f61567f42a [InstSimplify] Add tests for saturating add/sub + icmp; NFC
If a saturating add/sub with a constant operand is compared to
another constant, we should be able to determine that the condition
is always true/false in some cases (but currently don't).

llvm-svn: 349261
2018-12-15 10:37:01 +00:00
Florian Hahn c214bc2b8d [NewGVN] Update use counts for SSA copies when replacing them by their operands.
The current code relies on LeaderUseCount to determine if we can remove
an SSA copy, but in that the LeaderUseCount does not refer to the SSA
copy. If a SSA copy is a dominating leader, we use the operand as dominating
leader instead. This means we removed a user of a ssa copy and we should
decrement its use count, so we can remove the ssa copy once it becomes dead.

Fixes PR38804.

Reviewers: efriedma, davide

Reviewed By: davide

Differential Revision: https://reviews.llvm.org/D51595

llvm-svn: 349217
2018-12-15 00:32:38 +00:00
Vedant Kumar 9d1827331f [Util] Refer to [s|z]exts of args when converting dbg.declares (fix PR35400)
When converting dbg.declares, if the described value is a [s|z]ext,
refer to the ext directly instead of referring to its operand.

This fixes a narrowing bug (the debugger got the sign of a variable
wrong, see llvm.org/PR35400).

The main reason to refer to the ext's operand was that an optimization
may remove the ext itself, leading to a dropped variable. Now that
InstCombine has been taught to use replaceAllDbgUsesWith (r336451), this
is less of a concern. Other passes can/should adopt this API as needed
to fix dropped variable bugs.

Differential Revision: https://reviews.llvm.org/D51813

llvm-svn: 349214
2018-12-15 00:03:33 +00:00
Michael Kruse ea9ef34558 [TransformWarning] Do not warn missed transformations in optnone functions.
Optimization transformations are intentionally disabled by the 'optnone'
function attribute. Therefore do not warn if transformation metadata is
still present.

Using the legacy pass manager structure, the `skipFunction` method takes
care for the optnone attribute (already called before this patch). For
the new pass manager, there is no equivalent, so we check for the
'optnone' attribute manually.

Differential Revision: https://reviews.llvm.org/D55690

llvm-svn: 349184
2018-12-14 19:45:43 +00:00
Michael Kruse 5948b7f30f [Transforms] Preserve metadata when converting invoke to call.
The `changeToCall` function did not preserve the invoke's metadata.
Currently, there is probably no metadata that depends on being applied
on a CallInst or InvokeInst. Therefore we can replace the instruction's
metadata.

This fixes http://llvm.org/PR39994

Suggested-by: Moritz Kreutzer <moritz.kreutzer@siemens.com>

Differential Revision: https://reviews.llvm.org/D55666

llvm-svn: 349170
2018-12-14 18:15:11 +00:00
Wei Mi 66c6c5abea [SampleFDO] handle ProfileSampleAccurate when initializing function entry count
ProfileSampleAccurate is used to indicate the profile has exact match to the
code to be optimized.

Previously ProfileSampleAccurate is handled in ProfileSummaryInfo::isColdCallSite
and ProfileSummaryInfo::isColdBlock. A better solution is to initialize function
entry count to 0 when ProfileSampleAccurate is true, so we don't have to handle
ProfileSampleAccurate in multiple places.

Differential Revision: https://reviews.llvm.org/D55660

llvm-svn: 349088
2018-12-13 21:51:42 +00:00
Nikita Popov dc73a6edde Reapply "[MemCpyOpt] memset->memcpy forwarding with undef tail"
Currently memcpyopt optimizes cases like

    memset(a, byte, N);
    memcpy(b, a, M);

to

    memset(a, byte, N);
    memset(b, byte, M);

if M <= N. Often this allows further simplifications down the line,
which drop the first memset entirely.

This patch extends this optimization for the case where M > N, but we
know that the bytes a[N..M] are undef due to alloca/lifetime.start.

This situation arises relatively often for Rust code, because Rust does
not initialize trailing structure padding and loves to insert redundant
memcpys. This also fixes https://bugs.llvm.org/show_bug.cgi?id=39844.

The previous version of this patch did not perform dependency checking
properly: While the dependency is checked at the position of the memset,
the used size must be that of the memcpy. Previously the size of the
memset was used, which missed modification in the region
MemSetSize..CopySize, resulting in miscompiles. The added tests cover
variations of this issue.

Differential Revision: https://reviews.llvm.org/D55120

llvm-svn: 349078
2018-12-13 20:04:27 +00:00
Davide Italiano 9737096bb1 [LoopUtils] Use i32 instead of `void`.
The actual type of the first argument of the @dbg intrinsic
doesn't really matter as we're setting it to `undef`, but the
bitcode reader is picky about `void` types.

llvm-svn: 349069
2018-12-13 18:37:23 +00:00
David L. Jones 54c01ad6a9 Revert r348645 - "[MemCpyOpt] memset->memcpy forwarding with undef tail"
This revision caused trucated memsets for structs with padding. See:
http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20181210/610520.html

llvm-svn: 349002
2018-12-13 03:15:11 +00:00
Philip Reames 04afb4a17d [test] Add a set of test for constant folding deopt operands with CVP
For anyone curious, the first test example is illustrative of a real code idiom produced by branching on the result of a three way comparison.

llvm-svn: 348997
2018-12-13 00:54:05 +00:00
Davide Italiano 744c3c327f [LoopDeletion] Update debug values after loop deletion.
When loops are deleted, we don't keep track of variables modified inside
the loops, so the DI will contain the wrong value for these.

e.g.

int b() {

int i;
for (i = 0; i < 2; i++)
  ;
patatino();
return a;
-> 6 patatino();

7     return a;
8   }
9   int main() { b(); }
(lldb) frame var i
(int) i = 0

We mark instead these values as unavailable inserting a
@llvm.dbg.value(undef to make sure we don't end up printing an incorrect
value in the debugger. We could consider doing something fancier,
for, e.g. constants, in the future.

PR39868.
rdar://problem/46418795)

Differential Revision: https://reviews.llvm.org/D55299

llvm-svn: 348988
2018-12-12 23:32:35 +00:00
Nikita Popov 36e03ac6ee [InstCombine] Fix negative GEP offset evaluation for 32-bit pointers
This fixes https://bugs.llvm.org/show_bug.cgi?id=39908.

The evaluateGEPOffsetExpression() function simplifies GEP offsets for
use in comparisons against zero, basically by converting X*Scale+Offset==0
to X+Offset/Scale==0 if Scale divides Offset. However, before this is done,
Offset is masked down to the pointer size. This results in incorrect
results for negative Offsets, because we basically end up dividing the
32-bit offset *zero* extended to 64-bit bits (rather than sign extended).

Fix this by explicitly sign extending the truncated value.

Differential Revision: https://reviews.llvm.org/D55449

llvm-svn: 348987
2018-12-12 23:19:03 +00:00
Sanjay Patel eb741c29c1 [PhaseOrdering] add test for funnel shift (rotate); NFC
As mentioned in D55604, there are 2 bugs here:
1. The new pass manager is speculating wildly by default.
2. The old pass manager is not converting this to funnel shift.

llvm-svn: 348980
2018-12-12 22:11:05 +00:00
Florian Hahn 81a22d32f7 [ConstantFold] Use getMinSignedBits for APInt in isIndexInRangeOfArrayType.
Indices for getelementptr can be signed so we should use
getMinSignedBits instead of getActiveBits here. The function later calls
getSExtValue to get the int64_t value, which also checks
getMinSignedBits.

This fixes  https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=11647.

Reviewers: mssimpso, efriedma, davide

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D55536

llvm-svn: 348957
2018-12-12 18:55:14 +00:00
Michael Kruse 7244852557 [Unroll/UnrollAndJam/Vectorizer/Distribute] Add followup loop attributes.
When multiple loop transformation are defined in a loop's metadata, their order of execution is defined by the order of their respective passes in the pass pipeline. For instance, e.g.

    #pragma clang loop unroll_and_jam(enable)
    #pragma clang loop distribute(enable)

is the same as

    #pragma clang loop distribute(enable)
    #pragma clang loop unroll_and_jam(enable)

and will try to loop-distribute before Unroll-And-Jam because the LoopDistribute pass is scheduled after UnrollAndJam pass. UnrollAndJamPass only supports one inner loop, i.e. it will necessarily fail after loop distribution. It is not possible to specify another execution order. Also,t the order of passes in the pipeline is subject to change between versions of LLVM, optimization options and which pass manager is used.

This patch adds 'followup' attributes to various loop transformation passes. These attributes define which attributes the resulting loop of a transformation should have. For instance,

    !0 = !{!0, !1, !2}
    !1 = !{!"llvm.loop.unroll_and_jam.enable"}
    !2 = !{!"llvm.loop.unroll_and_jam.followup_inner", !3}
    !3 = !{!"llvm.loop.distribute.enable"}

defines a loop ID (!0) to be unrolled-and-jammed (!1) and then the attribute !3 to be added to the jammed inner loop, which contains the instruction to distribute the inner loop.

Currently, in both pass managers, pass execution is in a fixed order and UnrollAndJamPass will not execute again after LoopDistribute. We hope to fix this in the future by allowing pass managers to run passes until a fixpoint is reached, use Polly to perform these transformations, or add a loop transformation pass which takes the order issue into account.

For mandatory/forced transformations (e.g. by having been declared by #pragma omp simd), the user must be notified when a transformation could not be performed. It is not possible that the responsible pass emits such a warning because the transformation might be 'hidden' in a followup attribute when it is executed, or it is not present in the pipeline at all. For this reason, this patche introduces a WarnMissedTransformations pass, to warn about orphaned transformations.

Since this changes the user-visible diagnostic message when a transformation is applied, two test cases in the clang repository need to be updated.

To ensure that no other transformation is executed before the intended one, the attribute `llvm.loop.disable_nonforced` can be added which should disable transformation heuristics before the intended transformation is applied. E.g. it would be surprising if a loop is distributed before a #pragma unroll_and_jam is applied.

With more supported code transformations (loop fusion, interchange, stripmining, offloading, etc.), transformations can be used as building blocks for more complex transformations (e.g. stripmining+stripmining+interchange -> tiling).

Reviewed By: hfinkel, dmgreen

Differential Revision: https://reviews.llvm.org/D49281
Differential Revision: https://reviews.llvm.org/D55288

llvm-svn: 348944
2018-12-12 17:32:52 +00:00
Wei Mi 7da5a08e1a [SampleFDO] Extend profile-sample-accurate option to cover isFunctionColdInCallGraph
For SampleFDO, when a callsite doesn't appear in the profile, it will not be marked as cold callsite unless the option -profile-sample-accurate is specified.

But profile-sample-accurate doesn't cover function isFunctionColdInCallGraph which is used to decide whether a function should be put into text.unlikely section, so even if the user knows the profile is accurate and specifies profile-sample-accurate, those functions not appearing in the sample profile are still not be put into text.unlikely section right now.

The patch fixes that.

Differential Revision: https://reviews.llvm.org/D55567

llvm-svn: 348940
2018-12-12 17:09:27 +00:00
Sanjay Patel d8ccc0e3e4 [AggressiveInstCombine] add tests for rotates with branch; NFC
llvm-svn: 348933
2018-12-12 15:28:21 +00:00
Florian Hahn cc419ad7df [ConstantInt] Check active bits before calling getZExtValue.
Without this check, we hit an assertion in getZExtValue, if the constant
value does not fit into an uint64_t.

As getZExtValue returns an uint64_t, should we update
getAggregateElement to take an uin64_t as well?

This fixes https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=6109.

Reviewers: efriedma, craig.topper, spatel

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D55547

llvm-svn: 348906
2018-12-12 02:22:12 +00:00
Gor Nishanov 20d833d5e3 [coroutines] Improve suspend point simplification
Summary:
Enable suspend point simplification for cases where:
* coro.save and coro.suspend are in different basic blocks
* where there are intervening intrinsics

Reviewers: modocache, tks2103, lewissbaker

Reviewed By: modocache

Subscribers: EricWF, llvm-commits

Differential Revision: https://reviews.llvm.org/D55160

llvm-svn: 348897
2018-12-11 21:23:09 +00:00
Nikita Popov 79c994d976 [ConstantFolding] Handle leading zero-size elements in load folding
Struct types may have leading zero-size elements like [0 x i32], in
which case the "real" element at offset 0 will not necessarily coincide
with the 0th element of the aggregate. ConstantFoldLoadThroughBitcast()
wants to drill down the element at offset 0, but currently always picks
the 0th aggregate element to do so. This patch changes the code to find
the first non-zero-size element instead, for the struct case.

The motivation behind this change is https://github.com/rust-lang/rust/issues/48627.
Rust is fond of emitting [0 x iN] separators between struct elements to
enforce alignment, which prevents constant folding in this particular case.

The additional tests with [4294967295 x [0 x i32]] check that we don't
end up unnecessarily looping over a large number of zero-size elements
of a zero-size array.

Differential Revision: https://reviews.llvm.org/D55169

llvm-svn: 348895
2018-12-11 20:29:16 +00:00
Vedant Kumar b3a7cae045 [HotColdSplitting] Disable outlining landingpad instructions (PR39917)
It's currently not safe to outline landingpad instructions (see
llvm.org/PR39917). Like @llvm.eh.typeid.for, the order and content of
previous landingpad instructions in a function alters the lowering of
subsequent landingpads by renumbering type info ID's. Outlining a
landingpad therefore breaks exception handling & unwinding.

llvm-svn: 348870
2018-12-11 18:05:31 +00:00
Sanjay Patel 2aa2dc76c2 [InstCombine] try to convert x86 movmsk intrinsic to generic IR (PR39927)
call iM movmsk(sext <N x i1> X) --> zext (bitcast <N x i1> X to iN) to iM

This has the potential to create less-than-8-bit scalar types as shown in 
some of the test diffs, but it looks like the backend knows how to deal 
with that in these patterns. This is the simple part of the fix suggested in:
https://bugs.llvm.org/show_bug.cgi?id=39927

Differential Revision: https://reviews.llvm.org/D55529

llvm-svn: 348862
2018-12-11 16:38:03 +00:00
Nikita Popov a5b82d9447 [BDCE] Add tests for PR39771; NFC
These involve cases where certain uses are dead by means of having
no demanded bits, even though the used instruction still has demanded
bits when other uses are taken into account. BDCE currently does not
simplify such cases.

llvm-svn: 348861
2018-12-11 16:37:26 +00:00
David Stenberg 2474ce5862 [DeadArgElim] Fixes for dbg.values using dead arg/return values
Summary:
When eliminating a dead argument or return value in a function with
local linkage, all uses, including in dbg.value intrinsics, would be
replaced with null constants. This would mean that, for example for an
integer argument, the debug info would incorrectly express that the
value is 0. Instead, replace all uses with undef to indicate that the
argument/return value is optimized out.

Also, make sure that metadata uses of return values are rewritten even
if there are no non-metadata uses of the value.

As a bit of historical curiosity, the code that emitted null constants
was introduced in the initial check-in of the pass in 2003, before
'undef' values even existed in LLVM.

This fixes PR23260.

Reviewers: dblaikie, aprantl, vsk, djtodoro

Reviewed By: aprantl

Subscribers: llvm-commits

Tags: #debug-info

Differential Revision: https://reviews.llvm.org/D55513

llvm-svn: 348837
2018-12-11 10:33:38 +00:00
Matt Arsenault 9ccde61f81 InstCombine: Scalarize single use icmp/fcmp
llvm-svn: 348801
2018-12-10 21:50:54 +00:00
Sanjay Patel 5c00519c9a [InstCombine] add tests for movmsk (PR39927) NFC
llvm-svn: 348800
2018-12-10 21:44:20 +00:00
Craig Topper 84d62ce6c1 [CostModel][X86][AArch64] Adjust cost of the scalarization part of min/max reduction.
Summary: The comment says we need 3 extracts and a select at the end. But didn't we just account for the select in the vector cost above. Aren't we just extracting the single element after taking the min/max in the vector register?

Reviewers: RKSimon, spatel, ABataev

Reviewed By: RKSimon

Subscribers: javed.absar, kristof.beyls, llvm-commits

Differential Revision: https://reviews.llvm.org/D55480

llvm-svn: 348739
2018-12-10 06:58:58 +00:00
Nikita Popov 94b8e2ea4e [MemCpyOpt] memset->memcpy forwarding with undef tail
Currently memcpyopt optimizes cases like

    memset(a, byte, N);
    memcpy(b, a, M);

to

    memset(a, byte, N);
    memset(b, byte, M);

if M <= N. Often this allows further simplifications down the line,
which drop the first memset entirely.

This patch extends this optimization for the case where M > N, but we
know that the bytes a[N..M] are undef due to alloca/lifetime.start.

This situation arises relatively often for Rust code, because Rust does
not initialize trailing structure padding and loves to insert redundant
memcpys. This also fixes https://bugs.llvm.org/show_bug.cgi?id=39844.

For the implementation, I'm reusing a bit of code for a similar existing
optimization (direct memcpy of undef). I've also added memset support to
MemDepAnalysis GetLocation -- Instead, getPointerDependencyFrom could be
used, but it seems to make more sense to add this to GetLocation and thus
make the computation cachable.

Differential Revision: https://reviews.llvm.org/D55120

llvm-svn: 348645
2018-12-07 21:16:58 +00:00
Nikita Popov 4ca00df571 [MemCpyOpt] Add tests for memset->memcpy forwaring with undef tail; NFC
These are baseline tests for D55120.

llvm-svn: 348644
2018-12-07 21:16:52 +00:00
Vedant Kumar 03f9f15b16 [HotColdSplitting] Refine definition of unlikelyExecuted
The splitting pass uses its 'unlikelyExecuted' predicate to statically
decide which blocks are cold.

- Do not treat noreturn calls as if they are cold unless they are actually
  marked cold. This is motivated by functions like exit() and longjmp(), which
  are not beneficial to outline.

- Do not treat inline asm as an outlining barrier. In practice asm("") is
  frequently used to inhibit basic block merging; enabling outlining in this case
  results in substantial memory savings.

- Treat invokes of cold functions as cold.

As a drive-by, remove the 'exceptionHandlingFunctions' predicate, because it's
no longer needed. The pass can identify & outline blocks dominated by EH pads,
so there's no need to special-case __cxa_begin_catch etc.

Differential Revision: https://reviews.llvm.org/D54244

llvm-svn: 348640
2018-12-07 20:24:04 +00:00
Vedant Kumar 03aaa3e2aa [HotColdSplitting] Outline more than once per function
Algorithm: Identify maximal cold regions and put them in a worklist. If
a candidate region overlaps with another, discard it. While the worklist
is full, remove a single-entry sub-region from the worklist and attempt
to outline it. By the non-overlap property, this should not invalidate
parts of the domtree pertaining to other outlining regions.

Testing: LNT results on X86 are clean. With test-suite + externals, llvm
outlines 134KB pre-patch, and 352KB post-patch (+ ~2.6x). The file
483.xalancbmk/src/Constants.cpp stands out as an extreme case where llvm
outlines over 100 times in some functions (mostly EH paths). There was
not a significant performance impact pre vs. post-patch.

Differential Revision: https://reviews.llvm.org/D53887

llvm-svn: 348639
2018-12-07 20:23:52 +00:00
Craig Topper d1498ed8df [CostModel][X86] Fix overcounting arithmetic cost in illegal types in getArithmeticReductionCost/getMinMaxReductionCost
We were overcounting the number of arithmetic operations needed at each level before we reach a legal type. We were using the full vector type for that level, but we are going to split the input vector at that level in half. So the effective arithmetic operation cost at that level is half the width.

So for example on 8i32 on an sse target. Were were calculating the cost of an 8i32 op which is likely 2 for basic integer. Then after the loop we count 2 more v4i32 ops. For a total arith cost of 4. But if you look at the assembly there would only be 3 arithmetic ops.

There are still more bugs in this code that I'm going to work on next. The non pairwise code shouldn't count extract subvectors in the loop. There are no extracts, the types are split in registers. For pairwise we need to use 2 two src permute shuffles.

Differential Revision: https://reviews.llvm.org/D55397

llvm-svn: 348621
2018-12-07 18:20:56 +00:00
Nikita Popov 110cf05203 Reapply "[DemandedBits][BDCE] Support vectors of integers"
DemandedBits and BDCE currently only support scalar integers. This
patch extends them to also handle vector integer operations. In this
case bits are not tracked for individual vector elements, instead a
bit is demanded if it is demanded for any of the elements. This matches
the behavior of computeKnownBits in ValueTracking and
SimplifyDemandedBits in InstCombine.

Unlike the previous iteration of this patch, getDemandedBits() can now
again be called on arbirary (sized) instructions, even if they don't
have integer or vector of integer type. (For vector types the size of the
returned mask will now be the scalar size in bits though.)

The added LoopVectorize test case shows a case which triggered an
assertion failure with the previous attempt, because getDemandedBits()
was called on a pointer-typed instruction.

Differential Revision: https://reviews.llvm.org/D55297

llvm-svn: 348602
2018-12-07 15:38:13 +00:00
Max Kazantsev b9e65cbddf Introduce llvm.experimental.widenable_condition intrinsic
This patch introduces a new instinsic `@llvm.experimental.widenable_condition`
that allows explicit representation for guards. It is an alternative to using
`@llvm.experimental.guard` intrinsic that does not contain implicit control flow.

We keep finding places where `@llvm.experimental.guard` is not supported or
treated too conservatively, and there are 2 reasons to that:

- `@llvm.experimental.guard` has memory write side effect to model implicit control flow,
  and this sometimes confuses passes and analyzes that work with memory;
- Not all passes and analysis are aware of the semantics of guards. These passes treat them
  as regular throwing call and have no idea that the condition of guard may be used to prove
  something. One well-known place which had caused us troubles in the past is explicit loop
  iteration count calculation in SCEV. Another example is new loop unswitching which is not
  aware of guards. Whenever a new pass appears, we potentially have this problem there.

Rather than go and fix all these places (and commit to keep track of them and add support
in future), it seems more reasonable to leverage the existing optimizer's logic as much as possible.
The only significant difference between guards and regular explicit branches is that guard's condition
can be widened. It means that a guard contains (explicitly or implicitly) a `deopt` block successor,
and it is always legal to go there no matter what the guard condition is. The other successor is
a guarded block, and it is only legal to go there if the condition is true.

This patch introduces a new explicit form of guards alternative to `@llvm.experimental.guard`
intrinsic. Now a widenable guard can be represented in the CFG explicitly like this:


    %widenable_condition = call i1 @llvm.experimental.widenable.condition()
    %new_condition = and i1 %cond, %widenable_condition
    br i1 %new_condition, label %guarded, label %deopt

  guarded:
    ; Guarded instructions

  deopt:
    call type @llvm.experimental.deoptimize(<args...>) [ "deopt"(<deopt_args...>) ]

The new intrinsic `@llvm.experimental.widenable.condition` has semantics of an
`undef`, but the intrinsic prevents the optimizer from folding it early. This form
should exploit all optimization boons provided to `br` instuction, and it still can be
widened by replacing the result of `@llvm.experimental.widenable.condition()`
with `and` with any arbitrary boolean value (as long as the branch that is taken when
it is `false` has a deopt and has no side-effects).

For more motivation, please check llvm-dev discussion "[llvm-dev] Giving up using
implicit control flow in guards".

This patch introduces this new intrinsic with respective LangRef changes and a pass
that converts old-style guards (expressed as intrinsics) into the new form.

The naming discussion is still ungoing. Merging this to unblock further items. We can
later change the name of this intrinsic.

Reviewed By: reames, fedor.sergeev, sanjoy
Differential Revision: https://reviews.llvm.org/D51207

llvm-svn: 348593
2018-12-07 14:39:46 +00:00
Markus Lavin 4dc4ebd606 [PM] Port LoadStoreVectorizer to the new pass manager.
Differential Revision: https://reviews.llvm.org/D54848

llvm-svn: 348570
2018-12-07 08:23:37 +00:00
Max Kazantsev a523a21175 [LoopSimplifyCFG] Do not deal with loops with irreducible CFG inside
The current algorithm that collects live/dead/inloop blocks relies on some invariants
related to RPO and PO traversals. In particular, the important fact it requires is that
the only loop's latch is the first block in PO traversal. It also relies on fact that during
RPO we visit all prececessors of a block before we visit this block (backedges ignored).

If a loop has irreducible non-loop cycle inside, both these assumptions may break.
This patch adds detection for this situation and prohibits the terminator folding
for loops with irreducible CFG.

We can in theory support this later, for this some algorithmic changes are needed.
Besides, irreducible CFG is not a frequent situation and we can just don't bother.

Thanks @uabelho for finding this!

Differential Revision: https://reviews.llvm.org/D55357
Reviewed By: skatkov

llvm-svn: 348567
2018-12-07 05:44:45 +00:00
Nikita Popov 14ca9a8355 Revert "[DemandedBits][BDCE] Support vectors of integers"
This reverts commit r348549. Causing assertion failures during
clang build.

llvm-svn: 348558
2018-12-07 00:42:03 +00:00
Nikita Popov cf65b9207b [DemandedBits][BDCE] Support vectors of integers
DemandedBits and BDCE currently only support scalar integers. This
patch extends them to also handle vector integer operations. In this
case bits are not tracked for individual vector elements, instead a
bit is demanded if it is demanded for any of the elements. This matches
the behavior of computeKnownBits in ValueTracking and
SimplifyDemandedBits in InstCombine.

The getDemandedBits() method can now only be called on instructions that
have integer or vector of integer type. Previously it could be called on
any sized instruction (even if it was not particularly useful). The size
of the return value is now always the scalar size in bits (while
previously it was the type size in bits).

Differential Revision: https://reviews.llvm.org/D55297

llvm-svn: 348549
2018-12-06 23:50:32 +00:00
Nikita Popov d7b6b62deb [BDCE] Add tests for BDCE applied to vector instructions; NFC
These are baseline tests for D55297.

llvm-svn: 348548
2018-12-06 23:50:19 +00:00
Alexandros Lamprineas e4c91f5c4c [GVN] Don't perform scalar PRE on GEPs
Partial Redundancy Elimination of GEPs prevents CodeGenPrepare from
sinking the addressing mode computation of memory instructions back
to its uses. The problem comes from the insertion of PHIs, which
confuse CGP and make it bail.

I've autogenerated the check lines of an existing test and added a
store instruction to demonstrate the motivation behind this change.
The store is now using the gep instead of a phi.

Differential Revision: https://reviews.llvm.org/D55009

llvm-svn: 348496
2018-12-06 16:11:58 +00:00
Ilya Biryukov cb5331eb93 Revert "[LoopSimplifyCFG] Delete dead in-loop blocks"
This reverts commit r348457.
The original commit causes clang to crash when doing an instrumented
build with a new pass manager. Reverting to unbreak our integrate.

llvm-svn: 348484
2018-12-06 13:21:01 +00:00
Roman Lebedev 98cb1216a6 [InstCombine] foldICmpWithLowBitMaskedVal(): don't miscompile -1 vector elts
I was finally able to quantify what i thought was missing in the fix,
it was vector constants. If we have a scalar (and %x, -1),
it will be instsimplified before we reach this code,
but if it is a vector, we may still have a -1 element.

Thus, we want to avoid the fold if *at least one* element is -1.
Or in other words, ignoring the undef elements, no sign bits
should be set. Thus, m_NonNegative().

A follow-up for rL348181
https://bugs.llvm.org/show_bug.cgi?id=39861

llvm-svn: 348462
2018-12-06 08:14:24 +00:00
Roman Lebedev d9941fa270 [NFC][InstCombine] Add more miscompile tests for foldICmpWithLowBitMaskedVal()
We also have to me aware of vector constants. If at least one element
is -1, we can't transform.

llvm-svn: 348461
2018-12-06 08:11:20 +00:00
Max Kazantsev 0b1d069d64 [LoopSimplifyCFG] Delete dead in-loop blocks
This patch teaches LoopSimplifyCFG to delete loop blocks that have
become unreachable after terminator folding has been done.

Differential Revision: https://reviews.llvm.org/D54023
Reviewed By: anna

llvm-svn: 348457
2018-12-06 05:45:02 +00:00
Matt Arsenault ca8631ba6e InstCombine: Add some missing tests for scalarization
llvm-svn: 348456
2018-12-06 03:32:50 +00:00
David L. Jones 5ff7b8a04a Revert r347934 "[SCEV] Guard movement of insertion point for loop-invariants"
This change caused SEGVs in instcombine. (The r347934 change seems to me to be a
precipitating cause, not a root cause. Details are on the llvm-commits thread
for r347934.)

llvm-svn: 348426
2018-12-05 23:13:50 +00:00
Sanjay Patel 998ececef0 [InstCombine] remove dead code from visitExtractElement
Extracting from a splat constant is always handled by InstSimplify.
Move the test for this from InstCombine to InstSimplify to make
sure that stays true.

llvm-svn: 348423
2018-12-05 23:09:33 +00:00
Sanjay Patel de3db684b7 [InstCombine] add/move tests for extractelement; NFC
llvm-svn: 348417
2018-12-05 21:56:13 +00:00
Vedant Kumar 09415a850e [CodeExtractor] Do not marked outlined calls which may resume EH as noreturn
Treat terminators which resume exception propagation as returning instructions
(at least, for the purposes of marking outlined functions `noreturn`). This is
to avoid inserting traps after calls to outlined functions which unwind.

rdar://46129950

llvm-svn: 348404
2018-12-05 19:35:37 +00:00
Christian Bruel 4ead99b3ac Allow norecurse attribute on functions that have debug infos.
Summary: debug intrinsics might be marked norecurse to enable the caller function to be norecurse and optimized if needed. This avoids code gen optimisation differences when -g is used, as in globalOpt.cpp:processInternalGlobal checks.

Reviewers: chandlerc, jmolloy, aprantl

Reviewed By: aprantl

Subscribers: aprantl, llvm-commits

Differential Revision: https://reviews.llvm.org/D55187

llvm-svn: 348381
2018-12-05 16:48:00 +00:00
Sanjay Patel baffae91b2 [InstCombine] simplify icmps with same operands based on dominating cmp
The tests here are based on the motivating cases from D54827.

More background:
1. We don't get these cases in general with SimplifyCFG because the root
   of the pattern match is an icmp, not a branch. I'm not sure how often
   we encounter this pattern vs. the seemingly more likely case with 
   branches, but I don't see evidence to leave the minimal pattern
   unoptimized.

2. This has a chance of increasing compile-time because we're using a
   ValueTracking call to handle the match. The motivating cases could be
   handled with a simpler pair of calls to isImpliedTrueByMatchingCmp/
   isImpliedFalseByMatchingCmp, but I saw that we have a more 
   comprehensive wrapper around those, so we might as well use it here
   unless there's evidence that it's significantly slower.

3. Ideally, we'd handle the fold to constants in InstSimplify, but as
   with the existing code here, we could extend this to handle cases
   where the result is not a constant, but a new combined predicate.
   That would mean splitting the logic across the 2 passes and possibly
   duplicating the pattern-matching cost.

4. As mentioned in D54827, this seems like the kind of thing that should
   be handled in Correlated Value Propagation, but that pass is currently
   limited to dealing with instructions with constant operands, so extending
   this bit of InstCombine is the smallest/easiest way to get these patterns 
   optimized.

llvm-svn: 348367
2018-12-05 15:04:00 +00:00
Max Kazantsev 594cb55686 [NFC] Verify memoryssa in test for PR39783
llvm-svn: 348333
2018-12-05 05:20:08 +00:00
Sanjay Patel 1df1facae2 [InstCombine] add tests for implied simplifications; NFC
Ideally, we would fold all of these in InstSimplify in a
similar way to rL347896, but this is a bit awkward when
we're trying to simplify a compare directly because the
ValueTracking API expects the compare as an input, but
in InstSimplify, we just have the operands of the compare.

Given that we can do transforms besides just simplifications,
we might as well just extend the code in InstCombine (which 
already does simplifications with constant operands).

llvm-svn: 348312
2018-12-04 22:25:33 +00:00
Sanjay Patel 882555628b [InstCombine] auto-generate full checks for icmp overflow tests; NFC
llvm-svn: 348274
2018-12-04 15:41:34 +00:00
Sanjay Patel 320cf5dde5 [InstCombine] auto-generate full checks for icmp dominator tests; NFC
llvm-svn: 348270
2018-12-04 15:00:35 +00:00
Simon Pilgrim 924f98e579 Add common check prefix. NFCI.
llvm-svn: 348265
2018-12-04 14:32:42 +00:00
Alina Sbirlea a2eebb828e Update MemorySSA in SimpleLoopUnswitch.
Summary:
Teach SimpleLoopUnswitch to preserve MemorySSA.

Subscribers: sanjoy, jlebar, Prazek, george.burgess.iv, llvm-commits

Differential Revision: https://reviews.llvm.org/D47022

llvm-svn: 348263
2018-12-04 14:23:37 +00:00
Vedant Kumar d129569e34 [CodeExtractor] Split PHI nodes with incoming values from outlined region (PR39433)
If a PHI node out of extracted region has multiple incoming values from it,
split this PHI on two parts. First PHI has incomings only from region and
extracts with it (they are placed to the separate basic block that added to the
list of outlined), and incoming values in original PHI are replaced by first
PHI. Similar solution is already used in CodeExtractor for PHIs in entry block
(severSplitPHINodes method). It covers PR39433 bug.

Patch by Sergei Kachkov!

Differential Revision: https://reviews.llvm.org/D55018

llvm-svn: 348205
2018-12-03 22:40:21 +00:00
Sanjay Patel 8c65515082 [InstCombine] fix undef propagation bug with shuffle+binop
When we have a shuffle that extends a source vector with undefs
and then do some binop on that, we must make sure that the extra
elements remain undef with that binop if we reverse the order of
the binop and shuffle.

'or' is probably the easiest example to show the bug because
'or C, undef --> -1' (not undef). But there are other 
opcode/constant combinations where this is true as shown by 
the 'shl' test.

llvm-svn: 348191
2018-12-03 21:15:17 +00:00
Roman Lebedev 7bf2fed167 [InstCombine] foldICmpWithLowBitMaskedVal(): disable 2 faulty folds.
These two folds are invalid for this non-constant pattern
when the mask ends up being all-ones:
https://rise4fun.com/Alive/9au
https://rise4fun.com/Alive/UcQM

Fixes https://bugs.llvm.org/show_bug.cgi?id=39861

llvm-svn: 348181
2018-12-03 20:07:58 +00:00
Sanjay Patel 3e66d81ec6 [InstCombine] add tests for shuffle+binop fold; NFC
llvm-svn: 348173
2018-12-03 19:41:21 +00:00
Sanjay Patel 8918a511a1 [SimplifyCFG] add tests for cross block compare folding; NFC
These are the baseline tests for D54827.
Patch based on code originally written by: @yinyuefengyi (luo xionghu)

Differential Revision: https://reviews.llvm.org/D54994

llvm-svn: 348151
2018-12-03 16:55:29 +00:00
Michal Gorny ff13c24cfe [test] Fix use of 'sort -b' in SimpleLoopUnswitch on NetBSD
Add '-k 1' to 'sort -b' calls in SimpleLoopUnswitch tests, as required
for sort implementation on NetBSD.  The '-b' modifier is ineffective
if specified without any key.  Per the manpage:

  Note that the -b option has no effect unless key fields are specified.

Differential Revision: https://reviews.llvm.org/D55168

llvm-svn: 348097
2018-12-02 16:49:33 +00:00
Simon Pilgrim 102854f4d4 [TTI] Reduction costs only need to include a single extract element cost (REAPPLIED)
We were adding the entire scalarization extraction cost for reductions, which returns the total cost of extracting every element of a vector type.

For reductions we don't need to do this - we just need to extract the 0'th element after the reduction pattern has completed.

Fixes PR37731

Rebased and reapplied after being reverted in rL347541 due to PR39774 - which was fixed by D54955/rL347759 and D55017/rL347997

Differential Revision: https://reviews.llvm.org/D54585

llvm-svn: 348076
2018-12-01 14:18:31 +00:00
Nikita Popov 0c5d6ccbfc [InstCombine] Support ssub.sat canonicalization for non-splats
Extend ssub.sat(X, C) -> sadd.sat(X, -C) canonicalization to also
support non-splat vector constants. This is done by generalizing
the implementation of the isNotMinSignedValue() helper to return
true for constants that are non-splat, but don't contain any
signed min elements.

Differential Revision: https://reviews.llvm.org/D55011

llvm-svn: 348072
2018-12-01 10:58:34 +00:00
Teresa Johnson 5b8ff375c8 [ThinLTO] Allow importing of functions with var args
Summary:
Follow up to D54270, which allowed importing of var args functions
unless they called va_start. As pointed out in the post-commit comments
on that patch, the inliner can handle functions that call va_start in
certain situations as well. Go ahead and enable importing of all var
args functions. Measurements on a large binary show that this increases
imports and binary size by an insignificant amount.

Reviewers: davidxl

Subscribers: mehdi_amini, inglorion, eraman, steven_wu, dexonsmith, llvm-commits

Differential Revision: https://reviews.llvm.org/D54607

llvm-svn: 348068
2018-12-01 05:11:46 +00:00
Craig Topper 88270231f8 [X86][LoopVectorize] Replace -mcpu=skylake-avx512 with -mattr=avx512f in some tests that failed when experimenting with defaulting to -mprefer-vector-width=256 for skylake-avx512.
llvm-svn: 348063
2018-12-01 01:38:44 +00:00
Craig Topper 502fc1bdd5 [X86] Split skylake-avx512 run lines in SLP vectorizer tests to cover -mprefer=vector-width=256 and -mprefer-vector-width=512.
This will make these tests immune if we ever change the default behavior of -march=skylake-avx512 to prefer 256 bit vectors.

llvm-svn: 348046
2018-11-30 22:53:21 +00:00
Sanjay Patel 398728732e [InstSimplify] add tests for undef + partial undef constant folding; NFC
These tests should probably go under a separate test file because they
should fold with just -constprop, but they're similar to the scalar
tests already in here.

llvm-svn: 348045
2018-11-30 22:51:34 +00:00
Joseph Tremoulet 27b1e3bd4f [Mem2Reg] Fix nondeterministic corner case
Summary:
When mem2reg inserts phi nodes in blocks with unreachable predecessors,
it adds undef operands for those incoming edges.  When there are
multiple such predecessors, the order is currently based on the address
of the BasicBlocks.  This change fixes that by using the BBNumbers in
the sort/search predicates, as is done elsewhere in mem2reg to ensure
determinism.

Also adds a testcase with a bunch of unreachable preds, which
(nodeterministically) fails without the fix.


Reviewers: majnemer

Reviewed By: majnemer

Subscribers: mgrang, llvm-commits

Differential Revision: https://reviews.llvm.org/D55077

llvm-svn: 348024
2018-11-30 19:20:02 +00:00
Alexey Bataev 3689747619 [SLP]PR39774: Update references of the replaced external instructions.
Summary:
An additional fix for PR39774. Need to update the references for the
RedcutionRoot instruction when it is replaced during the vectorization
phase to avoid compiler crash on reduction vectorization.

Reviewers: RKSimon, spatel

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D55017

llvm-svn: 347997
2018-11-30 15:14:20 +00:00
Renato Golin 135e72e1b9 Add a new reduction pattern match
Adding a new reduction pattern match for vectorizing code similar
to TSVC s3111:

for (int i = 0; i < N; i++)
  if (a[i] > b)
    sum += a[i];

This patch adds support for fadd, fsub and fmull, as well as multiple
branches and different (but compatible) instructions (ex. add+sub) in
different branches.

The difference from the previous patch(https://reviews.llvm.org/D49168)
is as follows:
 - Added check of fast-math property of fp-instruction to the
   previous patch
 - Fix/add some pattern for if-reduction.ll


Differential Revision: https://reviews.llvm.org/D54464

Patch by Takahiro Miyoshi <takahiro.miyoshi@linaro.org>
     and Masakazu Ueno <masakazu.ueno@linaro.org>

llvm-svn: 347989
2018-11-30 13:40:10 +00:00
Max Kazantsev 9cf417db78 [LoopSimplifyCFG] Update MemorySSA in terminator folding. PR39783
Terminator folding transform lacks MemorySSA update for memory Phis,
while they exist within MemorySSA analysis. They need exactly the same
type of updates as regular Phis. Failing to update them properly ends up
with inconsistent MemorySSA and manifests in various assertion failures.

This patch adds Memory Phi updates to this transform.

Thanks to @jonpa for finding this!

Differential Revision: https://reviews.llvm.org/D55050
Reviewed By: asbirlea

llvm-svn: 347979
2018-11-30 10:06:23 +00:00
Max Kazantsev deaa3e2068 [NFC] Simplify and reduce tests for PR39783
llvm-svn: 347976
2018-11-30 09:51:25 +00:00
Warren Ristow 72d1f3a285 [SCEV] Guard movement of insertion point for loop-invariants
r320789 suppressed moving the insertion point of SCEV expressions with
dev/rem operations to the loop header in non-loop-invariant situations.
This, and similar, hoisting is also unsafe in the loop-invariant case,
since there may be a guard against a zero denominator. This is an
adjustment to the fix of r320789 to suppress the movement even in the
loop-invariant case.

This fixes PR30806.

Differential Revision: https://reviews.llvm.org/D54713

llvm-svn: 347934
2018-11-30 00:02:54 +00:00
David Stuttard c6603861d8 Revert r347871 "Fix: Add support for TFE/LWE in image intrinsic"
Also revert fix r347876

One of the buildbots was reporting a failure in some relevant tests that I can't
repro or explain at present, so reverting until I can isolate.

llvm-svn: 347911
2018-11-29 20:14:17 +00:00
Sanjay Patel d802270808 [InstSimplify] fold select with implied condition
This is an almost direct move of the functionality from InstCombine to 
InstSimplify. There's no reason not to do this in InstSimplify because 
we never create a new value with this transform.

(There's a question of whether any dominance-based transform belongs in
either of these passes, but that's a separate issue.)

I've changed 1 of the conditions for the fold (1 of the blocks for the 
branch must be the block we started with) into an assert because I'm not 
sure how that could ever be false.

We need 1 extra check to make sure that the instruction itself is in a
basic block because passes other than InstCombine may be using InstSimplify
as an analysis on values that are not wired up yet.

The 3-way compare changes show that InstCombine has some kind of 
phase-ordering hole. Otherwise, we would have already gotten the intended
final result that we now show here.

llvm-svn: 347896
2018-11-29 18:44:39 +00:00
John Brawn a7eb2c863f [LICM] Reapply r347776 "Make LICM able to hoist phis" with fix
This commit caused a large compile-time slowdown in some cases when NDEBUG is
off due to the dominator tree verification it added. Fix this by only doing
dominator tree and loop info verification when something has been hoisted.

Differential Revision: https://reviews.llvm.org/D52827

llvm-svn: 347889
2018-11-29 17:10:00 +00:00
Sanjay Patel 515b91cdef [SimplifyCFG] auto-generate complete checks; NFC
llvm-svn: 347882
2018-11-29 16:28:37 +00:00
Sanjay Patel 81449c6b0e [InstCombine] auto-generate complete checks; NFC
llvm-svn: 347881
2018-11-29 16:26:03 +00:00
Joseph Tremoulet 926ee459c4 [CallSiteSplitting] Report edge deletion to DomTreeUpdater
Summary:
When splitting musttail calls, the split blocks' original terminators
get removed; inform the DTU when this happens.

Also add a testcase that fails an assertion in the DTU without this fix.


Reviewers: fhahn, junbuml

Reviewed By: fhahn

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D55027

llvm-svn: 347872
2018-11-29 15:27:04 +00:00
David Stuttard de02e4b1cc Add support for TFE/LWE in image intrinsics
TFE and LWE support requires extra result registers that are written in the
event of a failure in order to detect that failure case.
The specific use-case that initiated these changes is sparse texture support.

This means that if image intrinsics are used with either option turned on, the
programmer must ensure that the return type can contain all of the expected
results. This can result in redundant registers since the vector size must be a
power-of-2.

This change takes roughly 6 parts:
1. Modify the instruction defs in tablegen to add new instruction variants that
can accomodate the extra return values.
2. Updates to lowerImage in SIISelLowering.cpp to accomodate setting TFE or LWE
(where the bulk of the work for these instruction types is now done)
3. Extra verification code to catch cases where intrinsics have been used but
insufficient return registers are used.
4. Modification to the adjustWritemask optimisation to account for TFE/LWE being
enabled (requires extra registers to be maintained for error return value).
5. An extra pass to zero initialize the error value return - this is because if
the error does not occur, the register is not written and thus must be zeroed
before use. Also added a new (on by default) option to ensure ALL return values
are zero-initialized that is required for sparse texture support.
6. Disable the inst_combine optimization in the presence of tfe/lwe (later TODO
for this to re-enable and handle correctly).

There's an additional fix now to avoid a dmask=0

For an image intrinsic with tfe where all result channels except tfe
were unused, I was getting an image instruction with dmask=0 and only a
single vgpr result for tfe. That is incorrect because the hardware
assumes there is at least one vgpr result, plus the one for tfe.

Fixed by forcing dmask to 1, which gives the desired two vgpr result
with tfe in the second one.

The TFE or LWE result is returned from the intrinsics using an aggregate
type. Look in the test code provided to see how this works, but in essence IR
code to invoke the intrinsic looks as follows:

%v = call {<4 x float>,i32} @llvm.amdgcn.image.load.1d.v4f32i32.i32(i32 15,
                                      i32 %s, <8 x i32> %rsrc, i32 1, i32 0)
%v.vec = extractvalue {<4 x float>, i32} %v, 0
%v.err = extractvalue {<4 x float>, i32} %v, 1

Differential revision: https://reviews.llvm.org/D48826

Change-Id: If222bc03642e76cf98059a6bef5d5bffeda38dda
llvm-svn: 347871
2018-11-29 15:21:13 +00:00
Martin Storsjo bfd1d27585 Revert "[LICM] Enable control flow hoisting by default" and "[LICM] Reapply r347190 "Make LICM able to hoist phis" with fix"
This reverts commits r347776 and r347778.

The first one, r347776, caused significant compile time regressions
for certain input files, see PR39836 for details.

llvm-svn: 347867
2018-11-29 14:39:39 +00:00
Sanjay Patel 83d1d3f167 [CVP] auto-generate complete test checks; NFC
llvm-svn: 347866
2018-11-29 14:28:47 +00:00
Max Kazantsev a63b275285 [NFC] Add two XFAIL tests from PR39783
llvm-svn: 347845
2018-11-29 09:38:22 +00:00
Sam Parker d6ebf0108e [LoopStrengthReduce] ComplexityLimit as an option
Convert ComplexityLimit into a command line value.

Differential Revision: https://reviews.llvm.org/D54899

llvm-svn: 347843
2018-11-29 08:34:22 +00:00