Commit Graph

7841 Commits

Author SHA1 Message Date
Justin Lebar 96e2915574 [StructurizeCFG] Fix infinite loop in rebuildSSA.
Michel Dänzer reported that r288051, "[StructurizeCFG] Use range-based
for loops", introduced a bug into rebuildSSA, wherein we were iterating
over an instruction's use list while modifying it, without taking care
to do this correctly.

llvm-svn: 288200
2016-11-29 21:49:02 +00:00
Adam Nemet c2ed4b35b4 Revert "[GVN] Basic optimization remark support"
This reverts commit r288046.

Trying to see if the revert fixes a compiler crash during a stage2 LTO
build with a GVN backtrace.

llvm-svn: 288179
2016-11-29 18:32:04 +00:00
Adam Nemet 91d4d93f94 Revert "[GVN, OptDiag] Include the value that is forwarded in load elimination"
This reverts commit r288047.

Trying to see if the revert fixes a compiler crash during a stage2 LTO
build with a GVN backtrace.

llvm-svn: 288178
2016-11-29 18:32:00 +00:00
Adam Nemet a4d3d44ec2 Revert "[GVN, OptDiag] Print the interesting instructions involved in missed load-elimination"
This reverts commit r288090.

Trying to see if the revert fixes a compiler crash during a stage2 LTO
build with a GVN backtrace.

llvm-svn: 288177
2016-11-29 18:31:53 +00:00
Artur Pilipenko 5365746723 [CVP] Remove use of removed flag (-cvp-dont-process-adds) from the test
The flag was removed by 288154

llvm-svn: 288161
2016-11-29 16:43:30 +00:00
Alexey Bataev e951e5eb7b [SLP] Add a new test for tree vectorization starting from insertelement
instruction.

llvm-svn: 288148
2016-11-29 15:37:52 +00:00
Alexey Bataev 4fa063ebc9 [SLPVectorizer] Improved support of partial tree vectorization.
Currently SLP vectorizer tries to vectorize a binary operation and dies
immediately after unsuccessful the first unsuccessfull attempt. Patch
tries to improve the situation, trying to vectorize all binary
operations of all children nodes in the binop tree.

Differential Revision: https://reviews.llvm.org/D25517

llvm-svn: 288115
2016-11-29 08:21:14 +00:00
Adam Nemet b9e53c9056 [GVN, OptDiag] Print the interesting instructions involved in missed load-elimination
This includes the intervening store and the load/store that we're trying
to forward from in the optimization remark for the missed load
elimination.

This is hooked up under a new mode in ORE that allows for compile-time
budget for a bit more analysis to print more insightful messages.  This
mode is currently enabled for -fsave-optimization-record (-Rpass is
trickier since it is controlled in the front-end).

With this we can now print the red remark in http://lab.llvm.org:8080/artifacts/opt-view_test-suite/build/SingleSource/Benchmarks/Dhrystone/CMakeFiles/dry.dir/html/_org_test-suite_SingleSource_Benchmarks_Dhrystone_dry.c.html#L446

Differential Revision: https://reviews.llvm.org/D26490

llvm-svn: 288090
2016-11-29 00:09:22 +00:00
Eli Friedman 5096775393 [SROA] Drop lifetime.start/end intrinsics when they block promotion.
Preserving lifetime markers isn't as important as allowing promotion,
so just drop the lifetime markers if necessary.

This also fixes an assertion failure where other parts of SROA assumed
that lifetime markers never block promotion.

Fixes https://llvm.org/bugs/show_bug.cgi?id=29139.

Differential Revision: https://reviews.llvm.org/D24854

llvm-svn: 288074
2016-11-28 21:50:34 +00:00
Joerg Sonnenberger caaa82d90d Revert r287553: [CodeGenPrep] Skip merging empty case blocks
It results in assertions in lib/Analysis/BlockFrequencyInfoImpl.cpp line
670 ("Expected irreducible CFG").

llvm-svn: 288052
2016-11-28 18:56:54 +00:00
Adam Nemet a415a9bde6 [GVN, OptDiag] Include the value that is forwarded in load elimination
This requires some changes to the opt-diag API.  Hal and I have
discussed this at the Dev Meeting and came up with a streaming delimiter
(setExtraArgs) to solve this.

Arguments after this delimiter are only included in the optimization
records and not in the remarks printed in the compiler output.  (Note,
how in the test the content of the YAML file changes but the remarks on
the compiler output don't.)

This implements the green GVN message with a bug fix at line
http://lab.llvm.org:8080/artifacts/opt-view_test-suite/build/SingleSource/Benchmarks/Dhrystone/CMakeFiles/dry.dir/html/_org_test-suite_SingleSource_Benchmarks_Dhrystone_dry.c.html#L446

The fix is that now we properly include the constant value in the
message: "load of type i32 eliminated in favor of 7"

Differential Revision: https://reviews.llvm.org/D26489

llvm-svn: 288047
2016-11-28 17:45:34 +00:00
Adam Nemet e5112b14b9 [GVN] Basic optimization remark support
Follow-on patches will add more interesting cases.

The goal of this patch-set is to get the GVN messages printed in
opt-viewer from Dhrystone as was presented in my Dev Meeting talk.  This
is the optimization view for the function (the last remark in the
function has a bug which is fixed in this series):
http://lab.llvm.org:8080/artifacts/opt-view_test-suite/build/SingleSource/Benchmarks/Dhrystone/CMakeFiles/dry.dir/html/_org_test-suite_SingleSource_Benchmarks_Dhrystone_dry.c.html#L430

Differential Revision: https://reviews.llvm.org/D26488

llvm-svn: 288046
2016-11-28 17:45:28 +00:00
James Molloy 6bed13c551 [InlineCost] Reduce inline thresholds to compensate for cost changes
In r286814, the algorithm for calculating inline costs changed. This
caused more inlining to take place which is especially apparent
in optsize and minsize modes.

As the cost calculation removed a skewed behaviour (we were inconsistent
about the cost of calls) it isn't possible to update the thresholds to
get exactly the same behaviour as before. However, this threshold change
accounts for the very common case where an inline candidate has no
calls within it. In this case, r286814 would inline around 5-6 more (IR)
instructions.

The changes to -Oz have been heavily benchmarked. The "obvious" value
for the inline threshold at -Oz is zero, but due to inaccuracies in the
inline heuristics this can actually cause code size increases due to
not inlining key thunk functions (that then disappear). Experimentally,
5 was the sweet spot for code size over the test-suite.

For -Os, this change removes the outlier results shown up by green dragon
(http://104.154.54.203/db_default/v4/nts/13248).

Fixes D26848.

llvm-svn: 288024
2016-11-28 11:07:37 +00:00
Mohammad Shahid 2f5cb60b07 [SLP] Add new and update existing lit testfor providing more context to incoming patch for vectorization of jumbled load
Change-Id: Ifb9091bb0f84c1937c2c8bd2fc345734f250d2f9
llvm-svn: 287992
2016-11-27 03:35:31 +00:00
Sanjay Patel 12a2af447b [InstCombine] add test to show missing vector optimization; NFC
llvm-svn: 287982
2016-11-26 16:13:23 +00:00
Sanjay Patel 8bd69b7ed9 [InstCombine] don't drop metadata in FoldOpIntoSelect()
llvm-svn: 287980
2016-11-26 15:23:20 +00:00
Sanjay Patel 534e270ae5 [SimplifyCFG] auto-generate better checks; NFC
llvm-svn: 287954
2016-11-25 21:12:39 +00:00
Sanjay Patel d1a147f9f4 [SimplifyCFG] auto-generate better checks; NFC
llvm-svn: 287953
2016-11-25 21:07:13 +00:00
Abhilash Bhandari 54e5a1a4da [Loop Unswitch] Patch to selective unswitch only the reachable branch instructions.
Summary:
The iterative algorithm for Loop Unswitching may render some of the branches unreachable in the unswitched loops.
Given the exponential nature of the algorithm, this is quite an overhead.
This patch fixes this problem by selectively unswitching only those branches within a loop that are reachable from the loop header.

Reviewers: Michael Zolothukin, Anna Thomas, Weiming Zhao.
Subscribers: llvm-commits.

Differential Revision: http://reviews.llvm.org/D26299

llvm-svn: 287925
2016-11-25 14:07:44 +00:00
Simon Pilgrim 841d7ca463 [X86][AVX512] Add support for v2i64 fptosi/fptoui/sitofp/uitofp on AVX512DQ-only targets
Use 512-bit instructions with subvector insertion/extraction like we do in a number of similar circumstances

llvm-svn: 287882
2016-11-24 14:46:55 +00:00
Alexey Bataev 2eaacda53e [SLP] Add more tests for SLP Vectorizer.
llvm-svn: 287801
2016-11-23 20:10:32 +00:00
Alina Sbirlea a3d2f703a5 [LoadStoreVectorizer] Enable vectorization of stores in the presence of an aliasing load
Summary:
The "getVectorizablePrefix" method would give up if it found an aliasing load for a store chain.
In practice, the aliasing load can be treated as a memory barrier and all stores that precede it
are a valid vectorizable prefix.
Issue found by volkan in D26962. Testcase is a pruned version of the one in the original patch.

Reviewers: jlebar, arsenm, tstellarAMD

Subscribers: mzolotukhin, wdng, nhaehnle, anna, volkan, llvm-commits

Differential Revision: https://reviews.llvm.org/D27008

llvm-svn: 287781
2016-11-23 17:43:15 +00:00
Simon Pilgrim 4e9b9cbee9 [X86][AVX512] Add support for v4i64 fptosi/fptoui/sitofp/uitofp on AVX512DQ-only targets
Use 512-bit instructions with subvector insertion/extraction like we do in a number of similar circumstances

llvm-svn: 287762
2016-11-23 14:01:18 +00:00
Simon Pilgrim 03cd8f887c [CostModel][X86] Add missing AVX512DQ v8i64 fptosi/sitofp costs
llvm-svn: 287760
2016-11-23 13:42:09 +00:00
Davide Italiano f6fbe21bef [SCCP] Add a test for switches on undef.
Without this test, you can just remove the code fixing the
switch to the first constant in ResolvedUndefs in and everything
pass. This test, instead, fails with an assertion if the code
is removed. Found while refactoring SCCP to integrate undef in
the solver.

llvm-svn: 287731
2016-11-23 01:42:39 +00:00
Dehao Chen 554f500ae2 Before sample pgo annotation, do not inline a function that has no debug info. (NFC)
If there is no debug info in the callee, inlining it will not help annotator. This avoids infinite loop as reported in PR/31119.

llvm-svn: 287710
2016-11-22 22:50:01 +00:00
Davide Italiano e7ffae9dea [SCCP] Remove code in visitBinaryOperator (and add tests).
We visit and/or, we try to derive a lattice value for the
instruction even if one of the operands is overdefined.
If the non-overdefined value is still 'unknown' just return and wait
for ResolvedUndefsIn to "plug in" the correct value. This simplifies
the logic a bit. While I'm here add tests for missing cases.

llvm-svn: 287709
2016-11-22 22:11:25 +00:00
Sanjay Patel e359eaaf70 [InstCombine] change bitwise logic type to eliminate bitcasts
In PR27925:
https://llvm.org/bugs/show_bug.cgi?id=27925

...we proposed adding this fold to eliminate a bitcast. In D20774, there was 
some concern about changing the type of a bitwise op as well as creating 
bitcasts that might not be free for a target. However, if we're strictly 
eliminating an instruction (by limiting this to one-use ops), then we should 
be able to do this in InstCombine.

But we're cautiously restricting the transform for now to vector types to
avoid possible backend problems. A transform to make sure the logic op is
legal for the target should be added to reverse this transform and improve
codegen.

Differential Revision: https://reviews.llvm.org/D26641

llvm-svn: 287707
2016-11-22 22:05:48 +00:00
Vyacheslav Klochkov 9a630dfb57 Fixed the lost FastMathFlags in GVN(Global Value Numbering).
Reviewer: Hal Finkel.
Differential Revision: https://reviews.llvm.org/D26952

llvm-svn: 287700
2016-11-22 20:52:53 +00:00
Vyacheslav Klochkov 68a677ae5b Fixed the lost FastMathFlags in Reassociate optimization.
Reviewer: Hal Finkel.
Differential Revision: https://reviews.llvm.org/D26957

llvm-svn: 287695
2016-11-22 20:23:04 +00:00
Justin Lebar 3e50a5be8f [CodeGenPrepare] Don't sink non-cheap addrspacecasts.
Summary:
Previously, CGP would unconditionally sink addrspacecast instructions,
even going so far as to sink them into a loop.

Now we check that the cast is "cheap", as defined by TLI.

We introduce a new "is-cheap" function to TLI rather than using
isNopAddrSpaceCast because some GPU platforms want the ability to ask
for non-nop casts to be sunk.

Reviewers: arsenm, tra

Subscribers: jholewinski, wdng, llvm-commits

Differential Revision: https://reviews.llvm.org/D26923

llvm-svn: 287591
2016-11-21 22:49:15 +00:00
Eli Friedman c0bba1a96d [LoopReroll] Make root-finding more aggressive.
Allow using an instruction other than a mul or phi as the base for
root-finding. For example, the included testcase includes a loop
which requires using a getelementptr as the base for root-finding.

Differential Revision: https://reviews.llvm.org/D26529

llvm-svn: 287588
2016-11-21 22:35:34 +00:00
Sanjay Patel 3b0bafee63 [InstCombine] canonicalize min/max constant to select's false value
This is a first step towards canonicalization and improved folding/codegen
for integer min/max as discussed here:
http://lists.llvm.org/pipermail/llvm-dev/2016-November/106868.html

Here, we're just matching the simplest min/max patterns and adjusting the
icmp predicate while swapping the select operands.

I've included FIXME tests in test/Transforms/InstCombine/select_meta.ll
so it's easier to see how this might be extended (corresponds to the TODO
comment in the code). That's also why I'm using matchSelectPattern()
rather than a simpler check; once the backend is patched, we can just 
remove some of the restrictions to allow the obfuscated min/max patterns
in the FIXME tests to be matched.

Differential Revision: https://reviews.llvm.org/D26525

llvm-svn: 287585
2016-11-21 22:04:14 +00:00
Hubert Tong 1e5677649c reassociate-deadinst.ll: avoid accidental match on path
Pipe from stdin to avoid accidentally matching on the path.

llvm-svn: 287583
2016-11-21 21:53:01 +00:00
Mandeep Singh Grang 17e3f9b79d [MemorySSA] Fix unit tests broken by D26704
Summary:
D26704 fixed the non-determinism in codegen by sorting basic blocks before
iteration so as to have a defined iteration order. As a result we need to fix
the names (numbers) of the temporaries in the following unit tests:
  test/Transforms/Util/MemorySSA/multi-edges.ll
  test/Transforms/Util/MemorySSA/multiple-backedges-hal.ll

Reviewers: dberlin, david2050, mgrang

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D26926

llvm-svn: 287575
2016-11-21 20:39:08 +00:00
Mandeep Singh Grang 73f0095d71 [MemorySSA] Fix for non-determinism in codegen
This patch fixes the non-determinism caused due to iterating SmallPtrSet's
which was uncovered due to the experimental "reverse iteration order " patch:
https://reviews.llvm.org/D26718

The following unit tests failed because of the undefined order of iteration.
LLVM :: Transforms/Util/MemorySSA/cyclicphi.ll
LLVM :: Transforms/Util/MemorySSA/many-dom-backedge.ll
LLVM :: Transforms/Util/MemorySSA/many-doms.ll
LLVM :: Transforms/Util/MemorySSA/phi-translation.ll

Reviewers: dberlin, mgrang

Subscribers: dberlin, llvm-commits, david2050

Differential Revision: https://reviews.llvm.org/D26704

llvm-svn: 287563
2016-11-21 19:33:02 +00:00
Jun Bum Lim 82f55c5446 [CodeGenPrep] Skip merging empty case blocks
Summary: Merging an empty case block into the header block of switch could cause
ISel to add COPY instructions in the header of switch, instead of the case
block, if the case block is used as an incoming block of a PHI. This could
potentially increase dynamic instructions, especially when the switch is in a
loop. I added a test case which was reduced from the benchmark I was targetting.

Reviewers: t.p.northover, mcrosier, manmanren, wmi, davidxl

Subscribers: qcolombet, danielcdh, hfinkel, mcrosier, llvm-commits

Differential Revision: https://reviews.llvm.org/D22696

llvm-svn: 287553
2016-11-21 16:47:28 +00:00
Yaxun Liu 02f75f31e0 Fix known zero bits for addrspacecast.
Currently LLVM assumes that a pointer addrspacecasted to a different addr space is equivalent to trunc or zext bitwise, which is not true. For example, in amdgcn target, when a null pointer is addrspacecasted from addr space 4 to 0, its value is changed from i64 0 to i32 -1.

This patch teaches LLVM not to assume known bits of addrspacecast instruction to its operand.

Differential Revision: https://reviews.llvm.org/D26803

llvm-svn: 287545
2016-11-21 15:42:31 +00:00
Davide Italiano 2ae76dd239 [GlobalSplit] Port to the new pass manager.
llvm-svn: 287511
2016-11-21 00:28:23 +00:00
Sanjay Patel 47e577eb92 [InstCombine] add tests to show likely unwanted select widening; NFC
This is a prerequisite patch for D26556:
https://reviews.llvm.org/D26556

...because there was no direct coverage for these folds (which in some cases are adding instructions).

llvm-svn: 287400
2016-11-18 23:22:00 +00:00
Michael Zolotukhin 5020c9971b [LoopSimplify] Preserve LCSSA when removing edges from unreachable blocks.
This fixes PR30454.

llvm-svn: 287379
2016-11-18 21:01:12 +00:00
Florian Hahn 77382be56b [simplifycfg][loop-simplify] Preserve loop metadata in 2 transformations.
insertUniqueBackedgeBlock in lib/Transforms/Utils/LoopSimplify.cpp now
propagates existing llvm.loop metadata to newly the added backedge.

llvm::TryToSimplifyUncondBranchFromEmptyBlock in lib/Transforms/Utils/Local.cpp
now propagates existing llvm.loop metadata to the branch instructions in the
predecessor blocks of the empty block that is removed.

Differential Revision: https://reviews.llvm.org/D26495

llvm-svn: 287341
2016-11-18 13:12:07 +00:00
Craig Topper 1de753f7f5 [InstCombine][AVX-512] Teach InstCombineCalls how to handle the intrinsics for variable shift with 16-bit elements.
This is a straightforward extension of the existing support for 32/64-bit element types. Just needed to add the additional instrinsics to the switches.

llvm-svn: 287316
2016-11-18 06:04:33 +00:00
Craig Topper 07f1c15995 [AVX-512] Support FCOPYSIGN for v16f32 and v8f64
Summary:
This extends FCOPYSIGN support to 512-bit vectors.

I've also added tests to show what the 128-bit and 256-bit cases look like with broadcast loads.

Reviewers: delena, zvi, RKSimon, spatel

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D26791

llvm-svn: 287298
2016-11-18 02:25:34 +00:00
Dehao Chen 41d72a8632 Use profile info to adjust loop unroll threshold.
Summary:
For flat loop, even if it is hot, it is not a good idea to unroll in runtime, thus we set a lower partial unroll threshold.
For hot loop, we set a higher unroll threshold and allows expensive tripcount computation to allow more aggressive unrolling.

Reviewers: davidxl, mzolotukhin

Subscribers: sanjoy, mehdi_amini, llvm-commits

Differential Revision: https://reviews.llvm.org/D26527

llvm-svn: 287186
2016-11-17 01:17:02 +00:00
Peter Collingbourne f72a8d4e08 Introduce GlobalSplit pass.
This pass splits globals into elements using inrange annotations on
getelementptr indices.

Differential Revision: https://reviews.llvm.org/D22295

llvm-svn: 287178
2016-11-16 23:40:26 +00:00
Craig Topper 6910fa0ef4 [X86] Remove the scalar intrinsics for fadd/fsub/fdiv/fmul
Summary: These intrinsics have been unused for clang for a while. This patch removes them. We auto upgrade them to extractelements, a scalar operation and then an insertelement. This matches the sequence used by clangs intrinsic file.

Reviewers: zvi, delena, RKSimon

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D26660

llvm-svn: 287083
2016-11-16 05:24:10 +00:00
Vyacheslav Klochkov b3dc774a99 Fixed the lost FastMathFlags for CALL operations in SLPVectorizer.
Reviewer: Michael Zolotukhin.
Differential Revision: https://reviews.llvm.org/D26575

llvm-svn: 287064
2016-11-16 00:55:50 +00:00
Justin Lebar 2860573529 [BypassSlowDivision] Handle division by constant numerators better.
Summary:
We don't do BypassSlowDivision when the denominator is a constant, but
we do do it when the numerator is a constant.

This patch makes two related changes to BypassSlowDivision when the
numerator is a constant:

 * If the numerator is too large to fit into the bypass width, don't
   bypass slow division (because we'll never run the smaller-width
   code).

 * If we bypass slow division where the numerator is a constant, don't
   OR together the numerator and denominator when determining whether
   both operands fit within the bypass width.  We need to check only the
   denominator.

Reviewers: tra

Subscribers: llvm-commits, jholewinski

Differential Revision: https://reviews.llvm.org/D26699

llvm-svn: 287062
2016-11-16 00:44:47 +00:00
Wei Mi 37c4aaaf52 Revert r286999 which caused buildbot test failures. Some testcases need to be made target specific.
llvm-svn: 287014
2016-11-15 19:42:05 +00:00