Commit Graph

467 Commits

Author SHA1 Message Date
Markus Lavin 06758c6a61 [DebugInfo] Improve dbg preservation in LSR.
Use SCEV to salvage additional @llvm.dbg.value that have turned into
referencing undef after transformation (and traditional
salvageDebugInfo). Before transformation compute SCEV for each
@llvm.dbg.value in the loop body and store it (along side its current
DIExpression). After transformation update those @llvm.dbg.value now
referencing undef by comparing its stored SCEV to the SCEV of the
current loop-header PHI-nodes. Allow match with offset by inserting
compensation code in the DIExpression.

Includes fix for the nullptr deref that caused the original commit
to be reverted in 9d63029770.

Fixes : PR38815

Differential Revision: https://reviews.llvm.org/D87494
2020-10-08 13:16:43 +02:00
Chen Zheng f05608707c [PowerPC] implement target hook getTgtMemIntrinsic
This patch can make pass recognize Powerpc related memory intrinsics.

Reviewed By: steven.zhang

Differential Revision: https://reviews.llvm.org/D88373
2020-10-07 00:02:44 -04:00
Nikita Popov 9d63029770 Revert "[DebugInfo] Improve dbg preservation in LSR."
This reverts commit a3caf7f610.

The ReleaseLTO-g test-suite configuration has been failing
to build since this commit, because clang segfaults while
building 7zip.
2020-10-05 19:02:30 +02:00
Markus Lavin a3caf7f610 [DebugInfo] Improve dbg preservation in LSR.
Use SCEV to salvage additional @llvm.dbg.value that have turned into
referencing undef after transformation (and traditional
salvageDebugInfo). Before transformation compute SCEV for each
@llvm.dbg.value in the loop body and store it (along side its current
DIExpression). After transformation update those @llvm.dbg.value now
referencing undef by comparing its stored SCEV to the SCEV of the
current loop-header PHI-nodes. Allow match with offset by inserting
compensation code in the DIExpression.

Fixes : PR38815

Differential Revision: https://reviews.llvm.org/D87494
2020-10-05 09:55:16 +02:00
Fangrui Song 8fdac7cb7a Revert D71539 "Recommit "[SCEV] Look through single value PHIs.""
This reverts commit 11dccf8d3a.

A bootstrapped clang crashes (due to ArrayRef::front called on an empty
ArrayRef) when compiling some files.  Very strangely, this only reproduces with
modules.

```
13 0x0000564d3349e968 llvm::ArrayRef<llvm::BasicBlock*>::front() const /proc/self/cwd/llvm/include/llvm/ADT/ArrayRef.h:160:7
14 0x0000564d3349e896 llvm::LoopBase<llvm::BasicBlock, llvm::Loop>::getHeader() const /proc/self/cwd/llvm/include/llvm/Analysis/LoopInfo.h:104:50
15 0x0000564d3349fd9d llvm::LoopBase<llvm::BasicBlock, llvm::Loop>::getLoopLatch() const /proc/self/cwd/llvm/include/llvm/Analysis/LoopInfoImpl.h:210:11
16 0x0000564d33593c8a llvm::ScalarEvolution::computeBackedgeTakenCount(llvm::Loop const*, bool) /proc/self/cwd/llvm/lib/Analysis/ScalarEvolution.cpp:6933:15
17 0x0000564d33592ebc llvm::ScalarEvolution::getBackedgeTakenInfo(llvm::Loop const*) /proc/self/cwd/llvm/lib/Analysis/ScalarEvolution.cpp:0:30
18 0x0000564d33593a54 llvm::ScalarEvolution::getBackedgeTakenCount(llvm::Loop const*, llvm::ScalarEvolution::ExitCountKind) /proc/self/cwd/llvm/lib/Analysis/ScalarEvolution.cpp:6487:36
19 0x0000564d32be2402 llvm::ScalarEvolution::getConstantMaxBackedgeTakenCount(llvm::Loop const*) /proc/self/cwd/llvm/include/llvm/Analysis/ScalarEvolution.h:768:5
20 0x0000564d33590807 llvm::ScalarEvolution::getRangeRef(llvm::SCEV const*, llvm::ScalarEvolution::RangeSignHint) /proc/self/cwd/llvm/lib/Analysis/ScalarEvolution.cpp:5495:19
21 0x0000564d320abab7 llvm::ScalarEvolution::getSignedRange(llvm::SCEV const*) /proc/self/cwd/llvm/include/llvm/Analysis/ScalarEvolution.h:840:12
22 0x0000564d335a03aa llvm::ScalarEvolution::isKnownPredicateViaConstantRanges(llvm::CmpInst::Predicate, llvm::SCEV const*, llvm::SCEV const*) /proc/self/cwd/llvm/lib/Analysis/ScalarEvolution.cpp:9239:60
23 0x0000564d33586a80 llvm::ScalarEvolution::isKnownViaNonRecursiveReasoning(llvm::CmpInst::Predicate, llvm::SCEV const*, llvm::SCEV const*) /proc/self/cwd/llvm/lib/Analysis/ScalarEvolution.cpp:10284:60
```
2020-09-21 17:21:43 -07:00
Florian Hahn 11dccf8d3a Recommit "[SCEV] Look through single value PHIs."
This commit was originally because it was suspected to cause a crash,
but a reproducer did not surface.

A crash that was exposed by this change was fixed in 1d8f2e5292.

This reverts the revert commit 0581c0b0ee.
2020-09-21 11:59:50 +01:00
Owen Anderson 3d9c85e4d8 Mark FMOV constant materialization as being as cheap as a move.
This prevents us from doing things like LICM'ing it out of a loop,
which is usually a net loss because we end up having to spill a
callee-saved FPR to accomodate it.

This does perturb instruction scheduling around this instruction,
so a number of tests had to be updated to account for it.

Reviewed By: t.p.northover

Differential Revision: https://reviews.llvm.org/D87316
2020-09-10 16:38:59 +00:00
Andrew Wei 78071fb524 [LSR] Canonicalize a formula before insert it into the list
In GenerateConstantOffsetsImpl, we may generate non canonical Formula
if BaseRegs of that Formula is updated and includes a recurrent expr reg
related with current loop while its ScaledReg is not.

Patched by: mdchen
Reviewed By: qcolombet

Differential Revision: https://reviews.llvm.org/D86939
2020-09-08 13:14:53 +08:00
Florian Hahn 88818491b9 [LoopIdiom,LSR] Add additional tests for SCEVExpander cleanups. 2020-08-21 13:48:31 +01:00
Ali Tamur 0581c0b0ee Revert "[SCEV] Look through single value PHIs."
This reverts commit e441b7a7a0.

This patch causes a compile error in tensorflow opensource project. The stack trace looks like:

Point of crash:
llvm/include/llvm/Analysis/LoopInfoImpl.h : line 35

(gdb) ptype *this
type = const class llvm::LoopBase<llvm::BasicBlock, llvm::Loop> [with BlockT = llvm::BasicBlock, LoopT = llvm::Loop]

(gdb) p *this
$1 = {ParentLoop = 0x0, SubLoops = std::vector of length 0, capacity 0, Blocks = std::vector of length 0, capacity 1,
  DenseBlockSet = {<llvm::SmallPtrSetImpl<llvm::BasicBlock const*>> = {<llvm::SmallPtrSetImplBase> = {<llvm::DebugEpochBase> = {Epoch = 3}, SmallArray = 0x1b2bf6c8, CurArray = 0x1b2bf6c8,
        CurArraySize = 8, NumNonEmpty = 0, NumTombstones = 0}, <No data fields>}, SmallStorage = {0xfffffffffffffffe, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}}, IsInvalid = true}

(gdb) p *this->DenseBlockSet->CurArray
$2 = (const void *) 0xfffffffffffffffe

I will try to get a case from tensorflow or use creduce to get a small case.
2020-08-12 23:13:24 -07:00
Florian Hahn e441b7a7a0 [SCEV] Look through single value PHIs.
Now that SCEVExpander can preserve LCSSA form,
we do not have to worry about LCSSA form when
trying to look through PHIs. SCEVExpander will take
care of inserting LCSSA PHI nodes as required.

This increases precision of the analysis in some cases.

Reviewed By: mkazantsev, bmahjour

Differential Revision: https://reviews.llvm.org/D71539
2020-08-12 10:03:42 +01:00
Florian Hahn 23817cbd0b [SCEVExpander] Make sure cast properly dominates Builder's IP.
The selected cast must properly dominate the Builder's IP, so we cannot
re-use the cast, if it matches the builder's IP.
2020-08-09 16:51:19 +01:00
Florian Hahn c70f0b9d4a [SCEVExpander] Avoid re-using existing casts if it means updating users.
Currently the SCEVExpander tries to re-use existing casts, even if they
are not exactly at the insertion point it was asked to create the cast.
To do so in some case, it creates a new cast at the insertion point and
updates all users to use the new cast.

This behavior is problematic, because it changes the IR outside of the
instructions created during the expansion. Therefore we cannot
completely undo all changes made during expansion.

This re-use should be only an extra optimization, so only using the new
cast in the expanded instructions should not be a correctness issue.
There are many cases equivalent instructions are created during
expansion.

This patch also adjusts findInsertPointAfter to skip instructions
inserted during expansion. This enables re-using existing casts without
the renaming any uses, by picking a better insertion point.

Reviewed By: efriedma, lebedev.ri

Differential Revision: https://reviews.llvm.org/D84399
2020-08-09 13:25:17 +01:00
Chen Zheng 8c5edf5023 [SCEV] don't query getSCEV() for incomplete phis
querying getSCEV() for incomplete phis leads to wrong cache value in `ExprToIVMap`,
because incomplete phis may be simplified to same value before get SCEV expression.

Reviewed By: lebedev.ri, mkazantsev

Differential Revision: https://reviews.llvm.org/D77560
2020-08-01 02:38:54 -04:00
Florian Hahn be2ea29ee1 [SCEV] Add additional tests.
Increase test coverage for upcoming changes to how SCEV deals with LCSSA
phis.
2020-07-28 16:15:57 +01:00
Florian Hahn 82e35197e6 [LSR] Re-generate check lines for test.
The test is quite frafile, as the check lines match IR numbers and it is
not obvious why only a very small subset is checked.

Re-generate check lines, so further changes are more obvious.
2020-07-23 13:53:53 +01:00
Arthur Eubanks 3d12e79094 [NewPM][LSR] Rename strength-reduce -> loop-reduce
The legacy pass was called "loop-reduce".

This lowers the number of check-llvm failures under NPM by 83.

Reviewed By: ychen

Differential Revision: https://reviews.llvm.org/D82925
2020-07-02 11:15:29 -07:00
Huihui Zhang 42048ff972 [NFC] Move test vscale-factor-out-constant.ll to AArch64 sub-directory.
Vscale scalable vector is specific to AArch64 target.

Bring back 'uglygep' check.
2020-06-04 12:55:28 -07:00
Huihui Zhang f7f1abdb88 [NFC] Temporarily disable check for 'uglygep' while investigating some buildbot failure.
The purpose of vscale-factor-out-constant.ll is to check we are crashing
with blind cast 'Factor' in a MulExpr to SCEVConstant.
2020-06-04 10:54:02 -07:00
Huihui Zhang bd43f78c76 [LSR][SCEVExpander] Avoid blind cast 'Factor' to SCEVConstant in FactorOutConstant.
Summary:
In SCEVExpander FactorOutConstant(), when GEP indexing into/over scalable vector,
it is legal for the 'Factor' in a MulExpr to be the size of a scalable vector
instead of a compile-time constant.

Current upstream crash with the test attached.

Reviewers: efriedma, sdesmalen, sanjoy.google, mkazantsev

Reviewed By: efriedma

Subscribers: hiraditya, javed.absar, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D80973
2020-06-04 10:33:39 -07:00
Pierre-vh 2668775f66 [LSR][ARM] Add new TTI hook to mark some LSR chains as profitable
This patch adds a new TTI hook to allow targets to tell LSR that
a chain including some instruction is already profitable and
should not be optimized. This patch also adds an implementation
of this TTI hook for ARM so LSR doesn't optimize chains that include
the VCTP intrinsic.

Differential Revision: https://reviews.llvm.org/D79418
2020-05-13 14:18:28 +01:00
Simon Pilgrim 396b1ee0e0 [LoopStrengthReduce] Fix test checks to fix issue reported on D77227 2020-04-03 18:10:33 +01:00
shchenz e344f8b9db Revert "[LSR] re-add testcase for wrongly phi node elimination - NFC"
This reverts commit f25a1b4f58.
ARM and hexagon fail at the new added case.
2020-04-01 12:58:06 +00:00
shchenz f25a1b4f58 [LSR] re-add testcase for wrongly phi node elimination - NFC
Retest the case on X86/SystemZ/AArch64/PowerPC
2020-04-01 11:11:17 +00:00
shchenz 8b8cd150a4 Revert "[LSR] add testcase for wrongly phi node elimination - NFC"
This reverts commit dbf5e4f6c7.
The testcase has different behaviour on PowerPC and X86.
2020-04-01 10:28:43 +00:00
shchenz dbf5e4f6c7 [LSR] add testcase for wrongly phi node elimination - NFC 2020-04-01 09:58:58 +00:00
Fangrui Song a36ddf0aa9 Migrate function attribute "no-frame-pointer-elim"="false" to "frame-pointer"="none" as cleanups after D56351 2019-12-24 16:27:51 -08:00
Fangrui Song 502a77f125 Migrate function attribute "no-frame-pointer-elim" to "frame-pointer"="all" as cleanups after D56351 2019-12-24 15:57:33 -08:00
Hubert Tong e9983eed5a [NFC][LSR] Avoid undefined grep in pr2570.ll
greater-than-sign is not a BRE special character.

POSIX.1-2017 XBD Section 9.3.2 indicates that the interpretation of `\>`
is undefined. This patch replaces the pattern.

llvm-svn: 363828
2019-06-19 16:02:54 +00:00
Sam Parker 60d6fb2a63 [SCEV] Use NoWrapFlags when expanding a simple mul
Second functional change following on from rL362687. Pass the
NoWrapFlags from the MulExpr to InsertBinop when we're generating a
shl or mul.

Differential Revision: https://reviews.llvm.org/D61934

llvm-svn: 363540
2019-06-17 10:05:18 +00:00
Fangrui Song ac14f7b10c [lit] Delete empty lines at the end of lit.local.cfg NFC
llvm-svn: 363538
2019-06-17 09:51:07 +00:00
Sam Parker 0cf9639a9c [SCEV] Pass NoWrapFlags when expanding an AddExpr
InsertBinop now accepts NoWrapFlags, so pass them through when
expanding a simple add expression.

This is the first re-commit of the functional changes from rL362687,
which was previously reverted.

Differential Revision: https://reviews.llvm.org/D61934

llvm-svn: 363364
2019-06-14 09:19:41 +00:00
David Bolvansky a9d8388e80 [NFC] Updated testcase for D54411/rL363284
llvm-svn: 363285
2019-06-13 18:13:03 +00:00
Benjamin Kramer f1249442cf Revert "[SCEV] Use wrap flags in InsertBinop"
This reverts commit r362687. Miscompiles llvm-profdata during selfhost.

llvm-svn: 362699
2019-06-06 12:35:46 +00:00
Sam Parker 7cc580f5e9 [SCEV] Use wrap flags in InsertBinop
If the given SCEVExpr has no (un)signed flags attached to it, transfer
these to the resulting instruction or use them to find an existing
instruction.

Differential Revision: https://reviews.llvm.org/D61934

llvm-svn: 362687
2019-06-06 08:56:26 +00:00
Craig Topper 46e5052b8e [X86FixupLEAs] Turn optIncDec into a generic two address LEA optimizer. Support LEA64_32r properly.
INC/DEC is really a special case of a more generic issue. We should also turn leas into add reg/reg or add reg/imm regardless of the slow lea flags.

This also supports LEA64_32 which has 64 bit input registers and 32 bit output registers. So we need to convert the 64 bit inputs to their 32 bit equivalents to check if they are equal to base reg.

One thing to note, the original code preserved the kill flags by adding operands to the new instruction instead of using addReg. But I think tied operands aren't supposed to have the kill flag set. I dropped the kill flags, but I could probably try to preserve it in the add reg/reg case if we think its important. Not sure which operand its supposed to go on for the LEA64_32r instruction due to the super reg implicit uses. Though I'm also not sure those are needed since they were probably just created by an INSERT_SUBREG from a 32-bit input.

Differential Revision: https://reviews.llvm.org/D61472

llvm-svn: 361691
2019-05-25 06:17:47 +00:00
Keno Fischer a1a4adf4b9 [SCEV] Add explicit representations of umin/smin
Summary:
Currently we express umin as `~umax(~x, ~y)`. However, this becomes
a problem for operands in non-integral pointer spaces, because `~x`
is not something we can compute for `x` non-integral. However, since
comparisons are generally still allowed, we are actually able to
express `umin(x, y)` directly as long as we don't try to express is
as a umax. Support this by adding an explicit umin/smin representation
to SCEV. We do this by factoring the existing getUMax/getSMax functions
into a new function that does all four. The previous two functions were
largely identical.

Reviewed By: sanjoy
Differential Revision: https://reviews.llvm.org/D50167

llvm-svn: 360159
2019-05-07 15:28:47 +00:00
David Green 63a2aa715a [LSR] Limit the recursion for setup cost
In some circumstances we can end up with setup costs that are very complex to
compute, even though the scevs are not very complex to create. This can also
lead to setupcosts that are calculated to be exactly -1, which LSR treats as an
invalid cost. This patch puts a limit on the recursion depth for setup cost to
prevent them taking too long.

Thanks to @reames for the report and test case.

Differential Revision: https://reviews.llvm.org/D60944

llvm-svn: 358958
2019-04-23 08:52:21 +00:00
Eric Christopher cee313d288 Revert "Temporarily Revert "Add basic loop fusion pass.""
The reversion apparently deleted the test/Transforms directory.

Will be re-reverting again.

llvm-svn: 358552
2019-04-17 04:52:47 +00:00
Eric Christopher a863435128 Temporarily Revert "Add basic loop fusion pass."
As it's causing some bot failures (and per request from kbarton).

This reverts commit r358543/ab70da07286e618016e78247e4a24fcb84077fda.

llvm-svn: 358546
2019-04-17 02:12:23 +00:00
Quentin Colombet fda0426888 [LSR] Rewrite misses some fixup locations if it splits critical edge
If LSR split critical edge during rewriting phi operands and
phi node has other pending fixup operands, we need to
update those pending fixups. Otherwise formulae will not be
implemented completely and some instructions will not be eliminated.

llvm.org/PR41445

Differential Revision: https://reviews.llvm.org/D60645

Patch by: Denis Bakhvalov <denis.bakhvalov@intel.com>

llvm-svn: 358457
2019-04-15 22:23:46 +00:00
Florian Hahn 45682fd633 [LSR] Fix signed overflow in GenerateCrossUseConstantOffsets.
For the attached test case, unchecked addition of immediate starts and
ends overflows, as they can be arbitrary i64 constants.

Proof: https://rise4fun.com/Alive/Plqc

Reviewers: qcolombet, gilr, efriedma

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D59218

llvm-svn: 357217
2019-03-28 22:17:29 +00:00
Clement Courbet 699dc025a6 [X86MacroFusion] Handle branch fusion (AMD CPUs).
Summary:
This adds a BranchFusion feature to replace the usage of the MacroFusion
for AMD CPUs.

See D59688 for context.

Reviewers: andreadb, lebedev.ri

Subscribers: hiraditya, jdoerfert, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D59872

llvm-svn: 357171
2019-03-28 14:12:46 +00:00
Clement Courbet f8666b0649 [X86MacroFusion][NFC] Add a bulldozer test.
llvm-svn: 357099
2019-03-27 17:44:16 +00:00
Florian Hahn 728293ac87 [LSR] Update test from rL356256 after rebase.
llvm-svn: 356257
2019-03-15 12:37:50 +00:00
Florian Hahn d9e88f7b7f [LSR] Check for signed overflow in NarrowSearchSpaceByDetectingSupersets.
We are adding a sign extended IR value to an int64_t, which can cause
signed overflows, as in the attached test case, where we have a formula
with BaseOffset = -1 and a constant with numeric_limits<int64_t>::min().

If the addition would overflow, skip the simplification for this
formula. Note that the target triple is required to trigger the failure.

Reviewers: qcolombet, gilr, kparzysz, efriedma

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D59211

llvm-svn: 356256
2019-03-15 12:17:36 +00:00
Sam Parker 3b2ba20afd [ARM] Run ARMParallelDSP in the IRPasses phase
Run EarlyCSE before ParallelDSP and do this in the backend IR opt
phase.

Differential Revision: https://reviews.llvm.org/D59257

llvm-svn: 356130
2019-03-14 10:57:40 +00:00
David Green ffc922ec35 [LSR] Attempt to increase the accuracy of LSR's setup cost
In some loops, we end up generating loop induction variables that look like:
  {(-1 * (zext i16 (%i0 * %i1) to i32))<nsw>,+,1}
As opposed to the simpler:
  {(zext i16 (%i0 * %i1) to i32),+,-1}
i.e we count up from -limit to 0, not the simpler counting down from limit to
0. This is because the scores, as LSR calculates them, are the same and the
second is filtered in place of the first. We end up with a redundant SUB from 0
in the code.

This patch tries to make the calculation of the setup cost a little more
thoroughly, recursing into the scev members to better approximate the setup
required. The cost function for comparing LSR costs is:

return std::tie(C1.NumRegs, C1.AddRecCost, C1.NumIVMuls, C1.NumBaseAdds,
                C1.ScaleCost, C1.ImmCost, C1.SetupCost) <
       std::tie(C2.NumRegs, C2.AddRecCost, C2.NumIVMuls, C2.NumBaseAdds,
                C2.ScaleCost, C2.ImmCost, C2.SetupCost);
So this will only alter results if none of the other variables turn out to be
different.

Differential Revision: https://reviews.llvm.org/D58770

llvm-svn: 355597
2019-03-07 13:44:40 +00:00
Florian Hahn fd2d89f98b Fix invalid target triples in tests. (NFC)
llvm-svn: 355349
2019-03-04 23:37:41 +00:00
Sam Parker 67756c09f2 [LSR] Generate cross iteration indexes
Modify GenerateConstantOffsetsImpl to create offsets that can be used
by indexed addressing modes. If formulae can be generated which
result in the constant offset being the same size as the recurrence,
we can generate a pre-indexed access. This allows the pointer to be
updated via the single pre-indexed access so that (hopefully) no
add/subs are required to update it for the next iteration. For small
cores, this can significantly improve performance DSP-like loops.

Differential Revision: https://reviews.llvm.org/D55373

llvm-svn: 353403
2019-02-07 13:32:54 +00:00
Max Kazantsev d5e595b7a6 [LSR] Check SCEV on isZero() after extend. PR40514
When LSR first adds SCEVs to BaseRegs, it only does it if `isZero()` has
returned false. In the end, in invocation of `InsertFormula`, it asserts that
all values there are still not zero constants. However between these two
points, it makes some transformations, in particular extends them to wider
type.

SCEV does not give us guarantee that if `S` is not a constant zero, then
`sext(S)` is also not a constant zero. It might have missed some optimizing
transforms when it was calculating `S` and then made them when it took `sext`.
For example, it may happen if previously optimizing transforms were limited
by depth or somehow else.

This patch adds a bailout when we may end up with a zero SCEV after extension.

Differential Revision: https://reviews.llvm.org/D57565
Reviewed By: samparker

llvm-svn: 353136
2019-02-05 04:30:37 +00:00
Max Kazantsev b37419ef66 [SCEV] Prohibit SCEV transformations for huge SCEVs
Currently SCEV attempts to limit transformations so that they do not work with
big SCEVs (that may take almost infinite compile time). But for this, it uses heuristics
such as recursion depth and number of operands, which do not give us a guarantee
that we don't actually have big SCEVs. This situation is still possible, though it is not
likely to happen. However, the bug PR33494 showed a bunch of simple corner case
tests where we still produce huge SCEVs, even not reaching big recursion depth etc.

This patch introduces a concept of 'huge' SCEVs. A SCEV is huge if its expression
size (intoduced in D35989) exceeds some threshold value. We prohibit optimizing
transformations if any of SCEVs we are dealing with is huge. This gives us a reliable
check that we don't spend too much time working with them.

As the next step, we can possibly get rid of old limiting mechanisms, such as recursion
depth thresholds.

Differential Revision: https://reviews.llvm.org/D35990
Reviewed By: reames

llvm-svn: 352728
2019-01-31 06:19:25 +00:00
Sam Parker d6ebf0108e [LoopStrengthReduce] ComplexityLimit as an option
Convert ComplexityLimit into a command line value.

Differential Revision: https://reviews.llvm.org/D54899

llvm-svn: 347843
2018-11-29 08:34:22 +00:00
Gil Rapaport 7b88bab386 [LSR] Combine unfolded offset into invariant register
LSR reassociates constants as unfolded offsets when the constants fit as
immediate add operands, which currently prevents such constants from being
combined later with loop invariant registers.
This patch modifies GenerateCombinations() to generate a second formula which
includes the unfolded offset in the combined loop-invariant register.

This commit fixes a bug in the original patch (committed at r345114, reverted
at r345123).

Differential Revision: https://reviews.llvm.org/D51861

llvm-svn: 346390
2018-11-08 09:01:19 +00:00
Gil Rapaport c523036fd2 Revert r345114
Investigating fails.

llvm-svn: 345123
2018-10-24 08:41:22 +00:00
Gil Rapaport 5012e7f6ac [LSR] Combine unfolded offset into invariant register
LSR reassociates constants as unfolded offsets when the constants fit as
immediate add operands, which currently prevents such constants from being
combined later with loop invariant registers.
This patch modifies GenerateCombinations() to generate a second formula which
includes the unfolded offset in the combined loop-invariant register.

Differential Revision: https://reviews.llvm.org/D51861

llvm-svn: 345114
2018-10-24 07:08:38 +00:00
Matt Arsenault 9de2fb58fa AMDGPU: Fix some outdated datalayouts in tests
llvm-svn: 342131
2018-09-13 11:56:28 +00:00
Gil Rapaport d874c3a480 [LSR] Add tests for small constants; NFC
LSR reassociates small constants that fit into add immediate operands as
unfolded offset. Since unfolded offset is not combined with loop-invariant
registers, LSR does not consider solutions that bump invariant registers by
these constants outside the loop.

llvm-svn: 341835
2018-09-10 14:56:24 +00:00
Roman Lebedev 8d081b78e4 SCEVExpander::expandAddRecExprLiterally(): check before casting as Instruction
Summary:
An alternative to D48597.
Fixes [[ https://bugs.llvm.org/show_bug.cgi?id=37936 | PR37936 ]].

The problem is as follows:
1. `indvars` marks `%dec` as `NUW`.
2. `loop-instsimplify` runs `instsimplify`, which constant-folds `%dec` to -1 (D47908)
3. `loop-reduce` tries to do some further modification, but crashes
    with an type assertion in cast, because `%dec` is no longer an `Instruction`,

If the runline is split into two, i.e. you first run `-indvars -loop-instsimplify`,
store that into a file, and then run `-loop-reduce`, there is no crash.

So it looks like the problem is due to `-loop-instsimplify` not discarding SCEV.
But in this case we can just not crash if it's not an `Instruction`.
This is just a local fix, unlike D48597, so there may very well be other problems.

Reviewers: mkazantsev, uabelho, sanjoy, silviu.baranga, wmi

Reviewed By: mkazantsev

Subscribers: evstupac, javed.absar, spatel, llvm-commits

Differential Revision: https://reviews.llvm.org/D48599

llvm-svn: 335950
2018-06-29 07:44:20 +00:00
Alina Sbirlea dfd14adeb0 Generalize MergeBlockIntoPredecessor. Replace uses of MergeBasicBlockIntoOnlyPred.
Summary:
Two utils methods have essentially the same functionality. This is an attempt to merge them into one.
1. lib/Transforms/Utils/Local.cpp : MergeBasicBlockIntoOnlyPred
2. lib/Transforms/Utils/BasicBlockUtils.cpp : MergeBlockIntoPredecessor

Prior to the patch:
1. MergeBasicBlockIntoOnlyPred
Updates either DomTree or DeferredDominance
Moves all instructions from Pred to BB, deletes Pred
Asserts BB has single predecessor
If address was taken, replace the block address with constant 1 (?)

2. MergeBlockIntoPredecessor
Updates DomTree, LoopInfo and MemoryDependenceResults
Moves all instruction from BB to Pred, deletes BB
Returns if doesn't have a single predecessor
Returns if BB's address was taken

After the patch:
Method 2. MergeBlockIntoPredecessor is attempting to become the new default:
Updates DomTree or DeferredDominance, and LoopInfo and MemoryDependenceResults
Moves all instruction from BB to Pred, deletes BB
Returns if doesn't have a single predecessor
Returns if BB's address was taken

Uses of MergeBasicBlockIntoOnlyPred that need to be replaced:

1. lib/Transforms/Scalar/LoopSimplifyCFG.cpp
Updated in this patch. No challenges.

2. lib/CodeGen/CodeGenPrepare.cpp
Updated in this patch.
  i. eliminateFallThrough is straightforward, but I added using a temporary array to avoid the iterator invalidation.
  ii. eliminateMostlyEmptyBlock(s) methods also now use a temporary array for blocks
Some interesting aspects:
  - Since Pred is not deleted (BB is), the entry block does not need updating.
  - The entry block was being updated with the deleted block in eliminateMostlyEmptyBlock. Added assert to make obvious that BB=SinglePred.
  - isMergingEmptyBlockProfitable assumes BB is the one to be deleted.
  - eliminateMostlyEmptyBlock(BB) does not delete BB on one path, it deletes its unique predecessor instead.
  - adding some test owner as subscribers for the interesting tests modified:
    test/CodeGen/X86/avx-cmp.ll
    test/CodeGen/AMDGPU/nested-loop-conditions.ll
    test/CodeGen/AMDGPU/si-annotate-cf.ll
    test/CodeGen/X86/hoist-spill.ll
    test/CodeGen/X86/2006-11-17-IllegalMove.ll

3. lib/Transforms/Scalar/JumpThreading.cpp
Not covered in this patch. It is the only use case using the DeferredDominance.
I would defer to Brian Rzycki to make this replacement.

Reviewers: chandlerc, spatel, davide, brzycki, bkramer, javed.absar

Subscribers: qcolombet, sanjoy, nemanjai, nhaehnle, jlebar, tpr, kbarton, RKSimon, wmi, arsenm, llvm-commits

Differential Revision: https://reviews.llvm.org/D48202

llvm-svn: 335183
2018-06-20 22:01:04 +00:00
Daniil Fukalov 37433dc2e1 reapply r334209 with fixes for harfbuzz in Chromium
r334209 description:
[LSR] Check yet more intrinsic pointer operands

the patch fixes another assertion in isLegalUse()

Differential Revision: https://reviews.llvm.org/D47794

llvm-svn: 334300
2018-06-08 16:22:52 +00:00
Reid Kleckner a3609f75b2 Revert r334209 "[LSR] Check yet more intrinsic pointer operands"
This causes cast failures when compiling harfbuzz in Chromium.
Reproducer on the way.

llvm-svn: 334254
2018-06-08 00:43:27 +00:00
Daniil Fukalov 12c0663a25 [LSR] Check yet more intrinsic pointer operands
the patch fixes another assertion in isLegalUse()

Differential Revision: https://reviews.llvm.org/D47794

llvm-svn: 334209
2018-06-07 17:30:58 +00:00
Stanislav Mekhanoshin 595fdcf43b [AMDGPU] Move lsr test. NFC.
llvm-svn: 332562
2018-05-17 01:30:51 +00:00
Evgeny Stupachenko bff9302c3d Fix LSR compile time hang.
Summary:
Limit number of reassociations in GenerateReassociationsImpl.

Reviewers: qcolombet, mkazantsev

Differential Revision: https://reviews.llvm.org/D46039

From: Evgeny Stupachenko <evstupac@gmail.com>
                         <evgeny.v.stupachenko@intel.com>
llvm-svn: 332426
2018-05-16 02:48:50 +00:00
Stefan Pintilie ef7c4976bb Revert "[PowerPC] LSR tunings for PowerPC"
Revert the rest of the LST tune commit.
It seems that the LSR tune commit breaks internal tests.
Reverting the commit.

llvm-svn: 327143
2018-03-09 16:08:55 +00:00
Stefan Pintilie 7f879a8467 Revert "[PowerPC] Move test to correct location."
Revert part of the LSR tune commit.

llvm-svn: 327142
2018-03-09 16:08:48 +00:00
Stefan Pintilie f8c2dce236 [PowerPC] Move test to correct location.
Test was added in r326906 to an incorrect location.
Moving the test to PPC CodeGen directory as the test is PPC specific.

llvm-svn: 326923
2018-03-07 18:27:10 +00:00
Stefan Pintilie f8438e8e59 [PowerPC] LSR tunings for PowerPC
The purpose of this patch is to have LSR generate better code on Power.
This is done by overriding isLSRCostLess.

Differential Revision: https://reviews.llvm.org/D40855

llvm-svn: 326906
2018-03-07 16:53:09 +00:00
Sanjay Patel d7c702b451 [LoopStrengthReduce, x86] don't add cost for a cmp that will be macro-fused (PR35681)
In the motivating case from PR35681 and represented by the macro-fuse-cmp test:
https://bugs.llvm.org/show_bug.cgi?id=35681
...there's a 37 -> 31 byte size win for the loop because we eliminate the big base 
address offsets.

SPEC2017 on Ryzen shows no significant perf difference.

Differential Revision: https://reviews.llvm.org/D42607

llvm-svn: 324289
2018-02-05 23:43:05 +00:00
Yaxun Liu 2a22c5deff [AMDGPU] Switch to the new addr space mapping by default
This requires corresponding clang change.

Differential Revision: https://reviews.llvm.org/D40955

llvm-svn: 324101
2018-02-02 16:07:16 +00:00
Mikael Holmen 6d06976e74 [LSR] Don't force bases of foldable formulae to the final type.
Summary:
Before emitting code for scaled registers, we prevent
SCEVExpander from hoisting any scaled addressing mode
by emitting all the bases first. However, these bases
are being forced to the final type, resulting in some
odd code.

For example, if the type of the base is an integer and
the final type is a pointer, we will emit an inttoptr
for the base, a ptrtoint for the scale, and then a
'reverse' GEP where the GEP pointer is actually the base
integer and the index is the pointer. It's more intuitive
to use the pointer as a pointer and the integer as index.

Patch by: Bevin Hansson

Reviewers: atrick, qcolombet, sanjoy

Reviewed By: qcolombet

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D42103

llvm-svn: 323946
2018-02-01 06:38:34 +00:00
Puyan Lotfi 43e94b15ea Followup on Proposal to move MIR physical register namespace to '$' sigil.
Discussed here:

http://lists.llvm.org/pipermail/llvm-dev/2018-January/120320.html

In preparation for adding support for named vregs we are changing the sigil for
physical registers in MIR to '$' from '%'. This will prevent name clashes of
named physical register with named vregs.

llvm-svn: 323922
2018-01-31 22:04:26 +00:00
Sanjay Patel ffb37a29d1 [LoopStrengthReduce] add test to show potential macro-fusion-based diff (PR35681); NFC
This is the baseline output for the test proposed with D42607.

llvm-svn: 323806
2018-01-30 19:17:38 +00:00
Sanjay Patel 5bce08ddff [x86] auto-generate complete checks; NFC
llvm-svn: 323571
2018-01-26 22:06:07 +00:00
Serguei Katkov 6a7a4c6a55 [SCEV] Do not cache S -> V if S is not equivalent of V
SCEV tracks the correspondence of created SCEV to original instruction.
However during creation of SCEV it is possible that nuw/nsw/exact flags are
lost.

As a result during expansion of the SCEV the instruction with nuw/nsw/exact
will be used where it was expected and we produce poison incorreclty.

Reviewers: sanjoy, mkazantsev, sebpop, jbhateja
Reviewed By: sanjoy
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D41578

llvm-svn: 322058
2018-01-09 06:47:14 +00:00
Matt Arsenault 3e268cc0dd LSR: Check more intrinsic pointer operands
llvm-svn: 320424
2017-12-11 21:38:43 +00:00
Matt Morehouse 9e658c974b Revert "[X86] Improvement in CodeGen instruction selection for LEAs."
This reverts r319543, due to ASan bot breakage.

llvm-svn: 319591
2017-12-01 22:20:26 +00:00
Jatin Bhateja 328199ec26 [X86] Improvement in CodeGen instruction selection for LEAs.
Summary:
1/  Operand folding during complex pattern matching for LEAs has been extended, such that it promotes Scale to
     accommodate similar operand appearing in the DAG  e.g.
                 T1 = A + B
                 T2 = T1 + 10
                 T3 = T2 + A
    For above DAG rooted at T3, X86AddressMode will now look like
                Base = B , Index = A , Scale = 2 , Disp = 10

2/  During OptimizeLEAPass down the pipeline factorization is now performed over LEAs so that if there is an opportunity
     then complex LEAs (having 3 operands) could be factored out  e.g.
                 leal 1(%rax,%rcx,1), %rdx
                 leal 1(%rax,%rcx,2), %rcx
     will be factored as following
                 leal 1(%rax,%rcx,1), %rdx
                 leal (%rdx,%rcx)   , %edx

3/ Aggressive operand folding for AM based selection for LEAs is sensitive to loops, thus avoiding creation of any complex LEAs within a loop.

4/ Simplify LEA converts (lea (BASE,1,INDEX,0)  --> add (BASE, INDEX) which offers better through put.

PR32755 will be taken care of by this pathc.

Previous patch revisions : r313343 , r314886

Reviewers: lsaba, RKSimon, craig.topper, qcolombet, jmolloy, jbhateja

Reviewed By: lsaba, RKSimon, jbhateja

Subscribers: jmolloy, spatel, igorb, llvm-commits

Differential Revision: https://reviews.llvm.org/D35014

llvm-svn: 319543
2017-12-01 14:07:38 +00:00
Hans Wennborg 2a6c9adb2f Revert r314886 "[X86] Improvement in CodeGen instruction selection for LEAs (re-applying post required revision changes.)"
It broke the Chromium / SQLite build; see PR34830.

> Summary:
>    1/  Operand folding during complex pattern matching for LEAs has been
>        extended, such that it promotes Scale to accommodate similar operand
>        appearing in the DAG.
>        e.g.
>          T1 = A + B
>          T2 = T1 + 10
>          T3 = T2 + A
>        For above DAG rooted at T3, X86AddressMode will no look like
>          Base = B , Index = A , Scale = 2 , Disp = 10
>
>    2/  During OptimizeLEAPass down the pipeline factorization is now performed over LEAs
>        so that if there is an opportunity then complex LEAs (having 3 operands)
>        could be factored out.
>        e.g.
>          leal 1(%rax,%rcx,1), %rdx
>          leal 1(%rax,%rcx,2), %rcx
>        will be factored as following
>          leal 1(%rax,%rcx,1), %rdx
>          leal (%rdx,%rcx)   , %edx
>
>    3/ Aggressive operand folding for AM based selection for LEAs is sensitive to loops,
>       thus avoiding creation of any complex LEAs within a loop.
>
> Reviewers: lsaba, RKSimon, craig.topper, qcolombet, jmolloy
>
> Reviewed By: lsaba
>
> Subscribers: jmolloy, spatel, igorb, llvm-commits
>
>     Differential Revision: https://reviews.llvm.org/D35014

llvm-svn: 314919
2017-10-04 17:54:06 +00:00
Jatin Bhateja 3c29bacd43 [X86] Improvement in CodeGen instruction selection for LEAs (re-applying post required revision changes.)
Summary:
   1/  Operand folding during complex pattern matching for LEAs has been
       extended, such that it promotes Scale to accommodate similar operand
       appearing in the DAG.
       e.g.
         T1 = A + B
         T2 = T1 + 10
         T3 = T2 + A
       For above DAG rooted at T3, X86AddressMode will no look like
         Base = B , Index = A , Scale = 2 , Disp = 10

   2/  During OptimizeLEAPass down the pipeline factorization is now performed over LEAs
       so that if there is an opportunity then complex LEAs (having 3 operands)
       could be factored out.
       e.g.
         leal 1(%rax,%rcx,1), %rdx
         leal 1(%rax,%rcx,2), %rcx
       will be factored as following
         leal 1(%rax,%rcx,1), %rdx
         leal (%rdx,%rcx)   , %edx

   3/ Aggressive operand folding for AM based selection for LEAs is sensitive to loops,
      thus avoiding creation of any complex LEAs within a loop.

Reviewers: lsaba, RKSimon, craig.topper, qcolombet, jmolloy

Reviewed By: lsaba

Subscribers: jmolloy, spatel, igorb, llvm-commits

    Differential Revision: https://reviews.llvm.org/D35014

llvm-svn: 314886
2017-10-04 09:02:10 +00:00
Hans Wennborg 534bfbd3ba Revert r313343 "[X86] PR32755 : Improvement in CodeGen instruction selection for LEAs."
This caused PR34629: asserts firing when building Chromium. It also broke some
buildbots building test-suite as reported on the commit thread.

> Summary:
>    1/  Operand folding during complex pattern matching for LEAs has been
>        extended, such that it promotes Scale to accommodate similar operand
>        appearing in the DAG.
>        e.g.
>           T1 = A + B
>           T2 = T1 + 10
>           T3 = T2 + A
>        For above DAG rooted at T3, X86AddressMode will no look like
>           Base = B , Index = A , Scale = 2 , Disp = 10
>
>    2/  During OptimizeLEAPass down the pipeline factorization is now performed over LEAs
>        so that if there is an opportunity then complex LEAs (having 3 operands)
>        could be factored out.
>        e.g.
>           leal 1(%rax,%rcx,1), %rdx
>           leal 1(%rax,%rcx,2), %rcx
>        will be factored as following
>           leal 1(%rax,%rcx,1), %rdx
>           leal (%rdx,%rcx)   , %edx
>
>    3/ Aggressive operand folding for AM based selection for LEAs is sensitive to loops,
>       thus avoiding creation of any complex LEAs within a loop.
>
> Reviewers: lsaba, RKSimon, craig.topper, qcolombet
>
> Reviewed By: lsaba
>
> Subscribers: spatel, igorb, llvm-commits
>
> Differential Revision: https://reviews.llvm.org/D35014

llvm-svn: 313376
2017-09-15 18:40:26 +00:00
Jatin Bhateja 908c8b37c2 [X86] PR32755 : Improvement in CodeGen instruction selection for LEAs.
Summary:
   1/  Operand folding during complex pattern matching for LEAs has been
       extended, such that it promotes Scale to accommodate similar operand
       appearing in the DAG.
       e.g.
          T1 = A + B
          T2 = T1 + 10
          T3 = T2 + A
       For above DAG rooted at T3, X86AddressMode will no look like
          Base = B , Index = A , Scale = 2 , Disp = 10

   2/  During OptimizeLEAPass down the pipeline factorization is now performed over LEAs
       so that if there is an opportunity then complex LEAs (having 3 operands)
       could be factored out.
       e.g.
          leal 1(%rax,%rcx,1), %rdx
          leal 1(%rax,%rcx,2), %rcx
       will be factored as following
          leal 1(%rax,%rcx,1), %rdx
          leal (%rdx,%rcx)   , %edx

   3/ Aggressive operand folding for AM based selection for LEAs is sensitive to loops,
      thus avoiding creation of any complex LEAs within a loop.

Reviewers: lsaba, RKSimon, craig.topper, qcolombet

Reviewed By: lsaba

Subscribers: spatel, igorb, llvm-commits

Differential Revision: https://reviews.llvm.org/D35014

llvm-svn: 313343
2017-09-15 05:29:51 +00:00
Max Kazantsev bb1d010872 [LSR] Fix Shadow IV in case of integer overflow
When LSR processes code like

  int accumulator = 0;
  for (int i = 0; i < N; i++) {
    accummulator += i;
    use((double) accummulator);
  }

It may decide to replace integer `accumulator` with a double Shadow IV to get rid
of casts.  The problem with that is that the `accumulator`'s value may overflow.
Starting from this moment, the behavior of integer and double accumulators
will differ.

This patch strenghtens up the conditions of Shadow IV mechanism applicability.
We only allow it for IVs that are proved to be `AddRec`s with `nsw`/`nuw` flag.

Differential Revision: https://reviews.llvm.org/D37209

llvm-svn: 311986
2017-08-29 07:32:20 +00:00
Max Kazantsev f2e017b083 [NFC] Fix indents in test
llvm-svn: 311982
2017-08-29 05:30:58 +00:00
Max Kazantsev 03407da281 [NFC] Refactor ShadowIV test to use FileCheck
Also get rid of unnamed values that make the test hard to read.

llvm-svn: 311980
2017-08-29 05:20:56 +00:00
Evgeny Astigeevich 540a39adf7 [ARM, Thumb1] Prevent ARMTargetLowering::isLegalAddressingMode from accepting illegal modes
ARMTargetLowering::isLegalAddressingMode can accept illegal addressing modes
for the Thumb1 target. This causes generation of redundant code and affects
performance.

This fixes PR34106: https://bugs.llvm.org/show_bug.cgi?id=34106

Differential Revision: https://reviews.llvm.org/D36467

llvm-svn: 311649
2017-08-24 10:00:25 +00:00
Evgeny Stupachenko c675290680 Reapply fix PR23384 (part 3 of 3) r304824 (was reverted in r305720).
The root cause of reverting was fixed - PR33514.

Summary:
The patch makes instruction count the highest priority for
 LSR solution for X86 (previously registers had highest priority).

Reviewers: qcolombet

Differential Revision: http://reviews.llvm.org/D30562

From: Evgeny Stupachenko <evstupac@gmail.com>
                         <evgeny.v.stupachenko@intel.com>
llvm-svn: 310289
2017-08-07 19:56:34 +00:00
Amara Emerson 56dca4e3ca [SCEV] Preserve NSW information for sext(subtract).
Pushes the sext onto the operands of a Sub if NSW is present.
Also adds support for propagating the nowrap flags of the
llvm.ssub.with.overflow intrinsic during analysis.

Differential Revision: https://reviews.llvm.org/D35256

llvm-svn: 310117
2017-08-04 20:19:46 +00:00
Evgeny Stupachenko 38197c66a1 Fix PR33514
Summary:
The bug was uncovered after fix of  PR23384 (part 3 of 3).
The patch restricts pointer multiplication in SCEV computaion for ICmpZero.

Reviewers: qcolombet

Differential Revision: http://reviews.llvm.org/D36170

From: Evgeny Stupachenko <evstupac@gmail.com>
                         <evgeny.v.stupachenko@intel.com>
llvm-svn: 310092
2017-08-04 18:46:13 +00:00
Adrian Prantl abe04759a6 Remove the obsolete offset parameter from @llvm.dbg.value
There is no situation where this rarely-used argument cannot be
substituted with a DIExpression and removing it allows us to simplify
the DWARF backend. Note that this patch does not yet remove any of
the newly dead code.

rdar://problem/33580047
Differential Revision: https://reviews.llvm.org/D35951

llvm-svn: 309426
2017-07-28 20:21:02 +00:00
Wei Mi 90707394e3 [LSR] Narrow search space by filtering non-optimal formulae with the same ScaledReg and Scale.
When the formulae search space is huge, LSR uses a series of heuristic to keep
pruning the search space until the number of possible solutions are within
certain limit.

The big hammer of the series of heuristics is NarrowSearchSpaceByPickingWinnerRegs,
which picks the register which is used by the most LSRUses and deletes the other
formulae which don't use the register. This is a effective way to prune the search
space, but quite often not a good way to keep the best solution. We saw cases before
that the heuristic pruned the best formula candidate out of search space.

To relieve the problem, we introduce a new heuristic called
NarrowSearchSpaceByFilterFormulaWithSameScaledReg. The basic idea is in order to
reduce the search space while keeping the best formula, we want to keep as many
formulae with different Scale and ScaledReg as possible. That is because the central
idea of LSR is to choose a group of loop induction variables and use those induction
variables to represent LSRUses. An induction variable candidate is often represented
by the Scale and ScaledReg in a formula. If we have more formulae with different
ScaledReg and Scale to choose, we have better opportunity to find the best solution.
That is why we believe pruning search space by only keeping the best formula with the
same Scale and ScaledReg should be more effective than PickingWinnerReg. And we use
two criteria to choose the best formula with the same Scale and ScaledReg. The first
criteria is to select the formula using less non shared registers, and the second
criteria is to select the formula with less cost got from RateFormula. The patch
implements the heuristic before NarrowSearchSpaceByPickingWinnerRegs, which is the
last resort.

Testing shows we get 1.8% and 2% on two internal benchmarks on x86. llvm nightly
testsuite performance is neutral. We also tried lsr-exp-narrow and it didn't help
on the two improved internal cases we saw.

Differential Revision: https://reviews.llvm.org/D34583

llvm-svn: 307269
2017-07-06 15:52:14 +00:00
Hans Wennborg ca69fc1cb7 Revert r304824 "Fix PR23384 (part 3 of 3)"
This seems to be interacting badly with ASan somehow, causing false reports of
heap-buffer overflows: PR33514.

> Summary:
> The patch makes instruction count the highest priority for
> LSR solution for X86 (previously registers had highest priority).
>
> Reviewers: qcolombet
>
> Differential Revision: http://reviews.llvm.org/D30562
>
> From: Evgeny Stupachenko <evstupac@gmail.com>

llvm-svn: 305720
2017-06-19 17:57:15 +00:00
Max Kazantsev 35b2a18eb9 [SCEV] Teach SCEVExpander to expand BinPow
Current implementation of SCEVExpander demonstrates a very naive behavior when
it deals with power calculation. For example, a SCEV for x^8 looks like

  (x * x * x * x * x * x * x * x)

If we try to expand it, it generates a very straightforward sequence of muls, like:

  x2 = mul x, x
  x3 = mul x2, x
  x4 = mul x3, x
      ...
  x8 = mul x7, x

This is a non-efficient way of doing that. A better way is to generate a sequence of
binary power calculation. In this case the expanded calculation will look like:

  x2 = mul x, x
  x4 = mul x2, x2
  x8 = mul x4, x4

In some cases the code size reduction for such SCEVs is dramatic. If we had a loop:

  x = a;
  for (int i = 0; i < 3; i++)
    x = x * x;

And this loop have been fully unrolled, we have something like:

  x = a;
  x2 = x * x;
  x4 = x2 * x2;
  x8 = x4 * x4;

The SCEV for x8 is the same as in example above, and if we for some reason
want to expand it, we will generate naively 7 multiplications instead of 3.
The BinPow expansion algorithm here allows to keep code size reasonable.

This patch teaches SCEV Expander to generate a sequence of BinPow multiplications
if we have repeating arguments in SCEVMulExpressions.

Differential Revision: https://reviews.llvm.org/D34025

llvm-svn: 305663
2017-06-19 06:24:53 +00:00
Evgeny Stupachenko 3b88291581 Fix PR23384 (part 3 of 3)
Summary:
The patch makes instruction count the highest priority for
 LSR solution for X86 (previously registers had highest priority).

Reviewers: qcolombet

Differential Revision: http://reviews.llvm.org/D30562

From: Evgeny Stupachenko <evstupac@gmail.com>
llvm-svn: 304824
2017-06-06 20:04:16 +00:00
Craig Topper 2b54baeb96 [X86] Replace 'REQUIRES: x86' in tests with 'REQUIRES: x86-registered-target' which seems to be the correct way to make them run on an x86 build.
llvm-svn: 304682
2017-06-04 08:21:58 +00:00
Keno Fischer 090f1959c1 [SCEVExpander] Try harder to avoid introducing inttoptr
Summary:
This fixes introduction of an incorrect inttoptr/ptrtoint pair in
the included test case which makes use of non-integral pointers. I
suspect there are more cases like this left, but this takes care of
the one I was seeing at the moment.

Reviewers: sanjoy

Subscribers: mzolotukhin, llvm-commits

Differential Revision: https://reviews.llvm.org/D33129

llvm-svn: 304058
2017-05-27 03:22:55 +00:00
Max Kazantsev 41450329f7 Re-enable "[SCEV] Do not fold dominated SCEVUnknown into AddRecExpr start"
The patch rL303730 was reverted because test lsr-expand-quadratic.ll failed on
many non-X86 configs with this patch. The reason of this is that the patch
makes a correctless fix that changes optimizer's behavior for this test.
Without the change, LSR was making an overconfident simplification basing on a
wrong SCEV. Apparently it did not need the IV analysis to do this. With the
change, it chose a different way to simplify (that wasn't so confident), and
this way required the IV analysis. Now, following the right execution path,
LSR tries to make a transformation relying on IV Users analysis. This analysis
is target-dependent due to this code:

  // LSR is not APInt clean, do not touch integers bigger than 64-bits.
  // Also avoid creating IVs of non-native types. For example, we don't want a
  // 64-bit IV in 32-bit code just because the loop has one 64-bit cast.
  uint64_t Width = SE->getTypeSizeInBits(I->getType());
  if (Width > 64 || !DL.isLegalInteger(Width))
    return false;

To make a proper transformation in this test case, the type i32 needs to be
legal for the specified data layout. When the test runs on some non-X86
configuration (e.g. pure ARM 64), opt gets confused by the specified target
and does not use it, rejecting the specified data layout as well. Instead,
it uses some default layout that does not treat i32 as a legal type
(currently the layout that is used when it is not specified does not have
legal types at all). As result, the transformation we expect to happen does
not happen for this test.

This re-enabling patch does not have any source code changes compared to the
original patch rL303730. The only difference is that the failing test is
moved to X86 directory and now has requirement of running on x86 only to comply
with the specified target triple and data layout.

Differential Revision: https://reviews.llvm.org/D33543

llvm-svn: 303971
2017-05-26 06:47:04 +00:00
Diana Picus 183863fc3b Revert "[SCEV] Do not fold dominated SCEVUnknown into AddRecExpr start"
This reverts commit r303730 because it broke all the buildbots.

llvm-svn: 303747
2017-05-24 14:16:04 +00:00
Max Kazantsev 13e016bf48 [SCEV] Do not fold dominated SCEVUnknown into AddRecExpr start
When folding arguments of AddExpr or MulExpr with recurrences, we rely on the fact that
the loop of our base recurrency is the bottom-lost in terms of domination. This assumption
may be broken by an expression which is treated as invariant, and which depends on a complex
Phi for which SCEVUnknown was created. If such Phi is a loop Phi, and this loop is lower than
the chosen AddRecExpr's loop, it is invalid to fold our expression with the recurrence.

Another reason why it might be invalid to fold SCEVUnknown into Phi start value is that unlike
other SCEVs, SCEVUnknown are sometimes position-bound. For example, here:

for (...) { // loop
  phi = {A,+,B}
}
X = load ...
Folding phi + X into {A+X,+,B}<loop> actually makes no sense, because X does not exist and cannot
exist while we are iterating in loop (this memory can be even not allocated and not filled by this moment).
It is only valid to make such folding if X is defined before the loop. In this case the recurrence {A+X,+,B}<loop>
may be existant.

This patch prohibits folding of SCEVUnknown (and those who use them) into the start value of an AddRecExpr,
if this instruction is dominated by the loop. Merging the dominating unknown values is still valid. Some tests that
relied on the fact that some SCEVUnknown should be folded into AddRec's are changed so that they no longer
expect such behavior.

llvm-svn: 303730
2017-05-24 08:52:18 +00:00
Wei Mi 8848c1e3c7 [LSR] Call canonicalize after we generate a new Formula in GenerateTruncates. Fix PR33077.
The testcase in PR33077 generates a LSR Use Formula with two SCEVAddRecExprs for the same
loop. Such uncommon formula will become non-canonical after GenerateTruncates adds sign
extension to the ScaledReg of the Formula, and it will break the assertion that every
Formula to be inserted is canonical.

The fix is to call canonicalize for the raw Formula generated by GenerateTruncates
before inserting it.

llvm-svn: 303361
2017-05-18 17:21:22 +00:00
Tim Northover 8b1240b0f0 ARM: handle post-indexed NEON ops where the offset isn't the access width.
Before, we assumed that any ConstantInt offset was precisely the access width,
so we could use the "[rN]!" form. ISelLowering only ever created that kind, but
further simplification during combining could lead to unexpected constants and
incorrect codegen.

Should fix PR32658.

llvm-svn: 300878
2017-04-20 19:54:02 +00:00
Eli Friedman 5fba1e53f2 Turn on -addr-sink-using-gep by default.
The new codepath has been in the tree for years, and there isn't any
reason to use two codepaths here.

Differential Revision: https://reviews.llvm.org/D30596

llvm-svn: 299723
2017-04-06 22:42:18 +00:00
Matt Arsenault 3dbeefa978 AMDGPU: Mark all unspecified CC functions in tests as amdgpu_kernel
Currently the default C calling convention functions are treated
the same as compute kernels. Make this explicit so the default
calling convention can be changed to a non-kernel.

Converted with perl -pi -e 's/define void/define amdgpu_kernel void/'
on the relevant test directories (and undoing in one place that actually
wanted a non-kernel).

llvm-svn: 298444
2017-03-21 21:39:51 +00:00
Evgeny Stupachenko d6aa0d02c2 Set option enabling LSR alternative way to resolve complex solution to false.
Differential Revision: http://reviews.llvm.org/D29862

From: Evgeny Stupachenko <evstupac@gmail.com>
llvm-svn: 296959
2017-03-04 03:14:05 +00:00
Wei Mi 74d5a90fa6 [LSR] Canonicalize formula and put recursive Reg related with current loop in ScaledReg.
After rL294814, LSR formula can have multiple SCEVAddRecExprs inside of its BaseRegs.
Previous canonicalization will swap the first SCEVAddRecExpr in BaseRegs with ScaledReg.
But now we want to swap the SCEVAddRecExpr Reg related with current loop with ScaledReg.
Otherwise, we may generate code like this: RegA + lsr.iv + RegB, where loop invariant
parts RegA and RegB are not grouped together and cannot be promoted outside of loop.
With this patch, it will ensure lsr.iv to be generated later in the expr:
RegA + RegB + lsr.iv, so that RegA + RegB can be promoted outside of loop.

Differential Revision: https://reviews.llvm.org/D26781

llvm-svn: 295884
2017-02-22 21:47:08 +00:00
Evgeny Stupachenko 9909872e30 The patch introduces new way of narrowing complex (>UINT16 variants) solutions.
The new method introduced under "-lsr-exp-narrow" option (currenlty set to true).

Summary:

The method is based on registers number mathematical expectation and should be
 generally closer to optimal solution.
Please see details in comments to
 "LSRInstance::NarrowSearchSpaceByDeletingCostlyFormulas()" function
 (in lib/Transforms/Scalar/LoopStrengthReduce.cpp).

Reviewers: qcolombet

Differential Revision: http://reviews.llvm.org/D29862

From: Evgeny Stupachenko <evstupac@gmail.com>
llvm-svn: 295704
2017-02-21 07:34:40 +00:00
Wei Mi 493fb266ed [LSR] Prevent formula with SCEVAddRecExpr type of Reg from Sibling loops
In rL294814, we allow formula with SCEVAddRecExpr type of Reg from loops
other than current loop. This is good for the case when induction variable
of outerloop being used in expr in innerloop. But it is very bad to allow
such Reg from sibling loop because we may need to add lsr.iv in other sibling
loops when scev expanding those SCEVAddRecExpr type exprs. For the testcase
below, one loop can be inserted with a bunch of lsr.iv because of LSR for
other loops. 

// The induction variable j from a loop in the middle will have initial
// value generated from previous sibling loop and exit value used by its
// next sibling loop.
void goo(long i, long j); 
long cond; 

void foo(long N) { 
long i = 0; 
long j = 0; 
i = 0; do { goo(i, j); i++; j++; } while (cond); 
i = 0; do { goo(i, j); i++; j++; } while (cond); 
i = 0; do { goo(i, j); i++; j++; } while (cond); 
i = 0; do { goo(i, j); i++; j++; } while (cond); 
i = 0; do { goo(i, j); i++; j++; } while (cond); 
i = 0; do { goo(i, j); i++; j++; } while (cond); 
} 

The fix is to only allow formula with SCEVAddRecExpr type of Reg from current
loop or its parents.

Differential Revision: https://reviews.llvm.org/D30021

llvm-svn: 295378
2017-02-16 21:27:31 +00:00
Mikael Holmen ece84cd10c [LSR] Pointers with different address spaces are considered incompatible.
Summary:
Function isCompatibleIVType is already used as a guard before the call to

 SE.getMinusSCEV(OperExpr, PrevExpr);

in LSRInstance::ChainInstruction. getMinusSCEV requires the expressions
to be of the same type, so we now consider two pointers with different
address spaces to be incompatible, since it is possible that the pointers
in fact have different sizes.

Reviewers: qcolombet, eli.friedman

Reviewed By: qcolombet

Subscribers: nhaehnle, Ka-Ka, llvm-commits, mzolotukhin

Differential Revision: https://reviews.llvm.org/D29885

llvm-svn: 295033
2017-02-14 06:37:42 +00:00
Evgeny Stupachenko 5f3d9b6c09 The patch fixes r294821
Summary:
Update register match for windows testing

From: Evgeny Stupachenko <evstupac@gmail.com>
llvm-svn: 294825
2017-02-11 05:39:00 +00:00
Evgeny Stupachenko fe6f548d2d Fix PR23384 (under "-lsr-insns-cost" option)
Summary:
The patch adds instructions number generated by a solution
 to LSR cost under "-lsr-insns-cost" option.

Reviewers: qcolombet, hfinkel

Differential Revision: http://reviews.llvm.org/D28307

From: Evgeny Stupachenko <evstupac@gmail.com>
llvm-svn: 294821
2017-02-11 02:57:43 +00:00
Wei Mi 8f20e63a20 [LSR] Recommit: Allow formula containing Reg for SCEVAddRecExpr related with outerloop.
The recommit includes some changes of testcases. No functional change to the patch.

In RateRegister of existing LSR, if a formula contains a Reg which is a SCEVAddRecExpr,
and this SCEVAddRecExpr's loop is an outerloop, the formula will be marked as Loser
and dropped.

Suppose we have an IR that %for.body is outerloop and %for.body2 is innerloop. LSR only
handle inner loop now so only %for.body2 will be handled.

Using the logic above, formula like
reg(%array) + reg({1,+, %size}<%for.body>) + 1*reg({0,+,1}<%for.body2>) will be dropped
no matter what because reg({1,+, %size}<%for.body>) is a SCEVAddRecExpr type reg related
with outerloop. Only formula like
reg(%array) + 1*reg({{1,+, %size}<%for.body>,+,1}<nuw><nsw><%for.body2>) will be kept
because the SCEVAddRecExpr related with outerloop is folded into the initial value of the
SCEVAddRecExpr related with current loop.

But in some cases, we do need to share the basic induction variable
reg{0 ,+, 1}<%for.body2> among LSR Uses to reduce the final total number of induction
variables used by LSR, so we don't want to drop the formula like
reg(%array) + reg({1,+, %size}<%for.body>) + 1*reg({0,+,1}<%for.body2>) unconditionally.

From the existing comment, it tries to avoid considering multiple level loops at the same time.
However, existing LSR only handles innermost loop, so for any SCEVAddRecExpr with a loop other
than current loop, it is an invariant and will be simple to handle, and the formula doesn't have
to be dropped.

Differential Revision: https://reviews.llvm.org/D26429

llvm-svn: 294814
2017-02-11 00:50:23 +00:00
Matt Arsenault cb3fa37c7e LSR: Check atomic instruction pointer operands
llvm-svn: 294410
2017-02-08 06:44:58 +00:00
Matt Arsenault 1f2ca66317 LSR: Don't drop address space when type doesn't match
For targets with different addressing modes in each address space,
if this is dropped querying isLegalAddressingMode later with this
will give a nonsense result, breaking the isLegalUse assertions.

This is a candidate for the 4.0 release branch.

llvm-svn: 293542
2017-01-30 19:50:17 +00:00
Chandler Carruth d501b18990 This test apparently requires an x86 target and is failing on numerous
bots ever since d0k fixed the CHECK lines so that it did something at
all.

It isn't actually testing SCEV directly but LSR, so move it into LSR and
the x86-specific tree of tests that already exists there. Target
dependence is common and unavoidable with the current design of LSR.

llvm-svn: 292774
2017-01-23 08:33:29 +00:00
Chandler Carruth 0952750fae [PM] Clean up the testing for IVUsers, especially with the new PM.
First, I've moved a test of IVUsers from the LSR tree to a dedicated
IVUsers test directory. I've also simplified its RUN line now that the
new pass manager's loop PM is providing analyses on their own.

No functionality changed, but it makes subsequent changes cleaner.

llvm-svn: 292060
2017-01-15 09:29:27 +00:00
David Majnemer bba17390c7 [LoopStrengthReduce] Don't bother rewriting PHIs in catchswitch blocks
The catchswitch instruction cannot be split, don't bother trying to
rewrite it.

This fixes PR31627.

llvm-svn: 291966
2017-01-13 22:24:27 +00:00
Wei Mi 37c4aaaf52 Revert r286999 which caused buildbot test failures. Some testcases need to be made target specific.
llvm-svn: 287014
2016-11-15 19:42:05 +00:00
Wei Mi 7ccf7651c0 [LSR] Allow formula containing Reg for SCEVAddRecExpr related with outerloop.
In RateRegister of existing LSR, if a formula contains a Reg which is a SCEVAddRecExpr,
and this SCEVAddRecExpr's loop is an outerloop, the formula will be marked as Loser
and dropped.

Suppose we have an IR that %for.body is outerloop and %for.body2 is innerloop. LSR only
handle inner loop now so only %for.body2 will be handled.

Using the logic above, formula like
reg(%array) + reg({1,+, %size}<%for.body>) + 1*reg({0,+,1}<%for.body2>) will be dropped
no matter what because reg({1,+, %size}<%for.body>) is a SCEVAddRecExpr type reg related
with outerloop. Only formula like
reg(%array) + 1*reg({{1,+, %size}<%for.body>,+,1}<nuw><nsw><%for.body2>) will be kept
because the SCEVAddRecExpr related with outerloop is folded into the initial value of the
SCEVAddRecExpr related with current loop.

But in some cases, we do need to share the basic induction variable
reg{0 ,+, 1}<%for.body2> among LSR Uses to reduce the final total number of induction
variables used by LSR, so we don't want to drop the formula like
reg(%array) + reg({1,+, %size}<%for.body>) + 1*reg({0,+,1}<%for.body2>) unconditionally.

From the existing comment, it tries to avoid considering multiple level loops at the same time.
However, existing LSR only handles innermost loop, so for any SCEVAddRecExpr with a loop other
than current loop, it is an invariant and will be simple to handle, and the formula doesn't have
to be dropped.

Differential Revision: https://reviews.llvm.org/D26429

llvm-svn: 286999
2016-11-15 18:35:53 +00:00
Alexandros Lamprineas 0ee3ec2fe4 [ARM] Loop Strength Reduction crashes when targeting ARM or Thumb.
Scalar Evolution asserts when not all the operands of an Add Recurrence
Expression are loop invariants. Loop Strength Reduction should only
create affine Add Recurrences, so that both the start and the step of
the expression are loop invariants.

Differential Revision: https://reviews.llvm.org/D26185

llvm-svn: 286347
2016-11-09 08:53:07 +00:00
Krzysztof Parzyszek 3cb5ffeb35 Fix testcases failing after r284036
The codegen has changed slightly between my tests and the commit.

llvm-svn: 284049
2016-10-12 20:39:33 +00:00
Krzysztof Parzyszek 8271be9a1d Do not remove implicit defs in BranchFolder
Branch folder removes implicit defs if they are the only non-branching
instructions in a block, and the branches do not use the defined registers.
The problem is that in some cases these implicit defs are required for
the liveness information to be correct.

Differential Revision: https://reviews.llvm.org/D25478

llvm-svn: 284036
2016-10-12 19:50:57 +00:00
James Molloy 196ad0823e [LSR] Don't try and create post-inc expressions on non-rotated loops
If a loop is not rotated (for example when optimizing for size), the latch is not the backedge. If we promote an expression to post-inc form, we not only increase register pressure and add a COPY for that IV expression but for all IVs!

Motivating testcase:

    void f(float *a, float *b, float *c, int n) {
      while (n-- > 0)
        *c++ = *a++ + *b++;
    }

It's imperative that the pointer increments be located in the latch block and not the header block; if not, we cannot use post-increment loads and stores and we have to keep both the post-inc and pre-inc values around until the end of the latch which bloats register usage.

llvm-svn: 278658
2016-08-15 07:53:03 +00:00
Geoff Berry d01828096f [SCEV] Update interface to handle SCEVExpander insert point motion.
Summary:
This is an extension of the fix in r271424.  That fix dealt with builder
insert points being moved by SCEV expansion, but only for the lifetime
of the expand call.  This change modifies the interface so that LSR can
safely call expand multiple times at the same insert point and do the
right thing if one of the expansions decides to move the original insert
point.

This is a fix for PR28719.

Reviewers: sanjoy

Subscribers: llvm-commits, mcrosier, mzolotukhin

Differential Revision: https://reviews.llvm.org/D23342

llvm-svn: 278413
2016-08-11 21:05:17 +00:00
Chandler Carruth 6cb2ab2c60 [PM] Significantly refactor the pass pipeline parsing to be easier to
reason about and less error prone.

The core idea is to fully parse the text without trying to identify
passes or structure. This is done with a single state machine. There
were various bugs in the logic around this previously that were repeated
and scattered across the code. Having a single routine makes it much
easier to fix and get correct. For example, this routine doesn't suffer
from PR28577.

Then the actual pass construction is handled using *much* easier to read
code and simple loops, with particular pass manager construction sunk to
live with other pass construction. This is especially nice as the pass
managers *are* in fact passes.

Finally, the "implicit" pass manager synthesis is done much more simply
by forming "pre-parsed" structures rather than having to duplicate tons
of logic.

One of the bugs fixed by this was evident in the tests where we accepted
a pipeline that wasn't really well formed. Another bug is PR28577 for
which I have added a test case.

The code is less efficient than the previous code but I'm really hoping
that's not a priority. ;]

Thanks to Sean for the review!

Differential Revision: https://reviews.llvm.org/D22724

llvm-svn: 277561
2016-08-03 03:21:41 +00:00
Dehao Chen 6132ee8502 [PM] Convert Loop Strength Reduce pass to new PM
Summary: Convert Loop String Reduce pass to new PM

Reviewers: davidxl, silvas

Subscribers: junbuml, sanjoy, mzolotukhin, llvm-commits

Differential Revision: https://reviews.llvm.org/D22468

llvm-svn: 275919
2016-07-18 21:41:50 +00:00
Dehao Chen 1a44452b11 [PM] Convert IVUsers analysis to new pass manager.
Summary: Convert IVUsers analysis to new pass manager.

Reviewers: davidxl, silvas

Subscribers: junbuml, sanjoy, llvm-commits, mzolotukhin

Differential Revision: https://reviews.llvm.org/D22434

llvm-svn: 275698
2016-07-16 22:51:33 +00:00
Matt Arsenault f42c69206d AMDGPU: Run pointer optimization passes
llvm-svn: 272736
2016-06-15 00:11:01 +00:00
Geoff Berry 43e5160d0e Reapply [LSR] Create fewer redundant instructions.
Summary:
Fix LSRInstance::HoistInsertPosition() to check the original insert
position block first for a canonical insertion point that is dominated
by all inputs.  This leads to SCEV being able to reuse more instructions
since it currently tracks the instructions it creates for reuse by
keeping a table of <Value, insert point> pairs.

Originally reviewed in http://reviews.llvm.org/D18001

Reviewers: atrick

Subscribers: llvm-commits, mzolotukhin, mcrosier

Differential Revision: http://reviews.llvm.org/D18480

llvm-svn: 271929
2016-06-06 19:10:46 +00:00
Matt Arsenault 71fa1f375e AMDGPU: Fix a few slightly broken tests
Fix minor bugs and uses of undef which break when
pointer related optimization passes are run.

llvm-svn: 269944
2016-05-18 15:48:44 +00:00
Matt Arsenault 7d1b6c81af AMDGPU: Stop reporting an addressing mode for unknown addrspace
This was being treated the same as private, which has an immediate
offset. For unknown, it probably means it's for a computation not
actually being used for accessing memory, so it should not have a
nontrivial addressing mode.

llvm-svn: 268002
2016-04-29 06:25:10 +00:00
Chuang-Yu Cheng d3fb38cae5 Don't delete empty preheaders in CodeGenPrepare if it would create a critical edge
Presently, CodeGenPrepare deletes all nearly empty (only phi and branch)
basic blocks. This pass can delete loop preheaders which frequently creates
critical edges. A preheader can be a convenient place to spill registers to
the stack. If the entrance to a loop body is a critical edge, then spills
may occur in the loop body rather than immediately before it. This patch
protects loop preheaders from deletion in CodeGenPrepare even if they are
nearly empty.

Since the patch alters the CFG, it affects a large number of test cases.
In most cases, the changes are merely cosmetic (basic blocks have different
names or instruction orders change slightly). I am somewhat concerned about
the test/CodeGen/Mips/brdelayslot.ll test case. If the loop preheader is not
deleted, then the MIPS backend does not take advantage of a branch delay
slot. Consequently, I would like some close review by a MIPS expert.

The patch also partially subsumes D16893 from George Burgess IV. George
correctly notes that CodeGenPrepare does not actually preserve the dominator
tree. I think the dominator tree was usually not valid when CodeGenPrepare
ran, but I am using LoopInfo to mark preheaders, so the dominator tree is
now always valid before CodeGenPrepare.

Author: Tom Jablin (tjablin)
Reviewers: hfinkel george.burgess.iv vkalintiris dsanders kbarton cycheng

http://reviews.llvm.org/D16984

llvm-svn: 265397
2016-04-05 14:06:20 +00:00
David Majnemer e09d035dad [LoopStrengthReduce] Don't hoist into a catchswitch
We try to hoist the insertion point as high as possible to encourage
sharing.  However, we must be careful not to hoist into a catchswitch as
it is both an EHPad and a terminator.

llvm-svn: 264344
2016-03-24 21:40:22 +00:00
Geoff Berry 56fabf9b55 Revert "[LSR] Create fewer redundant instructions."
This reverts commit r263644.  Investigating bootstrap failures.

llvm-svn: 263655
2016-03-16 19:21:47 +00:00
Geoff Berry 459b750871 [LSR] Create fewer redundant instructions.
Summary:
Fix LSRInstance::HoistInsertPosition() to check the original insert
position block first for a canonical insertion point that is dominated
by all inputs.  This leads to SCEV being able to reuse more instructions
since it currently tracks the instructions it creates for reuse by
keeping a table of <Value, insert point> pairs.

Reviewers: atrick

Subscribers: mcrosier, mzolotukhin, llvm-commits

Differential Revision: http://reviews.llvm.org/D18001

llvm-svn: 263644
2016-03-16 17:29:49 +00:00
Wei Mi a49559befb [SCEV] Try to reuse existing value during SCEV expansion
Current SCEV expansion will expand SCEV as a sequence of operations
and doesn't utilize the value already existed. This will introduce
redundent computation which may not be cleaned up throughly by
following optimizations.

This patch introduces an ExprValueMap which is a map from SCEV to the
set of equal values with the same SCEV. When a SCEV is expanded, the
set of values is checked and reused whenever possible before generating
a sequence of operations.

The original commit triggered regressions in Polly tests. The regressions
exposed two problems which have been fixed in current version.

1. Polly will generate a new function based on the old one. To generate an
instruction for the new function, it builds SCEV for the old instruction,
applies some tranformation on the SCEV generated, then expands the transformed
SCEV and insert the expanded value into new function. Because SCEV expansion
may reuse value cached in ExprValueMap, the value in old function may be
inserted into new function, which is wrong.
   In SCEVExpander::expand, there is a logic to check the cached value to
be used should dominate the insertion point. However, for the above
case, the check always passes. That is because the insertion point is
in a new function, which is unreachable from the old function. However
for unreachable node, DominatorTreeBase::dominates thinks it will be
dominated by any other node.
   The fix is to simply add a check that the cached value to be used in
expansion should be in the same function as the insertion point instruction.

2. When the SCEV is of scConstant type, expanding it directly is cheaper than
reusing a normal value cached. Although in the cached value set in ExprValueMap,
there is a Constant type value, but it is not easy to find it out -- the cached
Value set is not sorted according to the potential cost. Existing reuse logic
in SCEVExpander::expand simply chooses the first legal element from the cached
value set.
   The fix is that when the SCEV is of scConstant type, don't try the reuse
logic. simply expand it.

Differential Revision: http://reviews.llvm.org/D12090

llvm-svn: 259736
2016-02-04 01:27:38 +00:00
David Majnemer a53b5bbb18 [LoopStrengthReduce] Don't rewrite PHIs with incoming values from CatchSwitches
Bail out if we have a PHI on an EHPad that gets a value from a
CatchSwitchInst.  Because the CatchSwitchInst cannot be split, there is
no good place to stick any instructions.

This fixes PR26373.

llvm-svn: 259702
2016-02-03 21:30:34 +00:00
Wei Mi 97de385868 Revert r259662, which caused regressions on polly tests.
llvm-svn: 259675
2016-02-03 18:05:57 +00:00
Wei Mi ed133978a0 [SCEV] Try to reuse existing value during SCEV expansion
Current SCEV expansion will expand SCEV as a sequence of operations
and doesn't utilize the value already existed. This will introduce
redundent computation which may not be cleaned up throughly by
following optimizations.

This patch introduces an ExprValueMap which is a map from SCEV to the
set of equal values with the same SCEV. When a SCEV is expanded, the
set of values is checked and reused whenever possible before generating
a sequence of operations.

Differential Revision: http://reviews.llvm.org/D12090

llvm-svn: 259662
2016-02-03 17:05:12 +00:00
Dan Gohman 75452734e4 Followup to 258750; update more tests to use .p2align .
llvm-svn: 258755
2016-01-26 00:35:07 +00:00
David Majnemer bbfc7219ef [IR] Remove terminatepad
It turns out that terminatepad gives little benefit over a cleanuppad
which calls the termination function.  This is not sufficient to
implement fully generic filters but MSVC doesn't support them which
makes terminatepad a little over-designed.

Depends on D15478.

Differential Revision: http://reviews.llvm.org/D15479

llvm-svn: 255522
2015-12-14 18:34:23 +00:00
David Majnemer 8a1c45d6e8 [IR] Reformulate LLVM's EH funclet IR
While we have successfully implemented a funclet-oriented EH scheme on
top of LLVM IR, our scheme has some notable deficiencies:
- catchendpad and cleanupendpad are necessary in the current design
  but they are difficult to explain to others, even to seasoned LLVM
  experts.
- catchendpad and cleanupendpad are optimization barriers.  They cannot
  be split and force all potentially throwing call-sites to be invokes.
  This has a noticable effect on the quality of our code generation.
- catchpad, while similar in some aspects to invoke, is fairly awkward.
  It is unsplittable, starts a funclet, and has control flow to other
  funclets.
- The nesting relationship between funclets is currently a property of
  control flow edges.  Because of this, we are forced to carefully
  analyze the flow graph to see if there might potentially exist illegal
  nesting among funclets.  While we have logic to clone funclets when
  they are illegally nested, it would be nicer if we had a
  representation which forbade them upfront.

Let's clean this up a bit by doing the following:
- Instead, make catchpad more like cleanuppad and landingpad: no control
  flow, just a bunch of simple operands;  catchpad would be splittable.
- Introduce catchswitch, a control flow instruction designed to model
  the constraints of funclet oriented EH.
- Make funclet scoping explicit by having funclet instructions consume
  the token produced by the funclet which contains them.
- Remove catchendpad and cleanupendpad.  Their presence can be inferred
  implicitly using coloring information.

N.B.  The state numbering code for the CLR has been updated but the
veracity of it's output cannot be spoken for.  An expert should take a
look to make sure the results are reasonable.

Reviewers: rnk, JosephTremoulet, andrew.w.kaylor

Differential Revision: http://reviews.llvm.org/D15139

llvm-svn: 255422
2015-12-12 05:38:55 +00:00
David Majnemer 7378e7a333 [LoopStrengthReduce] Don't increment iterator past the end of the BB
We tried to move the insertion point beyond instructions like landingpad
and cleanuppad.
However, we *also* tried to move past catchpad.  This is problematic
because catchpad is also a terminator.

This fixes PR25541.

llvm-svn: 253238
2015-11-16 17:37:58 +00:00
David Majnemer b222184223 [LoopStrengthReduce] Don't bother fixing up PHIs from EH Pad preds
We cannot really insert fixup code into a PHI's predecessor.

This fixes PR25445.

llvm-svn: 252416
2015-11-08 05:04:07 +00:00
David Majnemer 235acde953 [ScalarEvolutionExpander] PHI on a catchpad can be used on both edges
A PHI on a catchpad might be used by both edges out of the catchpad,
feeding back into a loop.  In this case, just use the insertion point.
Anything more clever would require new basic blocks or PHI placement.

llvm-svn: 251442
2015-10-27 19:48:28 +00:00
David Majnemer dd9a815746 [ScalarEvolutionExpander] Properly insert no-op casts + EH Pads
We want to insert no-op casts as close as possible to the def.  This is
tricky when the cast is of a PHI node and the BasicBlocks between the
def and the use cannot hold any instructions.  Iteratively walk EH pads
until we hit a non-EH pad.

This fixes PR25326.

llvm-svn: 251393
2015-10-27 07:36:42 +00:00
Sanjoy Das 8f27415c05 [SCEV] Mark AddExprs as nsw or nuw if legal
Summary:
This uses `ScalarEvolution::getRange` and not potentially control
dependent `nsw` and `nuw` bits on the arithmetic instruction.

Reviewers: atrick, hfinkel, nlewycky

Subscribers: llvm-commits, sanjoy

Differential Revision: http://reviews.llvm.org/D13613

llvm-svn: 251048
2015-10-22 19:57:19 +00:00
Craig Topper 2c4068f409 [TwoAddressInstructionPass] When looking for a 3 addr conversion after commuting, make sure regB has been updated to take into account the commute.
llvm-svn: 249378
2015-10-06 05:39:59 +00:00
Jeroen Ketema ab99b59e8c [ARM][NEON] Use address space in vld([1234]|[234]lane) and vst([1234]|[234]lane) instructions
This commit changes the interface of the vld[1234], vld[234]lane, and vst[1234],
vst[234]lane ARM neon intrinsics and associates an address space with the
pointer that these intrinsics take. This changes, e.g.,

<2 x i32> @llvm.arm.neon.vld1.v2i32(i8*, i32)

to

<2 x i32> @llvm.arm.neon.vld1.v2i32.p0i8(i8*, i32)

This change ensures that address spaces are fully taken into account in the ARM
target during lowering of interleaved loads and stores.

Differential Revision: http://reviews.llvm.org/D12985

llvm-svn: 248887
2015-09-30 10:56:37 +00:00
Duncan P. N. Exon Smith 814b8e91c7 DI: Require subprogram definitions to be distinct
As a follow-up to r246098, require `DISubprogram` definitions
(`isDefinition: true`) to be 'distinct'.  Specifically, add an assembler
check, a verifier check, and bitcode upgrading logic to combat testcase
bitrot after the `DIBuilder` change.

While working on the testcases, I realized that
test/Linker/subprogram-linkonce-weak-odr.ll isn't relevant anymore.  Its
purpose was to check for a corner case in PR22792 where two subprogram
definitions match exactly and share the same metadata node.  The new
verifier check, requiring that subprogram definitions are 'distinct',
precludes that possibility.

I updated almost all the IR with the following script:

    git grep -l -E -e '= !DISubprogram\(.* isDefinition: true' |
    grep -v test/Bitcode |
    xargs sed -i '' -e 's/= \(!DISubprogram(.*, isDefinition: true\)/= distinct \1/'

Likely some variant of would work for out-of-tree testcases.

llvm-svn: 246327
2015-08-28 20:26:49 +00:00