As discussed in
http://lists.llvm.org/pipermail/llvm-dev/2020-July/143801.html.
Currently no users outside of unit tests.
Replace all instances in tests of -constprop with -instsimplify.
Notable changes in tests:
* vscale.ll - @llvm.sadd.sat.nxv16i8 is evaluated by instsimplify, use a fake intrinsic instead
* InsertElement.ll - insertelement undef is removed by instsimplify in @insertelement_undef
llvm/test/Transforms/ConstProp moved to llvm/test/Transforms/InstSimplify/ConstProp
Reviewed By: lattner, nikic
Differential Revision: https://reviews.llvm.org/D85159
InstSimplify should do all transformations that ConstProp does, but
one thing that ConstProp does that InstSimplify wouldn't is inline
vector instructions that are constants, e.g. into a ret.
Previously vector instructions wouldn't be inlined in InstSimplify
because llvm::Simplify*Instruction() would return nullptr for specific
instructions, such as vector instructions that were actually constants,
if it couldn't simplify them.
This changes SimplifyInsertElementInst, SimplifyExtractElementInst, and
SimplifyShuffleVectorInst to return a vector constant when possible.
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D85946
MVE Gather scatter codegeneration is looking a lot better than it used
to, but still has some issues. The instructions we currently model as 1
cycle per element, which is a bit low for some cases. Increasing the
cost by the MVECostFactor brings them in-line with our other instruction
costs. This will have the effect of only generating then when the extra
benefit is more likely to overcome some of the issues. Notably in
running out of registers and vectorizing loops that could otherwise be
SLP vectorized.
In the short-term whilst we look at other ways of dealing with those
more directly, we can increase the costs of gathers to make them more
likely to be beneficial when created.
Differential Revision: https://reviews.llvm.org/D86444
Summary: The LCSSA pass (required for all loop passes) sometimes adds
additional blocks containing LCSSA variables, and checkLoopsStructure
may return false even when the loops are perfectly nested in this case.
This is because the successor of the exit block of the inner loop now
points to the LCSSA block instead of the latch block of the outer loop.
Examples are shown in the test nests-with-lcssa.ll.
To fix the issue, the successor of the exit block of the inner loop can
now point to a block in which all instructions are LCSSA phi node
(except the terminator), and the sole successor of that block should
point to the latch block of the outer loop.
Reviewed By: Whitney, etiotto
Differential Revision: https://reviews.llvm.org/D86133
In getCastInstrCost when the instruction is a truncate we were relying
upon the implicit TypeSize -> uint64_t cast when asking if a given type
has the same size as a legal integer. I've changed the code to only
ask the question if the type is fixed length.
I have also changed InstCombinerImpl::SimplifyDemandedUseBits to bail
out for now if the type is a scalable vector.
I've added the following new tests:
Analysis/CostModel/AArch64/sve-trunc.ll
Transforms/InstCombine/AArch64/sve-trunc.ll
for both of these fixes.
Differential revision: https://reviews.llvm.org/D86432
test referenced a relative path to a file, but the path was not correct
relative to the project the test is in
Differential Revision: https://reviews.llvm.org/D86368
Modify the ARM getCmpSelInstrCost implementation for the code size
costs of selects. Now consider the legalization cost and increase
the cost of i1 because those values wouldn't live in a general purpose
register. We also make selects +1 more expensive to account for the IT
instruction.
Differential Revision: https://reviews.llvm.org/D82091
StackLifetime class collects lifetime marker of an `alloca` by collect
the user of `BitCast` who is the user of the `alloca`. However, either
the `alloca` itself could be used with the lifetime marker or the `BitCast`
of the `alloca` could be transformed to other instructions. (e.g.,
it may be transformed to all zero reps in `InstCombine` pass).
This patch tries to fix this process in `collectMarkers` functions.
Reviewed By: vitalybuka
Differential Revision: https://reviews.llvm.org/D85399
The current demand propagator for addition will mark all input bits at and right of the alive output bit as alive. But carry won't propagate beyond a bit for which both operands are zero (or one/zero in the case of subtraction) so a more accurate answer is possible given known bits.
I derived a propagator by working through truth tables and using a bit-reversed addition to make demand ripple to the right, but I'm not sure how to make a convincing argument for its correctness in the comments yet. Nevertheless, here's a minimal implementation and test to get feedback.
This would help in a situation where, for example, four bytes (<128) packed into an int are added with four others SIMD-style but only one of the four results is actually read.
Known A: 0_______0_______0_______0_______
Known B: 0_______0_______0_______0_______
AOut: 00000000001000000000000000000000
AB, current: 00000000001111111111111111111111
AB, patch: 00000000001111111000000000000000
Committed on behalf of: @rrika (Erika)
Differential Revision: https://reviews.llvm.org/D72423
If we can't identify alloca used in lifetime marker we
need to assume to worst case scenario.
Reviewed By: eugenis
Differential Revision: https://reviews.llvm.org/D84630
Similarly as for pointers, even for integers a == b is usually false.
GCC also uses this heuristic.
Reviewed By: ebrevnov
Differential Revision: https://reviews.llvm.org/D85781
Similarly as for pointers, even for integers a == b is usually false.
GCC also uses this heuristic.
Reviewed By: ebrevnov
Differential Revision: https://reviews.llvm.org/D85781
Similarly as for pointers, even for integers a == b is usually false.
GCC also uses this heuristic.
Reviewed By: ebrevnov
Differential Revision: https://reviews.llvm.org/D85781
This reverts commit e441b7a7a0.
This patch causes a compile error in tensorflow opensource project. The stack trace looks like:
Point of crash:
llvm/include/llvm/Analysis/LoopInfoImpl.h : line 35
(gdb) ptype *this
type = const class llvm::LoopBase<llvm::BasicBlock, llvm::Loop> [with BlockT = llvm::BasicBlock, LoopT = llvm::Loop]
(gdb) p *this
$1 = {ParentLoop = 0x0, SubLoops = std::vector of length 0, capacity 0, Blocks = std::vector of length 0, capacity 1,
DenseBlockSet = {<llvm::SmallPtrSetImpl<llvm::BasicBlock const*>> = {<llvm::SmallPtrSetImplBase> = {<llvm::DebugEpochBase> = {Epoch = 3}, SmallArray = 0x1b2bf6c8, CurArray = 0x1b2bf6c8,
CurArraySize = 8, NumNonEmpty = 0, NumTombstones = 0}, <No data fields>}, SmallStorage = {0xfffffffffffffffe, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}}, IsInvalid = true}
(gdb) p *this->DenseBlockSet->CurArray
$2 = (const void *) 0xfffffffffffffffe
I will try to get a case from tensorflow or use creduce to get a small case.
Now that SCEVExpander can preserve LCSSA form,
we do not have to worry about LCSSA form when
trying to look through PHIs. SCEVExpander will take
care of inserting LCSSA PHI nodes as required.
This increases precision of the analysis in some cases.
Reviewed By: mkazantsev, bmahjour
Differential Revision: https://reviews.llvm.org/D71539
This is the max version of D85046.
This change causes binary changes in 44 out of 237 benchmarks (out of
MultiSource/SPEC2000/SPEC2006)
Reviewed By: lebedev.ri
Differential Revision: https://reviews.llvm.org/D85189
Constant fold both the trapping and saturating versions of the
WebAssembly truncation intrinsics. The tests are adapted from the
WebAssembly spec tests for the corresponding instructions.
Requested in PR46982.
Differential Revision: https://reviews.llvm.org/D85392
If we can't identify alloca used in lifetime marker we
need to assume to worst case scenario.
Reviewed By: eugenis
Differential Revision: https://reviews.llvm.org/D84630
Add cases of fused fmul+fadd/fsub with f16 and f64 operands to cost model.
Also added operations with contract attribute.
Fixed line endings in test.
Reviewed By: rampitec
Differential Revision: https://reviews.llvm.org/D84995
As mentioned in
http://lists.llvm.org/pipermail/llvm-dev/2020-July/143395.html,
loop-unswitch has not been ported to the NPM. Instead people are using
simple-loop-unswitch.
Pin all tests in Transforms/LoopUnswitch to legacy PM and replace all
other uses of loop-unswitch with simple-loop-unswitch.
One test that didn't fit into the above was
2014-06-21-congruent-constant.ll which seems to only pass with
loop-unswitch. That is also pinned to legacy PM.
Now all tests containing "-loop-unswitch" anywhere in the test succeed with
NPM turned on by default.
Reviewed By: ychen
Differential Revision: https://reviews.llvm.org/D85360
-print-memoryssa in legacy PM is print<memoryssa> in NPM.
Pin tests with -print-memoryssa to legacy PM.
Add corresponding tests for NPM where missing.
This fixes "unknown pass name 'print-memoryssa'".
Some tests still fail in Analysis/MemorySSA due to other passes that
haven't been ported.
pr43427.ll and pr43438.ll required adding -aa-pipeline=basic-aa,
-loop-simplify (since it doesn't run on legacy PM by default), and
decrementing some of the MemoryPhi numbers.
Reviewed By: asbirlea
Differential Revision: https://reviews.llvm.org/D85333
As with other targets, set the throughput cost of control-flow
instructions to free so that we don't miss out of vectorization
opportunities.
Differential Revision: https://reviews.llvm.org/D85283
This option was added a while back, to help improve AA around pointer
phi loops. It looks for phi(gep(phi, const), x) loops, checking if x can
then prove more precise aliasing info.
Differential Revision: https://reviews.llvm.org/D82998