This changes the vectorizer to explicitly use the loopsimplify and lcssa utils,
instead of "requiring" the transformations as if they were analyses.
This is not NFC, since it changes the LCSSA behavior - we no longer run LCSSA
for all loops, but rather only for the loops we expect to modify.
Differential Revision: https://reviews.llvm.org/D28868
llvm-svn: 292456
We currently check whether a reduction has a single outside user. We don't
really need to require that - we just need to make sure a single value is
used externally. The number of external users of that value shouldn't actually
matter.
Differential Revision: https://reviews.llvm.org/D28830
llvm-svn: 292424
If a memory instruction will be vectorized, but it's pointer operand is
non-consecutive-like, the instruction is a gather or scatter operation. Its
pointer operand will be non-uniform. This should fix PR31671.
Reference: https://llvm.org/bugs/show_bug.cgi?id=31671
Differential Revision: https://reviews.llvm.org/D28819
llvm-svn: 292254
updated instructions:
pmulld, pmullw, pmulhw, mulsd, mulps, mulpd, divss, divps, divsd, divpd, addpd and subpd.
special optimization case which replaces pmulld with pmullw\pmulhw\pshuf seq.
In case if the real operands bitwidth <= 16.
Differential Revision: https://reviews.llvm.org/D28104
llvm-svn: 291657
Bail out instead of asserting when we encounter this situation,
which can actually happen.
The reason the test uses the new PM is that the "bad" phi, incidentally, gets
cleaned up by LoopSimplify. But LICM can create this kind of phi and preserve
loop simplify form, so the cleanup has no chance to run.
This fixes PR31190.
We may want to solve this in a less conservative manner, since this phi is
actually uniform within the inner loop (or we may want LICM to output a cleaner
promotion to begin with).
Differential Revision: https://reviews.llvm.org/D28490
llvm-svn: 291589
This patch delays the fix-up step for external induction variable users until
after the dominator tree has been properly updated. This should fix PR30742.
The SCEVExpander in InductionDescriptor::transform can generate code in the
wrong location if the dominator tree is not up-to-date. We should work towards
keeping the dominator tree up-to-date throughout the transformation.
Reference: https://llvm.org/bugs/show_bug.cgi?id=30742
Differential Revision: https://reviews.llvm.org/D28168
llvm-svn: 291462
This code seems to be target dependent which may not be the same for all targets.
Passed the decision whether the given stride is complex or not to the target by sending stride information via SCEV to getAddressComputationCost instead of 'IsComplex'.
Specifically at X86 targets we dont see any significant address computation cost in case of the strided access in general.
Differential Revision: https://reviews.llvm.org/D27518
llvm-svn: 291106
This patch renumbers the metadata nodes in debug info testcases after
https://reviews.llvm.org/D26769. This is a separate patch because it
causes so much churn. This was implemented with a python script that
pipes the testcases through llvm-as - | llvm-dis - and then goes
through the original and new output side-by side to insert all
comments at a close-enough location.
Differential Revision: https://reviews.llvm.org/D27765
llvm-svn: 290292
This patch implements PR31013 by introducing a
DIGlobalVariableExpression that holds a pair of DIGlobalVariable and
DIExpression.
Currently, DIGlobalVariables holds a DIExpression. This is not the
best way to model this:
(1) The DIGlobalVariable should describe the source level variable,
not how to get to its location.
(2) It makes it unsafe/hard to update the expressions when we call
replaceExpression on the DIGLobalVariable.
(3) It makes it impossible to represent a global variable that is in
more than one location (e.g., a variable with multiple
DW_OP_LLVM_fragment-s). We also moved away from attaching the
DIExpression to DILocalVariable for the same reasons.
This reapplies r289902 with additional testcase upgrades and a change
to the Bitcode record for DIGlobalVariable, that makes upgrading the
old format unambiguous also for variables without DIExpressions.
<rdar://problem/29250149>
https://llvm.org/bugs/show_bug.cgi?id=31013
Differential Revision: https://reviews.llvm.org/D26769
llvm-svn: 290153
This reverts commit 289920 (again).
I forgot to implement a Bitcode upgrade for the case where a DIGlobalVariable
has not DIExpression. Unfortunately it is not possible to safely upgrade
these variables without adding a flag to the bitcode record indicating which
version they are.
My plan of record is to roll the planned follow-up patch that adds a
unit: field to DIGlobalVariable into this patch before recomitting.
This way we only need one Bitcode upgrade for both changes (with a
version flag in the bitcode record to safely distinguish the record
formats).
Sorry for the churn!
llvm-svn: 289982
This patch reapplies r289863. The original patch was reverted because it
exposed a bug causing the loop vectorizer to crash in the Python runtime on
PPC. The underlying issue was fixed with r289958.
llvm-svn: 289975
After r288909, instructions feeding predicated instructions may be scalarized
if profitable. Since these instructions will remain scalar, we shouldn't
attempt to type-shrink them. We should only truncate vector types to their
minimal bit widths. This bug was exposed by enabling the vectorization of loops
containing conditional stores by default.
llvm-svn: 289958
stores by default
This uncovers a crasher in the loop vectorizer on PPC when building the
Python runtime. I'll send the testcase to the review thread for the
original commit.
llvm-svn: 289934
This patch implements PR31013 by introducing a
DIGlobalVariableExpression that holds a pair of DIGlobalVariable and
DIExpression.
Currently, DIGlobalVariables holds a DIExpression. This is not the
best way to model this:
(1) The DIGlobalVariable should describe the source level variable,
not how to get to its location.
(2) It makes it unsafe/hard to update the expressions when we call
replaceExpression on the DIGLobalVariable.
(3) It makes it impossible to represent a global variable that is in
more than one location (e.g., a variable with multiple
DW_OP_LLVM_fragment-s). We also moved away from attaching the
DIExpression to DILocalVariable for the same reasons.
This reapplies r289902 with additional testcase upgrades.
<rdar://problem/29250149>
https://llvm.org/bugs/show_bug.cgi?id=31013
Differential Revision: https://reviews.llvm.org/D26769
llvm-svn: 289920
This patch implements PR31013 by introducing a
DIGlobalVariableExpression that holds a pair of DIGlobalVariable and
DIExpression.
Currently, DIGlobalVariables holds a DIExpression. This is not the
best way to model this:
(1) The DIGlobalVariable should describe the source level variable,
not how to get to its location.
(2) It makes it unsafe/hard to update the expressions when we call
replaceExpression on the DIGLobalVariable.
(3) It makes it impossible to represent a global variable that is in
more than one location (e.g., a variable with multiple
DW_OP_LLVM_fragment-s). We also moved away from attaching the
DIExpression to DILocalVariable for the same reasons.
<rdar://problem/29250149>
https://llvm.org/bugs/show_bug.cgi?id=31013
Differential Revision: https://reviews.llvm.org/D26769
llvm-svn: 289902
This patch sets the default value of the "-enable-cond-stores-vec" command line
option to "true".
Differential Revision: https://reviews.llvm.org/D27814
llvm-svn: 289863
We currently check if the exact trip count is known and is smaller than the
"tiny loop" bound. We should be checking the maximum bound on the trip count
instead.
Differential Revision: https://reviews.llvm.org/D27690
llvm-svn: 289583
Summary:
This change adds some verification in the IR verifier around struct path
TBAA metadata.
Other than some basic sanity checks (e.g. we get constant integers where
we expect constant integers), this checks:
- That by the time an struct access tuple `(base-type, offset)` is
"reduced" to a scalar base type, the offset is `0`. For instance, in
C++ you can't start from, say `("struct-a", 16)`, and end up with
`("int", 4)` -- by the time the base type is `"int"`, the offset
better be zero. In particular, a variant of this invariant is needed
for `llvm::getMostGenericTBAA` to be correct.
- That there are no cycles in a struct path.
- That struct type nodes have their offsets listed in an ascending
order.
- That when generating the struct access path, you eventually reach the
access type listed in the tbaa tag node.
Reviewers: dexonsmith, chandlerc, reames, mehdi_amini, manmanren
Subscribers: mcrosier, llvm-commits
Differential Revision: https://reviews.llvm.org/D26438
llvm-svn: 289402
This patch attempts to scalarize the operand expressions of predicated
instructions if they were conditionally executed in the original loop. After
scalarization, the expressions will be sunk inside the blocks created for the
predicated instructions. The transformation essentially performs
un-if-conversion on the operands.
The cost model has been updated to determine if scalarization is profitable. It
compares the cost of a vectorized instruction, assuming it will be
if-converted, to the cost of the scalarized instruction, assuming that the
instructions corresponding to each vector lane will be sunk inside a predicated
block, possibly avoiding execution. If it's more profitable to scalarize the
entire expression tree feeding the predicated instruction, the expression will
be scalarized; otherwise, it will be vectorized. We only consider the cost of
the entire expression to accurately estimate the cost of the required
insertelement and extractelement instructions.
Differential Revision: https://reviews.llvm.org/D26083
llvm-svn: 288909
VSX has instructions lxsiwax/lxsdx that can load 32/64 bit value into VSX register cheaply. That patch makes it known to memory cost model, so the vectorization of the test case in pr30990 is beneficial.
Differential Revision: https://reviews.llvm.org/D26713
llvm-svn: 288560
The register usage algorithm incorrectly treats instructions whose value is
not used within the loop (e.g. those that do not produce a value).
The algorithm first calculates the usages within the loop. It iterates over
the instructions in order, and records at which instruction index each use
ends (in fact, they're actually recorded against the next index, as this is
when we want to delete them from the open intervals).
The algorithm then iterates over the instructions again, adding each
instruction in turn to a list of open intervals. Instructions are then
removed from the list of open intervals when they occur in the list of uses
ended at the current index.
The problem is, instructions which are not used in the loop are skipped.
However, although they aren't used, the last use of a value may have been
recorded against that instruction index. In this case, the use is not deleted
from the open intervals, which may then bump up the estimated register usage.
This patch fixes the issue by simply moving the "is used" check after the loop
which erases the uses at the current index.
Differential Revision: https://reviews.llvm.org/D26554
llvm-svn: 286969
Add explicit v16i16/v32i8 ADD/SUB costs, matching the costs of v4i64/v8i32 - they were missing for some reason.
This has side effects on the LV max bandwidth tests (AVX1 now prefers 128-bit vectors vs AVX2 which still prefers 256-bit)
llvm-svn: 286832
This is PR28376.
Unfortunately given the current structure of optimization diagnostics we
lack the capability to tell whether the user has
passed -Rpass-analysis=loop-vectorize since this is local to the
front-end (BackendConsumer::OptimizationRemarkHandler).
So rather than printing this even if the user has already
passed -Rpass-analysis, this patch just punts and stops recommending
this option. I don't think that getting this right is worth the
complexity.
Differential Revision: https://reviews.llvm.org/D26563
llvm-svn: 286662
Summary:
The change will test the change in r286159.
The idea behind the change: Make the dbg location different between loop header and preheader/exit. Originally, dbg location 21 exists in 3 BBs: preheader, header, critical edge (exit). Update the debug location of inside the loop header from !21 to !22 so that it will reflect the correct location.
Reviewers: probinson
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D26428
llvm-svn: 286403
possible pointer-wrap-around concerns, in some cases.
Before this patch, collectConstStridedAccesses (part of interleaved-accesses
analysis) called getPtrStride with [Assume=false, ShouldCheckWrap=true] when
examining all candidate pointers. This is too conservative. Instead, this
patch makes collectConstStridedAccesses use an optimistic approach, calling
getPtrStride with [Assume=true, ShouldCheckWrap=false], and then, once the
candidate interleave groups have been formed, revisits the pointer-wrapping
analysis but only where it matters: namely, in groups that have gaps, and where
the gaps are not at the very end of the group (in which case the loop is
peeled). This second time getPtrStride is called with [Assume=false,
ShouldCheckWrap=true], but this could further be improved to using Assume=true,
once we also add the logic to track that we are not going to meet the scev
runtime checks threshold.
Differential Revision: https://reviews.llvm.org/D25276
llvm-svn: 285517
When we predicate an instruction (div, rem, store) we place the instruction in
its own basic block within the vectorized loop. If a predicated instruction has
scalar operands, it's possible to recursively sink these scalar expressions
into the predicated block so that they might avoid execution. This patch sinks
as much scalar computation as possible into predicated blocks. We previously
were able to sink such operands only if they were extractelement instructions.
Differential Revision: https://reviews.llvm.org/D25632
llvm-svn: 285097
Some instructions from the original loop, when vectorized, can become trivially
dead. This happens because of the way we structure the new loop. For example,
we create new induction variables and induction variable "steps" in the new
loop. Thus, when we go to vectorize the original induction variable update, it
may no longer be needed due to the instructions we've already created. This
patch prevents us from creating these redundant instructions. This reduces code
size before simplification and allows greater flexibility in code generation
since we have fewer unnecessary instruction uses.
Differential Revision: https://reviews.llvm.org/D25631
llvm-svn: 284631
This patch modifies the cost calculation of predicated instructions (div and
rem) to avoid the accumulation of rounding errors due to multiple truncating
integer divisions. The calculation for predicated stores will be addressed in a
follow-on patch since we currently don't scale the cost of predicated stores by
block probability.
Differential Revision: https://reviews.llvm.org/D25333
llvm-svn: 284123
Previously, we marked the branch conditions of latch blocks uniform after
vectorization if they were instructions contained in the loop. However, if a
condition instruction has users other than the branch, it may not remain
uniform. This patch ensures the conditions we mark uniform are only used by the
branch. This should fix PR30627.
Reference: https://llvm.org/bugs/show_bug.cgi?id=30627
llvm-svn: 283563
Vectorizer tests in the target-independent directory should not have a target
triple. If a test really needs to query a specific backend, it belongs in the
right target subdirectory (which "REQUIRES" the right backend). Otherwise, it
should not specify a triple.
llvm-svn: 283512
When building the steps for scalar induction variables, we previously attempted
to determine if all the scalar users of the induction variable were uniform. If
they were, we would only emit the step corresponding to vector lane zero. This
optimization was too aggressive. We generally don't know the entire set of
induction variable users that will be scalar. We have
isScalarAfterVectorization, but this is only a conservative estimate of the
instructions that will be scalarized. Thus, an induction variable may have
scalar users that aren't already known to be scalar. To avoid emitting unused
steps, we can only check that the induction variable is uniform. This should
fix PR30542.
Reference: https://llvm.org/bugs/show_bug.cgi?id=30542
llvm-svn: 282863
This patch ensures that we actually scalarize instructions marked scalar after
vectorization. Previously, such instructions may have been vectorized instead.
Differential Revision: https://reviews.llvm.org/D23889
llvm-svn: 282418
If we identify an instruction as uniform after vectorization, we know that we
should only use the value corresponding to the first vector lane of each unroll
iteration. However, when scalarizing such instructions, we still produce values
for the other vector lanes. This patch prevents us from generating the unused
scalars.
Differential Revision: https://reviews.llvm.org/D24275
llvm-svn: 282087
This patch moves the processing of pointer induction variables in
collectLoopUniforms from the consecutive pointer phase of the analysis to the
phi node phase. Previously, if a pointer induction variable was used by both a
scalarized non-memory instruction as well as a vectorized memory instruction,
we would incorrectly identify the pointer as uniform. Pointer induction
variables should be treated the same as other phi nodes. That is, they are
uniform if all users of the induction variable and induction variable update
are uniform.
Differential Revision: https://reviews.llvm.org/D24511
llvm-svn: 281485
This patch reverses the edge from DIGlobalVariable to GlobalVariable.
This will allow us to more easily preserve debug info metadata when
manipulating global variables.
Fixes PR30362. A program for upgrading test cases is attached to that
bug.
Differential Revision: http://reviews.llvm.org/D20147
llvm-svn: 281284
The test case included in r280979 wasn't checking what it was supposed to be
checking for the predicated store case. Fixing the test revealed that the
multi-use case (when a pointer is used by both vectorized and scalarized memory
accesses) wasn't being handled properly. We can't skip over
non-consecutive-like pointers since they may have looked consecutive-like with
a different memory access.
llvm-svn: 280992
Previously, all consecutive pointers were marked uniform after vectorization.
However, if a consecutive pointer is used by a memory access that is eventually
scalarized, the pointer won't remain uniform after all. An example is
predicated stores. Even though a predicated store may be consecutive, it will
still be scalarized, making it's pointer operand non-uniform.
This patch updates the logic in collectLoopUniforms to consider the cases where
a memory access may be scalarized. If a memory access may be scalarized, its
pointer operand is not marked uniform. The determination of whether a given
memory instruction will be scalarized or not has been moved into a common
function that is used by the vectorizer, cost model, and legality analysis.
Differential Revision: https://reviews.llvm.org/D24271
llvm-svn: 280979
For uniform instructions, we're only required to generate a scalar value for
the first vector lane of each unroll iteration. Thus, if we have a reverse
interleaved group, computing the member index off the scalar GEP corresponding
to the last vector lane of its pointer operand technically makes the GEP
non-uniform. We should compute the member index off the first scalar GEP
instead.
I've added the updated member index computation to the existing reverse
interleaved group test.
llvm-svn: 280497
We don't need to limit predication to blocks that have a single incoming
edge, we just need to use the right mask.
This fixes PR30172.
Differential Revision: https://reviews.llvm.org/D24009
llvm-svn: 280148