This patch adds necessary test cases for load-update-store pattern
which only updates single element of vector.
Differential Revision: https://reviews.llvm.org/D71886
Summary: This patch makes `AAValueSimplify` use `changeUsesAfterManifest` in `manifest`. This will invoke simple folding after the manifest.
Reviewers: jdoerfert, sstefan1
Reviewed By: jdoerfert
Subscribers: hiraditya, arphaman, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D71972
A branch is considered UB if it depends on an undefined / uninitialized value.
At this point this handles simple UB branches in the form: `br i1 undef, ...`
We query `AAValueSimplify` to get a value for the branch condition, so the branch
can be more complicated than just: `br i1 undef, ...`.
Patch By: Stefanos Baziotis (@baziotis)
Reviewers: jdoerfert, sstefan1, uenoku
Reviewed By: uenoku
Differential Revision: https://reviews.llvm.org/D71799
This patch extends the current shape propagation and shape aware
lowering to also support binary operators. Those operators are uniform
with respect to their shape (shape of the input operands is the same as
the shape of their result).
Reviewers: anemet, Gerolf, reames, hfinkel, andrew.w.kaylor
Reviewed By: anemet
Differential Revision: https://reviews.llvm.org/D70898
Summary:
Follow-up on: https://reviews.llvm.org/D71435
We basically use `checkForAllInstructions` to loop through all the instructions in a function that access memory through a pointer: load, store, atomicrmw, atomiccmpxchg
Note that we can now use the `getPointerOperand()` that gets us the pointer operand for an instruction that belongs to the aforementioned set.
Question: This function returns `nullptr` if the instruction is `volatile`. Why?
Guess: Because if it is volatile, we don't want to do any transformation to it.
Another subtle point is that I had to add AtomicRMW, AtomicCmpXchg to `initializeInformationCache()`. Following `checkAllInstructions()` path, that
seemed the most reasonable place to add it and correct the fact that these instructions were ignored (they were not in `OpcodeInstMap` etc.). Is that ok?
Reviewers: jdoerfert, sstefan1
Reviewed By: jdoerfert, sstefan1
Subscribers: hiraditya, jfb, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D71787
_Eventually_, this attribute will be assigned to a function if it
contains undefined behavior. As a first small step, I tried to make it
loop through the load instructions in a function (eventually, the plan
is to check if a load instructions causes undefined behavior, because
e.g. dereferences a null pointer - Also eventually, this won't happen in
initialize() but in updateImpl()).
Patch By: Stefanos Baziotis (@baziotis)
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D71435
If the matrix.multiply calls have the contract fast math flag, we can
use fmuladd. This als adds a command line option to force fmuladd
generation. We can retire this option once there is a clang-level
option.
Reviewers: anemet, Gerolf, hfinkel, andrew.w.kaylor
Reviewed By: anemet
Differential Revision: https://reviews.llvm.org/D70951
This patch adds infrastructure for forward shape propagation to
LowerMatrixIntrinsics. It also updates the pass to make use of
the shape information to break up larger vector operations and to
eliminate unnecessary conversion operations between columnwise matrixes
and flattened vectors: if shape information is available for an
instruction, lower the operation to a set of instructions operating on
columns. For example, a store of a matrix is broken down into separate
stores for each column. For users that do not have shape
information (e.g. because they do not yet support shape information
aware lowering), we pack the result columns into a flat vector and
update those users.
It also adds shape aware lowering for the first non-intrinsic
instruction: vector stores.
Example:
For
%c = call <4 x double> @llvm.matrix.transpose(<4 x double> %a, i32 2, i32 2)
store <4 x double> %c, <4 x double>* %Ptr
We generate the code below without shape propagation. Note %9 which
combines the columns of the transposed matrix into a flat vector.
%split = shufflevector <4 x double> %a, <4 x double> undef, <2 x i32> <i32 0, i32 1>
%split1 = shufflevector <4 x double> %a, <4 x double> undef, <2 x i32> <i32 2, i32 3>
%1 = extractelement <2 x double> %split, i64 0
%2 = insertelement <2 x double> undef, double %1, i64 0
%3 = extractelement <2 x double> %split1, i64 0
%4 = insertelement <2 x double> %2, double %3, i64 1
%5 = extractelement <2 x double> %split, i64 1
%6 = insertelement <2 x double> undef, double %5, i64 0
%7 = extractelement <2 x double> %split1, i64 1
%8 = insertelement <2 x double> %6, double %7, i64 1
%9 = shufflevector <2 x double> %4, <2 x double> %8, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
store <4 x double> %9, <4 x double>* %Ptr
With this patch, we propagate the 2x2 shape information from the
transpose to the store and we generate the code below. Note that we
store the columns directly and do not need an extra shuffle.
%9 = bitcast <4 x double>* %Ptr to double*
%10 = bitcast double* %9 to <2 x double>*
store <2 x double> %4, <2 x double>* %10, align 8
%11 = getelementptr double, double* %9, i32 2
%12 = bitcast double* %11 to <2 x double>*
store <2 x double> %8, <2 x double>* %12, align 8
Reviewers: anemet, Gerolf, reames, hfinkel, andrew.w.kaylor
Reviewed By: anemet
Differential Revision: https://reviews.llvm.org/D70897
This reverts commit ee7579409b.
It causes crashes during ThinLTO. I suspect the issue is related to
races on the global TypeSize variable, which is 80 at the time of the
crash.
As discussed in PR44330:
https://bugs.llvm.org/show_bug.cgi?id=44330
...the transform from pow(X, -0.5) libcall/intrinsic to
reciprocal square root can result in small deviations from
the expected result due to differences in the pow()
implementation and/or the extra rounding step from the division.
This patch proposes to allow that difference with either the
'approximate functions' or 'reassociate' FMF:
http://llvm.org/docs/LangRef.html#fast-math-flags
In practice, this likely means that the code is compiled with
all of 'fast' (-ffast-math), but I have preserved the existing
specializations for -0.0/-INF that enable generating safe code
if those special values are allowed simultaneously with
allowing approximation/reassociation.
The question about whether a similar restriction is needed for
the non-reciprocal case -- pow(X, 0.5) -- is deferred. That
transform is allowed without FMF currently, and this patch does
not change that behavior.
Differential Revision: https://reviews.llvm.org/D71706
Summary:
This patch limits the default number of iterations performed by InstCombine. It also exposes a new option that allows to specify how many iterations is considered getting stuck in an infinite loop.
Based on experiments performed on real-world C++ programs, InstCombine seems to perform at most ~8-20 iterations, so treating 1000 iterations as an infinite loop seems like a safe choice. See D71145 for details.
The two limits can be specified via command line options.
Reviewers: spatel, lebedev.ri, nikic, xbolva00, grosser
Reviewed By: spatel
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D71673
A sequence of additions or multiplications that is known not to wrap, may wrap
if it's order is changed (i.e., reassociated). Therefore when vectorizing
integer sum or product reductions, their no-wrap flags need to be removed.
Fixes PR43828
Patch by Denis Antrushin
Differential Revision: https://reviews.llvm.org/D69563
Summary:
It is pretty common to assume that something is not zero.
Even optimizer itself sometimes emits such assumptions
(e.g. `addAssumeNonNull()` in `PromoteMemoryToRegister.cpp`).
But we currently don't deal with such assumptions :)
The only way `isKnownNonZero()` handles assumptions is
by calling `computeKnownBits()` which calls `computeKnownBitsFromAssume()`.
But `x != 0` does not tell us anything about set bits,
it only says that there are *some* set bits.
So naturally, `KnownBits` does not get populated,
and we fail to make use of this assumption.
I propose to deal with this special case by special-casing it
via adding a `isKnownNonZeroFromAssume()` that returns boolean
when there is an applicable assumption.
While there, we also deal with other predicates,
mainly if the comparison is with constant.
Fixes [[ https://bugs.llvm.org/show_bug.cgi?id=43267 | PR43267 ]].
Differential Revision: https://reviews.llvm.org/D71660
This is a pretty rare case, when CxtI and assume are
in the same basic block, with assume being located later.
We were already checking that assumption was guaranteed to be
executed, but we omitted CxtI itself from consideration,
and as the test (miscompile) shows, that is incorrect.
As noted in D71660 review by @nikic.
@escape() may throw here, we don't know that assumption, which is located
afterwards in the same block, is executed, therefore %load arg of
call to @escape() can not be marked as non-null.
As noted in D71660 review by @nikic.
A function marked `noreturn` may contain unreachable terminators: these
should not be considered cold, as the function may be a trampoline.
rdar://58068594
Summary:
Ignore looking at blocks that are unreachable from entry when
collecting candidates for hosting.
Normally the consthoist pass is executed in the llc pipeline,
just after unreachableblockelim. So it is abnormal to have code
that is unreachable from the entry block. But when running the
pass as part of opt, for example as part of fuzzy testing, we
might trigger various kinds of asserts when collecting candidates
if we include unreachable blocks in that analysis.
It seems like a waste of time to hoist constants in unreachble
blocks, so the solution is to simply ignore such blocks when
collecting the hoisting candidates.
The two added test cases used to end up in two different asserts,
and the intention with the checks is just to verify that we no
longer fail.
Fixes: PR43903
Reviewers: spatel
Reviewed By: spatel
Subscribers: hiraditya, uabelho, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D71678
In certain situations after inlining and simplification we end up with
code that is _almost_ a min/max pattern, but contains constants that
have been demand-bit optimised to the wrong values, ending up with code
like:
%1 = icmp slt i32 %shr, -128
%2 = select i1 %1, i32 128, i32 %shr
%.inv = icmp sgt i32 %shr, 127
%spec.select.i = select i1 %.inv, i32 127, i32 %2
%conv7 = trunc i32 %spec.select.i to i8
This should be turned into a min/max pattern, but the -128 in the first
select was instead transformed into 128, as only the bottom byte was
ever demanded.
To fix this, I've put in further canonicalisation for the immediates of
selects, preferring to use the same value as the icmp if available.
Differential Revision: https://reviews.llvm.org/D71516
Summary:
This patch adds instructions to the InstCombine worklist after they are properly inserted. This way we don't get `<badref>`s printed when logging added instructions.
It also adds a check in `Worklist::Add` that ensures that all added instructions have parents.
Simple test case that illustrates the difference when run with `--debug-only=instcombine`:
```
define i32 @test35(i32 %a, i32 %b) {
%1 = or i32 %a, 1135
%2 = or i32 %1, %b
ret i32 %2
}
```
Before this patch:
```
INSTCOMBINE ITERATION #1 on test35
IC: ADDING: 3 instrs to worklist
IC: Visiting: %1 = or i32 %a, 1135
IC: Visiting: %2 = or i32 %1, %b
IC: ADD: %2 = or i32 %a, %b
IC: Old = %3 = or i32 %1, %b
New = <badref> = or i32 %2, 1135
IC: ADD: <badref> = or i32 %2, 1135
...
```
With this patch:
```
INSTCOMBINE ITERATION #1 on test35
IC: ADDING: 3 instrs to worklist
IC: Visiting: %1 = or i32 %a, 1135
IC: Visiting: %2 = or i32 %1, %b
IC: ADD: %2 = or i32 %a, %b
IC: Old = %3 = or i32 %1, %b
New = <badref> = or i32 %2, 1135
IC: ADD: %3 = or i32 %2, 1135
...
```
Reviewers: fhahn, davide, spatel, foad, grosser, nikic
Reviewed By: nikic
Subscribers: nikic, lebedev.ri, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D71093
Summary:
This patch teaches InstCombine to accept a new parameter: maximum number of iterations over functions.
InstCombine tries to simplify instructions by iterating over the whole function until the function stops changing. As a consequence, the last iteration before reaching a fixpoint visits all instructions in the worklist and never performs any rewrites.
Bounding the number of iterations can have 2 benefits:
* In case the users of the pass can make a good guess about the number of required iterations, we can save the time normally spent on the last iteration that doesn't change anything.
* When the wants to use InstCombine as a cleanup pass, it may be enough to run just a few iterations and stop even before reaching a fixpoint. This can be also useful for implementing a lightweight pass pipeline (think `-O1`).
This patch does not change the behavior of opt or Clang -- limiting the number of iterations is entirely opt-in.
Reviewers: fhahn, davide, spatel, foad, nlopes, grosser, lebedev.ri, nikic, xbolva00
Reviewed By: spatel
Subscribers: craig.topper, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D71145
This reverts commit 1f3dd83cc1, reapplying
commit bb1b0bc4e5.
The original commit failed on some builds seemingly due to the use of a
bracketed constructor with an std::array, i.e. `std::array<> arr({...})`.
Previously, LLVM had no functional way of performing casts inside of a
DIExpression(), which made salvaging cast instructions other than Noop
casts impossible. This patch enables the salvaging of casts by using the
DW_OP_LLVM_convert operator for SExt and Trunc instructions.
There is another issue which is exposed by this fix, in which fragment
DIExpressions (which are preserved more readily by this patch) for
values that must be split across registers in ISel trigger an assertion,
as the 'split' fragments extend beyond the bounds of the fragment
DIExpression causing an error. This patch also fixes this issue by
checking the fragment status of DIExpressions which are to be split, and
dropping fragments that are invalid.
We somehow missed doing this when we were working on Power9 exploitation.
This just adds the missing legalization and cost for producing the vector
intrinsics.
Differential revision: https://reviews.llvm.org/D70436
Summary:This PR move instructions from FC0.Latch bottom up to the
beginning of FC1.Latch as long as they are proven safe.
To illustrate why this is beneficial, let's consider the following
example:
Before Fusion:
header1:
br header2
header2:
br header2, latch1
latch1:
br header1, preheader3
preheader3:
br header3
header3:
br header4
header4:
br header4, latch3
latch3:
br header3, exit3
After Fusion (before this PR):
header1:
br header2
header2:
br header2, latch1
latch1:
br header3
header3:
br header4
header4:
br header4, latch3
latch3:
br header1, exit3
Note that preheader3 is removed during fusion before this PR.
Notice that we cannot fuse loop2 with loop4 as there exists block latch1
in between.
This PR move instructions from latch1 to beginning of latch3, and remove
block latch1. LoopFusion is now able to fuse loop nest recursively.
After Fusion (after this PR):
header1:
br header2
header2:
br header3
header3:
br header4
header4:
br header2, latch3
latch3:
br header1, exit3
Reviewer: kbarton, jdoerfert, Meinersbur, dmgreen, fhahn, hfinkel,
bmahjour, etiotto
Reviewed By: kbarton, Meinersbur
Subscribers: hiraditya, llvm-commits
Tag: LLVM
Differential Revision: https://reviews.llvm.org/D71165
Summary:
Add trimming of unused components of s_buffer_load.
Extend trimming of *buffer_load to also include
unused components at the beginning of vectors and update offset.
Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D70315
Summary:
This patch restricts loop fusion to only consider rotated loops as valid candidates.
This simplifies the analysis and transformation and aligns with other loop optimizations.
Reviewers: jdoerfert, Meinersbur, dmgreen, etiotto, Whitney, fhahn, hfinkel
Reviewed By: Meinersbur
Subscribers: ormris, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D71025
Summary:
In commit d60f34c20a (llvm-svn 317128,
PR35113) MergeBlockIntoPredecessor was changed into
discarding some dbg.value intrinsics referring to
PHI values, post-splice due to loop rotation.
That elimination of dbg.value intrinsics did not
consider which dbg.value to keep depending on the
context (e.g. if the variable is changing its value
several times inside the basic block).
In the past that hasn't been such a big problem since
CodeGenPrepare::placeDbgValues has moved the dbg.value
to be next to the PHI node anyway. But after commit
00e238896c CodeGenPrepare isn't doing that
any longer, so we need to be more careful when avoiding
duplicate dbg.value intrinsics in MergeBlockIntoPredecessor.
This patch replaces the code that tried to avoid duplicate
dbg.values by using the RemoveRedundantDbgInstrs helper.
Reviewers: aprantl, jmorse, vsk
Reviewed By: aprantl, vsk
Subscribers: jholewinski, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D71480