Summary:
In addMustTailToCoroResumes, we set musttail on those resume instructions that are followed by a ret instruction. This is done by simplifyTerminatorLeadingToRet which replace a sequence of branches leading to a ret with a clone of the ret.
However it forgets to remove corresponding PHI values that come from basic block of replaced branch, and may cause jumpthreading pass hangs (https://bugs.llvm.org/show_bug.cgi?id=43720)
This patch fix this issue
Test Plan:
cppcoro library with O3+flto
check-llvm
Reviewers: modocache, GorNishanov, lewissbaker
Reviewed By: modocache
Subscribers: mehdi_amini, EricWF, hiraditya, dexonsmith, jfb, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D71826
Patch by junparser (JunMa)!
SCEVExpander modifies the underlying function so it is more suitable in
Transforms/Utils, rather than Analysis. This allows using other
transform utils in SCEVExpander.
Reviewers: sanjoy.google, efriedma, reames
Reviewed By: sanjoy.google
Differential Revision: https://reviews.llvm.org/D71537
I would think it's better than having two practically identical folds
next to eachother, but then generalization isn't all that pretty
due to the fact that we need to produce different `sub` each time..
This change is no-functional-changes-intended refactoring.
Summary:
For artificial cases (huge array, few usages), Global SRA optimization creates
a lot of redundant data. It creates an instance of GlobalVariable for each array
element. For huge array, that means huge compilation time and huge memory usage.
Following example compiles for 10 minutes and requires 40GB of memory.
namespace {
char LargeBuffer[64 * 1024 * 1024];
}
int main ( void ) {
LargeBuffer[0] = 0;
printf("\n ");
return LargeBuffer[0] == 0;
}
The fix is to avoid Global SRA for large arrays.
Reviewers: craig.topper, rnk, efriedma, fhahn
Reviewed By: rnk
Subscribers: xbolva00, lebedev.ri, lkail, merge_guards_bot, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D71993
Name: (X & (- Y)) - X -> - (X & (Y - 1)) (PR44448)
%negy = sub i8 0, %y
%unbiasedx = and i8 %negy, %x
%r = sub i8 %unbiasedx, %x
=>
%ymask = add i8 %y, -1
%xmasked = and i8 %ymask, %x
%r = sub i8 0, %xmasked
https://rise4fun.com/Alive/OIpla
This decreases use count of %x, may allow us to
later hoist said negation even further,
and results in marginally nicer X86 codegen.
See
https://bugs.llvm.org/show_bug.cgi?id=44448https://reviews.llvm.org/D71499
If we replace a function with a new one because we rewrite the
signature, dead users may still refer to the old version. With this
patch we reuse the code that deals with dead functions, which the old
versions are, to avoid problems.
An inbounds GEP results in poison if the value is not "inbounds", not in
UB. We accidentally derived nonnull and dereferenceable from these
inbounds GEPs even in the absence of accesses that would make the poison
to UB.
The patch makes sure that the LastThrowing pointer does not point to any instruction deleted by call to DeleteDeadInstruction.
While iterating through the instructions the pass maintains a pointer to the lastThrowing Instruction. A call to deleteDeadInstruction deletes a dead store and other instructions feeding the original dead instruction which also become dead. The instruction pointed by the lastThrowing pointer could also be deleted by the call to DeleteDeadInstruction and thus it becomes a dangling pointer. Because of this, we see an error in the next iteration.
In the patch, we maintain a list of throwing instructions encountered previously and use the last non deleted throwing instruction from the container.
Reviewers: fhahn, bcahoon, efriedma
Reviewed By: fhahn
Differential Revision: https://reviews.llvm.org/D65326
As shown in P44383:
https://bugs.llvm.org/show_bug.cgi?id=44383
...we can't safely propagate a vector constant through this icmp fold
if that vector constant contains undefined elements.
We know that each defined element of the constant is safe though, so
find the first of those and replicate it into the formerly undef lanes.
Differential Revision: https://reviews.llvm.org/D72101
This is a less ambitious alternative to previous attempts to fix
this bug with:
rG56b2aee1875a
rGef02831f0a4e
rG56b2aee1875a
...because those all failed bot testing with use-after-free or
other problems.
The original crashing/assert problem is still showing up on
various fuzzers, so I've added a new minimal test based on
another one of those failures.
Instead of trying to manage and coordinate the logic in
isAllocSiteRemovable() with the deletion loops, just loosen
the existing code that handles casts and GEP by replacing
with undef to allow other opcodes. That means that no
instructions with uses should assert on deletion, and there
are hopefully no non-obvious sanitizer bugs induced.
A series of patches beginning with https://reviews.llvm.org/D71898
propose to add an implementation of the coroutine passes to the new pass
manager. As part of these changes, the coroutine passes that implement
the legacy pass manager interface are renamed, to `<PassName>Legacy`.
This mirrors similar changes that have been made to many other passes in
LLVM as they've been transitioned to support both old and new pass
managers.
This commit splits out the renaming portion of that patch and commits it
in advance as an NFC (no functional change intended) commit. It renames:
* `CoroEarly` => `CoroEarlyLegacy`
* `CoroSplit` => `CoroSplitLegacy`
* `CoroElide` => `CoroElideLegacy`
* `CoroCleanup` => `CoroCleanupLegacy`
This patch introduces `AAValueConstantRange`, which answers a possible range for integer value in a specific program point.
One of the motivations is propagating existing `range` metadata. (I think we need to change the situation that `range` metadata cannot be put to Argument).
The state is a tuple of `ConstantRange` and it is initialized to (known, assumed) = ([-∞, +∞], empty).
Currently, AAValueConstantRange is created when AAValueSimplify cannot
simplify the value.
Supported
- BinaryOperator(add, sub, ...)
- CmpInst(icmp eq, ...)
- !range metadata
`AAValueConstantRange` is not intended to extend to polyhedral range value analysis.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D71620
The instructions use a mask to either pack disjoint bits together(pext) or spread bits to disjoint locations(pdep). If the mask is all 0s then no bits are extracted or deposited. If the mask is all ones, then the source value is written to the result since no compression or expansion happens. Otherwise if both the source and mask are constant we can walk the bits in the source/mask and calculate the result.
There other crazier things we could do like computeKnownBits or turning pext into shift/and if only a single contiguous range of bits is extracted.
Fixes PR44389
Differential Revision: https://reviews.llvm.org/D71952
This does not solve PR17101, but it is one of the
underlying diffs noted here:
https://bugs.llvm.org/show_bug.cgi?id=17101#c8
We could ease the one-use checks for the 'clear'
(no 'not' op) half of the transform, but I do not
know if that asymmetry would make things better
or worse.
Proofs:
https://rise4fun.com/Alive/uVB
Name: masked bit set
%sh1 = shl i32 1, %y
%and = and i32 %sh1, %x
%cmp = icmp ne i32 %and, 0
%r = zext i1 %cmp to i32
=>
%s = lshr i32 %x, %y
%r = and i32 %s, 1
Name: masked bit clear
%sh1 = shl i32 1, %y
%and = and i32 %sh1, %x
%cmp = icmp eq i32 %and, 0
%r = zext i1 %cmp to i32
=>
%xn = xor i32 %x, -1
%s = lshr i32 %xn, %y
%r = and i32 %s, 1
Judging by the existing comments, this was the intention, but the
transform never actually checked if the existing phi's would be removed.
See https://bugs.llvm.org/show_bug.cgi?id=44242 for an example where
this causes much worse code generation on AMDGPU.
Differential Revision: https://reviews.llvm.org/D71209
As part of the Attributor manifest we want to change the signature of
functions. This patch introduces a fairly generic interface to do so.
As a first, very simple, use case, we remove unused arguments. A second
use case, pointer privatization, will be committed with this patch as
well.
A lot of the code and ideas are taken from argument promotion and we
run all argument promotion tests through this framework as well.
Reviewed By: uenoku
Differential Revision: https://reviews.llvm.org/D68765
Since the information is known we can simply use it at the call site.
This is especially useful for callbacks but also helps regular calls.
The test changes are mechanical.
This is the second step after D67871 to make use of abstract call sites.
In this patch the argument we associate with a abstract call site
argument can be the one in the callback callee instead of the one in the
callback broker.
Caveat: We cannot allow no-alias arguments for problematic callbacks:
As described in [1], adding no-alias (or restrict) to arguments could
break synchronization as the synchronization effect, e.g., a barrier,
does not "alias" with the pointer anymore. This disables no-alias
annotation for potentially problematic arguments until we implement the
fix described in [1].
Reviewed By: uenoku
Differential Revision: https://reviews.llvm.org/D68008
[1] Compiler Optimizations for OpenMP, J. Doerfert and H. Finkel,
International Workshop on OpenMP 2018,
http://compilers.cs.uni-saarland.de/people/doerfert/par_opt18.pdf
Especially for callbacks, annotating the call site arguments is
important. Doing so exposed a too strong dependence of AAMemoryBehavior
on AANoCapture since we handle the case of potentially captured pointers
explicitly.
The changes to the tests are all mechanical.
Summary: This patch makes `AAValueSimplify` use `changeUsesAfterManifest` in `manifest`. This will invoke simple folding after the manifest.
Reviewers: jdoerfert, sstefan1
Reviewed By: jdoerfert
Subscribers: hiraditya, arphaman, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D71972
A branch is considered UB if it depends on an undefined / uninitialized value.
At this point this handles simple UB branches in the form: `br i1 undef, ...`
We query `AAValueSimplify` to get a value for the branch condition, so the branch
can be more complicated than just: `br i1 undef, ...`.
Patch By: Stefanos Baziotis (@baziotis)
Reviewers: jdoerfert, sstefan1, uenoku
Reviewed By: uenoku
Differential Revision: https://reviews.llvm.org/D71799
This patch extends the current shape propagation and shape aware
lowering to also support binary operators. Those operators are uniform
with respect to their shape (shape of the input operands is the same as
the shape of their result).
Reviewers: anemet, Gerolf, reames, hfinkel, andrew.w.kaylor
Reviewed By: anemet
Differential Revision: https://reviews.llvm.org/D70898
Summary: Calling `changeToUnreachable` in `manifest` from different places might cause really unpredictable problems. As other deleting functions are doing, we need to change these instructions after all `manifest`.
Reviewers: jdoerfert, sstefan1
Reviewed By: jdoerfert
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D71910
Summary:
As discussed in D71799, we have found that it is more useful to reach an optimistic fixpoint in AAValueSimpify when the value is constant or undef.
Reviewers: jdoerfert, sstefan1
Reviewed By: jdoerfert
Subscribers: baziotis, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D71852
Summary:
Follow-up on: https://reviews.llvm.org/D71435
We basically use `checkForAllInstructions` to loop through all the instructions in a function that access memory through a pointer: load, store, atomicrmw, atomiccmpxchg
Note that we can now use the `getPointerOperand()` that gets us the pointer operand for an instruction that belongs to the aforementioned set.
Question: This function returns `nullptr` if the instruction is `volatile`. Why?
Guess: Because if it is volatile, we don't want to do any transformation to it.
Another subtle point is that I had to add AtomicRMW, AtomicCmpXchg to `initializeInformationCache()`. Following `checkAllInstructions()` path, that
seemed the most reasonable place to add it and correct the fact that these instructions were ignored (they were not in `OpcodeInstMap` etc.). Is that ok?
Reviewers: jdoerfert, sstefan1
Reviewed By: jdoerfert, sstefan1
Subscribers: hiraditya, jfb, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D71787
_Eventually_, this attribute will be assigned to a function if it
contains undefined behavior. As a first small step, I tried to make it
loop through the load instructions in a function (eventually, the plan
is to check if a load instructions causes undefined behavior, because
e.g. dereferences a null pointer - Also eventually, this won't happen in
initialize() but in updateImpl()).
Patch By: Stefanos Baziotis (@baziotis)
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D71435
If the matrix.multiply calls have the contract fast math flag, we can
use fmuladd. This als adds a command line option to force fmuladd
generation. We can retire this option once there is a clang-level
option.
Reviewers: anemet, Gerolf, hfinkel, andrew.w.kaylor
Reviewed By: anemet
Differential Revision: https://reviews.llvm.org/D70951
This patch adds infrastructure for forward shape propagation to
LowerMatrixIntrinsics. It also updates the pass to make use of
the shape information to break up larger vector operations and to
eliminate unnecessary conversion operations between columnwise matrixes
and flattened vectors: if shape information is available for an
instruction, lower the operation to a set of instructions operating on
columns. For example, a store of a matrix is broken down into separate
stores for each column. For users that do not have shape
information (e.g. because they do not yet support shape information
aware lowering), we pack the result columns into a flat vector and
update those users.
It also adds shape aware lowering for the first non-intrinsic
instruction: vector stores.
Example:
For
%c = call <4 x double> @llvm.matrix.transpose(<4 x double> %a, i32 2, i32 2)
store <4 x double> %c, <4 x double>* %Ptr
We generate the code below without shape propagation. Note %9 which
combines the columns of the transposed matrix into a flat vector.
%split = shufflevector <4 x double> %a, <4 x double> undef, <2 x i32> <i32 0, i32 1>
%split1 = shufflevector <4 x double> %a, <4 x double> undef, <2 x i32> <i32 2, i32 3>
%1 = extractelement <2 x double> %split, i64 0
%2 = insertelement <2 x double> undef, double %1, i64 0
%3 = extractelement <2 x double> %split1, i64 0
%4 = insertelement <2 x double> %2, double %3, i64 1
%5 = extractelement <2 x double> %split, i64 1
%6 = insertelement <2 x double> undef, double %5, i64 0
%7 = extractelement <2 x double> %split1, i64 1
%8 = insertelement <2 x double> %6, double %7, i64 1
%9 = shufflevector <2 x double> %4, <2 x double> %8, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
store <4 x double> %9, <4 x double>* %Ptr
With this patch, we propagate the 2x2 shape information from the
transpose to the store and we generate the code below. Note that we
store the columns directly and do not need an extra shuffle.
%9 = bitcast <4 x double>* %Ptr to double*
%10 = bitcast double* %9 to <2 x double>*
store <2 x double> %4, <2 x double>* %10, align 8
%11 = getelementptr double, double* %9, i32 2
%12 = bitcast double* %11 to <2 x double>*
store <2 x double> %8, <2 x double>* %12, align 8
Reviewers: anemet, Gerolf, reames, hfinkel, andrew.w.kaylor
Reviewed By: anemet
Differential Revision: https://reviews.llvm.org/D70897
As discussed in PR44330:
https://bugs.llvm.org/show_bug.cgi?id=44330
...the transform from pow(X, -0.5) libcall/intrinsic to
reciprocal square root can result in small deviations from
the expected result due to differences in the pow()
implementation and/or the extra rounding step from the division.
This patch proposes to allow that difference with either the
'approximate functions' or 'reassociate' FMF:
http://llvm.org/docs/LangRef.html#fast-math-flags
In practice, this likely means that the code is compiled with
all of 'fast' (-ffast-math), but I have preserved the existing
specializations for -0.0/-INF that enable generating safe code
if those special values are allowed simultaneously with
allowing approximation/reassociation.
The question about whether a similar restriction is needed for
the non-reciprocal case -- pow(X, 0.5) -- is deferred. That
transform is allowed without FMF currently, and this patch does
not change that behavior.
Differential Revision: https://reviews.llvm.org/D71706
Summary:
This patch limits the default number of iterations performed by InstCombine. It also exposes a new option that allows to specify how many iterations is considered getting stuck in an infinite loop.
Based on experiments performed on real-world C++ programs, InstCombine seems to perform at most ~8-20 iterations, so treating 1000 iterations as an infinite loop seems like a safe choice. See D71145 for details.
The two limits can be specified via command line options.
Reviewers: spatel, lebedev.ri, nikic, xbolva00, grosser
Reviewed By: spatel
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D71673
A sequence of additions or multiplications that is known not to wrap, may wrap
if it's order is changed (i.e., reassociated). Therefore when vectorizing
integer sum or product reductions, their no-wrap flags need to be removed.
Fixes PR43828
Patch by Denis Antrushin
Differential Revision: https://reviews.llvm.org/D69563
A function marked `noreturn` may contain unreachable terminators: these
should not be considered cold, as the function may be a trampoline.
rdar://58068594
Summary:
Ignore looking at blocks that are unreachable from entry when
collecting candidates for hosting.
Normally the consthoist pass is executed in the llc pipeline,
just after unreachableblockelim. So it is abnormal to have code
that is unreachable from the entry block. But when running the
pass as part of opt, for example as part of fuzzy testing, we
might trigger various kinds of asserts when collecting candidates
if we include unreachable blocks in that analysis.
It seems like a waste of time to hoist constants in unreachble
blocks, so the solution is to simply ignore such blocks when
collecting the hoisting candidates.
The two added test cases used to end up in two different asserts,
and the intention with the checks is just to verify that we no
longer fail.
Fixes: PR43903
Reviewers: spatel
Reviewed By: spatel
Subscribers: hiraditya, uabelho, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D71678
In certain situations after inlining and simplification we end up with
code that is _almost_ a min/max pattern, but contains constants that
have been demand-bit optimised to the wrong values, ending up with code
like:
%1 = icmp slt i32 %shr, -128
%2 = select i1 %1, i32 128, i32 %shr
%.inv = icmp sgt i32 %shr, 127
%spec.select.i = select i1 %.inv, i32 127, i32 %2
%conv7 = trunc i32 %spec.select.i to i8
This should be turned into a min/max pattern, but the -128 in the first
select was instead transformed into 128, as only the bottom byte was
ever demanded.
To fix this, I've put in further canonicalisation for the immediates of
selects, preferring to use the same value as the icmp if available.
Differential Revision: https://reviews.llvm.org/D71516
Loop fusion previously had a method to check whether a loop was in rotated form. This method has
been moved into the LoopInfo class. This patch removes the old isRotated method from loop fusion,
in favour of the new one in LoopInfo.
Summary:
This patch adds instructions to the InstCombine worklist after they are properly inserted. This way we don't get `<badref>`s printed when logging added instructions.
It also adds a check in `Worklist::Add` that ensures that all added instructions have parents.
Simple test case that illustrates the difference when run with `--debug-only=instcombine`:
```
define i32 @test35(i32 %a, i32 %b) {
%1 = or i32 %a, 1135
%2 = or i32 %1, %b
ret i32 %2
}
```
Before this patch:
```
INSTCOMBINE ITERATION #1 on test35
IC: ADDING: 3 instrs to worklist
IC: Visiting: %1 = or i32 %a, 1135
IC: Visiting: %2 = or i32 %1, %b
IC: ADD: %2 = or i32 %a, %b
IC: Old = %3 = or i32 %1, %b
New = <badref> = or i32 %2, 1135
IC: ADD: <badref> = or i32 %2, 1135
...
```
With this patch:
```
INSTCOMBINE ITERATION #1 on test35
IC: ADDING: 3 instrs to worklist
IC: Visiting: %1 = or i32 %a, 1135
IC: Visiting: %2 = or i32 %1, %b
IC: ADD: %2 = or i32 %a, %b
IC: Old = %3 = or i32 %1, %b
New = <badref> = or i32 %2, 1135
IC: ADD: %3 = or i32 %2, 1135
...
```
Reviewers: fhahn, davide, spatel, foad, grosser, nikic
Reviewed By: nikic
Subscribers: nikic, lebedev.ri, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D71093
Summary:
This patch teaches InstCombine to accept a new parameter: maximum number of iterations over functions.
InstCombine tries to simplify instructions by iterating over the whole function until the function stops changing. As a consequence, the last iteration before reaching a fixpoint visits all instructions in the worklist and never performs any rewrites.
Bounding the number of iterations can have 2 benefits:
* In case the users of the pass can make a good guess about the number of required iterations, we can save the time normally spent on the last iteration that doesn't change anything.
* When the wants to use InstCombine as a cleanup pass, it may be enough to run just a few iterations and stop even before reaching a fixpoint. This can be also useful for implementing a lightweight pass pipeline (think `-O1`).
This patch does not change the behavior of opt or Clang -- limiting the number of iterations is entirely opt-in.
Reviewers: fhahn, davide, spatel, foad, nlopes, grosser, lebedev.ri, nikic, xbolva00
Reviewed By: spatel
Subscribers: craig.topper, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D71145
This reverts commit 1f3dd83cc1, reapplying
commit bb1b0bc4e5.
The original commit failed on some builds seemingly due to the use of a
bracketed constructor with an std::array, i.e. `std::array<> arr({...})`.
Previously, LLVM had no functional way of performing casts inside of a
DIExpression(), which made salvaging cast instructions other than Noop
casts impossible. This patch enables the salvaging of casts by using the
DW_OP_LLVM_convert operator for SExt and Trunc instructions.
There is another issue which is exposed by this fix, in which fragment
DIExpressions (which are preserved more readily by this patch) for
values that must be split across registers in ISel trigger an assertion,
as the 'split' fragments extend beyond the bounds of the fragment
DIExpression causing an error. This patch also fixes this issue by
checking the fragment status of DIExpressions which are to be split, and
dropping fragments that are invalid.
Add an extra parameter so alignment can be taken under
consideration in gather/scatter legalization.
Differential Revision: https://reviews.llvm.org/D71610
Summary:This PR move instructions from FC0.Latch bottom up to the
beginning of FC1.Latch as long as they are proven safe.
To illustrate why this is beneficial, let's consider the following
example:
Before Fusion:
header1:
br header2
header2:
br header2, latch1
latch1:
br header1, preheader3
preheader3:
br header3
header3:
br header4
header4:
br header4, latch3
latch3:
br header3, exit3
After Fusion (before this PR):
header1:
br header2
header2:
br header2, latch1
latch1:
br header3
header3:
br header4
header4:
br header4, latch3
latch3:
br header1, exit3
Note that preheader3 is removed during fusion before this PR.
Notice that we cannot fuse loop2 with loop4 as there exists block latch1
in between.
This PR move instructions from latch1 to beginning of latch3, and remove
block latch1. LoopFusion is now able to fuse loop nest recursively.
After Fusion (after this PR):
header1:
br header2
header2:
br header3
header3:
br header4
header4:
br header2, latch3
latch3:
br header1, exit3
Reviewer: kbarton, jdoerfert, Meinersbur, dmgreen, fhahn, hfinkel,
bmahjour, etiotto
Reviewed By: kbarton, Meinersbur
Subscribers: hiraditya, llvm-commits
Tag: LLVM
Differential Revision: https://reviews.llvm.org/D71165
Summary:
Add trimming of unused components of s_buffer_load.
Extend trimming of *buffer_load to also include
unused components at the beginning of vectors and update offset.
Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D70315
Summary:
This is a resubmit of D71473.
This patch introduces a set of functions to enable deprecation of IRBuilder functions without breaking out of tree clients.
Functions will be deprecated one by one and as in tree code is cleaned up.
This is patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790
Reviewers: aaron.ballman, courbet
Subscribers: llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D71547
The Attributor is always kept formatted so diffs are cleaner.
Sometime we get out of sync for various reasons so we need to format the
file once in a while.
Summary:
This patch restricts loop fusion to only consider rotated loops as valid candidates.
This simplifies the analysis and transformation and aligns with other loop optimizations.
Reviewers: jdoerfert, Meinersbur, dmgreen, etiotto, Whitney, fhahn, hfinkel
Reviewed By: Meinersbur
Subscribers: ormris, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D71025
Summary:
This patch introduces a set of functions to enable deprecation of IRBuilder functions without breaking out of tree clients.
Functions will be deprecated one by one and as in tree code is cleaned up.
This is patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790
Reviewers: courbet
Subscribers: arsenm, jvesely, nhaehnle, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D71473
Summary:
In commit d60f34c20a (llvm-svn 317128,
PR35113) MergeBlockIntoPredecessor was changed into
discarding some dbg.value intrinsics referring to
PHI values, post-splice due to loop rotation.
That elimination of dbg.value intrinsics did not
consider which dbg.value to keep depending on the
context (e.g. if the variable is changing its value
several times inside the basic block).
In the past that hasn't been such a big problem since
CodeGenPrepare::placeDbgValues has moved the dbg.value
to be next to the PHI node anyway. But after commit
00e238896c CodeGenPrepare isn't doing that
any longer, so we need to be more careful when avoiding
duplicate dbg.value intrinsics in MergeBlockIntoPredecessor.
This patch replaces the code that tried to avoid duplicate
dbg.values by using the RemoveRedundantDbgInstrs helper.
Reviewers: aprantl, jmorse, vsk
Reviewed By: aprantl, vsk
Subscribers: jholewinski, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D71480
Summary:
Add a RemoveRedundantDbgInstrs to BasicBlockUtils with the
goal to remove redundant dbg intrinsics from a basic block.
This can be useful after various transforms, as it might
be simpler to do a filtering of dbg intrinsics after the
transform than during the transform.
One primary use case would be to replace a too aggressive
removal done by MergeBlockIntoPredecessor, seen at loop
rotate (not done in this patch).
The elimination algorithm currently focuses on dbg.value
intrinsics and is doing two iterations over the BB.
First we iterate backward starting at the last instruction
in the BB. Whenever a consecutive sequence of dbg.value
instructions are found we keep the last dbg.value for
each variable found (variable fragments are identified
using the {DILocalVariable, FragmentInfo, inlinedAt}
triple as given by the DebugVariable helper class).
Next we iterate forward starting at the first instruction
in the BB. Whenever we find a dbg.value describing a
DebugVariable (identified by {DILocalVariable, inlinedAt})
we save the {DIValue, DIExpression} that describes that
variables value. But if the variable already was mapped
to the same {DIValue, DIExpression} pair we instead drop
the second dbg.value.
To ease the process of making lit tests for this utility a
new pass is introduced called RedundantDbgInstElimination.
It can be executed by opt using -redundant-dbg-inst-elim.
Reviewers: aprantl, jmorse, vsk
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D71478
This was part of D70767. When we replace the value of (call/invoke)
instructions we do not want to disturb the old call graph so we will
only replace instruction uses until we get rid of the old PM.
Accepted as part of D70767.
This reverts commit 0be81968a2.
The VFDatabase needs some rework to be able to handle vectorization
and subsequent scalarization of intrinsics in out-of-tree versions of
the compiler. For more details, see the discussion in
https://reviews.llvm.org/D67572.
GEP index size can be specified in the DataLayout, introduced in D42123. However, there were still places
in which getIndexSizeInBits was used interchangeably with getPointerSizeInBits. This notably caused issues
with Instcombine's visitPtrToInt; but the unit tests was incorrect, so this remained undiscovered.
This fixes the buildbot failures.
Differential Revision: https://reviews.llvm.org/D68328
Patch by Joseph Faulls!
Summary:
Support alloca-referencing dbg.value in hwasan instrumentation.
Update AsmPrinter to emit DW_AT_LLVM_tag_offset when location is in
loclist format.
Reviewers: pcc
Subscribers: srhines, aprantl, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D70753
When we reason about the pointer argument that is byval we actually
reason about a local copy of the value passed at the call site. This was
not the case before and we wrongly introduced attributes based on the
surrounding function.
AAMemoryBehaviorArgument, AAMemoryBehaviorCallSiteArgument and
AANoCaptureCallSiteArgument are made aware of byval now. The code
to skip "subsuming positions" for reasoning follows a common pattern and
we should refactor it. A TODO was added.
Discovered by @efriedma as part of D69748.
This is the first patch adding an initial set of matrix intrinsics and a
corresponding lowering pass. This has been discussed on llvm-dev:
http://lists.llvm.org/pipermail/llvm-dev/2019-October/136240.html
The first patch introduces four new intrinsics (transpose, multiply,
columnwise load and store) and a LowerMatrixIntrinsics pass, that
lowers those intrinsics to vector operations.
Matrixes are embedded in a 'flat' vector (e.g. a 4 x 4 float matrix
embedded in a <16 x float> vector) and the intrinsics take the dimension
information as parameters. Those parameters need to be ConstantInt.
For the memory layout, we initially assume column-major, but in the RFC
we also described how to extend the intrinsics to support row-major as
well.
For the initial lowering, we split the input of the intrinsics into a
set of column vectors, transform those column vectors and concatenate
the result columns to a flat result vector.
This allows us to lower the intrinsics without any shape propagation, as
mentioned in the RFC. In follow-up patches, we plan to submit the
following improvements:
* Shape propagation to eliminate the embedding/splitting for each
intrinsic.
* Fused & tiled lowering of multiply and other operations.
* Optimization remarks highlighting matrix expressions and costs.
* Generate loops for operations on large matrixes.
* More general block processing for operation on large vectors,
exploiting shape information.
We would like to add dedicated transpose, columnwise load and store
intrinsics, even though they are not strictly necessary. For example, we
could instead emit a large shufflevector instruction instead of the
transpose. But we expect that to
(1) become unwieldy for larger matrixes (even for 16x16 matrixes,
the resulting shufflevector masks would be huge),
(2) risk instcombine making small changes, causing us to fail to
detect the transpose, preventing better lowerings
For the load/store, we are additionally planning on exploiting the
intrinsics for better alias analysis.
Reviewers: anemet, Gerolf, reames, hfinkel, andrew.w.kaylor, efriedma, rengolin
Reviewed By: anemet
Differential Revision: https://reviews.llvm.org/D70456
Summary: Remove `Worklist` iteration and make use `checkForAllUses`. There is no test chage.
Reviewers: sstefan1, jdoerfert
Reviewed By: jdoerfert
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D71352
GEP index size can be specified in the DataLayout, introduced in D42123. However, there were still places
in which getIndexSizeInBits was used interchangeably with getPointerSizeInBits. This notably caused issues
with Instcombine's visitPtrToInt; but the unit tests was incorrect, so this remained undiscovered.
Differential Revision: https://reviews.llvm.org/D68328
Patch by Joseph Faulls!
Summary: AutoFDO compilation has two places that do inlining - the sample profile loader that does inlining with context sensitive profile, and the regular inliner as CGSCC pass. Ideally we want most inlining to come from sample profile loader as that is driven by context sensitive profile and also retains context sensitivity after inlining. However the reality is most of the inlining actually happens during regular inliner. To track the number of inline instances from sample profile loader and help move more inlining to sample profile loader, I'm adding statistics and optimization remarks for sample profile loader's inlining.
Reviewers: wmi, davidxl
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D70584
This has two main effects:
- Optimizes debug info size by saving 221.86 MB of obj file size in a
Windows optimized+debug build of 'all'. This is 3.03% of 7,332.7MB of
object file size.
- Incremental step towards decoupling target intrinsics.
The enums are still compact, so adding and removing a single
target-specific intrinsic will trigger a rebuild of all of LLVM.
Assigning distinct target id spaces is potential future work.
Part of PR34259
Reviewers: efriedma, echristo, MaskRay
Reviewed By: echristo, MaskRay
Differential Revision: https://reviews.llvm.org/D71320
Soon Intrinsic::ID will be a plain integer, so this overload will not be
possible.
Rename both overloads to ensure that downstream targets observe this as
a build failure instead of a runtime failure.
Split off from D71320
Reviewers: efriedma
Differential Revision: https://reviews.llvm.org/D71381
Fix for https://bugs.llvm.org/show_bug.cgi?id=40846.
This adds a combine for cases where a (a + b) < a style overflow
check is performed, but with a + b being the result of
uadd.with.overflow, so the overflow result is also already available
and we can just use it. Subsequently GVN/CSE will deduplicate the extracts.
We can run into this situation if you have both a uadd.with.overflow
and a manual add + overflow check in the same function (on the same
operands), in which case GVN will rewrite the add to the with.overflow
result and leave you with this pattern.
The implementation is a bit ugly because I'm handling the various
canonicalization edge cases.
This does not yet handle the negated version of this pattern.
Differential Revision: https://reviews.llvm.org/D58644
Fix for https://bugs.llvm.org/show_bug.cgi?id=44236. This code was
originally introduced in rG36512330041201e10f5429361bbd79b1afac1ea1.
However, the attribute copying was done in the wrong place (in general
call replacement, not thunk generation) and a proper fix was
implemented in D12581.
Previously this code was just unnecessary but harmless (because
FunctionComparator ensured that the attributes of the two functions
are exactly the same), but since byval was changed to accept a type
this copying is actively wrong and may result in malformed IR.
Differential Revision: https://reviews.llvm.org/D71173
Summary: Rollback of parts of D71213. After digging more into the code I think we should leave 0 when creating the instructions (CreateMemcpy, CreateMaskedStore, CreateMaskedLoad). It's probably fine for MemorySanitizer because Alignement is resolved but I'm having a hard time convincing myself it has no impact at all (although tests are passing).
Reviewers: courbet
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D71332
This patch introduced the VFDatabase, the framework proposed in
http://lists.llvm.org/pipermail/llvm-dev/2019-June/133484.html. [*]
In this patch the VFDatabase is used to bridge the TargetLibraryInfo
(TLI) calls that were previously used to query for the availability of
vector counterparts of scalar functions.
The VFISAKind field `ISA` of VFShape have been moved into into VFInfo,
under the assumption that different vector ISAs may provide the same
vector signature. At the moment, the vectorizer accepts any of the
available ISAs as long as the signature provided by the VFDatabase
matches the one expected in the vectorization process. For example,
when targeting AVX or AVX2, which both have 256-bit registers, the IR
signature of the two vector functions associated to the two ISAs is
the same. The `getVectorizedFunction` method at the moment returns the
first available match. We will need to add more heuristics to the
search system to decide which of the available version (TLI, AVX,
AVX2, ...) the system should prefer, when multiple versions with the
same VFShape are present.
Some of the code in this patch is based on the work done by Sumedh
Arani in https://reviews.llvm.org/D66025.
[*] Notice that in the proposal the VFDatabase was called SVFS. The
name VFDatabase is more in line with LLVM recommendations for
naming classes and variables.
Differential Revision: https://reviews.llvm.org/D67572
This pattern is noted as a regression from:
D70246
...where we removed an over-aggressive shuffle simplification.
SimplifyDemandedVectorElts fails to catch this case when the insert has multiple uses,
so I'm proposing to pattern match the minimal sequence directly. This fold does not
conflict with any of our current shuffle undef/poison semantics.
Differential Revision: https://reviews.llvm.org/D71220
basic blocks
Originally applied in 72ce759928.
Fixed a build failure caused by incorrect use of cast instead of
dyn_cast.
This reverts commit 8b0780f795.
This attempts to teach the cost model in Arm that code such as:
%s = shl i32 %a, 3
%a = and i32 %s, %b
Can under Arm or Thumb2 become:
and r0, r1, r2, lsl #3
So the cost of the shift can essentially be free. To do this without
trying to artificially adjust the cost of the "and" instruction, it
needs to get the users of the shl and check if they are a type of
instruction that the shift can be folded into. And so it needs to have
access to the actual instruction in getArithmeticInstrCost, which if
available is added as an extra parameter much like getCastInstrCost.
We otherwise limit it to shifts with a single user, which should
hopefully handle most of the cases. The list of instruction that the
shift can be folded into include ADC, ADD, AND, BIC, CMP, EOR, MVN, ORR,
ORN, RSB, SBC and SUB. This translates to Add, Sub, And, Or, Xor and
ICmp.
Differential Revision: https://reviews.llvm.org/D70966
Currently we fail to pick the right insertion point when
PreviousLastPart of a first-order-recurrence is a PHI node not in the
LoopVectorBody. This can happen when PreviousLastPart is produce in a
predicated block. In that case, we should pick the insertion point in
the BB the PHI is in.
Fixes PR44020.
Reviewers: hsaito, fhahn, Ayal, dorit
Reviewed By: Ayal
Differential Revision: https://reviews.llvm.org/D71071
AssumptionCache can be null in SimplifyCFGOptions. However, FoldCondBranchOnPHI() was not properly handling that when passing a null AssumptionCache to simplifyCFG.
Patch by Rodrigo Caetano Rocha <rcor.cs@gmail.com>
Reviewers: fhahn, lebedev.ri, spatel
Reviewed By: spatel
Differential Revision: https://reviews.llvm.org/D69963
The file is intended to gather various VPlan transformations, not only
CFG related transforms. Actually, the only transformation there is not
CFG related.
Reviewers: Ayal, gilr, hsaito, rengolin
Reviewed By: gilr
Differential Revision: https://reviews.llvm.org/D70732
Summary:
Sample profile loader of AutoFDO tries to replay previous inlining using context sensitive profile. The replay only repeats inlining if the call site block is hot. As a result it punts inlining of small functions, some of which can be beneficial for size, and will still be inlined by CSGCC inliner later. The oscillation between sample profile loader's inlining and regular CGSSC inlining cause unnecessary loss of context-sensitive profile. It doesn't have much impact for inline decision itself, but it negatively affects post-inline profile quality as CGSCC inliner have to scale counts which is not as accurate as the original context sensitive profile, and bad post-inline profile can misguide code layout.
This change added regular Inline Cost calculation for sample profile loader, so we can inline small functions upfront under switch -sample-profile-inline-size. In addition -sample-profile-cold-inline-threshold is added so we can tune the separate size threshold - currently the default is chosen to be the same as regular inliner's cold call-site threshold.
Reviewers: wmi, davidxl
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D70750
InnerLoopVectorizer's code called during VPlan execution still relies on
original IR's def-use relations to decide which vector code to generate,
limiting VPlan transformations ability to modify def-use relations and still
have ILV generate the vector code.
This commit moves GEP operand queries controlling how GEPs are widened to a
dedicated recipe and extracts GEP widening code to its own ILV method taking
those recorded decisions as arguments. This reduces ingredient def-use usage by
ILV as a step towards full VPlan-based def-use relations.
Differential revision: https://reviews.llvm.org/D69067
In general ValueHandleBase::ValueIsRAUWd shouldn't be called when not
all uses of the value were actually replaced, though, currently
formLCSSAForInstructions calls it when it inserts LCSSA-phis.
Calls of ValueHandleBase::ValueIsRAUWd were added to LCSSA specifically
to update/invalidate SCEV. In the best case these calls duplicate some
of the work already done by SE->forgetValue, though in case when SCEV of
the value is SCEVUnknown, SCEV replaces the underlying value of
SCEVUnknown with the new value (i.e. acts like LCSSA-phi actually fully
replaces the value it is created for), which leads to SCEV being
corrupted because LCSSA-phi rarely dominates all uses of its inputs.
Fixes bug https://bugs.llvm.org/show_bug.cgi?id=44058.
Reviewers: fhahn, efriedma, reames, sanjoy.google
Reviewed By: fhahn
Subscribers: hiraditya, javed.absar, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D70593
Summary:
Add an option to allow the attribute propagation on the index to be
disabled, to allow a workaround for issues (such as that fixed by
D70977).
Also move the setting of the WithAttributePropagation flag on the index
into propagateAttributes(), and remove some old stale code that predated
this flag and cleared the maybe read/write only bits when we need to
disable the propagation (previously only when importing disabled, now
also when the new option disables it).
Reviewers: evgeny777, steven_wu
Subscribers: mehdi_amini, inglorion, hiraditya, dexonsmith, arphaman, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D70984
Summary:
AutoFDO's sample profile loader processes function in arbitrary source code order, so if I change the order of two functions in source code, the inline decision can change. This also prevented the use of context-sensitive profile to do specialization while inlining. This commit enforces SCC top-down order for sample profile loader. With this change, we can now do specialization, as illustrated by the added test case:
Say if we have A->B->C and D->B->C call path, we want to inline C into B when root inliner is B, but not when root inliner is A or D, this is not possible without enforcing top-down order. E.g. Once C is inlined into B, A and D can only choose to inline (B->C) as a whole or nothing, but what we want is only inline B into A and D, not its recursive callee C. If we process functions in top-down order, this is no longer a problem, which is what this commit is doing.
This change is guarded with a new switch "-sample-profile-top-down-load" for tuning, and it depends on D70653. Eventually, top-down can be the default order for sample profile loader.
Reviewers: wmi, davidxl
Subscribers: hiraditya, llvm-commits, tejohnson
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D70655
Summary:
When sample profile loader decides not to inline a previously inlined call-site, we adjust the profile of outlined function simply by scaling up its profile counts by call-site count. This means the context-sensitive profile of that inlined instance will be thrown away. This commit try to keep context-sensitive profile for such cases:
- Instead of scaling outlined function's profile, we now properly merge the FunctionSamples of inlined instance into outlined function, including all recursively inlined profile.
- Instead of adjusting the profile for negative inline decision at the end of the sample profile loader pass, we do the profile merge right after processing each function. This change paired with top-down ordering of annotation/inline-replay (a separate diff) will make sure we recursively merge profile back before the profile is used for annotation and inline replay.
A new switch -sample-profile-merge-inlinee is added to enable the new profile merge for tuning. It should be the default behavior eventually.
Reviewers: wmi, davidxl
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D70653
Summary:
Emit a value debug intrinsic (with OP_deref) when an alloca address is
passed to a function call after going through a bitcast.
This generates an FP or SP-relative location for the local variable in
the following case:
int x;
use((void *)&x;
Reviewers: aprantl, vsk, pcc
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D70752
Summary:
This reverts commit c3b06d0c39.
Reason for revert: Caused miscompiles when inserting assume for undef.
Also adds a test to prevent similar breakage in future.
Fixes PR44154.
Reviewers: rnk, jdoerfert, efriedma, xbolva00
Reviewed By: rnk
Subscribers: thakis, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D70933
Summary:
D68408 proposes to greatly improve our negation sinking abilities.
But in current canonicalization, we produce `sub A, zext(B)`,
which we will consider non-canonical and try to sink that negation,
undoing the existing canonicalization.
So unless we explicitly stop producing previous canonicalization,
we will have two conflicting folds, and will end up endlessly looping.
This inverts canonicalization, and adds back the obvious fold
that we'd miss:
* `sub [nsw] Op0, sext/zext (bool Y) -> add [nsw] Op0, zext/sext (bool Y)`
https://rise4fun.com/Alive/xx4
* `sext(bool) + C -> bool ? C - 1 : C`
https://rise4fun.com/Alive/fBl
It is obvious that `@ossfuzz_9880()` / `@lshr_out_of_range()`/`@ashr_out_of_range()`
(oss-fuzz 4871) are no longer folded as much, though those aren't really worrying.
Reviewers: spatel, efriedma, t.p.northover, hfinkel
Reviewed By: spatel
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D71064
The patch makes sure that the LastThrowing pointer does not point to any instruction deleted by call to DeleteDeadInstruction.
While iterating through the instructions the pass maintains a pointer to the lastThrowing Instruction. A call to deleteDeadInstruction deletes a dead store and other instructions feeding the original dead instruction which also become dead. The instruction pointed by the lastThrowing pointer could also be deleted by the call to DeleteDeadInstruction and thus it becomes a dangling pointer. Because of this, we see an error in the next iteration.
In the patch, we maintain a list of throwing instructions encountered previously and use the last non deleted throwing instruction from the container.
Patch by Ankit <quic_aankit@quicinc.com>
Reviewers: fhahn, bcahoon, efriedma
Reviewed By: fhahn
Differential Revision: https://reviews.llvm.org/D65326
This makes no difference currently because we don't apply FMF
to FP casts, but that may change.
This could also be a place to add a fold for select with fptrunc,
so it will make that patch easier/smaller.
Summary:
D69561/dde5893 enabled importing of readonly variables with references,
however, it introduced a bug relating to importing/internalization of
writeonly variables with references.
A fix for this was added in D70006/7f92d66. But this didn't work in
distributed ThinLTO mode. The reason is that the fix (importing the
writeonly var with a zeroinitializer) was only applied when there were
references on the writeonly var summary. In distributed ThinLTO mode,
where we only have a small slice of the index, we will not have the
references on the importing side if we are not importing those
referenced values. Rather than changing this handshaking (which will
require a lot of other changes, since that's how we know what to import
in the distributed backend clang invocation), we can simply always give
the writeonly variable a zero initializer.
Reviewers: evgeny777, steven_wu
Subscribers: mehdi_amini, inglorion, hiraditya, dexonsmith, arphaman, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D70977
Revise the coverage mapping format to reduce binary size by:
1. Naming function records and marking them `linkonce_odr`, and
2. Compressing filenames.
This shrinks the size of llc's coverage segment by 82% (334MB -> 62MB)
and speeds up end-to-end single-threaded report generation by 10%. For
reference the compressed name data in llc is 81MB (__llvm_prf_names).
Rationale for changes to the format:
- With the current format, most coverage function records are discarded.
E.g., more than 97% of the records in llc are *duplicate* placeholders
for functions visible-but-not-used in TUs. Placeholders *are* used to
show under-covered functions, but duplicate placeholders waste space.
- We reached general consensus about giving (1) a try at the 2017 code
coverage BoF [1]. The thinking was that using `linkonce_odr` to merge
duplicates is simpler than alternatives like teaching build systems
about a coverage-aware database/module/etc on the side.
- Revising the format is expensive due to the backwards compatibility
requirement, so we might as well compress filenames while we're at it.
This shrinks the encoded filenames in llc by 86% (12MB -> 1.6MB).
See CoverageMappingFormat.rst for the details on what exactly has
changed.
Fixes PR34533 [2], hopefully.
[1] http://lists.llvm.org/pipermail/llvm-dev/2017-October/118428.html
[2] https://bugs.llvm.org/show_bug.cgi?id=34533
Differential Revision: https://reviews.llvm.org/D69471
The PHI node checks for inner loop exits are too permissive currently.
As indicated by an existing comment, we should only allow LCSSA PHI
nodes that are part of reductions or are only used outside of the loop
nest. We ensure this by checking the users of the LCSSA PHIs.
Specifically, it is not safe to use an exiting value from the inner loop in the latch of the outer
loop.
It also moves the inner loop exit check before the outer loop exit
check.
Fixes PR43473.
Reviewers: efriedma, mcrosier
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D68144
Summary:
This is one more prep step necessary before the code gen pass instrumentation
code could go in.
Reviewers: davidxl
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D70988
When basic blocks are killed, either due to being empty or to being an if.then
or if.else block whose complement contains identical instructions, some of the
debug intrinsics in that block are lost. This patch sinks those intrinsics
into the single successor block, setting them Undef if necessary to
prevent debug info from falling out-of-date.
Differential Revision: https://reviews.llvm.org/D70318
SCEV caches the exiting blocks when computing exit counts. In
SimpleLoopUnswitch, we split the exit block of the loop to unswitch.
Currently we only invalidate the loop containing that exit block, but if
that block is the exiting block for a parent loop, we have stale cache
entries. We have to invalidate the top-most loop that contains the exit
block as exiting block. We might also be able to skip invalidating the
loop containing the exit block, if the exit block is not an exiting
block of that loop.
There are also 2 more places in SimpleLoopUnswitch, that use a similar
problematic approach to get the loop to invalidate. If the patch makes
sense, I will also update those places to a similar approach (they deal
with multiple exit blocks, so we cannot directly re-use
getTopMostExitingLoop).
Fixes PR43972.
Reviewers: skatkov, reames, asbirlea, chandlerc
Reviewed By: asbirlea
Differential Revision: https://reviews.llvm.org/D70786
Constructor invocations such as `APFloat(APFloat::IEEEdouble(), 0.0)`
may seem like they accept a FP (floating point) value, but the overload
they reach is actually the `integerPart` one, not a `float` or `double`
overload (which only exists when `fltSemantics` isn't passed).
This may lead to possible loss of data, by the conversion from `float`
or `double` to `integerPart`.
To prevent future mistakes, a new constructor overload, which accepts
any FP value and marked with `delete`, to prevent its usage.
Fixes PR34095.
Differential Revision: https://reviews.llvm.org/D70425
This reverts these two commits
[InstCombine] Turn (extractelement <1 x i64/double> (bitcast (x86_mmx))) into a single bitcast from x86_mmx to i64/double.
[InstCombine] Don't transform bitcasts between x86_mmx and v1i64 into insertelement/extractelement
We're seeing at least one internal test failure related to a
bitcast that was previously before an inline assembly block
containing emms being placed after it. This leads to the mmx
state ending up not empty after the emms. IR has no way to
make any specific guarantees about this. Reverting these patches
to get back to previous behavior which at least worked for this
test.
Fix PR40816: avoid considering scalar-with-predication instructions as also
uniform-after-vectorization.
Instructions identified as "scalar with predication" will be "vectorized" using
a replicating region. If such instructions are also optimized as "uniform after
vectorization", namely when only the first of VF lanes is used, such a
replicating region becomes erroneous - only the first instance of the region can
and should be formed. Fix such cases by not considering such instructions as
"uniform after vectorization".
Differential Revision: https://reviews.llvm.org/D70298
Summary:
Make SLPVectorize to recognize homogeneous aggregates like
`{<2 x float>, <2 x float>}`, `{{float, float}, {float, float}}`,
`[2 x {float, float}]` and so on.
It's a follow-up of https://reviews.llvm.org/D70068.
Merged `findBuildVector()` and `findBuildAggregate()` to
one `findBuildAggregate()` function making it recursive
to recognize multidimensional aggregates. Aggregates required
to be homogeneous.
Reviewers: RKSimon, ABataev, dtemirbulatov, spatel, vporpo
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D70587
This adds a dump() function to VPlan, which uses the existing
operator<<.
This method provides a convenient way to dump a VPlan while debugging,
e.g. from lldb.
Reviewers: hsaito, Ayal, gilr, rengolin
Reviewed By: hsaito
Differential Revision: https://reviews.llvm.org/D70920
Summary:
This fixes https://llvm.org/PR26673
"Wrong debugging information with -fsanitize=address"
where asan instrumentation causes the prologue end to be computed
incorrectly: findPrologueEndLoc, looks for the first instruction
with a debug location to determine the prologue end. Since the asan
instrumentation instructions had debug locations, that prologue end was
at some instruction, where the stack frame is still being set up.
There seems to be no good reason for extra debug locations for the
asan instrumentations that set up the frame; they don't have a natural
source location. In the debugger they are simply located at the start
of the function.
For certain other instrumentations like -fsanitize-coverage=trace-pc-guard
the same problem persists - that might be more work to fix, since it
looks like they rely on locations of the tracee functions.
This partly reverts aaf4bb2394
"[asan] Set debug location in ASan function prologue"
whose motivation was to give debug location info to the coverage callback.
Its test only ensures that the call to @__sanitizer_cov_trace_pc_guard is
given the correct source location; as the debug location is still set in
ModuleSanitizerCoverage::InjectCoverageAtBlock, the test does not break.
So -fsanitize-coverage is hopefully unaffected - I don't think it should
rely on the debug locations of asan-generated allocas.
Related revision: 3c6c14d14b
"ASAN: Provide reliable debug info for local variables at -O0."
Below is how the X86 assembly version of the added test case changes.
We get rid of some .loc lines and put prologue_end where the user code starts.
```diff
--- 2.master.s 2019-12-02 12:32:38.982959053 +0100
+++ 2.patch.s 2019-12-02 12:32:41.106246674 +0100
@@ -45,8 +45,6 @@
.cfi_offset %rbx, -24
xorl %eax, %eax
movl %eax, %ecx
- .Ltmp2:
- .loc 1 3 0 prologue_end # 2.c:3:0
cmpl $0, __asan_option_detect_stack_use_after_return
movl %edi, 92(%rbx) # 4-byte Spill
movq %rsi, 80(%rbx) # 8-byte Spill
@@ -57,9 +55,7 @@
callq __asan_stack_malloc_0
movq %rax, 72(%rbx) # 8-byte Spill
.LBB1_2:
- .loc 1 0 0 is_stmt 0 # 2.c:0:0
movq 72(%rbx), %rax # 8-byte Reload
- .loc 1 3 0 # 2.c:3:0
cmpq $0, %rax
movq %rax, %rcx
movq %rax, 64(%rbx) # 8-byte Spill
@@ -72,9 +68,7 @@
movq %rax, %rsp
movq %rax, 56(%rbx) # 8-byte Spill
.LBB1_4:
- .loc 1 0 0 # 2.c:0:0
movq 56(%rbx), %rax # 8-byte Reload
- .loc 1 3 0 # 2.c:3:0
movq %rax, 120(%rbx)
movq %rax, %rcx
addq $32, %rcx
@@ -99,7 +93,6 @@
movb %r8b, 31(%rbx) # 1-byte Spill
je .LBB1_7
# %bb.5:
- .loc 1 0 0 # 2.c:0:0
movq 40(%rbx), %rax # 8-byte Reload
andq $7, %rax
addq $3, %rax
@@ -118,7 +111,8 @@
movl %ecx, (%rax)
movq 80(%rbx), %rdx # 8-byte Reload
movq %rdx, 128(%rbx)
- .loc 1 4 3 is_stmt 1 # 2.c:4:3
+.Ltmp2:
+ .loc 1 4 3 prologue_end # 2.c:4:3
movq %rax, %rdi
callq f
movq 48(%rbx), %rax # 8-byte Reload
```
Reviewers: eugenis, aprantl
Reviewed By: eugenis
Subscribers: ormris, aprantl, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D70894
Summary:
This cropped up in the Linux kernel where cold code was placed in an
incompatible section.
Reviewers: compnerd, vsk, tejohnson
Reviewed By: vsk
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D70925
Summary:
In case of a need to distinguish different query sites for gradual commit or
debugging of PGSO. NFC.
Reviewers: davidxl
Subscribers: hiraditya, zzheng, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D70510
By defining the graph traits right after the VPBlockBase definitions, we
can make use of them earlier in the file.
Reviewers: hsaito, Ayal, gilr
Reviewed By: gilr
Differential Revision: https://reviews.llvm.org/D70733
As described here:
https://bugs.llvm.org/show_bug.cgi?id=44186
The match() code safely allows undef values, but we can't safely
propagate a vector constant that contains an undef to the new
compare instruction.
Summary:
If a user writing C code using the ACLE MVE intrinsics generates a
predicate and then complements it, then the resulting IR will use the
`pred_v2i` IR intrinsic to turn some `<n x i1>` vector into a 16-bit
integer; complement that integer; and convert back. This will generate
machine code that moves the predicate out of the `P0` register,
complements it in an integer GPR, and moves it back in again.
This InstCombine rule replaces `i2v(~v2i(x))` with a direct complement
of the original predicate vector, which we can already instruction-
select as the VPNOT instruction which complements P0 in place.
Reviewers: ostannard, MarkMurrayARM, dmgreen
Reviewed By: dmgreen
Subscribers: kristof.beyls, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D70484
rL341831 moved one-use check higher up, restricting a few folds
that produced a single instruction from two instructions to the case
where the inner instruction would go away.
Original commit message:
> InstCombine: move hasOneUse check to the top of foldICmpAddConstant
>
> There were two combines not covered by the check before now,
> neither of which actually differed from normal in the benefit analysis.
>
> The most recent seems to be because it was just added at the top of the
> function (naturally). The older is from way back in 2008 (r46687)
> when we just didn't put those checks in so routinely, and has been
> diligently maintained since.
From the commit message alone, there doesn't seem to be a
deeper motivation, deeper problem that was trying to solve,
other than 'fixing the wrong one-use check'.
As i have briefly discusses in IRC with Tim, the original motivation
can no longer be recovered, too much time has passed.
However i believe that the original fold was doing the right thing,
we should be performing such a transformation even if the inner `add`
will not go away - that will still unchain the comparison from `add`,
it will no longer need to wait for `add` to compute.
Doing so doesn't seem to break any particular idioms,
as least as far as i can see.
References https://bugs.llvm.org/show_bug.cgi?id=44100
If the sign of the sign argument is known (this could be extended to use ValueTracking),
then we can use fneg+fabs to clear/set the sign bit of the magnitude argument.
http://llvm.org/docs/LangRef.html#llvm-copysign-intrinsic
This transform is already done in DAGCombiner, but we can do it sooner in IR as
suggested in PR44153:
https://bugs.llvm.org/show_bug.cgi?id=44153
We have effectively no analysis for copysign in IR, so we are taking the unusual step
of increasing the number of IR instructions for the negative constant case.
Differential Revision: https://reviews.llvm.org/D70792
Summary:
optimizeVectorResize is rewriting patterns like:
%1 = bitcast vector %src to integer
%2 = trunc/zext %1
%dst = bitcast %2 to vector
Since bitcasting between integer an vector types gives
different integer values depending on endianness, we need
to take endianness into account. As it happens the old
implementation only produced the correct result for little
endian targets.
Fixes: https://bugs.llvm.org/show_bug.cgi?id=44178
Reviewers: spatel, lattner, lebedev.ri
Reviewed By: spatel, lebedev.ri
Subscribers: lebedev.ri, hiraditya, uabelho, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D70844
The constants come through as add %x, -C, not a sub as would be
expected. They need some extra matchers to canonicalise them towards
usub_sat.
Differential Revision: https://reviews.llvm.org/D69514
This adjusts the one use checks in the the usub_sat fold code to not
increase instruction count, but otherwise do the fold. Reviewed as a
part of D69514.
Summary:
This patch introduces the deduction based on load/store instructions whose pointer operand is a non-inbounds GEP instruction.
For example if we have,
```
void f(int *u){
u[0] = 0;
u[1] = 1;
u[2] = 2;
}
```
then u must be dereferenceable(12).
This patch is inspired by D64258
Reviewers: jdoerfert, spatel, hfinkel, RKSimon, sstefan1, xbolva00, dtemirbulatov
Reviewed By: jdoerfert
Subscribers: jfb, lebedev.ri, xbolva00, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D70714
This reapplies: 8ff85ed905
Original commit message:
As a follow-up to my initial mail to llvm-dev here's a first pass at the O1 described there.
This change doesn't include any change to move from selection dag to fast isel
and that will come with other numbers that should help inform that decision.
There also haven't been any real debuggability studies with this pipeline yet,
this is just the initial start done so that people could see it and we could start
tweaking after.
Test updates: Outside of the newpm tests most of the updates are coming from either
optimization passes not run anymore (and without a compelling argument at the moment)
that were largely used for canonicalization in clang.
Original post:
http://lists.llvm.org/pipermail/llvm-dev/2019-April/131494.html
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D65410
This reverts commit c9ddb02659.
Summary:
This patch enables us to track GEP instruction in align deduction.
If a pointer `B` is defined as `A+Offset` and known to have alignment `C`, there exists some integer Q such that
```
A + Offset = C * Q = B
```
So we can say that the maximum power of two which is a divisor of gcd(Offset, C) is an alignment.
Reviewers: jdoerfert, sstefan1
Reviewed By: jdoerfert
Subscribers: lebedev.ri, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D70392
This change doesn't include any change to move from selection dag to fast isel
and that will come with other numbers that should help inform that decision.
There also haven't been any real debuggability studies with this pipeline yet,
this is just the initial start done so that people could see it and we could start
tweaking after.
Test updates: Outside of the newpm tests most of the updates are coming from either
optimization passes not run anymore (and without a compelling argument at the moment)
that were largely used for canonicalization in clang.
Original post:
http://lists.llvm.org/pipermail/llvm-dev/2019-April/131494.html
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D65410
This is NFC-intended because SimplifyDemandedVectorElts() does the same
transform later. As discussed in D70641, we may want to change that
behavior, so we need to isolate where it happens.
The pattern in question is currently not possible because we
aggressively (wrongly) transform mask elements to undef values
if they choose from an undef operand. That, however, would
change if we tighten our semantics for shuffles as discussed
in D70641. Adding this check gives us the flexibility to make
that change with minimal overhead for current definitions.
Summary:
Related bug: https://bugs.llvm.org/show_bug.cgi?id=40648
Static helper function rewriteDebugUsers in Local.cpp deletes dbg.value
intrinsics when it cannot move or rewrite them, or salvage the deleted
instruction's value. It should instead undef them in this case.
This patch fixes that and I've added a test which covers the failing test
case in bz40648. I've updated the unit test Local.ReplaceAllDbgUsesWith
to check for this behaviour (and fixed a typo in the test which would
cause the old test to always pass).
Reviewers: aprantl, vsk, djtodoro, probinson
Reviewed By: vsk
Subscribers: hiraditya, llvm-commits
Tags: #debug-info, #llvm
Differential Revision: https://reviews.llvm.org/D70604
This version contains 2 fixes for reported issues:
1. Make sure we do not try to sink terminator instructions.
2. Make sure we bail out, if we try to sink an instruction that needs to
stay in place for another recurrence.
Original message:
If the recurrence PHI node has a single user, we can sink any
instruction without side effects, given that all users are dominated by
the instruction computing the incoming value of the next iteration
('Previous'). We can sink instructions that may cause traps, because
that only causes the trap to occur later, but not on any new paths.
With the relaxed check, we also have to make sure that we do not have a
direct cycle (meaning PHI user == 'Previous), which indicates a
reduction relation, which potentially gets missed by
ReductionDescriptor.
As follow-ups, we can also sink stores, iff they do not alias with
other instructions we move them across and we could also support sinking
chains of instructions and multiple users of the PHI.
Fixes PR43398.
Reviewers: hsaito, dcaballe, Ayal, rengolin
Reviewed By: Ayal
Differential Revision: https://reviews.llvm.org/D69228
Currently the assertion in updateSuccessor is overly strict in some
cases and overly relaxed in other cases. For branches to the inner and
outer loop preheader it is too strict, because they can either be
unconditional branches or conditional branches with duplicate targets.
Both cases are fine and we can allow updating multiple successors.
On the other hand, we have to at least update one successor. This patch
adds such an assertion.
And simultaneously enhance SimplifyDemandedVectorElts() to rcognize that
pattern. That preserves some of the old optimizations in IR.
Given a shuffle that includes undef elements in an otherwise identity mask like:
define <4 x float> @shuffle(<4 x float> %arg) {
%shuf = shufflevector <4 x float> %arg, <4 x float> undef, <4 x i32> <i32 undef, i32 1, i32 2, i32 3>
ret <4 x float> %shuf
}
We were simplifying that to the input operand.
But as discussed in PR43958:
https://bugs.llvm.org/show_bug.cgi?id=43958
...that means that per-vector-element poison that would be stopped by the shuffle can now
leak to the result.
Also note that we still have (and there are tests for) the same transform with no undef
elements in the mask (a fully-defined identity mask). I don't think there's any
controversy about that case - it's a valid transform under any interpretation of
shufflevector/undef/poison.
Looking at a few of the diffs into codegen, I don't see any difference in final asm. So
depending on your perspective, that's good (no real loss of optimization power) or bad
(poison exists in the DAG, so we only partially fixed the bug).
Differential Revision: https://reviews.llvm.org/D70246
moved before another instruction.
Summary:Added an API to check if an instruction can be safely moved
before another instruction. In future PRs, we will like to add support
of moving instructions between blocks that are not control flow
equivalent, and add other APIs to enhance usability, e.g. moving basic
blocks, moving list of instructions...
Loop Fusion will be its first user. When there is intervening code in
between two loops, fusion is currently unable to fuse them. Loop Fusion
can use this utility to check if the intervening code can be safely
moved before or after the two loops, and move them, then it can
successfully fuse them.
Reviewer:kbarton,jdoerfert,Meinersbur,bmahjour,etiotto
Reviewed By:bmahjour
Subscribers:mgorny,hiraditya,llvm-commits
Tag:LLVM
Differential Revision:https://reviews.llvm.org/D70049
Summary:
Vector aggregate is homogeneous aggregate of vectors like `{ <2 x float>, <2 x float> }`.
This patch allows `findBuildAggregate()` to consider vector aggregates as
well as scalar ones. For instance, `{ <2 x float>, <2 x float> }` maps to `<4 x float>`.
Fixes vector part of llvm.org/PR42022
Reviewers: RKSimon
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D70068
Summary:
With this patch, we no longer cache F.hasProfileData(). We simply
call the function again.
I'm doing this because:
- JumpThreadingPass also has a member variable named HasProfileData,
which is very confusing,
- the function is very lightweight, and
- this patch makes JumpThreading::runOnFunction more consistent with
JumpThreadingPass::run.
Subscribers: hiraditya, jfb, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D70602
Summary:
Without this patch, the jump threading pass ignores profiling data
whenever we invoke the pass with the new pass manager.
Specifically, JumpThreadingPass::run calls runImpl with class variable
HasProfileData always set to false. In turn, runImpl sets
HasProfileData to false again:
HasProfileData = HasProfileData_;
In the end, we don't use profiling data at all with the new pass
manager.
This patch fixes the problem by passing F.hasProfileData() to runImpl.
The bug appears to have been introduced at:
https://reviews.llvm.org/D41461
which removed local variable HasProfileData in JumpThreadingPass::run
even though there was one more use left in the same function. As a
result, the remaining use ended referring to the class variable
instead.
Note that F.hasProfileData is an extremely lightweight function, so I
don't see the need to cache its result. Once this patch is approved,
I'm planning to stop caching the result of F.hasProfileData in
runOnFunction.
Reviewers: wmi, eli.friedman
Subscribers: hiraditya, jfb, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D70509
Summary: Working towards Johannes's suggestion for fixme, in Attributor's Noalias attribute deduction.
(ii) Check whether the value is captured in the scope using AANoCapture.
FIXME: This is conservative though, it is better to look at CFG and
// check only uses possibly executed before this call site.
A Reachability abstract attribute answers the question "does execution at point A potentially reach point B". If this question is answered with false for all other uses of the value that might be captured, we know it is not *yet* captured and can continue with the noalias deduction. Currently, information AAReachability provides is completely pessimistic.
Reviewers: jdoerfert
Reviewed By: jdoerfert
Subscribers: uenoku, sstefan1, hiraditya, llvm-commits
Differential Revision: https://reviews.llvm.org/D70233
The verification inside loop passes should be done under the
VerifyMemorySSA flag (enabled by EXPESIVE_CHECKS or explicitly with
opt), in order to not add to compile time during regular builds.
We may end up with a case where we have a widenable branch above the loop, but not all widenable branches within the loop have been removed. Since a widenable branch inhibit SCEVs ability to reason about exit counts (by design), we have a tradeoff between effectiveness of this optimization and allowing future widening of the branches within the loop. LoopPred is thought to be one of the most important optimizations for range check elimination, so let's pay the cost.
Bit-Tracking Dead Code Elimination (bdce) do not mark dbg.value as undef after
deleting instruction. which shows invalid state of variable in debugger. This
patches fixes this by marking the dbg.value as undef which depends on dead
instruction.
This fixes https://bugs.llvm.org/show_bug.cgi?id=41925
Patch by kamlesh kumar!
Differential Revision: https://reviews.llvm.org/D70040
Summary:
This patch moves various checks from ThreadEdge to new function
TryThreadEdge The rational behind this is that I'd like to use
ThreadEdge without its checks in my upcoming patch.
This patch preserves lightweight checks as assertions in ThreadEdge.
ThreadEdge does not repeat the cost check, however.
Reviewers: wmi
Subscribers: hiraditya, jfb, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D70338
Summary:
Most libraries are defined in the lib/ directory but there are also a
few libraries defined in tools/ e.g. libLLVM, libLTO. I'm defining
"Component Libraries" as libraries defined in lib/ that may be included in
libLLVM.so. Explicitly marking the libraries in lib/ as component
libraries allows us to remove some fragile checks that attempt to
differentiate between lib/ libraries and tools/ libraires:
1. In tools/llvm-shlib, because
llvm_map_components_to_libnames(LIB_NAMES "all") returned a list of
all libraries defined in the whole project, there was custom code
needed to filter out libraries defined in tools/, none of which should
be included in libLLVM.so. This code assumed that any library
defined as static was from lib/ and everything else should be
excluded.
With this change, llvm_map_components_to_libnames(LIB_NAMES, "all")
only returns libraries that have been added to the LLVM_COMPONENT_LIBS
global cmake property, so this custom filtering logic can be removed.
Doing this also fixes the build with BUILD_SHARED_LIBS=ON
and LLVM_BUILD_LLVM_DYLIB=ON.
2. There was some code in llvm_add_library that assumed that
libraries defined in lib/ would not have LLVM_LINK_COMPONENTS or
ARG_LINK_COMPONENTS set. This is only true because libraries
defined lib lib/ use LLVMBuild.txt and don't set these values.
This code has been fixed now to check if the library has been
explicitly marked as a component library, which should now make it
easier to remove LLVMBuild at some point in the future.
I have tested this patch on Windows, MacOS and Linux with release builds
and the following combinations of CMake options:
- "" (No options)
- -DLLVM_BUILD_LLVM_DYLIB=ON
- -DLLVM_LINK_LLVM_DYLIB=ON
- -DBUILD_SHARED_LIBS=ON
- -DBUILD_SHARED_LIBS=ON -DLLVM_BUILD_LLVM_DYLIB=ON
- -DBUILD_SHARED_LIBS=ON -DLLVM_LINK_LLVM_DYLIB=ON
Reviewers: beanz, smeenai, compnerd, phosek
Reviewed By: beanz
Subscribers: wuzish, jholewinski, arsenm, dschuff, jyknight, dylanmckay, sdardis, nemanjai, jvesely, nhaehnle, mgorny, mehdi_amini, sbc100, jgravelle-google, hiraditya, aheejin, fedor.sergeev, asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, atanasyan, steven_wu, rogfer01, MartinMosbeck, brucehoult, the_o, dexonsmith, PkmX, jocewei, jsji, dang, Jim, lenary, s.egerton, pzheng, sameer.abuasal, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D70179
As a reminder, a "widenable branch" is the pattern "br i1 (and i1 X, WC()), label %taken, label %untaken" where "WC" is the widenable condition intrinsics. The semantics of such a branch (derived from the semantics of WC) is that a new condition can be added into the condition arbitrarily without violating legality.
Broaden the definition in two ways:
Allow swapped operands to the br (and X, WC()) form
Allow widenable branch w/trivial condition (i.e. true) which takes form of br i1 WC()
The former is just general robustness (e.g. for X = non-instruction this is what instcombine produces). The later is specifically important as partial unswitching of a widenable range check produces exactly this form above the loop.
Differential Revision: https://reviews.llvm.org/D70502
Follow-up of cb47b8783: don't query TTI->preferPredicateOverEpilogue when
option -prefer-predicate-over-epilog is set to false, i.e. when we prefer not
to predicate the loop.
Differential Revision: https://reviews.llvm.org/D70382
With updates to various LLVM tools that use SpecialCastList.
It was tempting to use RealFileSystem as the default, but that makes it
too easy to accidentally forget passing VFS in clang code.
Summary:
This is a follow-up to 590f279c45, which
moved some of the callers to use VFS.
It turned out more code in Driver calls into real filesystem APIs and
also needs an update.
Reviewers: gribozavr2, kadircet
Reviewed By: kadircet
Subscribers: ormris, mgorny, hiraditya, llvm-commits, jkorous, cfe-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D70440
Summary:
Also, replace the SmallVector with a normal C array.
Reviewers: vsk
Reviewed By: vsk
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D70498
Moving accesses in MemorySSA at InsertionPlace::End, when an instruction is
moved into a block, almost always means insert at the end of the block, but
before the block terminator. This matters when the block terminator is a
MemoryAccess itself (an invoke), and the insertion must be done before
the terminator for the update to be correct.
Insert an additional position: InsertionPlace:BeforeTerminator and update
current usages where this applies.
Resolves PR44027.
After speaking with Sanjay - seeing a number of miscompiles and working
on tracking down a testcase. None of the follow on patches seem to
have helped so far.
This reverts commit 8a0aa5310b.
After speaking with Sanjay - seeing a number of miscompiles and working
on tracking down a testcase. None of the follow on patches seem to
have helped so far.
This reverts commit 7ff57705ba.
This is mostly NFC, but I removed the setting of the guard's calling convention onto the WC call. Why? Because it was untested, and was producing an ill defined output as the declaration's convention wasn't been changed leaving a mismatch which is UB.
This code has never been enabled. While it is tested, it's complicating some refactoring. If we decide to re-implement this, doing it in SimplifyCFG would probably make more sense anyways.
With the widenable condition construct, we have the ability to reason about branches which can be 'widened' (i.e. made to fail more often). We've got a couple o transforms which leverage this. This patch just cleans up the API a bit.
This is prep work for generalizing our definition of a widenable branch slightly. At the moment "br i1 (and A, wc()), ..." is considered widenable, but oddly, neither "br i1 (and wc(), B), ..." or "br i1 wc(), ..." is. That clearly needs addressed, so first, let's centralize the code in one place.
Unswitch (and other loop transforms) like to generate loop exit blocks with unconditional successors, and phi nodes (LCSSA, or simple multiple exiting blocks sharing an exit). Generalize the "likely very rare exit" check slightly to handle this form.
Pair up inlined AutoreleaseRV calls with their matching RetainRV or
ClaimRV.
- RetainRV cancels out AutoreleaseRV. Delete both instructions.
- ClaimRV is a peephole for RetainRV+Release. Delete AutoreleaseRV and
replace ClaimRV with Release.
This avoids problems where more aggressive inlining triggers memory
regressions.
This patch is happy to skip over non-callable instructions and non-ARC
intrinsics looking for the pair. It is likely sound to also skip over
opaque function calls, but that's harder to reason about, and it's not
relevant to the goal here: if there's an opaque function call splitting
up a pair, it's very unlikely that a handshake would have happened
dynamically without inlining.
Note that this patch also subsumes the previous logic that looked
backwards from ReleaseRV.
https://reviews.llvm.org/D70370
rdar://problem/46509586
The 1st attempt was reverted because it revealed an existing
bug where we could produce invalid IR (use of value before
definition). That should be fixed with:
rG39de82ecc9c2
The bug manifests as replacing a reduction operand with an undef
value.
The problem appears to be limited to cases where a min/max reduction
has extra uses of the compare operand to the select.
In the general case, we are tracking "ExternallyUsedValues" and
an "IgnoreList" of the reduction operations, but those may not apply
to the final compare+select in a min/max reduction.
For that, we use replaceAllUsesWith (RAUW) to ensure that the new
vectorized reduction values are transferred to all subsequent users.
Differential Revision: https://reviews.llvm.org/D70148
As discussed in D70148 (and caused a revert of the original commit):
if we insert at the select, then we can produce invalid IR because
the replacement for the compare may have uses before the select.
Summary:
Pass down the already accessed ValueInfo to shouldPromoteLocalToGlobal,
to avoid an unnecessary extra index lookup.
Add some assertion checking to confirm we have a non-empty VI when
expected.
Also some misc cleanup, merging the two versions of
doImportAsDefinition, since one was only called by the other, and
unnecessarily passed in a member variable.
Reviewers: steven_wu, pcc, evgeny777
Reviewed By: evgeny777
Subscribers: mehdi_amini, inglorion, hiraditya, dexonsmith, arphaman, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D70337
Summary:
Clean up the code that does GV promotion in the ThinLTO backends.
Specifically, we don't need to check whether we are importing since that
is already checked and handled correctly in shouldPromoteLocalToGlobal.
Simply call shouldPromoteLocalToGlobal, and if it returns true we are
guaranteed that we are promoting, whether or not we are importing (or in
the exporting module). This also makes the handling in getName()
consistent with that in getLinkage(), which checks the DoPromote parameter
regardless of whether we are importing or exporting.
Reviewers: steven_wu, pcc, evgeny777
Subscribers: mehdi_amini, inglorion, hiraditya, dexonsmith, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D70327
This implements a version of the predicateLoopExits transform from IndVarSimplify extended to exploit widenable conditions - and thus be much wider in scope of legality. The code structure ends up being almost entirely different, so I chose to duplicate this into the LoopPredication pass instead of trying to reuse the code in the IndVars.
The core notions of the transform are as follows:
If we have a widenable condition which controls entry into the loop, we're allowed to widen it arbitrarily. Given that, it's simply a *profitability* question as to what conditions to fold into the widenable branch.
To avoid pass ordering issues, we want to avoid widening cases that would otherwise be dischargeable. Or... widen in a form which can still be discharged. Thus, we phrase the transform as selecting one analyzeable exit from the set of analyzeable exits to keep. This avoids creating pass ordering complexities.
Since none of the above proves that we actually exit through our analyzeable exits - we might exit through something else entirely - we limit ourselves to cases where a) the latch is analyzeable and b) the latch is predicted taken, and c) the exit being removed is statically cold.
Differential Revision: https://reviews.llvm.org/D69830
If you're writing C code using the ACLE MVE intrinsics that passes the
result of a vcmp as input to a predicated intrinsic, e.g.
mve_pred16_t pred = vcmpeqq(v1, v2);
v_out = vaddq_m(v_inactive, v3, v4, pred);
then clang's codegen for the compare intrinsic will create calls to
`@llvm.arm.mve.pred.v2i` to convert the output of `icmp` into an
`mve_pred16_t` integer representation, and then the next intrinsic
will call `@llvm.arm.mve.pred.i2v` to convert it straight back again.
This will be visible in the generated code as a `vmrs`/`vmsr` pair
that move the predicate value pointlessly out of `p0` and back into it again.
To prevent that, I've added InstCombine rules to remove round trips of
the form `v2i(i2v(x))` and `i2v(v2i(x))`. Also I've taught InstCombine
about the known and demanded bits of those intrinsics. As a result,
you now get just the generated code you wanted:
vpt.u16 eq, q1, q2
vaddt.u16 q0, q3, q4
Reviewers: ostannard, MarkMurrayARM, dmgreen
Reviewed By: dmgreen
Subscribers: kristof.beyls, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D70313
Split out a helper function for the individual call optimizations and
skip useless calls to it (where the instruction is not an ARC
intrinsic). Besides reducing indentation (and possibly speeding up
compile time in some small way), an upcoming patch will add additional
calls and expand out the `switch`.
Similar to/extension of D70208 (rGee0882bdf866), but this one
may finally allow closing motivating bugs.
This is another step towards having FMF apply only to FP values
rather than those + fcmp. See PR38086 for one of the original
discussions/motivations:
https://bugs.llvm.org/show_bug.cgi?id=38086
And the test here is derived from PR39535:
https://bugs.llvm.org/show_bug.cgi?id=39535
Currently, we lose FMF when converting any phi to select in
SimplifyCFG. There are a small number of similar changes needed
to correct within SimplifyCFG, so it should be quick to patch
this pass up.
FMF was extended to select and phi with:
D61917
D67564
Working on top of D69252, this adds canonicalisation patterns for ssub.with.overflow to ssub.sats.
Differential Revision: https://reviews.llvm.org/D69753
This adds to D69245, adding extra signed patterns for folding from a
sadd_with_overflow to a sadd_sat. These are more complex than the
unsigned patterns, as the overflow can occur in either direction.
For the add case, the positive overflow can only occur if both of the
values are positive (same for both the values being negative). So there
is an extra select on whether to use the positive or negative overflow
limit.
Differential Revision: https://reviews.llvm.org/D69252
It was added in 2014 in 732e0aa9fb with one use in Scalarizer.cpp.
That one use was then removed when porting to the new pass manager in
2018 in b6f76002d9.
While the RFC and the desire to get off of static initializers for
cl::opt all still stand, this code is now dead, and I think we should
delete this code until someone is ready to do the migration.
There were many clients of CommandLine.h that were it transitively
through LLVMContext.h, so I cleaned that up in 4c1a1d3cf9.
Reviewers: beanz
Differential Revision: https://reviews.llvm.org/D70280
This is another step towards having FMF apply only to FP values
rather than those + fcmp. See PR38086 for one of the original
discussions/motivations:
https://bugs.llvm.org/show_bug.cgi?id=38086
And the test here is derived from PR39535:
https://bugs.llvm.org/show_bug.cgi?id=39535
Currently, we lose FMF when converting any phi to select in
SimplifyCFG. There are a small number of similar changes needed
to correct within SimplifyCFG, so it should be quick to patch
this pass up.
FMF was extended to select and phi with:
D61917
D67564
Differential Revision: https://reviews.llvm.org/D70208
This is a patch to support D66328, which was reverted until this lands.
Enable a compiler-rt test that used to fail previously with D66328.
Differential Revision: https://reviews.llvm.org/D67283
This patch introduces a function pass to inject the scalar-to-vector
mappings stored in the TargetLIbraryInfo (TLI) into the Vector
Function ABI (VFABI) variants attribute.
The test is testing the injection for three vector libraries supported
by the TLI (Accelerate, SVML, MASSV).
The pass does not change any of the analysis associated to the
function.
Differential Revision: https://reviews.llvm.org/D70107
ValueInfo has user-defined 'operator bool' which allows incorrect implicit conversion
to GlobalValue::GUID (which is unsigned long). This causes bugs which are hard to
track and should be removed in future.
Summary:
When scalarizing PHI nodes we might try to examine/rewrite
InsertElement nodes in predecessors. If those predecessors
are unreachable from entry, then the IR in those blocks could
have unexpected properties resulting in infinite loops in
Scatterer::operator[].
By simply treating values originating from instructions in
unreachable blocks as undef we do not need to analyse them
further.
This fixes PR41723.
Reviewers: bjope
Reviewed By: bjope
Subscribers: bjope, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D70171
getFirstNonPHI iterates over all the instructions in a block until it
finds a non-PHI.
Then, the loop starts from the beginning of the block and goes through
all the instructions until it reaches the instruction found by
getFirstNonPHI.
Instead of doing that, just stop when a non-PHI is found.
This reduces the compile-time of a test case discussed in
https://reviews.llvm.org/D47023 by 13x.
Not entirely sure how to come up with a test case for this since it's a
compile time issue that would significantly slow down running the tests.
Differential Revision: https://reviews.llvm.org/D70016
This reverts commit e511c4b0dff1692c267addf17dce3cebe8f97faa:
Temporarily Revert:
"[SLP] Generalization of stores vectorization."
"[SLP] Fix -Wunused-variable. NFC"
"[SLP] Vectorize jumbled stores."
after fixing the problem with compile time.
The vectoriser queries TTI->preferPredicateOverEpilogue to determine if
tail-folding is preferred for a loop, but it was not respecting loop hint
'predicate' that can disable this, which has now been added. This showed that
we were incorrectly initialising loop hint 'vectorize.predicate.enable' with 0
(i.e. FK_Disabled) but this should have been FK_Undefined, which has been
fixed.
Differential Revision: https://reviews.llvm.org/D70125
This is a resubmission of bbb29738b5 that
was reverted due to clang tests failures. It includes the fix and
additional IR tests for the missed case.
Summary:
In case when all incoming values of a PHI are equal pointers, this
transformation inserts a definition of such a pointer right after
definition of the base pointer and replaces with this value both PHI and
all it's incoming pointers. Primary goal of this transformation is
canonicalization of this pattern in order to enable optimizations that
can't handle PHIs. Non-inbounds pointers aren't currently supported.
Reviewers: spatel, RKSimon, lebedev.ri, apilipenko
Reviewed By: apilipenko
Tags: #llvm
Subscribers: hiraditya, llvm-commits
Differential Revision: https://reviews.llvm.org/D68128
This patch adds an assertion check for exported read/write-only
variables to be also in import list for module. If they aren't
we may face linker errors, because read/write-only variables are
internalized in their source modules. The patch also changes
export lists to store ValueInfo instead of GUID for performance
considerations.
Differential revision: https://reviews.llvm.org/D70128
Summary:
This fixes PR43081, where the transformation of `strchr(p, 0) -> p +
strlen(p)` can cause a segfault, if `-fno-builtin-strlen` is used. In
that case, `emitStrLen` returns nullptr, which CreateGEP is not designed
to handle. Also add the minimized code from the PR as a test case.
Reviewers: xbolva00, spatel, jdoerfert, efriedma
Reviewed By: efriedma
Subscribers: lebedev.ri, hiraditya, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D70143
This file lists every pass in LLVM, and is included by Pass.h, which is
very popular. Every time we add, remove, or rename a pass in LLVM, it
caused lots of recompilation.
I found this fact by looking at this table, which is sorted by the
number of times a file was changed over the last 100,000 git commits
multiplied by the number of object files that depend on it in the
current checkout:
recompiles touches affected_files header
342380 95 3604 llvm/include/llvm/ADT/STLExtras.h
314730 234 1345 llvm/include/llvm/InitializePasses.h
307036 118 2602 llvm/include/llvm/ADT/APInt.h
213049 59 3611 llvm/include/llvm/Support/MathExtras.h
170422 47 3626 llvm/include/llvm/Support/Compiler.h
162225 45 3605 llvm/include/llvm/ADT/Optional.h
158319 63 2513 llvm/include/llvm/ADT/Triple.h
140322 39 3598 llvm/include/llvm/ADT/StringRef.h
137647 59 2333 llvm/include/llvm/Support/Error.h
131619 73 1803 llvm/include/llvm/Support/FileSystem.h
Before this change, touching InitializePasses.h would cause 1345 files
to recompile. After this change, touching it only causes 550 compiles in
an incremental rebuild.
Reviewers: bkramer, asbirlea, bollu, jdoerfert
Differential Revision: https://reviews.llvm.org/D70211
Summary:
This temporarily disables the large working set size behavior in profile guided
size optimization due to internal benchmark regressions.
Reviewers: davidxl
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D70207
The bug manifests as replacing a reduction operand with an undef
value.
The problem appears to be limited to cases where a min/max reduction
has extra uses of the compare operand to the select.
In the general case, we are tracking "ExternallyUsedValues" and
an "IgnoreList" of the reduction operations, but those may not apply
to the final compare+select in a min/max reduction.
For that, we use replaceAllUsesWith (RAUW) to ensure that the new
vectorized reduction values are transferred to all subsequent users.
Differential Revision: https://reviews.llvm.org/D70148
As noted by the FIXME comment, this is not correct based on our current FMF semantics.
We should be propagating FMF from the final value in a sequence (in this case the
'select'). So the behavior even without this patch is wrong, but we did not allow FMF
on 'select' until recently.
But if we do the correct thing right now in this patch, we'll inevitably introduce
regressions because we have not wired up FMF propagation for 'phi' and 'select' in
other passes (like SimplifyCFG) or other places in InstCombine. I'm not seeing a
better incremental way to make progress.
That said, the potential extra damage over the existing wrong behavior from this
patch is very limited. AFAIK, the only way to have different FMF on IR in the same
function is if we have LTO inlined IR from 2 modules that were compiled using
different fast-math settings.
As seen in the tests, we may actually see some improvements with this patch because
adding the FMF to the 'select' allows matching to min/max intrinsics that were
previously missed (in the common case, the 'fcmp' and 'select' should have identical
FMF to begin with).
Next steps in the transition:
Make similar changes in instcombine as needed.
Enable phi-to-select FMF propagation in SimplifyCFG.
Remove dependencies on fcmp with FMF.
Deprecate FMF on fcmp.
Differential Revision: https://reviews.llvm.org/D69720
I think we have to be a bit more careful when it comes to moving
ops across shuffles, if the op does restrict undef. For example, without
this patch, we would move 'and %v, <0, 0, -1, -1>' over a
'shufflevector %a, undef, <undef, undef, 1, 2>'. As a result, the first
2 lanes of the result are undef after the combine, but they really
should be 0, unless I am missing something.
For ops that do fold to undef on undef operands, the current behavior
should be fine. I've add conservative check OpDoesRestrictUndef, maybe
there's a better existing utility?
Reviewers: spatel, RKSimon, lebedev.ri
Reviewed By: spatel
Differential Revision: https://reviews.llvm.org/D70093
In case when all incoming values of a PHI are equal pointers, this
transformation inserts a definition of such a pointer right after
definition of the base pointer and replaces with this value both PHI and
all it's incoming pointers. Primary goal of this transformation is
canonicalization of this pattern in order to enable optimizations that
can't handle PHIs. Non-inbounds pointers aren't currently supported.
Reviewers: spatel, RKSimon, lebedev.ri, apilipenko
Reviewed By: apilipenko
Tags: #llvm
Subscribers: hiraditya, llvm-commits
Differential Revision: https://reviews.llvm.org/D68128
Don't try to canonicalize loads to scalable vector types to loads
of integers.
This removes one assertion when trying to use a TypeSize as a parameter
to DataLayout::isLegalInteger. It does not handle the second part of the
function (which looks at bitcasts).
This patch also contains a NFC fix for Load Analysis, where a variable
initialization that would cause the same assertion is moved closer to
its use. This allows us to run the new test for InstCombine without
having to teach LocationSize to play nicely with scalable vectors.
Differential Revision: https://reviews.llvm.org/D70075
Currently we have limited support for outer loops with multiple basic
blocks after the inner loop exit. But the current checks for creating
PHIs for loop exit values only assumes the header and latches of the
outer loop. It is better to just skip incoming values defined in the
original inner loops. Those are handled earlier.
Reviewers: efriedma, mcrosier
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D70059
Summary:
This patch introduces align attribute deduction for callsite argument, function argument, function returned and floating value based on must-be-executed-context.
Reviewers: jdoerfert, sstefan1
Reviewed By: jdoerfert
Subscribers: hiraditya, jfb, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D69797
Summary: This patch introduces a new heuristic for guiding operand reordering. The new "look-ahead" heuristic can look beyond the immediate predecessors. This helps break ties when the immediate predecessors have identical opcodes (see lit test for examples).
Reviewers: RKSimon, ABataev, dtemirbulatov, Ayal, hfinkel, rnk
Reviewed By: RKSimon, dtemirbulatov
Subscribers: xbolva00, Carrot, hiraditya, phosek, rnk, rcorcs, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D60897
The attribute is stored at the `FunctionIndex` attribute set, with the
name "vector-function-abi-variant".
The get/set methods of the attribute have assertion to verify that:
1. Each name in the attribute is a valid VFABI mangled name.
2. Each name in the attribute correspond to a function declared in the
module.
Differential Revision: https://reviews.llvm.org/D69976
Re-try because earlier attempts were reverted due to use-after-free.
Hopefully, diagnosed correctly this time - we replace/remove the
invariant.start first rather than the invariant.end to avoid angering
worklist-based iteration.
We gather a set of white-listed instructions in isAllocSiteRemovable() and then
replace/erase them. But we don't know in general if the instructions in the set
have uses amongst themselves, so order of deletion makes a difference.
There's already a special-case for the llvm.objectsize intrinsic, so add another
for llvm.invariant.start.
Should fix:
https://bugs.llvm.org/show_bug.cgi?id=43723
Differential Revision: https://reviews.llvm.org/D69977
Summary:
SimplifySelectsFeedingBinaryOp simplified binary ops when both operands
were selects with the same condition. This patch extends it to handle
these cases where only one operand is a select:
X op (C ? P : Q) -> C ? (X op P) : (X op Q)
// if X op P and X op Q both simplify
(C ? P : Q) op Y -> C ? (P op Y) : (Q op Y)
// if P op Y and Q op Y both simplify
For example: X *fast (C ? 1.0 : 0.0) -> C ? X : 0.0
Reviewers: mcberg2017, majnemer, craig.topper, qcolombet, mcrosier
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D64713
The _m64 type is represented in IR as <1 x i64>. The x86-64 ABI
on Linux passes <1 x i64> as a double. MMX intrinsics use x86_mmx
type in IR.These things result in a lot of bitcasts in mmx code.
There's another instcombine that tries to turn bitcast <1 x i64>
to double into extractelement and a bitcast.
The combine here tries to reverse this extractelement conversion
if we see an mmx type.
Re-try rGef02831f0a4e (reverted due to use-after-free), but bail out completely
if we encounter an unexpected llvm.invariant.start.
We gather a set of white-listed instructions in isAllocSiteRemovable() and then
replace/erase them. But we don't know in general if the instructions in the set
have uses amongst themselves, so order of deletion makes a difference.
There's already a special-case for the llvm.objectsize intrinsic, so add another
for llvm.invariant.end.
Should fix:
https://bugs.llvm.org/show_bug.cgi?id=43723
Differential Revision: https://reviews.llvm.org/D69977
Summary: A helper function to get argument number of a arg operand Use.
Reviewers: jdoerfert, uenoku
Subscribers: hiraditya, lebedev.ri, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D66844
We gather a set of white-listed instructions in isAllocSiteRemovable() and then
replace/erase them. But we don't know in general if the instructions in the set
have uses amongst themselves, so order of deletion makes a difference.
There's already a special-case for the llvm.objectsize intrinsic, so add another
for llvm.invariant.end.
Should fix:
https://bugs.llvm.org/show_bug.cgi?id=43723
Differential Revision: https://reviews.llvm.org/D69977
This recommits 11ed1c0239 (reverted in
9f08ce0d21 for failing an assert) with a fix:
tryToWidenMemory() now first checks if the widening decision is to interleave,
thus maintaining previous behavior where tryToInterleaveMemory() was called
first, giving priority to interleave decisions over widening/scalarization. This
commit adds the test case that exposed this bug as a LIT.
Summary: A user can force a function to be inlined by specifying the always_inline attribute. Currently, thinlto implementation is not aware of always_inline functions and does not guarantee import of such functions, which in turn can prevent inlining of such functions.
Patch by Bharathi Seshadri <bseshadr@cisco.com>
Reviewers: tejohnson
Reviewed By: tejohnson
Subscribers: mehdi_amini, inglorion, hiraditya, steven_wu, dexonsmith, arphaman, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D70014
Patch enables import of write-only variables with non-trivial initializers
to fix linker errors. Initializers of imported variables are converted to
'zeroinitializer' to avoid promotion of referenced objects.
Differential revision: https://reviews.llvm.org/D70006
This patch implements a correct, but not terribly useful, transform. In particular, if we have a dynamic alloca in a loop which is guaranteed to execute, and provably not captured, we hoist the alloca out of the loop. The capture tracking is needed so that we can prove that each previous stack region dies before the next one is allocated. The transform decreases the amount of stack allocation needed by a linear factor (e.g. the iteration count of the loop).
Now, I really hope no one is actually using dynamic allocas. As such, why this patch?
Well, the actual problem I'm hoping to make progress on is allocation hoisting. There's a large draft patch out for review (https://reviews.llvm.org/D60056), and this patch was the smallest chunk of testable functionality I could come up with which takes a step vaguely in that direction.
Once this is in, it makes motivating the changes to capture tracking mentioned in TODOs testable. After that, I hope to extend this to trivial malloc free regions (i.e. free dominating all loop exits) and allocation functions for GCed languages.
Differential Revision: https://reviews.llvm.org/D69227
The change itself is straight forward and obvious, but ... there's an existing test checking for exactly the opposite. Both I and Artur think this is simply conservatism in the initial implementation. If anyone bisects a problem to this, a counter example will be very interesting.
Differential Revision: https://reviews.llvm.org/D69907
This recommits 100e797adb (reverted in
009e032634 for failing an assert). While the
root cause was independently reverted in eaff300401,
this commit includes a LIT to make sure IVDescriptor's SinkAfter logic does not
try to sink branch instructions.
x86_mmx is conceptually a vector already. Don't introduce an extra conversion between it and scalar i64.
I'm using VectorType::isValidElementType which checks for floating point, integer, and pointers to hopefully make this more readable than just blacklisting x86_mmx.
Differential Revision: https://reviews.llvm.org/D69964
Summary:
I need to make use of this pass from a driver program that isn't opt.
Therefore this patch moves this pass into the LLVM library so that it is
available for use elsewhere.
There was one function I kept in tools/opt which is exportDebugifyStats()
this is because it's serializing the statistics into a human readable
format and this seemed more in keeping with opt than a library function
Reviewers: vsk, aprantl
Subscribers: mgorny, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D69926
Instcombiner pass was erasing trivially dead instruction without updating dependent llvm.dbg.value.
which was not showing programmer current state of variables while debugging.
As a part of this fix I did following,
Iterate throught all the users (llvm.dbg) of a instruction which is trivially dead and set each if them undef, Before deleting the instruction.
Now user will see optimized out, when try to print those variables.
This fixes
https://bugs.llvm.org/show_bug.cgi?id=43893
This is my first fix to llvm.
Patch by kamlesh kumar!
Differential Revision: https://reviews.llvm.org/D69809
shift (logic (shift X, C0), Y), C1 --> logic (shift X, C0+C1), (shift Y, C1)
This is an IR translation of an existing SDAG transform added here:
rL370617
So we again have 9 possible patterns with a commuted IR variant of each pattern:
https://rise4fun.com/Alive/VlIhttps://rise4fun.com/Alive/n1mhttps://rise4fun.com/Alive/1Vn
Part of the motivation is to allow easier recognition and subsequent
canonicalization of bswap patterns as discussed in PR43146:
https://bugs.llvm.org/show_bug.cgi?id=43146
We had to delay this transform because it used to allow the SLP vectorizer
to create awful reductions out of simple load-combines.
That problem was fixed with:
rL375025
(we'll bring back load combining in IR someday...)
The backend is also better equipped to deal with these patterns now
using hooks like TLI.getShiftAmountThreshold().
The only remaining potential controversy is that the -reassociate pass
tends to reverse this kind of pattern (to help GVN?). But since -reassociate
doesn't do anything with these specific patterns, there is no conflict currently.
Finally, there's a new pass proposal at D67383 for general tree-height-reduction
reassociation, and it could use a cost model to decide how to optimally rearrange
these kinds of ops for a target. That patch appears to be stalled.
Differential Revision: https://reviews.llvm.org/D69842
Patch allows importing declarations of functions and variables, referenced
by the initializer of some other readonly variable.
Differential revision: https://reviews.llvm.org/D69561
We have a vector compare reduction problem seen in PR39665 comment 2:
https://bugs.llvm.org/show_bug.cgi?id=39665#c2
Or slightly reduced here:
define i1 @cmp2(<2 x double> %a0) {
%a = fcmp ogt <2 x double> %a0, <double 1.0, double 1.0>
%b = extractelement <2 x i1> %a, i32 0
%c = extractelement <2 x i1> %a, i32 1
%d = and i1 %b, %c
ret i1 %d
}
SLP would not attempt to turn this into a vector reduction because there is an
artificial lower limit on that transform. We can not completely remove that limit
without inducing regressions though, so this patch just hacks an extra attempt at
creating a 2-way reduction to the end of the analysis.
As shown in the test file, we are still not getting some of the motivating cases,
so follow-on patches will be needed to solve those cases.
Differential Revision: https://reviews.llvm.org/D59710
Summary:
When adjusting function entry counts after inlining, Funciton::setEntryCount is called without providing an import function list. The side effect of that is the previously set import function list will be dropped. The import function list is used by ThinLTO to help import hot cross module callee for LTO inlining, so dropping that during ThinLTO pre-link may adversely affect LTO inlining. The fix is to keep the list while updating entry counts for inlining.
Reviewers: wmi, davidxl, tejohnson
Subscribers: mehdi_amini, hiraditya, dexonsmith, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D69736
"[SLP] Generalization of stores vectorization."
"[SLP] Fix -Wunused-variable. NFC"
"[SLP] Vectorize jumbled stores."
As they're causing significant (10-30x) compile time regressions on
vectorizable code.
The primary cause of the compile-time regression is f228b53716.
This reverts commits:
f228b537165503455ccb21d498c9c0
The basic idea of the transform is to convert variant loop exit conditions into invariant exit conditions by changing the iteration on which the exit is taken when we know that the trip count is unobservable. See the original patch which introduced the code for a more complete explanation.
The individual parts of this have been reviewed, the result has been fuzzed, and then further analyzed by hand, but despite all of that, I will not be suprised to see breakage here. If you see problems, please don't hesitate to revert - though please do provide a test case. The most likely class of issues are latent SCEV bugs and without a reduced test case, I'll be essentially stuck on reducing them.
(Note: A bunch of tests were opted out of the new transform to preserve coverage. That landed in a previous commit to simplify revert cycles if they turn out to be needed.)
Summary:
This patch factors out code to clone instructions -- partly for
readability and partly to facilitate an upcoming patch of my own.
Reviewers: wmi
Subscribers: hiraditya, jfb, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D69861
We had a subtle, but nasty bug in our definition of a widenable branch, and thus in the transforms which used that utility. Specifically, we returned true for any branch which included a widenable condition within it's condition, regardless of whether that widenable condition also had other uses.
The problem is that the result of the WC() call is defined to be one particular value. As such, all users must agree as to what that value is. If we widen a branch without also updating *all other users* of the WC in the same way, we have broken the required semantics.
Most of the textual diff is updating existing transforms not to leave dead uses hanging around. They're largely NFC as the dead instructions would be immediately deleted by other passes. The reason to make these changes is so that the transforms preserve the widenable branch form.
In practice, we don't get bitten by this only because it isn't profitable to CSE WC() calls and the lowering pass from guards uses distinct WC calls per branch.
Differential Revision: https://reviews.llvm.org/D69916
This patch fixes two issues noticed by inspection when going to enable the loop predication code in IndVarSimplify.
Issue 1 - Both the LoopPredication transform, and the already on by default optimizeLoopExits transform, modify the exit count of the exits they modify. (either to 0 or Infinity) Looking at the code more closely, this was not reflected into SCEV and we were instead running later transforms with incorrect SCEVs. Fixing this requires forgetting the loop, weakening a too strong assert, and updating SCEV to not pessimize results when a loop is provable untaken. I haven't been able to find a test case to demonstrate the miscompile.
Issue 2 - For modules without a data layout, we can end up with unsized pointer typed exit counts. Just bail out of this case.
I think these are the last two issues which need addressed before we enable this by default. The code has already survived a decent amount of fuzzing without revealing either of the above.
Differential Revision: https://reviews.llvm.org/D69695
Summary:
I believe this bisects to https://reviews.llvm.org/D44983
(`[LoopUnroll] Only peel if a predicate becomes known in the loop body.`)
While that revision did contain tests that showed arguably-subpar peeling
for [in]equality predicates that [not] happen in the middle of the loop,
it also disabled peeling for the *first* loop iteration,
because latch would be canonicalized to [in]equality comparison..
That was intentional as per https://reviews.llvm.org/D44983#1059583.
I'm not 100% sure that i'm using correct checks here,
but this fix appears to be going in the right direction..
Let me know if i'm missing some checks here..
Fixes [[ https://bugs.llvm.org/show_bug.cgi?id=43840 | PR43840 ]].
Reviewers: fhahn, mkazantsev, efriedma
Reviewed By: fhahn
Subscribers: xbolva00, hiraditya, zzheng, llvm-commits, fhahn
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D69617
We have two ways to steer creating a predicated vector body over creating a
scalar epilogue. To force this, we have 1) a command line option and 2) a
pragma available. This adds a third: a target hook to TargetTransformInfo that
can be queried whether predication is preferred or not, which allows the
vectoriser to make the decision without forcing it.
While this change behaves as a non-functional change for now, it shows the
required TTI plumbing, usage of this new hook in the vectoriser, and the
beginning of an ARM MVE implementation. I will follow up on this with:
- a complete MVE implementation, see D69845.
- a patch to disable this, i.e. we should respect "vector_predicate(disable)"
and its corresponding loophint.
Differential Revision: https://reviews.llvm.org/D69040
Summary:
This patch factors out code to merge a basic block with its sole
successor -- partly for readability and partly to facilitate an
upcoming patch of my own.
Reviewers: wmi
Subscribers: hiraditya, jfb, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D69852
This recommits 2be17087f8 (reverted in
d3ec06d219 for heap-use-after-free) with a fix
in IAI's reset() which was not clearing the set of interleave groups after
deleting them.
When eliminating a pair of
`llvm.objc.autoreleaseReturnValue`
followed by
`llvm.objc.retainAutoreleasedReturnValue`
we need to make sure that the instructions in between are safe to
ignore.
Other than bitcasts and useless GEPs, it's also safe to ignore lifetime
markers for both static allocas (lifetime.start/lifetime.end) and dynamic
allocas (stacksave/stackrestore).
These get added by the inliner as part of the return sequence and can
prevent the transformation from happening in practice.
Differential Revision: https://reviews.llvm.org/D69833
Summary:
This patch factors out common code to update the SSA form in
JumpThreading.cpp -- partly for readability and partly to facilitate
an coming patch of my own.
Reviewers: wmi
Subscribers: hiraditya, jfb, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D69811
Summary:
That fold keeps growing and growing :(
I think this may be one of the last pieces for it.
Since D67677/D67725, the fold knowns the general form
of the pattern - where some masking is needed:
https://rise4fun.com/Alive/F5Rhttps://rise4fun.com/Alive/gslRa
But there is one more huge piece missing - if you are extracting some bits,
it is not impossible that the origin is wider than the extraction,
i.e. there may be a truncation. And we don't deal with that yet.
But we can, and the generalization remains fully identical:
https://rise4fun.com/Alive/Uarhttps://rise4fun.com/Alive/5SW
After a preparatory cleanup i think the diff looks rather clean.
One missing piece is that in some patterns (especially pat. b),
`-1` only needs to be `-1` in final type, but that is for later..
https://bugs.llvm.org/show_bug.cgi?id=42563
Reviewers: spatel, nikic
Reviewed By: spatel
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D69125
This transformation is a variation on the GuardWidening transformation we have checked in as it's own pass. Instead of focusing on merge (i.e. hoisting and simplifying) two widenable branches, this transform makes the observation that simply removing a second slowpath block (by reusing an existing one) is often a very useful canonicalization. This may lead to later merging, or may not. This is a useful generalization when the intermediate block has loads whose dereferenceability is hard to establish.
As noted in the patch, this can be generalized further, and will be.
Differential Revision: https://reviews.llvm.org/D69689
This reverts commit 004ed2b0d1.
Original commit hash 6d03890384
Summary:
This adds a clang option to disable inline line tables. When it is used,
the inliner uses the call site as the location of the inlined function instead of
marking it as an inline location with the function location.
https://reviews.llvm.org/D67723
Summary:
If the GEP instructions are going to be vectorized, the indices in those
GEP instructions must be of the same type. Otherwise, the compiler may
crash when trying to build the vector constant.
Reviewers: RKSimon, spatel
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D69627
This was an experiment made possible by a non-standard feature of the
Android dynamic loader.
It required introducing a flag to tell the compiler which ABI was being
targeted.
This flag is no longer needed, since the generated code now works for
both ABI's.
We leave that flag untouched for backwards compatibility. This also
means that if we need to distinguish between targeted ABI's again
we can do that without disturbing any existing workflows.
We leave a comment in the source code and mention in the help text to
explain this for any confused person reading the code in the future.
Patch by Matthew Malcomson
Differential Revision: https://reviews.llvm.org/D69574
The sink-after and interleave-group vectorization decisions were so far applied to
VPlan during initial VPlan construction, which complicates VPlan construction – also because of
their inter-dependence. This patch refactors buildVPlanWithRecipes() to construct a simpler
initial VPlan and later apply both these vectorization decisions, in order, as VPlan-to-VPlan
transformations.
Differential Revision: https://reviews.llvm.org/D68577
Dependences between two abstract attributes SRC and TRG come naturally in
two flavors:
Either (1) "some" information of SRC is *required* for TRG to derive
information, or (2) SRC is just an *optional* way for TRG to derive
information.
While it is not strictly necessary to distinguish these types
explicitly, it can help us to converge faster, in terms of iterations,
and also cut down the number of `AbstractAttribute::update` calls.
As far as I can tell, we only use optional dependences for liveness so
far but that might change in the future. With this change the Attributor
can be informed about the "dependence class" and it will perform
appropriate actions when an Attribute is set to an invalid state, thus
one that cannot be used by others to derive information from.
As discussed in https://bugs.llvm.org/show_bug.cgi?id=43870,
this transform is missing a crucial legality check:
the old (non-countable) loop would early-return upon first mismatch,
but there is no such guarantee for bcmp/memcmp.
We'd need to ensure that [PtrA, PtrA+NBytes) and [PtrB, PtrB+NBytes)
are fully dereferenceable memory regions. But that would limit
the transform to constant loop trip counts and would further
cripple it because dereferenceability analysis is *very* partial.
Furthermore, even if all that is done, every single test
would need to be rewritten from scratch.
So let's just give up.
BlockAddress users will not "call" the function so they do not qualify
as call sites in the first place. When we delete a function with
BlockAddress users we need to first remove the body so they are properly
discarded.
When we replace constant returns at the call site we did issue a cast in
the hopes it would be a no-op if the types are equal. Turns out that is
not the case and we have to check it ourselves first.
Reused an IPConstantProp test for coverage. No functional change to the
test wrt. IPConstantProp.
Even if the invoked function may-return, we can replace it with a call
and branch if it is nounwind. We had almost everything in place to do
this but did not which actually caused a crash when we removed the
landingpad from the actually dead unwind block.
Exposed by the IPConstantProp tests.
We did merge "known" and "assumed" liveness information into a single
set which caused various kinds of problems, especially because we did
not properly record when something was actually known. With this patch
we properly track the "known" bit and distinguish dead ends we know from
the ones we still need to explore in future updates.
We gave up on `noreturn` if `willreturn` was known for a while but we
now again try to always derive `noreturn`. This is useful because a
function that is `noreturn` + `willreturn` is basically dead as
executing it would lead to undefined behavior (UB).
This came up in the IPConstantProp cases where a function only contained
a unreachable but was not marked `noreturn` which caused missed
opportunities down the line.
In D69605 only the "cases" of a switch were handled but if none matched
we did not make the default case life. This is fixed now and properly
tested (with code from IPConstantProp/user-with-multiple-uses.ll).
We cannot simply replace arguments that carry attributes like `nest`,
`inalloca`, `sret`, and `byval`. Except for the last one, which we can
replace if it is not written, we bail for now.