If a vector body has live-out values, it is probably a reduction, which needs a
final reduction step after the loop. MVE has a VADDV instruction to reduce
integer vectors, but doesn't have an equivalent one for float vectors. A
live-out value that is not recognised as reduction later in the optimisation
pipeline will result in the tail-predicated loop to be reverted to a
non-predicated loop and this is very expensive, i.e. it has a significant
performance impact, which is what we hope to avoid with fine tuning the ARM TTI
hook preferPredicateOverEpilogue implementation.
Differential Revision: https://reviews.llvm.org/D82953
Here we teach the ConstantFolding analysis pass that it is not legal to
replace a load of a bitcast constant (having a non-integral addrspace)
with a bitcast of the value of that constant (with a different
non-integral addrspace).
But also teach it that certain bit patterns are always known and
convertable (a fact it already uses elsewhere). This required us to also
fix a globalopt test, since, after this change, LLVM is able to realize
that the test actually is a valid transform (NULL is always a known
bit-pattern) and so it doesn't need to emit the failure remarks for it.
Also simplify some of the negative tests for transforms by avoiding a
type change in their bitcast, and add positive versions of the same
tests, to show that they otherwise should work.
Differential Revision: https://reviews.llvm.org/D59730
Summary:
NOTE: There is a mailing list discussion on this: http://lists.llvm.org/pipermail/llvm-dev/2019-December/137632.html
Complemantary to the assumption outliner prototype in D71692, this patch
shows how we could simplify the code emitted for an alignemnt
assumption. The generated code is smaller, less fragile, and it makes it
easier to recognize the additional use as a "assumption use".
As mentioned in D71692 and on the mailing list, we could adopt this
scheme, and similar schemes for other patterns, without adopting the
assumption outlining.
Reviewers: hfinkel, xbolva00, lebedev.ri, nikic, rjmccall, spatel, jdoerfert, sstefan1
Reviewed By: jdoerfert
Subscribers: thopre, yamauchi, kuter, fhahn, merge_guards_bot, hiraditya, bollu, rkruppe, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D71739
Currently, a transformation like pow(2.0, x) -> exp2(x) copies the pow
attribute list verbatim and applies it to exp2. This works out fine
when the attribute list is empty, but when it isn't clang may error due
due to the mismatch.
The source function and destination don't necessarily have anything
to do with one another, attribute-wise. So it makes sense to remove
the attribute lists (this is similar to what IPO does in this
situation).
This was discovered after implementing the `noundef` param attribute.
Differential Revision: https://reviews.llvm.org/D82820
This reverts commit 9649c2095f. See
discussion on the llvm-commits thread: if it's OK to preserve the
location when sinking a call, it's probably OK to always preserve the
location.
Place the ssa.copy instructions for assumes after the assume,
instead of before it. Both options are valid, but placing them
afterwards prevents assumes from being replaced with assume(true).
This fixes https://bugs.llvm.org/show_bug.cgi?id=37541 in NewGVN
and will avoid a similar issue in SCCP when we handle more
predicate infos.
Differential Revision: https://reviews.llvm.org/D83631
Teaches the SLPVectorizer to use vectorized library functions for
non-intrinsic calls.
This already worked for intrinsics that have vectorized library
functions, thanks to D75878, but schedules with library functions with a
vector variant were being rejected early.
- assume that there are no load/store dependencies between lib
functions with a vector variant; this would otherwise prevent the
bundle from becoming "ready"
- check during legalization that the vector variant can be used
- fix-up where we previously assumed that a call would be an intrinsic
Differential Revision: https://reviews.llvm.org/D82550
This refactors option -disable-mve-tail-predication to take different arguments
so that we have 1 option to control tail-predication rather than several
different ones.
This is also a prep step for D82953, in which we want to reject reductions
unless that is requested with this option.
Differential Revision: https://reviews.llvm.org/D83133
We can try to replace select with a Phi not in its parent block alone,
but also in blocks of its arguments. We benefit from it when select's
argument is a Phi.
Differential Revision: https://reviews.llvm.org/D83284
Reviewed By: nikic
Similar to rG40fcc42:
The base case only worked because we were relying on a
poison-unsafe select transform; if that is fixed, we
would regress on patterns like this.
The extra use tests show that the select transform can't
be applied consistently. So it may be a regression to have
an extra instruction on 1 test, but that result was not
created safely and does not happen reliably.
This patch fixes D81345 and PR46652.
If a loop with a small trip count is compiled w/o -Os/-Oz, Loop Access Analysis
still generates runtime checks for unit strides that will version the loop.
In such cases, the loop vectorizer should either re-run the analysis or bail-out
from vectorizing the loop, as done prior to D81345. The latter is applied for
now as the former requires refactoring.
Differential Revision: https://reviews.llvm.org/D83470
Summary:
- Skip unreachable predecessors during header detection in SCC. Those
unreachable blocks would be generated in the switch lowering pass in
the corner cases or other frontends. Even though they could be removed
through the CFG simplification, we should skip them during header
detection.
Reviewers: sameerds
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D83562
The current implementation of Tail Recursion Elimination has a very restricted
pre-requisite: AllCallsAreTailCalls. i.e. it requires that no function
call receives a pointer to local stack. Generally, function calls that
receive a pointer to local stack but do not capture it - should not
break TRE. This fix allows us to do TRE if it is proved that no pointer
to the local stack is escaped.
Reviewed by: efriedma
Differential Revision: https://reviews.llvm.org/D82085
In non-SPMD mode we create a state machine like code to identify the
parallel region the GPU worker threads should execute next. The
identification uses the parallel region function pointer as that allows
it to work even if the kernel (=target region) and the parallel region
are in separate TUs. However, taking the address of a function comes
with various downsides. With this patch we will identify the most common
situation and replace the function pointer use with a dummy global
symbol (for identification purposes only). That means, if the parallel
region is only called from a single target region (or kernel), we do not
use the function pointer of the parallel region to identify it but a new
global symbol.
Fixes PR46450.
Reviewed By: JonChesterfield
Differential Revision: https://reviews.llvm.org/D83271
We now identify GPU kernels, that is entry points into the GPU code.
These kernels (can) correspond to OpenMP target regions. With this patch
we identify and on request print them via remarks.
Reviewed By: JonChesterfield
Differential Revision: https://reviews.llvm.org/D83269
This reverts commit 1d542f0ca8.
`recollectUses()` is added to prevent looking at dead uses after
Attributor run.
This is the first and most basic ICV Tracking implementation. For this
first version, we only support deduplication within the same BB.
Reviewers: jdoerfert, JonChesterfield, hamax97, jhuber6, uenoku,
baziotis, lebedev.ri
Differential Revision: https://reviews.llvm.org/D81788
There appears to be some kind of memory corruption/use-after-free/etc
going on here. In particular, in `OpenMPOpt::deleteParallelRegions()`,
in `DeleteCallCB()`, `CI` is garbage.
WIll post reproducer in the original review.
This reverts commit 6c4a5e9257.
Attributor tests are mostly updated using the auto upgrade scripts but
sometimes we forget. If we do it manually or continue using old check
lines that still match we see unrelated changes down the line. This is
just a cleanup.
We can happen to have a situation with many stores eligible for transform,
but due to our visitation order (top to bottom), when we have processed
the first eligible instruction, we would not try to reprocess the previous
instructions that are now also eligible.
So after we've successfully merged a store that was second-to-last instruction
into successor, if the now-second-to-last instruction is also a such store
that is eligible, add it to worklist to be revisited.
Fixes https://bugs.llvm.org/show_bug.cgi?id=46661
Currently the DomTree is not kept up to date for additional blocks
generated in the vector loop, for example when vectorizing with
predication. SCEVExpander relies on dominance checks when looking for
existing instructions to re-use and in some cases that can lead to the
expander picking instructions that do not actually dominate their insert
point (e.g. as in PR46525).
Unfortunately keeping the DT up-to-date is a bit tricky, because the CFG
is only patched up after generating code for a block. For now, we can
just use the vector loop header, as this ensures the inserted
instructions dominate all uses in the vector loop. There should be no
noticeable impact on the generated code, as other passes should sink
those instructions, if profitable.
Fixes PR46525.
Reviewers: Ayal, gilr, mkazantsev, dmgreen
Reviewed By: dmgreen
Differential Revision: https://reviews.llvm.org/D83288
Summary: This allows to convert any SExt to a ZExt when we know none of the extended bits are used, specially in cases where there are multiple uses of the value.
Reviewers: dmgreen, eli.friedman, spatel, lebedev.ri, nikic
Reviewed By: lebedev.ri, nikic
Subscribers: hiraditya, dmgreen, craig.topper, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D60413
The block front may be a PHI node, inserting a cast instructions like
BitCast, PtrToInt, IntToPtr among PHIs is not right.
Reviewed By: lebedev.ri
Differential Revision: https://reviews.llvm.org/D80975
Follow up from the transform being removed in D83360. If X is probably not poison, then the transform is safe.
Still plan to remove or adjust the code from ConstantFolding after this.
Differential Revision: https://reviews.llvm.org/D83440
We can't fold to the non-undef value unless we know it isn't poison. So check each element with isGuaranteedNotToBeUndefOrPoison. This currently rules out all constant expressions.
Differential Revision: https://reviews.llvm.org/D83442
Some of the tests in the llvm/test/Transforms/IPConstantProp directory
actually only use -ipsccp. Those tests belong to the other (IP)SCCP
tests in llvm/test/Transforms/SCCP/ and this commits moves them there to
avoid confusion with IPConstantProp.
Summary:
The test case started to hoist bitcasts to upper BB after D81730.
Reverted unintentional logic change. Some instructions may have zero cost but
will not be hoisted by different limitation so should be counted for threshold.
Reviewers: aprantl, arsenm, nhaehnle
Reviewed By: aprantl
Subscribers: wdng, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D82761
Currently SCCP does not combine the information of conditions joined by
AND in the true branch or OR in the false branch.
For branches on AND, 2 copies will be inserted for the true branch, with
one being the operand of the other as in the code below. We can combine
the information using intersection. Note that for the OR case, the
copies are inserted in the false branch, where using intersection is
safe as well.
define void @foo(i32 %a) {
entry:
%lt = icmp ult i32 %a, 100
%gt = icmp ugt i32 %a, 20
%and = and i1 %lt, %gt
; Has predicate info
; branch predicate info { TrueEdge: 1 Comparison: %lt = icmp ult i32 %a, 100 Edge: [label %entry,label %true] }
%a.0 = call i32 @llvm.ssa.copy.140247425954880(i32 %a)
; Has predicate info
; branch predicate info { TrueEdge: 1 Comparison: %gt = icmp ugt i32 %a, 20 Edge: [label %entry,label %false] }
%a.1 = call i32 @llvm.ssa.copy.140247425954880(i32 %a.0)
br i1 %and, label %true, label %false
true: ; preds = %entry
call void @use(i32 %a.1)
%true.1 = icmp ne i32 %a.1, 20
call void @use.i1(i1 %true.1)
ret void
false: ; preds = %entry
call void @use(i32 %a.1)
ret void
}
Reviewers: efriedma, davide, mssimpso, nikic
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D77808
When adding support for scalable vector masked loads and stores we
accidently opened up likewise for fixed length vectors. This patch
restricts support to scalable vectors only, thus ensuring fixed
length vectors are treated the same regardless of SVE support.
Differential Revision: https://reviews.llvm.org/D83341
Summary:
New line duplication logic introduced in https://reviews.llvm.org/D63482
has two issues: (1) there is no logic that removes duplicate newlines
when clang-apply-replacment reads YAML and (2) in general such logic
should be applied to all strings and should happen on string
serialization level instead in YAML parser.
This diff changes multiline strings quotation from single quote `'` to
double `"`. It solves problems with internal newlines because now they are
escaped. Also double quotation solves the problem with leading whitespace after
newline. In case of single quotation YAML parsers should remove leading
whitespace according to specification. In case of double quotation these
leading are internal space and they are preserved. There is no way to
instruct YAML parsers to preserve leading whitespaces after newline so
double quotation is the only viable option that solves all problems at
once.
Test Plan: check-all
Reviewers: gribozavr, mgehre, yvvan
Subscribers: xazax.hun, hiraditya, cfe-commits, llvm-commits
Tags: #clang-tools-extra, #clang, #llvm
Differential Revision: https://reviews.llvm.org/D80301