The legacy PM doesn't run EP_ModuleOptimizerEarly on -O0, so skip
running it here when given O0.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D93886
With the i32 these patterns will only fire on RV32, but they
don't look RV32 specific.
Reviewed By: lenary
Differential Revision: https://reviews.llvm.org/D93843
This reverts commit dd6bb367d1.
Multi-stage builders are showing an assertion failure w/LCSSA not being preserved on entry to IndVars. Reason isn't clear, reverting while investigating.
The basic idea is that if SCEV can prove the backedge isn't taken, we can go ahead and get rid of the backedge (and thus the loop) while leaving the rest of the control in place. This nicely handles cases with dispatch between multiple exits and internal side effects.
Differential Revision: https://reviews.llvm.org/D93906
Silence clang static analyzer warning that 'fn' could still be in an undefined state - this shouldn't happen depending on the likely tag order, but the analyzer can't know that.
bb7d3af113 disabled hoisting in SimplifyCFG by default, but enabled it
late in the pipeline. But it appears as if the LTO pipelines got missed.
This patch adjusts the LTO pipelines to also enable hoisting in the
later stages.
Unfortunately there's no easy way to add a test for the change I think.
Reviewed By: lebedev.ri
Differential Revision: https://reviews.llvm.org/D93684
Both tryReplaceExtracts and replaceBinOpShuffles may modify the IR, even
if no interleaved loads are generated, but currently the pass pretends
no changes were made.
This patch updates the pass to return true if either of the functions
made any changes. In case of tryReplaceExtracts, changes are made if
there are any Extracts and true is returned.
`replaceBinOpShuffles` always makes changes if BinOpShuffles is not empty.
It also always returned true, so I went ahead and change it to just
`replaceBinOpShuffles`.
Fixes PR48208.
Reviewed By: SjoerdMeijer
Differential Revision: https://reviews.llvm.org/D93997
A new TTI interface has been added 'Optional <unsigned>getMaxVScale' that
returns the maximum vscale for a given target.
When known getMaxVScale is used to compute the cost of masked gather scatter
for scalable vector.
Depends on D92094
Differential Revision: https://reviews.llvm.org/D93030
This patch adds patterns for the indexed variants of FCMLA. Mostly based
on a patch by Tim Northover.
Reviewed By: SjoerdMeijer
Differential Revision: https://reviews.llvm.org/D92947
Check if all possible values for a pair of knownbits give the same icmp result - these are based off the checks performed in InstCombineCompares.cpp and D86578.
Add exhaustive unit test coverage - a followup will update InstCombineCompares.cpp to use this.
The lowering of a <4 x i16> or <4 x i8> vecreduce.add into an i64 would
previously be expanded, due to the i64 not being legal. This patch
adjusts our reduction matchers, making it produce a VADDLV(sext A to
v4i32) instead.
Differential Revision: https://reviews.llvm.org/D93622
* Prevent the generation of invalid shift instructions by constraining
the immediate field. I've limited the shift field to constant values
only, adding the `R_SPARC_5`/`R_SPARC_6` relocations is trivial if
needed (but I can't really think of a use case for those).
* Fix the generation of PC-relative `call`
* Fix the transformation of `jmp sym` into `jmpl`
* Emit fixups for simm13 operands
I moved the choice of the correct relocation into the code emitter as I've
seen the other backends do, it can be definitely cleaner but the aim was
to reduce the scope of the patch as much as possible.
Fixes the problems raised by joerg in L254199
Reviewed By: dcederman
Differential Revision: https://reviews.llvm.org/D78193
Change default CPU name of SX-Aurora VE from "ve" to "generic" similar
to other architectures.
Reviewed By: simoll
Differential Revision: https://reviews.llvm.org/D93836
The old CPU model only had MLA->MLA forwarding. I added some missing
MUL->MLA read advances and a missing absolute diff accumulator read
advance according to the Cortex A57 Software Optimization Guide.
The patch improves performance in EEMBC rgbyiqv2 by about 6%-7% and
spec2006/milc by 8% (repeated runs on multiple devices), causes no
significant regressions (none in SPEC).
Differential Revision: https://reviews.llvm.org/D92296
Currently ArgPromotion removes dead GEPs as part of the legality check
in isSafeToPromoteArgument. If no promotion happens, this means the pass
claims no modifications happened, even though GEPs were removed.
This patch fixes the issue by delaying removal of dead GEPs until
doPromotion: isSafeToPromoteArgument can simply skips dead GEPs and
the code in doPromotion dealing with GEPs is updated to account for
dead GEPs. Once we committed to promotion, it should be safe to
remove dead GEPs.
Alternatively isSafeToPromoteArgument could return an additional boolean
to indicate whether it made changes, but this is quite cumbersome and
there should be no real benefit of weeding out some dead GEPs here if we
do not perform promotion.
I added a test for the case where dead GEPs need to be removed when
promotion happens in 578c5a0c6e.
Fixes PR47477.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D93991
Remove VA.needsCustom checks which are copied from Sparc implementation
at the very beginning of VE implementation. Add assert to sanity-check
VA.needsCustom flag, also.
Reviewed By: simoll
Differential Revision: https://reviews.llvm.org/D93847
This patch fixes a crash encountered when compiling this code:
...
float16_t a;
__asm__("fminv %h[a], %[b], %[c].h"
: [a] "=r" (a)
: [b] "Upl" (b), [c] "w" (c))
The issue here is when using the 'h' modifier for a register
constraint 'r'.
Differential Revision: https://reviews.llvm.org/D93537
In `PPCInstrInfo::optimizeCompareInstr` we seek opportunities to fold `cmp(d|w)` and `subf` as an `subf.`. However, if `subf.` gets overflow, `cr0` can't reflect the correct order, violating the semantics of `cmp(d|w)`.
Fixed https://bugs.llvm.org/show_bug.cgi?id=47830.
Reviewed By: #powerpc, nemanjai
Differential Revision: https://reviews.llvm.org/D90156
This is NFC since SimplifyCFG still currently defaults to not preserving DomTree.
SimplifyCFGOpt::simplifyOnce() is only be called from SimplifyCFGOpt::run(),
and can not be called externally, since SimplifyCFGOpt is defined in .cpp
This avoids some needless verifications, and is thus a bit faster
without sacrificing precision.
We only need to remove non-TrueBB/non-FalseBB successors,
and we only need to do that once. We don't need to insert
any new edges, because no new successors will be added.
As the comment already indicates, performing an operation with
nnan/ninf flags on a nan/inf or undef results in poison. Now that
we have a proper poison value, we no longer need to relax it to
undef.
Div/rem by zero is immediate undefined behavior and anything goes.
Currently we fold it to undef, this patch changes it to fold to
poison instead, which is slightly stronger.
Differential Revision: https://reviews.llvm.org/D93995
This is the same change as D93990, but for extractelement rather
than insertelement.
> If idx exceeds the length of val for a fixed-length vector, the
> result is a poison value. For a scalable vector, if the value of
> idx exceeds the runtime length of the vector, the result is a
> poison value.
This patch makes X86InterleavedAccessGroup::deinterleave8bitStride3 use the unary CreateShuffleVector.
This is a continuation of D93923. There were a few missing replacements.
IIUC, this patch does not cause change in the generated programs' semantics because the
function inserts shufflevectors that only choose elements from the first vector.
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D93993
This is a simple patch that updates InstSimplify to return poison if the index is/can be out-of-bounds
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D93990
This patch makes Scalarizer to use poison as insertelement's placeholder.
It contains two changes in Scalarizer.cpp, and the both changes does not change the semantics of the optimized program.
It is because the placeholder value (poison) is already completely hidden by following insertelement instructions.
The first change at visitBitCastInst() creates poison vector of MidTy and consecutively inserts FanIn times,
which is # of elems of MidTy.
The second change at ScalarizerVisitor::finish() creates poison with Op->getType(), and it is filled with
Count insertelements.
The test diffs show that the poison value is never exposed after insertelements.
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D93989
Let getTruncateExpr() short-circuit to zero when the value being truncated is
known to have at least as many trailing zeros as the target type.
Differential Revision: https://reviews.llvm.org/D93973
That refactoring is helpful since it reduces data inter-dependencies.
Which is good for current implementation and even more good for
fully multi-thread implementation. The idea of the refactoring
is to delete UniquingStringPool from the global DWARFLinker level.
It is used to unique type names while ODR deduplication is done.
Thus we move UniquingStringPool into the DeclContextTree which
matched to UniquingStringPool usage scope.
golden-dsymutil/dsymutil 23787992
clang MD5: 7d9873ff94f0246b6ab1ec3e8d0f3f06
build-Release/bin/dsymutil 23921272
clang MD5: 7d9873ff94f0246b6ab1ec3e8d0f3f06
Differential Revision: https://reviews.llvm.org/D93460
There is a number of transforms in SimplifyCFG that take DomTree out of
DomTreeUpdater, and do updates manually. Until they are fixed,
user passes are unable to claim that PDT is preserved.
Note that the default for SimplifyCFG is still not to preserve DomTree,
so this is still effectively NFC.
`UniqueInternalLinkageNamesPass` is useful to CSSPGO, especially when pseudo probe is used. It solves naming conflict for static functions which otherwise will share a merged profile and likely have a profile quality issue with mismatched CFG checksums. Since the pseudo probe instrumentation happens very early in the pipeline, I'm moving `UniqueInternalLinkageNamesPass` right before it. This is being done only to the new pass manager.
Reviewed By: dblaikie, aeubanks
Differential Revision: https://reviews.llvm.org/D93656