The legacy PM doesn't run EP_ModuleOptimizerEarly on -O0, so skip
running it here when given O0.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D93886
With the i32 these patterns will only fire on RV32, but they
don't look RV32 specific.
Reviewed By: lenary
Differential Revision: https://reviews.llvm.org/D93843
This reverts commit dd6bb367d1.
Multi-stage builders are showing an assertion failure w/LCSSA not being preserved on entry to IndVars. Reason isn't clear, reverting while investigating.
The basic idea is that if SCEV can prove the backedge isn't taken, we can go ahead and get rid of the backedge (and thus the loop) while leaving the rest of the control in place. This nicely handles cases with dispatch between multiple exits and internal side effects.
Differential Revision: https://reviews.llvm.org/D93906
Summary:
This patch adds more fine-grained support over which information is output from the libomptarget runtime when run with the environment variable LIBOMPTARGET_INFO set. An extensible set of flags can be used to pick and choose which information the user is interested in.
Reviewers: jdoerfert JonChesterfield grokos
Differential Revision: https://reviews.llvm.org/D93727
Silence clang static analyzer warning that 'fn' could still be in an undefined state - this shouldn't happen depending on the likely tag order, but the analyzer can't know that.
[libomptarget][amdgpu] Call into deviceRTL instead of ockl
Amdgpu codegen presently emits a call into ockl. The same functionality
is already present in the deviceRTL. Adds an amdgpu specific entry point
to avoid the dependency. This lets simple openmp code (specifically, that
which doesn't use libm) run without rocm device libraries installed.
Reviewed By: ronlieb
Differential Revision: https://reviews.llvm.org/D93356
bb7d3af113 disabled hoisting in SimplifyCFG by default, but enabled it
late in the pipeline. But it appears as if the LTO pipelines got missed.
This patch adjusts the LTO pipelines to also enable hoisting in the
later stages.
Unfortunately there's no easy way to add a test for the change I think.
Reviewed By: lebedev.ri
Differential Revision: https://reviews.llvm.org/D93684
One or more cmov instructions could be generated for these functions
when the Zbt extension is present.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D93768
Both tryReplaceExtracts and replaceBinOpShuffles may modify the IR, even
if no interleaved loads are generated, but currently the pass pretends
no changes were made.
This patch updates the pass to return true if either of the functions
made any changes. In case of tryReplaceExtracts, changes are made if
there are any Extracts and true is returned.
`replaceBinOpShuffles` always makes changes if BinOpShuffles is not empty.
It also always returned true, so I went ahead and change it to just
`replaceBinOpShuffles`.
Fixes PR48208.
Reviewed By: SjoerdMeijer
Differential Revision: https://reviews.llvm.org/D93997
getAs<> can return null if the cast is invalid, which can lead to null pointer deferences. Use castAs<> instead which will assert that the cast is valid.
A new TTI interface has been added 'Optional <unsigned>getMaxVScale' that
returns the maximum vscale for a given target.
When known getMaxVScale is used to compute the cost of masked gather scatter
for scalable vector.
Depends on D92094
Differential Revision: https://reviews.llvm.org/D93030
This patch adds patterns for the indexed variants of FCMLA. Mostly based
on a patch by Tim Northover.
Reviewed By: SjoerdMeijer
Differential Revision: https://reviews.llvm.org/D92947
Check if all possible values for a pair of knownbits give the same icmp result - these are based off the checks performed in InstCombineCompares.cpp and D86578.
Add exhaustive unit test coverage - a followup will update InstCombineCompares.cpp to use this.
The lowering of a <4 x i16> or <4 x i8> vecreduce.add into an i64 would
previously be expanded, due to the i64 not being legal. This patch
adjusts our reduction matchers, making it produce a VADDLV(sext A to
v4i32) instead.
Differential Revision: https://reviews.llvm.org/D93622
* Prevent the generation of invalid shift instructions by constraining
the immediate field. I've limited the shift field to constant values
only, adding the `R_SPARC_5`/`R_SPARC_6` relocations is trivial if
needed (but I can't really think of a use case for those).
* Fix the generation of PC-relative `call`
* Fix the transformation of `jmp sym` into `jmpl`
* Emit fixups for simm13 operands
I moved the choice of the correct relocation into the code emitter as I've
seen the other backends do, it can be definitely cleaner but the aim was
to reduce the scope of the patch as much as possible.
Fixes the problems raised by joerg in L254199
Reviewed By: dcederman
Differential Revision: https://reviews.llvm.org/D78193
Change default CPU name of SX-Aurora VE from "ve" to "generic" similar
to other architectures.
Reviewed By: simoll
Differential Revision: https://reviews.llvm.org/D93836
The old CPU model only had MLA->MLA forwarding. I added some missing
MUL->MLA read advances and a missing absolute diff accumulator read
advance according to the Cortex A57 Software Optimization Guide.
The patch improves performance in EEMBC rgbyiqv2 by about 6%-7% and
spec2006/milc by 8% (repeated runs on multiple devices), causes no
significant regressions (none in SPEC).
Differential Revision: https://reviews.llvm.org/D92296
Currently ArgPromotion removes dead GEPs as part of the legality check
in isSafeToPromoteArgument. If no promotion happens, this means the pass
claims no modifications happened, even though GEPs were removed.
This patch fixes the issue by delaying removal of dead GEPs until
doPromotion: isSafeToPromoteArgument can simply skips dead GEPs and
the code in doPromotion dealing with GEPs is updated to account for
dead GEPs. Once we committed to promotion, it should be safe to
remove dead GEPs.
Alternatively isSafeToPromoteArgument could return an additional boolean
to indicate whether it made changes, but this is quite cumbersome and
there should be no real benefit of weeding out some dead GEPs here if we
do not perform promotion.
I added a test for the case where dead GEPs need to be removed when
promotion happens in 578c5a0c6e.
Fixes PR47477.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D93991
Remove VA.needsCustom checks which are copied from Sparc implementation
at the very beginning of VE implementation. Add assert to sanity-check
VA.needsCustom flag, also.
Reviewed By: simoll
Differential Revision: https://reviews.llvm.org/D93847
This patch fixes a crash encountered when compiling this code:
...
float16_t a;
__asm__("fminv %h[a], %[b], %[c].h"
: [a] "=r" (a)
: [b] "Upl" (b), [c] "w" (c))
The issue here is when using the 'h' modifier for a register
constraint 'r'.
Differential Revision: https://reviews.llvm.org/D93537
std::decay_t used by llvm/utils/benchmark/include/benchmark/benchmark.h is a c++14 feature, but the CMakelist uses c++11, it's the root-cause of build error.
There are two options to fix the error.
1) change the CMakelist to support c++14.
2) change std::decay_t to std::decay, it's what the patch done.
This bug can only be reproduced by CMake 3.15, we didn't observer the bug with CMake 3.16. But based on the code's logic, it's an obvious bug of LLVM.
Reviewed By: lebedev.ri
Differential Revision: https://reviews.llvm.org/D93794
In `PPCInstrInfo::optimizeCompareInstr` we seek opportunities to fold `cmp(d|w)` and `subf` as an `subf.`. However, if `subf.` gets overflow, `cr0` can't reflect the correct order, violating the semantics of `cmp(d|w)`.
Fixed https://bugs.llvm.org/show_bug.cgi?id=47830.
Reviewed By: #powerpc, nemanjai
Differential Revision: https://reviews.llvm.org/D90156
See OMP-5.0 2.19.5.5 task_reduction Clause.
To add a positive test case we need `taskgroup` directive which is not added hence skipping the test.
This is a dependency for `taskgroup` construct.
Reviewed By: clementval
Differential Revision: https://reviews.llvm.org/D93105
Co-authored-by: Valentin Clement <clementval@gmail.com>
before:
$ echo 'int main(){}'|clang -g -fsanitize=leak -x c++ -;./a.out
Tracer caught signal 11: addr=0x7f4f73da5f40 pc=0x4222c8 sp=0x7f4f72cffd40
==1164171==LeakSanitizer has encountered a fatal error.
==1164171==HINT: For debugging, try setting environment variable LSAN_OPTIONS=verbosity=1:log_threads=1
==1164171==HINT: LeakSanitizer does not work under ptrace (strace, gdb, etc)
$ _
after:
$ echo 'int main(){}'|clang -g -fsanitize=leak -x c++ -;./a.out)
$ _
I haven't verified the size cannot be affected by Fedora patches of
upstream glibc-2.32 - but I do not expect upstream glibc-2.32 would have
the last sizes `(1216, 2304)` from 2013 around glibc-2.12.
Differential Revision: https://reviews.llvm.org/D93386
Attempt to fix the 2 failing tests identifier in 48646.
Appears that python3 doesn't like nested double quotes in single quoted strings, hopefully nested single quotes in double quoted strings is a-ok.
Reviewed By: thakis
Differential Revision: https://reviews.llvm.org/D93979
This is NFC since SimplifyCFG still currently defaults to not preserving DomTree.
SimplifyCFGOpt::simplifyOnce() is only be called from SimplifyCFGOpt::run(),
and can not be called externally, since SimplifyCFGOpt is defined in .cpp
This avoids some needless verifications, and is thus a bit faster
without sacrificing precision.
We only need to remove non-TrueBB/non-FalseBB successors,
and we only need to do that once. We don't need to insert
any new edges, because no new successors will be added.
As the comment already indicates, performing an operation with
nnan/ninf flags on a nan/inf or undef results in poison. Now that
we have a proper poison value, we no longer need to relax it to
undef.
Div/rem by zero is immediate undefined behavior and anything goes.
Currently we fold it to undef, this patch changes it to fold to
poison instead, which is slightly stronger.
Differential Revision: https://reviews.llvm.org/D93995
This is the same change as D93990, but for extractelement rather
than insertelement.
> If idx exceeds the length of val for a fixed-length vector, the
> result is a poison value. For a scalable vector, if the value of
> idx exceeds the runtime length of the vector, the result is a
> poison value.