This is a retry of 324a53205. I cautiously reverted that at 6aa3fc4
because the rules about gep math were not clear. Since then, we
have added this line to LangRef for gep inbounds:
"The successive addition of offsets (without adding the base address)
does not wrap the pointer index type in a signed sense (nsw)."
See D90708 and post-commit comments on the revert patch for more details.
This reapplies 36c64af9d7 in updated
form.
Emit the xdata for each function at .seh_endproc. This keeps the
exact same output header order for most code generated by the LLVM
CodeGen layer. (Sections still change order for code built from
assembly where functions lack an explicit .seh_handlerdata
directive, and functions with chained unwind info.)
The practical effect should be that assembly output lacks
superfluous ".seh_handlerdata; .text" pairs at the end of functions
that don't handle exceptions, which allows such functions to use
the AArch64 packed unwind format again.
Differential Revision: https://reviews.llvm.org/D87448
Current check compiles the regex on every attempt at matching. The check also populates and enables a regex value by default so the default behaviour results in regex re-compilation for every macro - if the check is enabled. If people used this check there's a reasonable chance they would have relatively complex regexes in use.
This is a quick and simple fix to store and use the compiled regex.
Reviewed By: njames93
Differential Revision: https://reviews.llvm.org/D91908
The devirtualization wrapper misses cases where if it wraps a pass
manager, an individual pass may devirtualize an indirect call created by
a previous pass. For example, inlining may create a new indirect call
which is devirtualized by instcombine. Currently the devirtualization
wrapper will not see that because it only checks cgscc edges at the very
beginning and end of the pass (manager) it wraps.
This fixes some tests testing this exact behavior in the legacy PM.
Instead of checking WeakTrackingVHs for CallBases at the very beginning
and end of the pass it wraps, check every time
updateCGAndAnalysisManagerForPass() is called.
check-llvm and check-clang with -abort-on-max-devirt-iterations-reached
on by default doesn't show any failures outside of tests specifically
testing it so it doesn't needlessly rerun passes more than necessary.
(The NPM -O2/3 pipeline run the inliner/function simplification pipeline
under a devirtualization repeater pass up to 4 times by default).
http://llvm-compile-time-tracker.com/?config=O3&stat=instructions&remote=aeubanks
shows that 7zip has ~1% compile time regression. I looked at it and saw
that there indeed was devirtualization happening that was not previously
caught, so now it reruns the CGSCC pipeline on some SCCs, which is WAI.
Reviewed By: asbirlea
Differential Revision: https://reviews.llvm.org/D89587
Support reserved [0-100] and non-reserved[101-65535] Clang/GNU init
priority values on AIX.
This patch maps Clang/GNU values into priority values used in sinit/sterm
functions. User can play with values and be able to get init to occur
before or after XL init and vice versa.
Differential Revision: https://reviews.llvm.org/D91272
Accept macro function definitions, and apply them when invoked in operand position.
Reviewed By: thakis
Differential Revision: https://reviews.llvm.org/D89734
These tests invoke opt and llc even though they are in the frontend.
We now do a better job of generating commuted patterns for fma so
these tests now form fmls instead of fmla+fneg.
Exposing some utility functions from Linalg to allow for promotion of
fused views outside of the core tile+fuse logic.
This is an alternative to patch D91322 which adds the promotion logic
to the tileAndFuse method. Downside with that approach is that it is
not easily customizable based on needs.
Differential Revision: https://reviews.llvm.org/D91503
The test needs an object file, which it currenty gets with
`-target x86_64-apple-darwin10`. Rather than adding `REQUIRES: X86`, create
the object file via yaml2obj. This way, the test runs and passes even if the
host arch isn't x86 and only the host arch is built.
Part of PR46644.
Enhance the tile+fuse logic to allow fusing a sequence of operations.
Make sure the value used to obtain tile shape is a
SubViewOp/SubTensorOp. Current logic used to get the bounds of loop
depends on the use of `getOrCreateRange` method on `SubViewOp` and
`SubTensorOp`. Make sure that the value/dim used to compute the range
is from such ops. This fix is a reasonable WAR, but a btter fix would
be to make `getOrCreateRange` method be a method of `ViewInterface`.
Differential Revision: https://reviews.llvm.org/D90991
X86 was already specially marking fma as commutable which allowed
tablegen to autogenerate commuted patterns. This moves it to the target
independent definition and fix up the targets to remove now
unneeded patterns.
Unfortunately, the tests change because the commuted version of
the patterns are generating operands in a different than the
explicit patterns.
Differential Revision: https://reviews.llvm.org/D91842
Previously, there was no way to add context to the diagnostic engine via the C API. Adding this ability makes it much easier to reason about memory ownership, particularly in reference-counted languages such as Swift. There are more details in the review comments.
Reviewed By: ftynse, mehdi_amini
Differential Revision: https://reviews.llvm.org/D91738
VE Vector Predicated (VVP) SDNodes form an intermediate layer between VE
vector instructions and the initial SDNodes.
We introduce 'vvp_add' with isel and tests as the first of these VVP
nodes. VVP nodes have a mask and explicit vector length operand, which
we will make proper use of later.
Reviewed By: kaz7
Differential Revision: https://reviews.llvm.org/D91802
traverse() predates the IgnoreUnlessSpelledInSource mode. Update example
and test code to use the newer mode.
Differential Revision: https://reviews.llvm.org/D91917
The testcase was added in faf848ac32 to test the fix of PR 47969, but
it was named pr48980 (which happens to be the TR number in my downstream
issue system).
An SCF 'for' loop does not iterate if its lower bound is equal to its upper
bound. Remove loops where both bounds are the same SSA value as such bounds are
guaranteed to be equal. Similarly, remove 'parallel' loops where at least one
pair of respective lower/upper bounds is specified by the same SSA value.
Reviewed By: gysit
Differential Revision: https://reviews.llvm.org/D91880
ConstantOffsetPtrs contains mappings from a Value to a base pointer and
an offset. The offset is typed and has a size, and at least when dealing
with ptrtoint, it could happen that we had a mapping from a ptrtoint
with type i32 to an offset with type i16. This could later cause
problems, showing up in PR 47969 and PR 38500.
In PR 47969 we ended up in an assert complaining that trunc i16 to i16
is invalid and in Pr 38500 that a cmp on an i32 and i16 value isn't
valid.
Reviewed By: spatel
Differential Revision: https://reviews.llvm.org/D90610
This test contains YAMLs that can be merged with use of macros.
This opens road for adding more test cases.
Differential revision: https://reviews.llvm.org/D91953
The existing implementation of the conversion from SCF Parallel operation to
SCF "for" loops in order to further convert those loops to branch-based CFG has
been cloning the loop and reduction body operations into the new loop because
ConversionPatternRewriter was missing support for moving blocks while replacing
their arguments. This functionality now available, use it to implement the
conversion and avoid cloning operations, which may lead to doubling of the IR
size during the conversion.
In addition, this fixes an issue with converting nested SCF "if" conditionals
present in "parallel" operations that would cause the conversion infrastructure
to stop because of the repeated application of the pattern converting "newly"
created "if"s (which were in fact just moved). Arguably, this should be fixed
at the infrastructure level and this fix is a workaround.
Reviewed By: herhut
Differential Revision: https://reviews.llvm.org/D91955