These can be lowered to code sequences using CMPFP and CMPFPE which then get
selected to VCMP and VCMPE. The implementation isn't fully correct, as the chain
operand isn't handled correctly, but resolving that looks like it would involve
changes around FPSCR-handling instructions and how the FPSCR is modelled.
The fp-intrinsics test was already testing some of this but as the entire test
was being XFAILed it wasn't noticed. Un-XFAIL the test and instead leave the
cases where we aren't generating the right instruction sequences as FIXME.
Differential Revision: https://reviews.llvm.org/D73194
Summary:
In big-endian MVE, the simple vector load/store instructions (i.e.
both contiguous and non-widening) don't all store the bytes of a
register to memory in the same order: it matters whether you did a
VSTRB.8, VSTRH.16 or VSTRW.32. Put another way, the in-register
formats of different vector types relate to each other in a different
way from the in-memory formats.
So, if you want to 'bitcast' or 'reinterpret' one vector type as
another, you have to carefully specify which you mean: did you want to
reinterpret the //register// format of one type as that of the other,
or the //memory// format?
The ACLE `vreinterpretq` intrinsics are specified to reinterpret the
register format. But I had implemented them as LLVM IR bitcast, which
is specified for all types as a reinterpretation of the memory format.
So a `vreinterpretq` intrinsic, applied to values already in registers,
would code-generate incorrectly if compiled big-endian: instead of
emitting no code, it would emit a `vrev`.
To fix this, I've introduced a new IR intrinsic to perform a
register-format reinterpretation: `@llvm.arm.mve.vreinterpretq`. It's
implemented by a trivial isel pattern that expects the input in an
MQPR register, and just returns it unchanged.
In the clang codegen, I only emit this new intrinsic where it's
actually needed: I prefer a bitcast wherever it will have the right
effect, because LLVM understands bitcasts better. So we still generate
bitcasts in little-endian mode, and even in big-endian when you're
casting between two vector types with the same lane size.
For testing, I've moved all the codegen tests of vreinterpretq out
into their own file, so that they can have a different set of RUN
lines to check both big- and little-endian.
Reviewers: dmgreen, MarkMurrayARM, miyuki, ostannard
Reviewed By: dmgreen
Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D73786
Summary:
These instructions generate a vector of consecutive elements starting
from a given base value and incrementing by 1, 2, 4 or 8. The `wdup`
versions also wrap the values back to zero when they reach a given
limit value. The instruction updates the scalar base register so that
another use of the same instruction will continue the sequence from
where the previous one left off.
At the IR level, I've represented these instructions as a family of
target-specific intrinsics with two return values (the constructed
vector and the updated base). The user-facing ACLE API provides a set
of intrinsics that throw away the written-back base and another set
that receive it as a pointer so they can update it, plus the usual
predicated versions.
Because the intrinsics return two values (as do the underlying
instructions), the isel has to be done in C++.
This is the first family of MVE intrinsics that use the `imm_1248`
immediate type in the clang Tablegen framework, so naturally, I found
I'd given it the wrong C integer type. Also added some tests of the
check that the immediate has a legal value, because this is the first
time those particular checks have been exercised.
Finally, I also had to fix a bug in MveEmitter which failed an
assertion when I nested two `seq` nodes (the inner one used to extract
the two values from the pair returned by the IR intrinsic, and the
outer one put on by the predication multiclass).
Reviewers: dmgreen, MarkMurrayARM, miyuki, ostannard
Reviewed By: dmgreen
Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D73357
Summary:
The unpredicated case of this is trivial: the clang codegen just makes
a vector splat of the input, and LLVM isel is already prepared to
handle that. For the predicated version, I've generated a `select`
between the same vector splat and the `inactive` input parameter, and
added new Tablegen isel rules to match that pattern into a predicated
`MVE_VDUP` instruction.
Reviewers: dmgreen, MarkMurrayARM, miyuki, ostannard
Reviewed By: dmgreen
Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D73356
Summary:
A Copy with a source that is zeros is the same as a Set of zeros.
This fixes the invariant that SrcAlign should always be non-null.
Reviewers: courbet
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D73791
Summary: There are no counters for individual ports, but this is already
enough to find a lot of issues in the current model (upcoming patch).
Reviewers: dblaikie, gchatelet
Subscribers: hiraditya, tschuett, RKSimon, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D72032
Summary:
D68092 introduced a new SIRemoveShortExecBranches optimization pass and
broke some graphics shaders. The problem is that it was removing
branches over KILL pseudo instructions, and the fix is to explicitly
check for that in mustRetainExeczBranch.
Reviewers: critson, arsenm, nhaehnle, cdevadas, hakzsam
Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, hiraditya, kerbowa, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D73771
Duplicating instructions can lead to code size increases but using
a threshold of 3 is good for reducing code size.
Differential Revision: https://reviews.llvm.org/D72916
If all call sites are in `norecurse` functions we can derive `norecurse`
as the ReversePostOrderFunctionAttrsPass does. This should make
ReversePostOrderFunctionAttrsLegacyPass obsolete once the Attributor is
enabled.
Reviewed By: uenoku
Differential Revision: https://reviews.llvm.org/D72017
If we know that all call sites have been processed we can derive an
early fixpoint. The use in this patch is likely not to trigger right now
but a follow up patch will make use of it.
Reviewed By: uenoku, baziotis
Differential Revision: https://reviews.llvm.org/D72016
We only need to call this on floating point comparisons. In this
case these are known to be integer compares. One of them even
has a SUB opcode instead of CMP.
With this patch new trivial edges can be added to an SCC in a CGSCC
pass via the updateCGAndAnalysisManagerForCGSCCPass method. It shares
almost all the code with the existing
updateCGAndAnalysisManagerForFunctionPass method but it implements the
first step towards the TODOs.
This was initially part of D70927.
Reviewed By: JonChesterfield
Differential Revision: https://reviews.llvm.org/D72025
If we had `noalias` on an argument the inliner created alias scope
metadata already. However, the call site `noalias` annotation was not
considered. Since the Attributor can derive such call site `noalias`
annotation we should treat them the same as argument annotations.
Reviewed By: hfinkel
Differential Revision: https://reviews.llvm.org/D73528
This is the first of multiple parts to make OpenMP context/trait
handling reusable and generic. This patch was originally part of D71830
but with the unit tests it can be tested independently.
This patch implements an almost complete handling of OpenMP
contexts/traits such that we can reuse most of the logic in Flang
through the OMPContext.{h,cpp} in llvm/Frontend/OpenMP.
All but construct SIMD specifiers, e.g., inbranch, and the device ISA
selector are define in llvm/lib/Frontend/OpenMP/OMPKinds.def. From
these definitions we generate the enum classes TraitSet,
TraitSelector, and TraitProperty as well as conversion and helper
functions in llvm/lib/Frontend/OpenMP/OMPContext.{h,cpp}.
The OpenMP context is now an explicit object (see `struct OMPContext`).
This is in anticipation of construct traits that need to be tracked. The
OpenMP context, as well as the VariantMatchInfo, are basically made up
of a set of active or respectively required traits, e.g., 'host', and an
ordered container of constructs which allows duplication. Matching and
scoring is kept as generic as possible to allow easy extension in the
future.
Reviewed By: JonChesterfield
Differential Revision: https://reviews.llvm.org/D71847
Fix attempt
this is part of the implementation of http://lists.llvm.org/pipermail/llvm-dev/2019-December/137632.html
this patch gives the basis of building an assume to preserve all information from an instruction and add support for building an assume that preserve the information from a call.
The code paths in the absence of TargetMachine, TargetLowering or
TargetRegisterInfo are poorly tested. As rL285987 said, requiring
TargetPassConfig allows us to delete many (untested) checks littered
everywhere.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D73754
Summary:
this is part of the implementation of http://lists.llvm.org/pipermail/llvm-dev/2019-December/137632.html
this patch gives the basis of building an assume to preserve all information from an instruction and add support for building an assume that preserve the information from a call.
Reviewers: jdoerfert
Reviewed By: jdoerfert
Subscribers: mgrang, fhahn, mgorny, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D72475
Summary:
this is part of the implementation of http://lists.llvm.org/pipermail/llvm-dev/2019-December/137632.html
this patch gives the basis of building an assume to preserve all information from an instruction and add support for building an assume that preserve the information from a call.
Reviewers: jdoerfert
Reviewed By: jdoerfert
Subscribers: mgrang, fhahn, mgorny, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D72475
Summary:
this is part of the implementation of http://lists.llvm.org/pipermail/llvm-dev/2019-December/137632.html
this patch gives the basis of building an assume to preserve all information from an instruction and add support for building an assume that preserve the information from a call.
Reviewers: jdoerfert
Reviewed By: jdoerfert
Subscribers: mgrang, fhahn, mgorny, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D72475
We want to allow splat value transforms to improve PR44588 and related bugs:
https://bugs.llvm.org/show_bug.cgi?id=44588
...but to do that, we need to know if values are splatted from the same,
specific index (lane) rather than splatted from an arbitrary index.
We can improve the undef handling with 1-liner follow-ups because the
Constant API optionally allow undefs now.
Differential Revision: https://reviews.llvm.org/D73549
Summary:
this patch makes tablegen generate llvm attributes in a more generic and simpler (at least to me).
changes: make tablegen generate
...
ATTRIBUTE_ENUM(Alignment,align)
ATTRIBUTE_ENUM(AllocSize,allocsize)
...
which can be used to generate most of what was previously used and more.
Tablegen was also generating attributes from 2 identical files leading to identical output. so I removed one of them and made user use the other.
Reviewers: jdoerfert, thakis, aaron.ballman
Reviewed By: aaron.ballman
Subscribers: mgorny, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D72455
Summary:
this is part of the implementation of http://lists.llvm.org/pipermail/llvm-dev/2019-December/137632.html
this patch gives the basis of building an assume to preserve all information from an instruction and add support for building an assume that preserve the information from a call.
Reviewers: jdoerfert
Reviewed By: jdoerfert
Subscribers: mgrang, fhahn, mgorny, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D72475
Summary:
this patch makes tablegen generate llvm attributes in a more generic and simpler (at least to me).
changes: make tablegen generate
...
ATTRIBUTE_ENUM(Alignment,align)
ATTRIBUTE_ENUM(AllocSize,allocsize)
...
which can be used to generate most of what was previously used and more.
Tablegen was also generating attributes from 2 identical files leading to identical output. so I removed one of them and made user use the other.
Reviewers: jdoerfert, thakis, aaron.ballman
Reviewed By: aaron.ballman
Subscribers: mgorny, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D72455
This reverts commit e34801c8e6 and the followup due to multiple
problems.
I've tried to keep the tests and RDA parts where possible, as those
still seem useful.
The current FirstMI.getDebugLoc() is actually null in almost all cases.
If it isn't, the generated .loc will be considered initial. The .loc
will have the prologue_end flag and terminate the prologue prematurely.
Also use an overload of BuildMI that will not prepend
PATCHABLE_FUNCTION_ENTRY to a MachineInstr bundle.
Summary:
Virtual registers that are undef have an empty LiveInterval at this
point, which means beginIndex() and endIndex() cannot be used. We
only need those indices to determine the range in which to scan for
affected other NSA instructions, and undef operands cannot contribute
to that range.
Reviewers: arsenm, rampitec, mareko
Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, hiraditya, kerbowa, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D73831
We were checking that the original Value * for the compare operands
were null. But that can never happen.
I believe we intended to check for 0 registers here instead.
Fixes PR44749.
This is an alternate fix for the issue D73606 was trying to
solve.
The main issue here is that we bailed out of
foldOffsetIntoAddress if Offset is 0. But if we just found a
symbolic displacement and AM.Disp became non-zero
earlier, we still need to validate that AM.Disp with the symbolic
displacement.
This is my second attempt at committing this after failing
build bots previously. One thing I realized about the previous
attempt is that its possible that AM.Disp is already non-zero
and the new Offset changes it back to zero. In that case my
previous attempt failed to update AM.Disp to zero. So this patch
removes the early out for 0 and appropriately handle the 0 case
in each check so we still update AM.Disp at the end.
This is based on this llvm-dev thread http://lists.llvm.org/pipermail/llvm-dev/2019-December/137521.html
The current strategy for f16 is to promote type to float every except where the specific width is required like loads, stores, and bitcasts. This results in rounding occurring in odd places instead of immediately after arithmetic operations. This interacts in weird ways with the __fp16 type in clang which is a storage only type where arithmetic is always promoted to float. InstCombine can remove some fpext/fptruncs around such arithmetic and turn it into arithmetic on half. This wouldn't be so bad if SelectionDAG was able to put those fpext/fpround back in when it promotes.
It is also not obvious how to handle to make the existing strategy work with STRICT fp. We need to use STRICT versions of the conversions which require chain operands. But if the conversions are created for a bitcast, there is no place to get an appropriate chain from.
This patch implements a different strategy where conversions are emitted directly around arithmetic operations. And otherwise its passed around as an i16 including in arguments and return values. This can result in more conversions between arithmetic operations, but is closer to matching the IR the frontend generates for __fp16. And it will allow us to use the chain from constrained arithmetic nodes to link the STRICT_FP_TO_FP16/STRICT_FP16_TO_FP that will need to be added. I've set it up so that each target can opt into the new behavior. Converting all the targets myself was more than I was able to handle.
Differential Revision: https://reviews.llvm.org/D73749
This fixes legalizations of global stores > 128-bits. It seems work is
needed on how this split actually occurs. For example, we get the
right code for s160, with an s128 and s32 load, but get 5 s32 loads
for <5 x s32>.
This patch adds initial support for a DemandedElts mask to the internal computeKnownBits/ComputeNumSignBits methods, matching the SelectionDAG and GlobalISel equivalents.
So far only a couple of instructions have been setup to handle the DemandedElts, the remainder still using the existing 'all elements' default. The plan is to extend support as we have test coverage.
Differential Revision: https://reviews.llvm.org/D73435
Copy it instead. Otherwise, key registers (such as RBP) may get zeroed
out by the stack unwinder.
Fixes CrashRecoveryTest.DumpStackCleanup with MSVC in release builds.
Reviewed By: stella.stamenova
Differential Revision: https://reviews.llvm.org/D73809
Significant missing hashing - as per the comment this was only meant to
skip member functions (unspecified, but I think it's legible as member
function declarations, not definitions) but was skipping all named
subprograms (so only hashed child DIEs for member function definitions -
because they didn't have a direct name, but only a name given indirectly
in the DW_AT_specification-referenced DIE)
Summary:
Implements the jump pseudo-instruction, which is used in e.g. the Linux kernel.
Reviewers: asb, lenary
Reviewed By: lenary
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D73178
If dumping an Split DWARF file that hasn't been split into separate
files (such as from llc - that includes the plain and .dwo sections in
the same file) allow both macinfo and macinfo.dwo sections to be dumped.
Summary: With the new pass manager, it is not possible to obtain a pointer to the pass.
Reviewers: jfb, rinon, yln
Subscribers: hiraditya, dexonsmith, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D73390
Summary:
Applying this cleanup:
- MIRBuilder.buildInstr(TargetOpcode::G_ASHR)
- .addDef(Shifted)
- .addUse(Res)
- .addUse(ShiftAmt);
+ MIRBuilder.buildAShr(Shifted, Res, ShiftAmt);
caused an assertion failure here:
llc: /home/jayfoad2/git/llvm-project/llvm/lib/CodeGen/MachineRegisterInfo.cpp:404: llvm::MachineInstr *llvm::MachineRegisterInfo::getVRegDef(unsigned int) const: Assertion `(I.atEnd() || std::next(I) == def_instr_end()) && "getVRegDef assumes a single definition or no definition"' failed.
#4 0x00000000050a6d96 in llvm::MachineRegisterInfo::getVRegDef (this=0x74606a0, Reg=2147483650) at /home/jayfoad2/git/llvm-project/llvm/lib/CodeGen/MachineRegisterInfo.cpp:403
#5 0x00000000066148f6 in llvm::getConstantVRegValWithLookThrough (VReg=2147483650, MRI=..., LookThroughInstrs=false, HandleFConstant=true) at /home/jayfoad2/git/llvm-project/llvm/lib/CodeGen/GlobalISel/Utils.cpp:244
#6 0x00000000066147da in llvm::getConstantVRegVal (VReg=2147483650, MRI=...) at /home/jayfoad2/git/llvm-project/llvm/lib/CodeGen/GlobalISel/Utils.cpp:210
#7 0x0000000006615367 in llvm::ConstantFoldBinOp (Opcode=101, Op1=2147483650, Op2=2147483656, MRI=...) at /home/jayfoad2/git/llvm-project/llvm/lib/CodeGen/GlobalISel/Utils.cpp:341
#8 0x000000000657eee0 in llvm::CSEMIRBuilder::buildInstr (this=0x7465010, Opc=101, DstOps=..., SrcOps=..., Flag=...) at /home/jayfoad2/git/llvm-project/llvm/lib/CodeGen/GlobalISel/CSEMIRBuilder.cpp:160
#9 0x0000000003645958 in llvm::MachineIRBuilder::buildAShr (this=0x7465010, Dst=..., Src0=..., Src1=..., Flags=...) at /home/jayfoad2/git/llvm-project/llvm/include/llvm/CodeGen/GlobalISel/MachineIRBuilder.h:1298
#10 0x00000000065c35b1 in llvm::LegalizerHelper::lower (this=0x7fffffffb5f8, MI=..., TypeIdx=0, Ty=...) at /home/jayfoad2/git/llvm-project/llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp:2020
because at this point there are two instructions defining Res: the
original G_SMULO/G_UMULO and the new G_MUL that we built. The fix is
to modify the original mul in place, so that there is only ever one
definition of Res.
Reviewers: arsenm, aditya_nandakumar
Subscribers: wdng, rovka, hiraditya, volkan, Petar.Avramovic, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D72842
When you encounter a G_TRUNC, you are moving from a larger type to a smaller
type.
Asking for the i-th bit on a larger value is the same as asking for the i-th
bit on a smaller value.
So, we should always be able to walk through G_TRUNC when computing the bit
for a TB(N)Z.
Differential Revision: https://reviews.llvm.org/D73748
This allows SimplifyDemandedBits to call SimplifyMultipleUseDemandedBits to create a simpler ISD::INSERT_SUBVECTOR, which is particularly useful for cases where we're splitting into subvectors anyhow.
Some code gen passes use MBFIWrapper to keep track of the frequency of new
blocks. This was not taken into account and could lead to incorrect frequencies
as MBFI silently returns zero frequency for unknown/new blocks.
Add a variant for MBFIWrapper in the PGSO query interface.
Depends on D73494.
Summary: This is a first step before changing the types to llvm::Align and introduce functions to ease client code.
Reviewers: courbet
Subscribers: arsenm, sdardis, nemanjai, jvesely, nhaehnle, hiraditya, kbarton, jrtc27, atanasyan, jsji, kerbowa, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D73785
Try out using combine definition rules.
This really should be a post-legalizer combine, but the combiner pass
is currently pre-legalize. Most of the target combines are really
post-legalize, so we should probably move the pass.
We may calculate reassociable math ops in arbitrary order when creating a shuffle reduction,
so there's no guarantee that things like 'nsw' hold on those intermediate values. Drop all
poison-generating flags for safety.
This change is limited to shuffle reductions because I don't think we have a problem in the
general case (where we intersect flags of each scalar op that goes into a vector op), but if
there's evidence of other cases being wrong, we can extend this fix to cover those cases.
https://bugs.llvm.org/show_bug.cgi?id=44536
Differential Revision: https://reviews.llvm.org/D73727
First attempt at implementing -fsemantic-interposition.
Rely on GlobalValue::isInterposable that already captures most of the expected
behavior.
Rely on a ModuleFlag to state whether we should respect SemanticInterposition or
not. The default remains no.
So this should be a no-op if -fsemantic-interposition isn't used, and if it is,
isInterposable being already used in most optimisation, they should honor it
properly.
Note that it only impacts architecture compiled with -fPIC and no pie.
Differential Revision: https://reviews.llvm.org/D72829
This patch wraps an external thread local storage variable inside of a
getter function and makes it have internal linkage. This allows LLVM to
be built with BUILD_SHARED_LIBS on windows with MinGW. Additionally it
allows Clang versions prior to 10 to compile current trunk for MinGW.
Differential Revision: https://reviews.llvm.org/D73639
One of the exit criteria of computeKnownBits is whether we reach the max
recursive call depth. Before this patch we would check that the
depth is exactly equal to max depth to exit.
Depth may get bigger than max depth if it gets passed to a different
GISelKnownBits object.
This may happen when say a generic part uses a GISelKnownBits object
with some max depth, but then we hit TL.computeKnownBitsForTargetInstr
which creates a new GISelKnownBits object with a different and smaller
depth. In that situation, when we hit the max depth check for the first
time in the target specific GISelKnownBits object, depth may already
be bigger than the current max depth. Hence we would continue to compute
the known bits, until we ran through the full depth of the chain of
computation or ran out of stack space.
For instance, let say we have
GISelKnownBits Info(/*MaxDepth*/ = 10);
Info.getKnownBits(Foo)
// 9 recursive calls to computeKnownBitsImpl.
// Then we hit a target specific instruction.
// The target specific GISelKnownBits does this:
GISelKnownBits TargetSpecificInfo(/*MaxDepth*/ = 6)
TargetSpecificInfo.computeKnownBitsImpl() // <-- next max depth checks would
// always return false.
This commit does not have any test case, none of the in-tree targets
use computeKnownBitsForTargetInstr.
For a MC_GlobalAddress reference to a dso_local external GlobalValue with a definition, emit .Lfoo$local to avoid a relocation.
-fno-pic and -fpie can infer dso_local but -fpic cannot. In the future,
we can explore the possibility of inferring dso_local with -fpic. As the
description of D73228 says, LLVM's existing IPO optimization behaviors
(like -fno-semantic-interposition) and a previous assembly behavior give
us enough license to be aggressive here.
Reviewed By: rnk
Differential Revision: https://reviews.llvm.org/D73230
This patch addresses the issue found in https://bugs.llvm.org/show_bug.cgi?id=44585
where a DW_OP_deref was placed at the end of a dwarf expression, resulting in corrupt
symbols when debugging.
This is an attempt to reland with a few fixes for buildbot since I
haven't merged from master in a bit.
Differential Revision: https://reviews.llvm.org/D73526
We can have geps that have a scalar base pointer, and a vector index value, which
means that the base pointer must be splatted into a vector of pointers.
This fixes crashes on arm64 GlobalISel with optimizations enabled.
I believe this also fixes bugs with CI 32-bit handling, which was
incorrectly skipping offsets that look like signed 32-bit values. Also
validate the offsets are dword aligned before folding.
This is similar to the code in getTestBitOperand in AArch64ISelLowering. Instead
of implementing all of the TB(N)Z optimizations at once, this patch implements
the simplest case first. The way that this is set up should make it fairly easy
to add the rest as we go along.
The idea here is that after determining that we can use a TB(N)Z, we can
continue looking through instructions and perform further folding.
In this case, when we have a G_ZEXT or G_ANYEXT where the extended bits are not
used, we can fold it into the TB(N)Z.
Differential Revision: https://reviews.llvm.org/D73673
Found by inspection, but there's no test for this yet because G_PTR_ADD is
currently illegal for vectors. I'll add the test at a later time when the
legalizer support has landed.
There's not much value to this separate node from the intrinsic. Make
the operand structure the same as the intrinsic, so we can reuse the
same pattern for GlobalISel.
In line with current conventions, create new instructions rather
than modify two operands in place and performing manual worklist
management.
This should be NFC apart from possible worklist order changes.
Summary:
Added file headers for files which implement iterative lightweight scheduling
strategies. Which is basically an exercise which I undertook in order to get
used to LLVM development process.
Reviewers: arsenm, vpykhtin, cdevadas
Reviewed By: vpykhtin
Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, javed.absar, kerbowa, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D73417
- Extends the comments related to function descriptors, noting how they
are only used on AIX.
- Changes the condition used to gate the creation of the current function
symbol in AsmPrinter::SetupMachineFunction to reflect being AIX
specific. The creation of the symbol is different because of AIXs
linkage conventions, not because AIX uses function descriptors.
Differential Revision: https://reviews.llvm.org/D73115
Summary:
For -fpatchable-function-entry=N,0 -mbranch-protection=bti, after
9a24488cb6, we place the NOP sled after
the initial BTI.
```
.Lfunc_begin0:
bti c
nop
nop
.section __patchable_function_entries,"awo",@progbits,f,unique,0
.p2align 3
.xword .Lfunc_begin0
```
This patch adds a label after the initial BTI and changes the __patchable_function_entries entry to reference the label:
```
.Lfunc_begin0:
bti c
.Lpatch0:
nop
nop
.section __patchable_function_entries,"awo",@progbits,f,unique,0
.p2align 3
.xword .Lpatch0
```
This placement is compatible with the resolution in
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92424 .
A local linkage function whose address is not taken does not need a BTI.
Placing the patch label after BTI has the advantage that code does not
need to differentiate whether the function has an initial BTI.
Reviewers: mrutland, nickdesaulniers, nsz, ostannard
Subscribers: kristof.beyls, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D73680
Summary:
Similar to issue D71445. Scalable vector should not be evaluated element by element.
Add support to handle scalable vector UndefValue.
Reviewers: sdesmalen, efriedma, apazos, huntergr, willlovett
Reviewed By: efriedma
Subscribers: tschuett, hiraditya, rkruppe, psnobl, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D73678
Summary:
Disable the always importing of constants introduced in D70404 by
default under a new internal option, since it is causing order of
magnitude compile time regressions during the thin link. Will continue
investigating why the regressions occur.
Reviewers: evgeny777, wmi
Subscribers: mehdi_amini, inglorion, hiraditya, steven_wu, dexonsmith, arphaman, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D73724
from FC0.ExitBlock to FC1.ExitBlock when proven safe.
Summary:
Currently LoopFusion give up when the second loop nest guard
block or the first loop nest exit block is not empty. For example:
if (0 < N) {
for (int i = 0; i < N; ++i) {}
x+=1;
}
y+=1;
if (0 < N) {
for (int i = 0; i < N; ++i) {}
}
The above example should be safe to fuse.
This PR moves instructions in FC1 guard block (e.g. y+=1;) to
FC0 guard block, or instructions in FC0 exit block (e.g. x+=1;) to
FC1 exit block, which then LoopFusion is able to fuse them.
Reviewer: kbarton, jdoerfert, Meinersbur, dmgreen, fhahn, hfinkel,
bmahjour, etiotto
Reviewed By: jdoerfert
Subscribers: hiraditya, llvm-commits
Tag: LLVM
Differential Revision: https://reviews.llvm.org/D73641
fadd/fmul reductions without reassoc are lowered to
VECREDUCE_STRICT_FADD/FMUL nodes, which don't have legalization
support. Until that is in place, expand these intrinsics on
ARM and AArch64. Other targets always expand the vector reduction
intrinsics.
Additionally expand fmax/fmin reductions without nonan flag on
AArch64, as the backend asserts that the flag is present when
lowering VECREDUCE_FMIN/FMAX.
This fixes https://bugs.llvm.org/show_bug.cgi?id=44600.
Differential Revision: https://reviews.llvm.org/D73135
The recommended optimization level for BPF programs
is O2 since (1). BPF is running inside the kernel and
linux kernel won't work at -O0 level, and (2). Verifier
is not able to handle O0 code properly, e.g., potential
large stack size and a lot of spills.
But we should keep -O0 at least compiling.
This patch fixed a bug in BPFMISimplifyPatchable phase
where with -O0, a segmentation fault will happen for a
simple program like:
int test(int a, int b) { return a + b; }
A test case is added to capture such a case.
Differential Revision: https://reviews.llvm.org/D73681
Summary:
This patch intends to support three most common relocation type
on AIX: R_POS, R_TOC, R_RBR.
These three relocation type will be needed for object file generation
on AIX for small code model.
We will have follow up patches to bring relocation support for
large code model on AIX.
Reviewers: hubert.reinterpretcast, daltenty, DiggerLin
Differential Revision: https://reviews.llvm.org/D72027
By adding the prefixed instructions the branch distances are no longer
computed correctly. Since prefixed instructions cannot cross a 64 byte
boundary we have to assume that a prefixed instruction may have a nop
prepended to it. This patch tries to take that nop into consideration
when computing the size of basic blocks.
Differential Revision: https://reviews.llvm.org/D72572
Summary:
When constant folding, constants that are wrapped in metadata were not
folded. This could lead to dbg.values being the only user of a constant
expression, due to the non-dbg uses having been rewritten, resulting in
the constant later on being removed by some other pass. This occurred
with the attached test case, in which the non-rewritten GEP in the
dbg.value intrinsic was later on removed by globalopt.
This patch makes the code look through metadata and fold such constants.
I guess that we in the future may want to allow dbg.values using GEPs and
other constant expressions to be emittable even if there are no non-dbg
uses, but for example SelectionDAG does not support that.
Reviewers: jmorse, aprantl, vsk, davide
Reviewed By: aprantl, vsk, davide
Subscribers: hiraditya, llvm-commits
Tags: #debug-info, #llvm
Differential Revision: https://reviews.llvm.org/D73630
The commit https://reviews.llvm.org/rG14fc20ca6 added some options to the X86
back end that cause the help text for opt/llc to become much harder to read.
The issue is that the cl::value_desc is part of the option name and is used to
compute the indentation of the description text (i.e. the maximum length option
name is what everything aligns to). Since the commit puts a large number of
characters into that text, everything is aligned to that width.
This patch just reformats the option so that the description is contained in the
description and the list of possible values is within the angle brackets.
Note: the readability issue of the helptext was fixed in commit
70cbf8c71c, but the re-formatting wasn't
added on that commit so I am still committing this.
Differential revision: https://reviews.llvm.org/D73267
Strict fp-to-int and int-to-fp conversions can be handled in the same way that
the non-strict versions are (by using the appropriate instruction or converting
to a function call when we have no instruction).
Differential Revision: https://reviews.llvm.org/D73625
On targets that don't have the normal packed f16 layout, handle these
during legalization. Directly modify the register types. We can infer
this was a d16 load based on the mem operand size during selection.
A16 operands should possibly be handled here as well, but don't worry
about that yet.
This trivially avoids violating the constant bus restriction.
Previously this was allowing one SGPR in the first source
operand, which technically also avoided violating this for most
operations (but not for special cases reading vcc).
We do need to write some new, smarter operand folds to pick the
optimal SGPR to use in some kind of post-isel fold, but that's purely
an optimization.
I was originally thinking we would pick which operands should be SGPRs
in RegBankSelect, but I think this isn't really manageable. There
would be additional complexity to handle every G_* instruction, and
then any nontrivial instruction patterns would need to know when to
avoid violating it, which is likely to be very error prone.
I think having all inputs being canonically copies to VGPRs will
simplify the operand folding logic. The current folding we do is
backwards, and only considers one operand at a time, relative to
operands it already has. It therefore poorly handles the case where
there is already a constant bus operand user. If all operands are
copies, it's somewhat simpler to consider all input operands at once
to choose the optimal constant bus user.
Since the failure mode for constant bus violations is now a verifier
error and not an selection failure, this moves towards a place where
we can turn on the fallback mode. The SGPR copy folding optimizations
can be left for later.
Commit 9965b12fd1 was supposed to change the offset constant when
lowering load/stores, but only introduced this change for loads. This
patch adds the same fix for stores.
A known limitation for Future CPU is that the new prefixed instructions may
not cross 64 Byte boundaries.
All instructions are already 4 byte aligned so the only situation where this
can occur is when the prefix is in one 64 byte block and the instruction that
is prefixed is at the top of the next 64 byte block. To fix this case
PPCELFStreamer was added to intercept EmitInstruction. When a prefixed
instruction is emitted we try to align it to 64 Bytes by adding a maximum of
4 bytes. If the prefixed instruction crosses the 64 Byte boundary then the
alignment would trigger and a 4 byte nop would be added to push the
instruction into the next 64 byte block.
Differential Revision: https://reviews.llvm.org/D72570
This gets selected to the appropriate fcvt instruction. Handling from there on
isn't fully correct yet, as we need to model fcvt reading and writing to fpsr
and fpcr.
Differential Revision: https://reviews.llvm.org/D73201
We are missing ability to override the sh_entsize field for
SHT_REL[A] sections. It would be useful for writing test cases.
Differential revision: https://reviews.llvm.org/D73621
These become STRICT_FCMP and STRICT_FCMPE, which then get selected to the
corresponding FCMP and FCMPE instructions, though the handling from there on
isn't fully correct as we don't model reads and writes to FPCR and FPSR.
Differential Revision: https://reviews.llvm.org/D73368
Summary:
The code was assuming in a few places that if there was only one exit
from the function that it was a normal return, which is invalid. It
could be an infinite loop, in which case we still need to insert the
usual fake edge so that the null export happens. This fixes shaders that
end with an infinite loop that discards.
Reviewers: arsenm, nhaehnle, critson
Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D71192
Summary:
Add trimming of unused components of s_buffer_load.
For s_buffer_load and unformatted buffer_load also trim unused
components at the beginning of vector and update offset accordingly.
Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D71785
The function a) returned 32-bits when in DWARF64, the PrologueLength
field is 64-bits in size, and b) didn't work for DWARF version 5.
Also deleted some related dead code. With this deletion, getLength is
itself dead, but another change is about to make use of it.
Reviewed by: probinson
Differential Revision: https://reviews.llvm.org/D73626
Summary:
In a release build this variable becomes unused and may break the build
with `-Werror,-Wunused-variable`.
Reviewers: gribozavr2, jdoerfert, sstefan1
Reviewed By: gribozavr2
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D73683
Summary:
As determined with `llvm-exegesis`.
Some of these look like typos/misunderstandings of the sched model td
spec:
- latency defaults to `1` when not set => Maybe we can avoid
having a default ?
- problems with regexps not being anchored by default (XCHG matching
CMPXHG)
Note that this is not complete, it fixes only the most obvious mistakes,
and only for latency (not uops).
Reviewers: RKSimon, GGanesh
Subscribers: hiraditya, jfb, mstojanovic, hfinkel, craig.topper, andreadb, lebedev.ri, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D73172
InstCombine operates on the basic premise that the operands of the
currently processed instruction have already been simplified. It
achieves this by pushing instructions to the worklist in reverse
program order, so that instructions are popped off in program order.
The worklist management in the main combining loop also makes sure
to uphold this invariant.
However, the same is not true for all the code that is performing
manual worklist management. The largest problem (addressed in this
patch) are instructions inserted by InstCombine's IRBuilder. These
will be pushed onto the worklist in order of insertion (generally
matching program order), which means that a) the users of the
original instruction will be visited first, as they are pushed later
in the main loop and b) the newly inserted instructions will be
visited in reverse program order.
This causes a number of problems: First, folds operate on instructions
that have not had their operands simplified, which may result in
optimizations being missed (ran into this in
https://reviews.llvm.org/D72048#1800424, which was the original
motivation for this patch). Additionally, this increases the amount
of folds InstCombine has to perform, both within one iteration, and
by increasing the number of total iterations.
This patch addresses the issue by adding a Worklist.AddDeferred()
method, which is used for instructions inserted by IRBuilder. These
will only be added to the real worklist after the combine finished,
and in reverse order, so they will end up processed in program order.
I should note that the same should also be done to nearly all other
uses of Worklist.Add(), but I'm starting with just this occurrence,
which has by far the largest test fallout.
Most of the test changes are due to
https://bugs.llvm.org/show_bug.cgi?id=44521 or other cases where
we don't canonicalize something. These are neutral. One regression
has been addressed in D73575 and D73647. The remaining regression
in an shl+sdiv fold can't really be fixed without dropping another
transform, but does not seem particularly problematic in the first
place.
Differential Revision: https://reviews.llvm.org/D73411
This lowering tries to look for G_PTR_ADD instructions and then converts
them to a standard G_ADD with a COPY on the source, and G_INTTOPTR on the
result. This is ok for address space 0 on AArch64 as p0 can be treated as
s64.
The motivation behind this is to expose the add semantics to the imported
tablegen patterns. We shouldn't need to check for uses being loads/stores,
because the selector works bottom up, uses before defs. By the time we
end up trying to select a G_PTR_ADD, we should have already attempted to
fold this into addressing modes and were therefore unsuccessful.
This gives some performance and code size improvements across the board.
Differential Revision: https://reviews.llvm.org/D73673
Currently some prefixes are emitted as instructions, to distinguish them
from real instruction, fuction isPrefix() is added. The kinds of prefix
are consistent with X86GenInstrInfo.inc.
Differential Revision: https://reviews.llvm.org/D73013
Summary:
This patch makes sure that the field VFShape.VF is greater than zero
when demangling the vector function name of scalable vector functions
encoded in the "vector-function-abi-variant" attribute.
This change is required to be able to provide instances of VFShape
that can be used to query the VFDatabase for the vectorization passes,
as such passes always require a positive value for the Vectorization Factor (VF)
needed by the vectorization process.
It is not possible to extract the value of VFShape.VF from the mangled
name of scalable vector functions, because it is encoded as
`x`. Therefore, the VFABI demangling function has been modified to
extract such information from the IR declaration of the vector
function, under the assumption that _all_ vectors in the signature of
the vector function have the same number of lanes. Such assumption is
valid because it is also assumed by the Vector Function ABI
specifications supported by the demangling function (x86, AArch64, and
LLVM internal one).
The unit tests that demangle scalable names have been modified by
adding the IR module that carries the declaration of the vector
function name being demangled.
In particular, the demangling function fails in the following cases:
1. When the declaration of the scalable vector function is not
present in the module.
2. When the value of VFSHape.VF is not greater than 0.
Reviewers: jdoerfert, sdesmalen, andwar
Reviewed By: jdoerfert
Subscribers: mgorny, kristof.beyls, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D73286
This is an alternate fix for the issue D73606 was trying to
solve.
The main issue here is that we bailed out of
foldOffsetIntoAddress if Offset is 0. But if we just found a
symbolic displacement and AM.Disp became non-zero
earlier, we still need to validate that AM.Disp with the symbolic
displacement.
This passes fold-add-pcrel.ll.
Differential Revision: https://reviews.llvm.org/D73608
Summary:
It is called when instructions aren't simplified, and the implementation
is expected to account for a penalty. Renamed to
onCommonInstructionMissedSimplification.
Reviewers: davidxl, eraman
Reviewed By: davidxl
Subscribers: hiraditya, baloghadamsoftware, haicheng, a.sidorin, Szelethus, donat.nagy, dkrupp, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D73662
A pointer is privatizeable if it can be replaced by a new, private one.
Privatizing pointer reduces the use count, interaction between unrelated
code parts. This is a first step towards replacing argument promotion.
While we can already handle recursion (unlike argument promotion!) we
are restricted to stack allocations for now because we do not analyze
the uses in the callee.
Reviewed By: uenoku
Differential Revision: https://reviews.llvm.org/D68852
The helpers AAReturnedFromReturnedValues and
AACallSiteReturnedFromReturned are useful not only to avoid code
duplication but also to avoid recomputation of results. If we have N
call sites we should not recompute the function return information N
times but once. These are mostly straightforward usages with some minor
improvements on the helpers and addition of a new one
(IRPosition::getAssociatedType) that knows about function return types.
We seem to be inheriting the cost from sse4.1. But if we have 256-bit registers we should be able to do this with just one extract to split the 16i16 and two v8i16->v8i32 operations so our cost should be 3 not 4.
Differential Revision: https://reviews.llvm.org/D73646
This is passed to legalizeCustom, but not intrinsic. Also remove the
MRI argument, since you can get that from the MachineIRBuilder.
I'm not sure why MachineIRBuilder has a private observer member, and
this is passed separately.
Rename Destructive enumerator in preparation for a larger set of patches to
support prefixing destructive oeprations with MOVPRFX.
Differential Revision: https://reviews.llvm.org/D73212
For pow2 constants we should use G_SHL for pattern matching (and perf)
purposes later.
Vector support not yet implemented.
Differential Revision: https://reviews.llvm.org/D73659
When the bit is <= 32, we have to use the W register variant for TB(N)Z.
This is because of the way the instruction is encoded.
Differential Revision: https://reviews.llvm.org/D73660
Summary:
This will help with devirtualization (store forwarding with vtable pointers in
the presence of other stores into members in the constructor.) During inlining,
we don't have AA.
Reviewers: davidxl
Subscribers: mgorny, Prazek, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D71307
Some housekeeping for the DestructiveInstType enum before a larger set of patches to support prefixing destructive oeprations with MOVPRFX.
Differential Revision: https://reviews.llvm.org/D73141
A previous patch should have added pld and pstd and any support code in
the backend that is required for prefixed load and store type operations.
This patch adds a number of additional prefixed load and store type
instructions for the future CPU.
Differential Revision: https://reviews.llvm.org/D72577
Summary:
gnu addr2line prints DWARF line table discriminators like so:
<file>:<line> (discriminator <Number>)
This matches that behavior.
Document how and when --output-style=GNU prints discriminators
Add test for new GNU-style discriminator printing.
Reviewers: rupprecht, labath, jhenderson
Subscribers: aprantl, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D73318
For the
icmp eq (add X, C1), C2 => icmp eq X, C2-C1
icmp eq (sub C1, X), C2 => icmp eq X, C1-C2
folds, this allows C1 to be non-splat and contain undefs.
C2 is still splat, due to the structure of the code.
This is to address the remaining part of the regression in D73411,
where demanded element analysis replaces some elements with undef.
Differential Revision: https://reviews.llvm.org/D73647
For `MC_GlobalAddress` operands referencing **certain** GlobalObjects,
we can lower them to STB_LOCAL aliases to avoid costs brought by
assembler/linker's conservative decisions about symbol interposition:
* An assembler conservatively assumes a global default visibility symbol interposable (ELF
semantics). So relocations in object files are needed even if the code generator assumed
the definition exact and non-interposable.
* The relocations can cause the creation of PLT entries on some targets for -shared links.
A linker conservatively assumes a global default visibility symbol interposable (if not
otherwise constrained by -Bsymbolic/--dynamic-list/VER_NDX_LOCAL/etc).
"certain" refers to GlobalObjects in the intersection of
`hasExactDefinition() and !isInterposable()`: `external`, `appending`, `internal`, `private`.
Local linkages (`internal` and `private`) cannot be interposed. `appending` is for very
few objects LLVM interpret specially. So the set just includes `external`.
This patch emits STB_LOCAL aliases (.Lfoo$local) for such GlobalObjects, so that targets can lower
MC_GlobalAddress operands to STB_LOCAL aliases if applicable.
We may extend the scope and include GlobalAlias in the future.
LLVM's existing -fno-semantic-interposition behaviors give us license to do such optimizations:
* Various optimizations (ipconstprop, inliner, sccp, sroa, etc) treat normal ExternalLinkage
GlobalObjects as non-interposable.
* Before D72197, MC resolved a PC-relative VK_None fixup to a non-local symbol at assembly time (no
outstanding relocation), if the target is defined in the same section. Put it simply, even if IR
optimizations failed to optimize and allowed interposition for the function call in
`void foo() {} void bar() { foo(); }`, the assembler would disallow it.
This patch sets up AsmPrinter infrastructure to make -fno-semantic-interposition more so.
With and without the patch, the object file output should be identical:
`.Lfoo$local` does not take a symbol table entry.
Reviewed By: sfertile
Differential Revision: https://reviews.llvm.org/D73228
Summary:
Add test case for the same. This test case will also serve as a
starting point for later symbolizer tests.
Reviewers: dblaikie, jdoerfert
Subscribers: hiraditya, llvm-commits, jhenderson
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D73583
Summary:
The initialization of RegisterBank needs to be done only once. The
logic of AlreadyInit has data race, use llvm::call_once instead.
This is continuing work of D73587.
Reviewers: arsenm, rovka, dsanders, t.p.northover, efriedma, apazos
Reviewed By: arsenm
Subscribers: wdng, kristof.beyls, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D73605
Summary:
The initialization of RegisterBank needs to be done only once. The
logic of AlreadyInit has data race, use llvm::call_once instead.
This is continuing work of D73587.
Reviewers: arsenm, tstellar, ronlieb, efriedma, apazos, nhaehnle
Reviewed By: nhaehnle
Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, kerbowa, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D73604
Summary:
The initialization of RegisterBank needs to be done only once. The
logic of AlreadyInit has a data race, use llvm::call_once instead.
This issue was identified through thread sanitizer.
Reviewers: efriedma, apazos, qcolombet, dsanders
Reviewed By: efriedma
Subscribers: arsenm, kristof.beyls, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D73587
ISD::FROUND is defined to round to nearest with ties rounding
away from 0. This mode isn't supported in hardware on X86.
But as long as we aren't compiling with trapping math, we can
emulate this with floor(X + copysign(nextafter(0.5, 0.0), X)).
We have to use nextafter to avoid some corner cases that adding
0.5 would have. For example, if X is nextafter(0.5, 0.0) it should
round to 0.0, but adding 0.5 would need one extra bit of mantissa
than can be stored so it rounds to 1.0. Adding nextafter(0.5, 0.0)
instead will just increase the exponent by 1 and leave the mantissa
as all 1s. This would be nextafter(1.0, 0.0) which will floor to 0.0.
Techically this requires -fno-trapping-math which isn't our default.
But if we care about exceptions we should be using constrained
intrinsics. Constrained intrinsics would use STRICT_FROUND which
won't go through this code.
Fixes PR42195.
Differential Revision: https://reviews.llvm.org/D73607
This code needs to map from the FPCW 2-bit encoding for rounding mode to the 2-bit encoding defined for FLT_ROUNDS. The previous implementation did some clever swapping of bits and adding 1 modulo 4 to do the mapping.
This patch instead uses an 8-bit immediate as a lookup table of four 2-bit values. Then we use the 2-bit FPCW encoding to index the lookup table by using a right shift and an AND. This requires extracting the 2-bit value from FPCW and multipying it by 2 to make it usable as a shift amount. But still results in less code.
Differential Revision: https://reviews.llvm.org/D73599
Fixes selection for scalar G_SMULH/G_UMULH. Also switches to using
tablegen selected add/sub, which switch to the signed version of the
opcode. This matches the current DAG behavior. We can't drop the
manual selection for add/sub yet, because it's still both for VALU
add/sub and for G_PTR_ADD.
Manually select this is as a tablegen workraound. Both SelectionDAG
and GlobalISel end up misplacing the copy to m0 when both instructions
in the output need it. Neither considers that both output instructions
depend on m0. I don't know of any other pattern we need to handle this
case, so it's less effort to just workaround this for now.
Summary:
BaseMemOpClusterMutation::apply forms store chains by looking for
control (i.e. non-data) dependencies from one mem op to another.
In the test case, clusterNeighboringMemOps successfully clusters the
loads, and then adds artificial edges to the loads' successors as
described in the comment:
// Copy successor edges from SUa to SUb. Interleaving computation
// dependent on SUa can prevent load combining due to register reuse.
The effect of this is that *data* dependencies from one load to a store
are copied as *artificial* dependencies from a different load to the
same store.
Then when BaseMemOpClusterMutation::apply looks at the stores, it finds
that some of them have a control dependency on a previous load, which
breaks the chains and means that the stores are not all considered part
of the same chain and won't all be clustered.
The fix is to only consider non-artificial control dependencies when
forming chains.
Subscribers: MatzeB, jvesely, nhaehnle, hiraditya, javed.absar, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D71717
This commit fixes PR39321.
GlobalExtensions is not guaranteed to be destroyed when optimizer plugins are unloaded. If it is indeed destroyed after a plugin is dlclose-d, the destructor of the corresponding ExtensionFn is not mapped anymore, causing a call to unmapped memory during destruction.
This commit guarantees that extensions coming from external plugins are removed from GlobalExtensions when the plugin is unloaded if GlobalExtensions has not been destroyed yet.
Differential Revision: https://reviews.llvm.org/D71959
Summary:
Due to the fact that kill is just a normal intrinsic, even though it's
supposed to terminate the thread, we can end up with provably infinite
loops that are actually supposed to end successfully. The
AMDGPUUnifyDivergentExitNodes pass breaks up these loops, but because
there's no obvious place to make the loop branch to, it just makes it
return immediately, which skips the exports that are supposed to happen
at the end and hangs the GPU if all the threads end up being killed.
While it would be nice if the fact that kill terminates the thread were
modeled in the IR, I think that the structurizer as-is would make a mess if we
did that when the kill is inside control flow. For now, we just add a null
export at the end to make sure that it always exports something, which fixes
the immediate problem without penalizing the more common case. This means that
we sometimes do two "done" exports when only some of the threads enter the
discard loop, but from tests the hardware seems ok with that.
This fixes dEQP-VK.graphicsfuzz.while-inside-switch with radv.
Reviewers: arsenm, nhaehnle
Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D70781
SIMachineScheduler uses isHighLatencyInstruction with the same
sematincs, but TargetInstrInfo has virtual isHighLatencyDef
method, so override it instead.
Added FLAT to the list of high latency opcodes and a check for
mayLoad since stores are not technically high latency in terms
of data dependency.
This change did not produce any visible impact on our tests.
Differential Revision: https://reviews.llvm.org/D73582
proven safe.
Summary:
Currently LoopFusion give up when the second loop nest preheader is
not empty. For example:
for (int i = 0; i < 100; ++i) {}
x+=1;
for (int i = 0; i < 100; ++i) {}
The above example should be safe to fuse.
This PR moves instructions in FC1 preheader (e.g. x+=1; ) to
FC0 preheader, which then LoopFusion is able to fuse them.
Reviewer: kbarton, Meinersbur, jdoerfert, dmgreen, fhahn, hfinkel,
bmahjour, etiotto
Reviewed By: jdoerfert
Subscribers: hiraditya, llvm-commits
Tag: LLVM
Differential Revision: https://reviews.llvm.org/D71821
Summary:
udiv/sdiv/urem/srem/mul integer isel patterns and tests.
Pretend for now that integer division were always cheap in HW.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D73623
This should be no problem to support with a pattern, but it turns out
there are just too many yaks to shave. The main problem is in the DAG
emitter, which I have no desire to sink effort into fixing.
If we had a bit to disable patterns in the DAG importer, fixing the
GlobalISelEmitter is more manageable.
Summary:
The code was assuming in a few places that if there was only one exit
from the function that it was a normal return, which is invalid. It
could be an infinite loop, in which case we still need to insert the
usual fake edge so that the null export happens. This fixes shaders that
end with an infinite loop that discards.
Reviewers: arsenm, nhaehnle, critson
Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D71192
Summary:
Due to the fact that kill is just a normal intrinsic, even though it's
supposed to terminate the thread, we can end up with provably infinite
loops that are actually supposed to end successfully. The
AMDGPUUnifyDivergentExitNodes pass breaks up these loops, but because
there's no obvious place to make the loop branch to, it just makes it
return immediately, which skips the exports that are supposed to happen
at the end and hangs the GPU if all the threads end up being killed.
While it would be nice if the fact that kill terminates the thread were
modeled in the IR, I think that the structurizer as-is would make a mess if we
did that when the kill is inside control flow. For now, we just add a null
export at the end to make sure that it always exports something, which fixes
the immediate problem without penalizing the more common case. This means that
we sometimes do two "done" exports when only some of the threads enter the
discard loop, but from tests the hardware seems ok with that.
This fixes dEQP-VK.graphicsfuzz.while-inside-switch with radv.
Reviewers: arsenm, nhaehnle
Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D70781
cmp (splat V1, M), SplatC --> splat (cmp V1, SplatC'), M
As discussed in PR44588:
https://bugs.llvm.org/show_bug.cgi?id=44588
...we try harder to push shuffles after binops than after compares.
This patch handles the special (but presumably most common case) of
splat shuffles. If both operands are splats, then we can do the
comparison on the non-splat inputs followed by splat of the compare.
That should take care of the regression noted in D73411.
There's another potential fold requested in PR37463 to scalarize the
compare, but that's another patch (and it's not clear if we can do
that without the ability to undo it later):
https://bugs.llvm.org/show_bug.cgi?id=37463
Differential Revision: https://reviews.llvm.org/D73575
Summary:
Currently, sqdmulh_lane and friends from the ACLE (implemented in arm_neon.h),
are represented in LLVM IR as a (by vector) sqdmulh and a vector of (repeated)
indices, like so:
%shuffle = shufflevector <4 x i16> %v, <4 x i16> undef, <4 x i32> <i32 3, i32 3, i32 3, i32 3>
%vqdmulh2.i = tail call <4 x i16> @llvm.aarch64.neon.sqdmulh.v4i16(<4 x i16> %a, <4 x i16> %shuffle)
When %v's values are known, the shufflevector is optimized away and we are no
longer able to select the lane variant of sqdmulh in the backend.
This defeats a (hand-coded) optimization that packs several constants into a
single vector and uses the lane intrinsics to reduce register pressure and
trade-off materialising several constants for a single vector load from the
constant pool, like so:
int16x8_t v = {2,3,4,5,6,7,8,9};
a = vqdmulh_laneq_s16(a, v, 0);
b = vqdmulh_laneq_s16(b, v, 1);
c = vqdmulh_laneq_s16(c, v, 2);
d = vqdmulh_laneq_s16(d, v, 3);
[...]
In one microbenchmark from libjpeg-turbo this accounts for a 2.5% to 4%
performance difference.
We could teach the compiler to recover the lane variants, but this would likely
require its own pass. (Alternatively, "volatile" could be used on the constants
vector, but this is a bit ugly.)
This patch instead implements the following LLVM IR intrinsics for AArch64 to
maintain the original structure through IR optmization and into instruction
selection:
- sqdmulh_lane
- sqdmulh_laneq
- sqrdmulh_lane
- sqrdmulh_laneq.
These 'lane' variants need an additional register class. The second argument
must be in the lower half of the 64-bit NEON register file, but only when
operating on i16 elements.
Note that the existing patterns for shufflevector and sqdmulh into sqdmulh_lane
(etc.) remain, so code that does not rely on NEON intrinsics to generate these
instructions is not affected.
This patch also changes clang to emit these IR intrinsics for the corresponding
NEON intrinsics (AArch64 only).
Reviewers: SjoerdMeijer, dmgreen, t.p.northover, rovka, rengolin, efriedma
Reviewed By: efriedma
Subscribers: kristof.beyls, hiraditya, jdoerfert, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D71469
This adds some missing MVE opcodes to evaluateBranch, which results in
llvm-objdump being able to print the PC relative branch target as an
annotation.
Differential Revision: https://reviews.llvm.org/D73553
Many of the debug line prologue errors are not inherently fatal. In most
cases, we can make reasonable assumptions and carry on. This patch does
exactly that. In the case of length problems, the approach of "assume
stated length is correct" is taken which means the offset might need
adjusting.
This is a relanding of b94191fe, fixing an LLD test and the LLDB build.
Reviewed by: dblaikie, labath
Differential Revision: https://reviews.llvm.org/D72158
Summary:
This removes a couple of unnecessary isReg checks, now that
memOpsHaveSameBasePtr can handle FI operands, but is otherwise NFC.
Reviewers: arsenm, rampitec
Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, kerbowa, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D73485
Add several new helpers to RDA:
- hasLocalDefBefore
- isRegDefinedAfter
- isSafeToDefRegAt
And move two bits of logic from ARMLowOverheadLoops into RDA:
- isSafeToMove
- isSafeToRemove
Both of these have some wrappers too to make them more convienent to
use.
Differential Revision: https://reviews.llvm.org/D73460
For `ret i64 add (i64 ptrtoint (i32* @foo to i64), i64 1701208431)`,
```
X86DAGToDAGISel::matchAdd
...
// AM.setBaseReg(CurDAG->getRegister(X86::RIP, MVT::i64));
if (!matchAddressRecursively(N.getOperand(0), AM, Depth+1) &&
// Try folding offset but fail; there is a symbolic displacement, so offset cannot be too large
!matchAddressRecursively(Handle.getValue().getOperand(1), AM, Depth+1))
return false;
...
// Try again after commuting the operands.
// AM.Disp = Val; foldOffsetIntoAddress() does not know there will be a symbolic displacement
if (!matchAddressRecursively(Handle.getValue().getOperand(1), AM, Depth+1) &&
// AM.setBaseReg(CurDAG->getRegister(X86::RIP, MVT::i64));
!matchAddressRecursively(Handle.getValue().getOperand(0), AM, Depth+1))
// Succeeded! Produced leaq sym+disp(%rip),...
return false;
```
`foldOffsetIntoAddress()` currently does not know there is a symbolic
displacement and can fold a large offset.
The produced `leaq sym+disp(%rip), %rax` instruction is relocated by
an R_X86_64_PC32. If disp is large and sym+disp-rip>=2**31, there
will be a relocation overflow.
This approach is still not elegant. Unfortunately the isRIPRelative
interface is a bit clumsy. I tried several solutions and eventually
picked this one.
Differential Revision: https://reviews.llvm.org/D73606
There was a TODO in AAValueConstantRangeArgument to reuse
AAArgumentFromCallSiteArguments. We now do this by allowing new States
to be build from the bestState.
If we invalidate an attribute we need to inform all dependent ones even
if the fixpoint state is not invalid. Before we only continued
invalidation if the fixpoint state was invalid, now we signal a change
in case the fixpoint state is valid.
The test case was already included in D71620 but the problem was hiding
because it only manifested with the old PM (for that input).
This patch modularizes the way we check for no-alias call site arguments
by putting the existing logic into helper functions. The reasoning was
not changed but special cases for readonly/readnone were added.
If `null` is not defined we cannot access it, hence the pointer is
`noalias`. While this is not helpful on it's own it simplifies later
deductions that can skip over already known `noalias` pointers in
certain situations.
During extraction, stale llvm.assume handles may be retained in the
original function. The setup is:
1) CodeExtractor unregisters assumptions in the blocks that are to be
extracted.
2) Extraction happens. There are now two functions: f1 and f1.extracted.
3) Leftover assumptions in f1 (/not/ removed as they were not in the set of
blocks to be extracted) now have affected-value llvm.assume handles in
f1.extracted.
When assumptions for a value used in f1 are looked up, ValueTracking can assert
as some of the handles are in the wrong function. To fix this, simply erase the
llvm.assume calls in the extracted function.
Alternatives include flushing the assumption cache in the original function, or
walking all values used in the original function to prune stale affected-value
handles. Both seem more expensive.
Testing: check-llvm, LNT run with -mllvm -hot-cold-split enabled
rdar://58460728
2 fixes:
Register coloring can re-assign virtual registers. When the frame base register
is colored, update the DwarfFrameBase accordingly When the frame base register
is stackified, do not attempt to encode DW_AT_frame_base as a local In the
future we will presumably want to handle this case better but for now we can
emit worse debug info rather than crashing.
Differential Revision: https://reviews.llvm.org/D73581
Previously, the enums didn't account for all the possible cases, which
could cause misleading results (particularly for a "switch" on
FunctionModRefBehavior).
Fixes regression in polly from recent patch to add writeonly to memset.
While I'm here, also fix a few dubious uses of the FMRB_* enum values.
Differential Revision: https://reviews.llvm.org/D73154
This is how it should've been and brings it more in line with
std::string_view. There should be no functional change here.
This is mostly mechanical from a custom clang-tidy check, with a lot of
manual fixups. It uncovers a lot of minor inefficiencies.
This doesn't actually modify StringRef yet, I'll do that in a follow-up.
When the G_BRCOND is fed by a eq or ne G_ICMP, it may be possible to fold a
G_AND into the branch by producing a tbnz/tbz instead.
This happens when
1. We have a ne/eq G_ICMP feeding into the G_BRCOND
2. The G_ICMP is a comparison against 0
3. One of the operands of the G_AND is a power of 2 constant
This is very similar to the code in AArch64TargetLowering::LowerBR_CC.
Add opt-and-tbnz-tbz to test this.
Differential Revision: https://reviews.llvm.org/D73573
Symbols created for merged external global variables have default
visibility. This can break programs when compiling with -Oz
-fvisibility=hidden as symbols that should be hidden will be exported at
link time.
Differential Revision: https://reviews.llvm.org/D73235
We have to avoid using a GOT relocation to access the bias variable,
setting the hidden visibility achieves that.
Differential Revision: https://reviews.llvm.org/D73529
Under --target=aarch64-fuchsia, -mcmodel=kernel has the effect of
(the default) -mcmodel=small plus -mtp=el1 (which did not exist when
this behavior was added). Fuchsia's kernel now uses -mtp=el1
directly instead of -mcmodel=kernel, so remove this special support.
Patch By: mcgrathr
Differential Revision: https://reviews.llvm.org/D73409
Summary:
To avoid header file circular dependency issues in passing updated MBFI (in
MBFIWrapper) to the interface of profile guided size optimizations.
A prep step for (and split off of) D73381.
Reviewers: davidxl
Subscribers: mgorny, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D73494
It can still be beneficial to do the optimization if the result of the compare
is used by *another* select.
Differential Revision: https://reviews.llvm.org/D73511
The only thing missing for basic llvm-symbolizer support is the ability on
lib/Object to get a wasm symbol's section ID, which allows sorting and
computation of the symbols' sizes.
Also, when the WasmAsmParser switches sections on new functions, also add the
section to the list of Dwarf sections if Dwarf is being generated for assembly;
this allows writing of simple tests.
Reviewers: sbc100, jhenderson, aardappel
Differential Revision: https://reviews.llvm.org/D73246
This patch adds support for explicitly highlighting sub-expressions
shared by multiple leaf nodes. For example consider the following
code
%shared.load = tail call <8 x double> @llvm.matrix.columnwise.load.v8f64.p0f64(double* %arg1, i32 %stride, i32 2, i32 4), !dbg !10, !noalias !10
%trans = tail call <8 x double> @llvm.matrix.transpose.v8f64(<8 x double> %shared.load, i32 2, i32 4), !dbg !10
tail call void @llvm.matrix.columnwise.store.v8f64.p0f64(<8 x double> %trans, double* %arg3, i32 10, i32 4, i32 2), !dbg !10
%load.2 = tail call <30 x double> @llvm.matrix.columnwise.load.v30f64.p0f64(double* %arg3, i32 %stride, i32 2, i32 15), !dbg !10, !noalias !10
%mult = tail call <60 x double> @llvm.matrix.multiply.v60f64.v8f64.v30f64(<8 x double> %trans, <30 x double> %load.2, i32 4, i32 2, i32 15), !dbg !11
tail call void @llvm.matrix.columnwise.store.v60f64.p0f64(<60 x double> %mult, double* %arg2, i32 10, i32 4, i32 15), !dbg !11
We have two leaf nodes (the 2 stores) and the first store stores %trans
which is also used by the matrix multiply %mult. We generate separate
remarks for each leaf (stores). To denote that parts are shared, the
shared expressions are marked as shared (), with a reference to the
other remark that shares it. The operation summary also denotes the
shared operations separately.
Reviewers: anemet, Gerolf, thegameg, hfinkel, andrew.w.kaylor, LuoYuanke
Reviewed By: anemet
Differential Revision: https://reviews.llvm.org/D72526
Dead instructions do not need to be sunk. Currently we try and record
the recipies for them, but there are no recipes emitted for them and
there's nothing to sink. They can be removed from SinkAfter while
marking them for recording.
Fixes PR44634.
Reviewers: rengolin, hsaito, fhahn, Ayal, gilr
Reviewed By: gilr
Differential Revision: https://reviews.llvm.org/D73423
Add the prefixed instructions pld and pstd to future CPU. These are load and
store instructions that require new operand types that are 34 bits. This patch
adds the two instructions as well as the operand types required.
Note that this patch also makes a minor change to tablegen to account for the
fact that some instructions are going to require shifts greater than 31 bits
for the new 34 bit instructions.
Differential Revision: https://reviews.llvm.org/D72574
Summary:
Currently IsControlFlowEquivalent determine if two blocks are control
flow equivalent by checking if A dominates B and B post dominates A.
There exists blocks that are control flow equivalent even if they don't
satisfy the A dominates B and B post dominates A condition.
For example,
if (cond)
A
if (cond)
B
In the PR, we determine if two blocks are control flow equivalent by
also checking if the two sets of conditions A and B depends on are
equivalent.
Reviewer: jdoerfert, Meinersbur, dmgreen, etiotto, bmahjour, fhahn,
hfinkel, kbarton
Reviewed By: fhahn
Subscribers: hiraditya, llvm-commits
Tag: LLVM
Differential Revision: https://reviews.llvm.org/D71578
Summary: X86 has instructions to calculate fma and fneg at the same time. But we combine the fneg and fma only when fneg is the source operand under strict FP.
Reviewers: craig.topper, andrew.w.kaylor, uweigand, RKSimon, LiuChen3
Subscribers: LuoYuanke, llvm-commits, cfe-commits, jdoerfert, hiraditya
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D72824
Many of the debug line prologue errors are not inherently fatal. In most
cases, we can make reasonable assumptions and carry on. This patch does
exactly that. In the case of length problems, the approach of "the
claimed length is correct" is taken to be consistent with other
instances such as the SectionParser, which ignores the read length.
Reviewed by: dblaikie
Differential Revision: https://reviews.llvm.org/D72158
Summary:
Up to gfx9, writes to vcc_lo and vcc_hi by instructions like
v_readlane and v_readfirstlane do not update vccz to reflect the new
value of vcc. Fix it by reusing part of the existing vccz bug handling
code, which inserts an "s_mov_b64 vcc, vcc" instruction to restore vccz
just before an instruction that needs the correct value.
Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D69661
Summary:
Function calls and stack-passing of function arguments.
Custom lowering, isel patterns and tests.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D73461
Summary:
This is a follow up on D61634. It adds an LLVM IR intrinsic to allow better implementation of memcpy from C++.
A follow up CL will add the intrinsics in Clang.
Reviewers: courbet, theraven, t.p.northover, jdoerfert, tejohnson
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D71710
This patch updates the remark to also include a summary of the number of
vector operations generated for each matrix expression.
Reviewers: anemet, Gerolf, thegameg, hfinkel, andrew.w.kaylor, LuoYuanke
Reviewed By: anemet
Differential Revision: https://reviews.llvm.org/D72480
Generate remarks for matrix operations in a function. To generate remarks
for matrix expressions, the following approach is used:
1. Collect leafs of matrix expressions (done in
RemarkGenerator::getExpressionLeafs). Leafs are lowered matrix
instructions without other matrix users (like stores).
2. For each leaf, create a remark containing a linearizied version of the
matrix expression.
The following improvements will be submitted as follow-ups:
* Summarize number of vector instructions generated for each expression.
* Account for shared sub-expressions.
* Propagate matrix remarks up the inlining chain.
The information provided by the matrix remarks helps users to spot cases
where matrix expression got split up, e.g. due to inlining not
happening. The remarks allow users to address those issues, ensuring
best performance.
Reviewers: anemet, Gerolf, thegameg, hfinkel, andrew.w.kaylor, LuoYuanke
Reviewed By: anemet
Differential Revision: https://reviews.llvm.org/D72453
This behavior appears to have changed unintentionally in
b0e979724f.
Instead of printing the leading newline in printFunction, print it when
printing a module. This ensures that `OS << *Func` starts printing
immediately on the current line, but whole modules are printed nicely.
Reviewed By: MaskRay
Differential Revision: https://reviews.llvm.org/D73505
Summary:
Treat scalable allocas as if they have storage size of 0, and
scalable-typed memory accesses as if their range is unlimited.
This is not a proper support of scalable vector types in the analysis -
we can do better, but not today.
Reviewers: vitalybuka
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D73394
This patch adds a new option to enable/disable register renaming in the
load-store optimizer. Defaults to disabled, as there is a potential
mis-compile caused by this.
Trivial type predicates should be moved into the tablegen pattern
itself, and not checked inside complex patterns. This eliminates a
redundant complex pattern, and fixes select source modifiers for
GlobalISel.
I have further patches which fully handle select in tablegen and
remove all of the C++ selection, although it requires the ugliness to
support the entire range of legal register types.
Summary:
This is mostly NFC. computeForAddSub may give more precise results in
some cases, but that doesn't seem to affect any existing GlobalISel
tests.
Subscribers: rovka, hiraditya, volkan, Petar.Avramovic, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D73431
D47163 created a rule that we should not change the casted
type of a select when we have matching types in its compare condition.
That was intended to help vector codegen, but it also could create
situations where we miss subsequent folds as shown in PR44545:
https://bugs.llvm.org/show_bug.cgi?id=44545
By using shouldChangeType(), we can continue to get the vector folds
(because we always return false for vector types). But we also solve
the motivating bug because it's ok to narrow the scalar select in that
example.
Our canonicalization rules around select are a mess, but AFAICT, this
will not induce any infinite looping from the reverse transform (but
we'll need to watch for that possibility if committed).
Side note: there's a similar use of shouldChangeType() for phi ops
just below this diff, and the source and destination types appear to
be reversed.
Differential Revision: https://reviews.llvm.org/D72733
This allows SimplifyDemandedBits to call SimplifyMultipleUseDemandedBits to create a simpler ISD::EXTRACT_SUBVECTOR, which is particularly useful for cases where we're splitting into subvectors anyhow.
Differential Revision: This allows SimplifyDemandedBits to call SimplifyMultipleUseDemandedBits to create a simpler ISD::EXTRACT_SUBVECTOR, which is particularly useful for cases where we're splitting into subvectors anyhow.
This patch fixes an assertion failure in DwarfExpression that is
triggered when a complex fragment has exactly the size of a
subregister of the register the DBG_VALUE points to *and* there is no
DWARF encoding for the super-register.
I took the opportunity to replace/document some magic values with
static constructor functions to make this code less confusing to read.
rdar://problem/58489125
Differential Revision: https://reviews.llvm.org/D72938
Followup to D72978. This moves existing negation handling in
InstCombine into freelyNegateValue(), which make it composable.
In particular, root negations of div/zext/sext/ashr/lshr/sub can
now always be performed through a shl/trunc as well.
Differential Revision: https://reviews.llvm.org/D73288
This is a revert-of-revert (i.e. this reverts commit 802bec89, which
itself reverted fa4701e1 and 79daafc9) with a fix folded in. The problem
was that call site tags weren't emitted properly when LTO was enabled
along with split-dwarf. This required a minor fix. I've added a reduced
test case in test/DebugInfo/X86/fission-call-site.ll.
Original commit message:
This allows a call site tag in CU A to reference a callee DIE in CU B
without resorting to creating an incomplete duplicate DIE for the callee
inside of CU A.
We already allow cross-CU references of subprogram declarations, so it
doesn't seem like definitions ought to be special.
This improves entry value evaluation and tail call frame synthesis in
the LTO setting. During LTO, it's common for cross-module inlining to
produce a call in some CU A where the callee resides in a different CU,
and there is no declaration subprogram for the callee anywhere. In this
case llvm would (unnecessarily, I think) emit an empty DW_TAG_subprogram
in order to fill in the call site tag. That empty 'definition' defeats
entry value evaluation etc., because the debugger can't figure out what
it means.
As a follow-up, maybe we could add a DWARF verifier check that a
DW_TAG_subprogram at least has a DW_AT_name attribute.
Update #1:
Reland with a fix to create a declaration DIE when the declaration is
missing from the CU's retainedTypes list. The declaration is left out
of the retainedTypes list in two cases:
1) Re-compiling pre-r266445 bitcode (in which declarations weren't added
to the retainedTypes list), and
2) Doing LTO function importing (which doesn't update the retainedTypes
list).
It's possible to handle (1) and (2) by modifying the retainedTypes list
(in AutoUpgrade, or in the LTO importing logic resp.), but I don't see
an advantage to doing it this way, as it would cause more DWARF to be
emitted compared to creating the declaration DIEs lazily.
Update #2:
Fold in a fix for call site tag emission in the split-dwarf + LTO case.
Tested with a stage2 ThinLTO+RelWithDebInfo build of clang, and with a
ReleaseLTO-g build of the test suite.
rdar://46577651, rdar://57855316, rdar://57840415, rdar://58888440
Differential Revision: https://reviews.llvm.org/D70350
We want to have more load/store clustering but we also want
to maintain low register pressure which are oposit targets.
Allow scheduler to reschedule regions without mutations
applied if we hit a register limit.
Differential Revision: https://reviews.llvm.org/D73386
TAPI currently lacks a way to emit the macCatalyst platform. For TBD_V3
is does support zippered frameworks given that both macOS and
macCatalyst are part of the PlatformSet.
Differential revision: https://reviews.llvm.org/D73325
Use intermediate instructions, unlike with buffer stores. This is
necessary because of the need to have an internal way to distinguish
between signed and unsigned extloads. This introduces some duplication
and near duplication with the buffer store selection path. The store
handling should maybe be moved into legalization to match and
eliminate the duplication.
Try to keep simple v2s16 cases as-is. This will more naturally map to
how the VOP3P op_sel modifiers work compared to the expansion
involving bitcasts and bitshifts.
This could maybe try harder with wider source vector types, although
that could be handled with a pre-legalize combine.
This reverts commit fe23ed2c68.
It was never really clear this was responsible for the performance
regressions that caused this to be reverted. It's been a long time,
and we need to have scalar patterns for this to get GlobalISel
working.