This is another enhancement to D77895/D78362
to avoid a round-trip from XMM->GPR->XMM.
This time we handle the case of starting/ending with different FP types
but always with signed i32 as the intermediate value.
I think this covers all of the faux vector optimization possibilities
for pre-AVX512.
There is at least 1 other transform mentioned in PR36617:
https://bugs.llvm.org/show_bug.cgi?id=36617#c19
...where we fold an 'fpext' into a preceding 'sitofp'. I think we will
want to handle that earlier (DAGCombiner or instcombine) because that's
a target-independent optimization.
Differential Revision: https://reviews.llvm.org/D78758
If one of caller/callee has disabled ZMM registers due to
prefer-vector-width=256, we were previously
disabling argument promotion as the ABI might be incompatible since
one side will split 512-bit vectors in this case.
But if we can see that the types are all scalar this shouldn't be
a problem.
This patch assumes that pointer element type reflects the type that
the argument will be promoted to.
Differential Revision: https://reviews.llvm.org/D78770
A previous bug fix for varargs introduced a regression where we would
incorrectly widen some stores to memory when passing i8/i16 parameters on the
stack. This didn't show up seemingly because it only happens when there is
no signext/zeroext parameter attribute, which I think for Darwin clang adds.
Swift however seems to be a different story, and a plain anyext on the parameter
triggered the bug.
To fix this, I've added a new ValueHandler::assignValueToAddress type override
which lets us distiguish between varargs and fixed args (we still need this
widening behaviour for varargs to fix the original bug in 2018).
rdar://61353552
Summary:
Add a check to make sure that MachineInstr::mayAlias returns prematurely if at least one of its instruction parameters does not access memory. This prevents calls to TargetInstrInfo::areMemAccessesTriviallyDisjoint with incompatible instructions.
A side effect of this change is to render the mayAlias helper in the AArch64 load/store optimizer obsolete. We can now directly call the MachineInstr::mayAlias member function.
Reviewers: hfinkel, t.p.northover, mcrosier, eli.friedman, efriedma
Reviewed By: efriedma
Subscribers: efriedma, kristof.beyls, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D78823
This is to fix performance regressions introduced by
86c944d790.
The old search would collect all potentially mergeable instructions in
the entire block. In this case, the same address is written in
multiple places in the block on the other side of a fence. When sorted
by offset, the two unmergeable, identical addresses would be next to
each other and the merge would give up.
Break the search space when we encounter an instruction we won't be
able to merge across. This will keep the identical addresses in
different merge attempts.
This may also improve compile time by reducing the merge list size.
When using reversedInstructionsWithoutDebug to construct a range from a
pair of MachineInstrBundleIterators, the range unexpectedly leaves out an
element. This results in mis-optimization as @mstorsjo points out in
https://reviews.llvm.org/D78157.
The problem is that when we convert a MachineInstrBundleIterator to a
reverse iterator, the result gets incremented:
MachineInstrBundleIterator(++I.getReverse())
The comment there explains that the "resulting iterator will dereference
... to the previous node, which is somewhat unexpected; but converting
the two endpoints in a range will give the same range in reverse". This
makes it hard to understand what reversedInstructionsWithoutDebug will
do: I've removed the helper to prevent similar mistakes in the future.
Summary:
It is important to emit HINT instructions instead of PAC ones when
PAC is disabled. This allows compatibility with other assemblers
(e.g. GAS). This was implemented in commit da33762de8.
Still, developers of assembly code will want to write code that is
compatible with both pre- and post-PAC CPUs. They could use HINT
mnemonics, but the new mnemonics are a lot more readable (e.g.
paciaz instead of hint #24), and they will result in the same
encodings. So, while LLVM should not *emit* the new mnemonics when
PAC is disabled, this patch will at least make LLVM *accept*
assembly code that uses them.
Reviewers: danielkiss, chill, olista01, LukeCheeseman, simon_tatham
Subscribers: kristof.beyls, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D78372
Follow-up of D78082 (x86-64).
This change avoids dynamic relocations in `xray_instr_map` for ARM/AArch64/powerpc64le.
MIPS64 cannot use 64-bit PC-relative addresses because R_MIPS_PC64 is not defined.
Because MIPS32 shares the same code, for simplicity, we don't use PC-relative addresses for MIPS32 as well.
Tested on AArch64 Linux and ppc64le Linux.
Reviewed By: ianlevesque
Differential Revision: https://reviews.llvm.org/D78590
This patch upstreams support for the Armv8.6-a Matrix Multiplication
Extension. A summary of the features can be found here:
https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/arm-architecture-developments-armv8-6-a
This patch includes:
- Assembly support for AArch32 and Assembly Parsing
D77872 has already added the MC representations of the instructions so that
they can be used in code gen; this patch fills in the details needed to
make assembly parsing work, and adds tests for asm and disasm
This is part of a patch series, starting with BFloat16 support and
the other components in the armv8.6a extension (in previous patches
linked in phabricator)
Based on work by:
- Luke Geeson
- Oliver Stannard
- Luke Cheeseman
Reviewers: t.p.northover, simon_tatham
Reviewed By: simon_tatham
Subscribers: simon_tatham, ostannard, kristof.beyls, hiraditya,
danielkiss, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D77874
This patch upstreams support for the Armv8.6-a Matrix Multiplication
Extension. A summary of the features can be found here:
https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/arm-architecture-developments-armv8-6-a
This patch includes:
- Assembly support for AArch64 Scalable Vector Instructions (in line
with the Scalable Vector Extension - SVE)
This is part of a patch series, starting with BFloat16 support and
the other components in the armv8.6a extension (in previous patches
linked in phabricator)
Based on work by:
- Luke Geeson
- Oliver Stannard
- Luke Cheeseman
Reviewers: t.p.northover, rengolin, c-rhodes
Reviewed By: c-rhodes
Subscribers: c-rhodes, ostannard, tschuett, kristof.beyls, hiraditya,
danielkiss, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D77873
This patch upstreams support for the Armv8.6-a Matrix Multiplication
Extension. A summary of the features can be found here:
https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/arm-architecture-developments-armv8-6-a
This patch includes:
- Assembly support for AArch32
- Intrinsics Support for AArch32 Neon Intrinsics for Matrix
Multiplication
Note: these extensions are optional in the 8.6a architecture and so have
to be enabled by default
No additional IR types or C Types are needed for this extension.
This is part of a patch series, starting with BFloat16 support and
the other components in the armv8.6a extension (in previous patches
linked in phabricator)
Based on work by:
- Luke Geeson
- Oliver Stannard
- Luke Cheeseman
Reviewers: t.p.northover, miyuki
Reviewed By: miyuki
Subscribers: miyuki, ostannard, kristof.beyls, hiraditya, danielkiss,
cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D77872
This patch upstreams support for the Armv8.6-a Matrix Multiplication
Extension. A summary of the features can be found here:
https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/arm-architecture-developments-armv8-6-a
This patch includes:
- Assembly support for AArch64 only (no SVE or Neon)
- Intrinsics Support for AArch64 Armv8.6a Matrix Multiplication Instructions (No bfloat16 matrix multiplication)
No IR types or C Types are needed for this extension.
This is part of a patch series, starting with BFloat16 support and
the other components in the armv8.6a extension (in previous patches
linked in phabricator)
Based on work by:
- Luke Geeson
- Oliver Stannard
- Luke Cheeseman
Reviewers: ostannard, t.p.northover, rengolin, kmclaughlin
Reviewed By: kmclaughlin
Subscribers: kmclaughlin, kristof.beyls, hiraditya, danielkiss,
cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D77871
Summary:
Frontend guarantees that coherent accesses have
corresponding cache policy bits set (glc, dlc).
Therefore there is no need for extra instructions
that invalidate cache.
Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, kerbowa, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D78800
Summary:
This patch maps IR operations for sdiv & udiv to the
@llvm.aarch64.sve.[s|u]div intrinsics.
A ptrue must be created during lowering as the div instructions
have only a predicated form.
Patch contains changes by Andrzej Warzynski.
Reviewers: sdesmalen, c-rhodes, efriedma, cameron.mcinally, rengolin
Reviewed By: efriedma
Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, andwar, cfe-commits, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D78569
MCELFObjectWriter::setRType## methods are always used altogether to
build complete MIPS N64 ABI "chain" of relocations. Using single
function for this task makes code less verbose.
Summary:
Changing all mnemonic to match assembly instructions to simplify mnemonic
naming rules. This time update all floating-point arithmetic instructions.
Reviewed By: simoll
Differential Revision: https://reviews.llvm.org/D78768
This was backwards from intended and missing a test. We perhaps should
just ignored the FP mode here, since it shouldn't be legal to mix code
with different default modes in the absence of strictfp.
If f32 denormals were enabled pre-gfx9, we would still try to
implement this with v_max_f32. Pre-gfx9, these instructions ignored
the denormal mode and did not flush. Switch to the multiply form for
f32 as a workaround which should always work in any case.
This fixes conformance failures when the library implementation of
fmin/fmax were accidentally not inlined, forcing the assumption of no
flushing on targets where denormals are not enabled by default. This
is a workaround, since really we should not be mixing code with
different FP mode expectations, but prefer the lowering that will work
in any mode.
Now this will always use max to implement canonicalize on gfx9+. This
is only really beneficial for f64. For f32/f16 it's a neutral choice
(and worse in terms of code size in 1 case), but possibly worse for
the compiler since it does add an extra register use operand. Leave
this change for later.
1. Use Subtarget.isUsingPCRelativeCalls() in LowerConstantPool to
check if using PCRelative addressing.
2. Change MO_GOT_FLAG = 32 to MO_GOT_FLAG = 8 in PPC.h to use
consecutive bits.
Differential Revision: https://reviews.llvm.org/D78406
This avoids more long lists of register classes that have to be updated
every time we add a new one. NFC.
Differential Revision: https://reviews.llvm.org/D78570
12994a70cf did this for 128-bit classes:
SGPR_128 only includes the real allocatable SGPRs, and SReg_128 adds
the additional non-allocatable TTMP registers. There's no point in
allocating SReg_128 vregs. This shrinks the size of the classes
regalloc needs to consider, which is usually good.
This patch extends it to all classes > 64 bits, for consistency.
Differential Revision: https://reviews.llvm.org/D78622
This patch changes the FP conversion intrinsics to take a predicate
that matches the number of lanes for the vector with the widest element
type as opposed to using <vscale x 16 x i1>.
For example:
```<vscale x 4 x float> @llvm.aarch64.sve.fcvt.f32f16(<vscale x 4 x float>, <vscale x 4 x i1>, <vscale x 8 x half>)```
now uses <vscale x 4 x i1> instead of <vscale x 16 x i1>
And similar for:
```<vscale x 4 x float> @llvm.aarch64.sve.fcvt.f32f64(<vscale x 4 x float>, <vscale x 2 x i1>, <vscale x 2 x double>)```
where the predicate now matches the wider type, so <vscale x 2 x i1>.
Reviewers: efriedma, SjoerdMeijer, paulwalker-arm, rengolin
Reviewed By: efriedma
Tags: #clang
Differential Revision: https://reviews.llvm.org/D78402
Do not count the presence of debug insts against the limit set by
LdStLimit, and allow the optimizer to find matching insts by skipping
over debug insts.
Differential Revision: https://reviews.llvm.org/D78411
Summary:
The findSuitableCompare method can fail if debug instructions are
present in the MBB -- fix this by using helpers to skip over debug
insts.
Reviewers: aemerson, paquette
Subscribers: kristof.beyls, hiraditya, danielkiss, aprantl, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D78265
Summary:
This fixes several instances in which condbr optimization was missed
due to a debug instruction appearing as a bogus NZCV clobber.
Reviewers: aemerson, paquette
Subscribers: kristof.beyls, hiraditya, jfb, danielkiss, aprantl, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D78264
Summary:
Use the variants of these APIs which skip over debug instructions. This
is mostly a cleanup, but it does fix a debug-variance issue which causes
addsub-shifted.ll and addsub_ext.ll to fail when debug info is inserted
by -mir-debugify.
Reviewers: aemerson, paquette
Subscribers: kristof.beyls, hiraditya, jfb, danielkiss, llvm-commits, aprantl
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D78262
Summary:
Fix an issue where the presence of debug info could disable the ccmp
optimization due to findConvertibleCompare failing too early (the error
is "Can't create ccmp with multiple uses", where the "use" is a
DBG_VALUE inst).
Depends on D78151.
Reviewers: t.p.northover, paquette, aemerson
Subscribers: kristof.beyls, hiraditya, danielkiss, aprantl, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D78156
Summary:
Fix an issue where the presence of debug info could disable a peephole
optimization due to areCFlagsAccessedBetweenInstrs returning the wrong
result.
In test/CodeGen/AArch64/arm64-csel.ll, the issue was found in the
function @foo5, in which the first compare could successfully be
optimized but not the second.
Reviewers: t.p.northover, eastig, paquette
Subscribers: kristof.beyls, hiraditya, danielkiss, aprantl, dsanders, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D78157
Summary:
Fix an issue where the presence of debug info could disable a peephole
optimization in optimizeCompareInstr due to canInstrSubstituteCmpInstr
returning the wrong result.
Depends on D78137.
Reviewers: t.p.northover, eastig, paquette
Subscribers: kristof.beyls, hiraditya, danielkiss, aprantl, llvm-commits, dsanders
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D78151
Summary:
These helpers are exercised by follow-up commits in this patch series,
which is all about removing CodeGen differences with vs. without debug
info in the AArch64 backend.
Reviewers: fhahn, aprantl, jpaquette, paquette
Subscribers: kristof.beyls, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D78260
Preserving liveness can be useful even late in the pipeline, if we're
doing substantial optimization work afterwards. (See, for example,
D76065.) Teach MachineOutliner how to correctly set live-ins on the
basic block in outlined functions.
Differential Revision: https://reviews.llvm.org/D78605
Currently an indirect call produces the following sequence on PCRelative mode:
extern void function( );
extern void (*ptrfunc) ( );
void g() {
ptrfunc=function;
}
void f() {
(*ptrfunc) ( );
}
Producing
paddi 3, 0, .LC0@PCREL, 1
ld 3, 0(3)
std 2, 24(1)
ld 12, 0(3)
mtctr 12
bctrl
ld 2, 24(1)
Though the caller does not use or preserve r2, it is still saved and restored
across a function call. This patch is added to remove these redundant save and
restores for indirect calls.
Differential Revision: https://reviews.llvm.org/D77749
llvm/lib/Target/Hexagon/HexagonTargetObjectFile.cpp:296:11: warning: enumeration value 'ScalableVectorTyID' not handled in switch [-Wswitch]
switch (Ty->getTypeID()) {
^
Summary:
This commit recommits the reversion of https://reviews.llvm.org/D75039.
Concensus appears to be in favour of assembly-time resolution of
these ADR and LDR relocations, in line with GNU. The previous
backout broke many lld tests, now fixed by Peter Smith in
61bccda9d9.
Reviewers: psmith
Subscribers: kristof.beyls, hiraditya, danielkiss, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D78301
Add initial support for PC Relative addressing to get jump table base
address instead of using TOC.
Differential Revision: https://reviews.llvm.org/D75931
If a 16-bit thumb STM with writeback stores the base register but it isn't the
first register in the list, then an unknown value is stored. The load/store
optimizer knows this and generates a 32-bit STM without writeback instead, but
thumb2 size reduction converts it into a 16-bit STM. Fix this by having thumb2
size reduction notice such STMs and leave them as they are.
Differential Revision: https://reviews.llvm.org/D78493
This adds some extra processing into the Pre-RA ARM load/store optimizer
to detect and merge MVE loads/stores and adds of the same base. This we
don't always turn into a post-inc during ISel, and due to the nature of
it being a graph we don't always know an order to use for the nodes, not
knowing which nodes to make post-inc and which to use the new post-inc
of. After ISel, we have an order that we can use to post-inc the
following instructions.
So this looks for a loads/store with a starting offset of 0, and an
add/sub from the same base, plus a number of other loads/stores. We then
do some checks and convert the zero offset load/store into a postinc
variant. Any loads/stores after it have the offset subtracted from their
immediates. For example:
LDR #4 LDR #4
LDR #0 LDR_POSTINC #16
LDR #8 LDR #-8
LDR #12 LDR #-4
ADD #16
It only handles MVE loads/stores at the moment. Normal loads/store will
be added in a followup patch, they just have some extra details to
ensure that we keep generating LDRD/LDM successfully.
Differential Revision: https://reviews.llvm.org/D77813
Add 96-bit, 160-bit and 256-bit AReg classes to match VReg and SReg.
NFC as far as I know, but it may avoid weird legalization problems.
Differential Revision: https://reviews.llvm.org/D78348
Finding the loop tripcount is the first crucial step in preparing a loop for
tail-predication, and this adds a debug message if a tripcount cannot be found.
And while I was at it, I added some more comments here and there.
Differential Revision: https://reviews.llvm.org/D78485
Summary:
Changing all mnemonic to match assembly instructions to simplify mnemonic
naming rules. This time update all shift operation instructions. This also
corrects instruction's operation kinds.
Reviewed By: simoll
Differential Revision: https://reviews.llvm.org/D78468
Summary:
VE uses identical names "%s0-63" to all generic registers. Change to use
alternative name mechanism among all generic registers instead of hard-
coding them.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D78174
This patch exploits rldimi instruction for patterns like
`or %a, 0b000011110000`, which saves number of instructions when the
operand has only one use, compared with `li-ori-sldi-or`.
Reviewed By: nemanjai
Differential Revision: https://reviews.llvm.org/D77850
FMLA/FMLS f16 indexed patterns added.
Fixes https://bugs.llvm.org/show_bug.cgi?id=45467
Removed redundant v2f32 vector_extract indexed pattern since
Instruction Selection is able to match v4f32 instead.
xray_instr_map contains absolute addresses of sleds, which are relocated
by `R_*_RELATIVE` when linked in -pie or -shared mode.
By making these addresses relative to PC, we can avoid the dynamic
relocations and remove the SHF_WRITE flag from xray_instr_map. We can
thus save VM pages containg xray_instr_map (because they are not
modified).
This patch changes x86-64 and bumps the sled version to 2. Subsequent
changes will change powerpc64le and AArch64.
Reviewed By: dberris, ianlevesque
Differential Revision: https://reviews.llvm.org/D78082
This is an optimization that applies to global addresses and
allows for the following transformation:
Convert this:
paddi r3, 0, symbol@PCREL, 1
ld r4, 8(r3)
To this:
pld r4, symbol@PCREL+8(0), 1
An instruction is saved and the linker can do the addition when
the symbol is resolved.
Differential Revision: https://reviews.llvm.org/D76160
This was yet another function that had to be updated whenever you added
a new register class. Remove it by refactoring its only caller to use
standard helper functions from SIRegisterInfo.
Differential Revision: https://reviews.llvm.org/D78557
Summary:
Move the declarations of getThe<Name>Target() functions into a new header in
TargetInfo and make users of these functions include this new header in
order to follow other architectures.
Differential Revision: https://reviews.llvm.org/D78543
The logic in ARMParallelDSP is setup to merge two 16-bits loads into
a 32-bit load and feed them into the smlads. This requires that four
loads are combined for the four inputs, but there wasn't actually a
check for this.
Differential Revision: https://reviews.llvm.org/D78492
Summary:
Before this patch, `relaxInstruction` takes three arguments, the first
argument refers to the instruction before relaxation and the third
argument is the output instruction after relaxation. There are two quite
strange things:
1) The first argument's type is `const MCInst &`, the third
argument's type is `MCInst &`, but they may be aliased to the same
variable
2) The backends of ARM, AMDGPU, RISC-V, Hexagon assume that the third
argument is a fresh uninitialized `MCInst` even if `relaxInstruction`
may be called like `relaxInstruction(Relaxed, STI, Relaxed)` in a
loop.
In this patch, we drop the thrid argument, and let `relaxInstruction`
directly modify the given instruction. Also, this patch fixes the bug https://bugs.llvm.org/show_bug.cgi?id=45580, which is introduced by D77851, and
breaks the assumption of ARM, AMDGPU, RISC-V, Hexagon.
Reviewers: Razer6, MaskRay, jyknight, asb, luismarques, enderby, rtaylor, colinl, bcain
Reviewed By: Razer6, MaskRay, bcain
Subscribers: bcain, nickdesaulniers, nathanchance, wuzish, annita.zhang, arsenm, dschuff, jyknight, dylanmckay, sdardis, nemanjai, jvesely, nhaehnle, tpr, sbc100, jgravelle-google, kristof.beyls, hiraditya, aheejin, kbarton, fedor.sergeev, asb, rbar, johnrusso, simoncook, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, PkmX, jocewei, Jim, lenary, s.egerton, pzheng, sameer.abuasal, apazos, luismarques, kerbowa, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D78364
For the test case in this patch like below
struct t { int a; } __attribute__((preserve_access_index));
int foo(void *);
int test(struct t *arg) {
long param[1];
param[0] = (long)&arg->a;
return foo(param);
}
The IR right before BPF SimplifyPatchable phase:
%1:gpr = LD_imm64 @"llvm.t:0:0$0:0"
%2:gpr = LDD killed %1:gpr, 0
%3:gpr = ADD_rr %0:gpr(tied-def 0), killed %2:gpr
STD killed %3:gpr, %stack.0.param, 0
After SimplifyPatchable phase, the incorrect IR is generated:
%1:gpr = LD_imm64 @"llvm.t:0:0$0:0"
%3:gpr = ADD_rr %0:gpr(tied-def 0), killed %1:gpr
CORE_MEM killed %3:gpr, 306, %0:gpr, @"llvm.t:0:0$0:0"
Note that CORE_MEM pseudo op is introduced to encode
memory operations related to CORE. In the above, we intend
to check whether we have a store like
*(%3:gpr + 0) = ...
and if this is the case, we could replace it with
*(%0:gpr + @"llvm.t:0:0$0:0"_ = ...
Unfortunately, in the above, IR for the store is
*(%stack.0.param + 0) = %3:gpr
and transformation should not happen.
Note that we won't have problem if the actual CORE
dereference (arg->a) happens.
This patch fixed the problem by skip CORE optimization if
the use of ADD_rr result is not the base address of the store
operation.
Differential Revision: https://reviews.llvm.org/D78466
- Adding changes to support comments on outlined functions with outlining for the conditions through which it was outlined (e.g. Thunks, Tail calls)
- Adapts the emitFunctionHeader to print out a comment next to the header if the target specifies it based on information in MachineFunctionInfo
- Adds mir test for function annotiation
Differential Revision: https://reviews.llvm.org/D78062
Adds support for passing a ByVal formal argument completely on the stack
(ie after all argument registers are exhausted).
Differential Revision: https://reviews.llvm.org/D78263
Summary:
Similar to the CallLowering class used for lowering LLVM IR calls to MIR calls,
we introduce a separate class for lowering LLVM IR inline asm to MIR INLINEASM.
There is no functional change yet, all existing tests should pass.
Reviewers: arsenm, dsanders, aemerson, volkan, t.p.northover, paquette
Reviewed By: aemerson
Subscribers: gargaroff, wdng, mgorny, rovka, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D78316
Previously, the AVR backend would put functions in .progmem.data. This
is probably a regression from when functions still lived in address
space 0. With this change, only global constants are placed in
.progmem.data.
This is not complete: avr-gcc additionally respects -fdata-sections for
progmem global constants, which LLVM doesn't yet do. But fixing that is
a bit more complicated (and I believe other backends such as RISC-V
might also have similar issues).
Differential Revision: https://reviews.llvm.org/D78212
We don't need all of MCStreamer.h, just FormattedStream.h. The rest can be replaced with forward declarations.
X86WinAllocaExpander.cpp had an implicit dependency on MapVector.h which I've added locally.
Summary:
The SVE masked load and store intrinsics introduced in D76688 rely on
common llvm.masked.load/store nodes. This patch creates new ISD nodes
for LD1(S) & ST1 to remove this dependency.
Additionally, this adds support for sign & zero extending
loads and truncating stores.
Reviewers: sdesmalen, efriedma, cameron.mcinally, c-rhodes, rengolin
Reviewed By: efriedma
Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, danielkiss, andwar, cfe-commits, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D78204
Summary:
This commit reverts https://reviews.llvm.org/D75039. Concensus appears to
be in favour of assembly-time resolution of these ADR and LDR relocations,
in line with GNU.
Reviewers: psmith
Subscribers: kristof.beyls, hiraditya, danielkiss, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D78301
The API for shuffles and reductions uses generic Type parameters,
instead of VectorType, and so assertions and casts are used a lot.
This patch makes those types explicit, which means that the clients
can't be lazy, but results in less ambiguity, and that can only be a
good thing.
Bugzilla: https://bugs.llvm.org/show_bug.cgi?id=45562
Differential Revision: https://reviews.llvm.org/D78357
Summary:
In SjLj exception mode, the old landingpad BB will create a new landingpad BB and use indirect branch jump to the old landingpad BB in lowering.
So we should add 2 endbr for this exception model.
Reviewers: hjl.tools, craig.topper, annita.zhang, LuoYuanke, pengfei, efriedma
Reviewed By: LuoYuanke
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D77124
Summary:
When we encode an instruction, we need to know the number of bytes being
emitted to determine the fixups in `X86MCCodeEmitter::emitImmediate`.
There are only two callers for `emitImmediate`: `emitMemModRMByte` and
`encodeInstruction`.
Before this patch, we kept track of the current byte being emitted
by passing a reference parameter `CurByte` across all the `emit*`
funtions, which is ugly and unnecessary. For example, we don't have any
fixups when emitting prefixes, so we don't need to track this value.
In this patch, we use `StartByte` to record the initial status of the
streamer, and use `OS.tell()` to get the current status of the streamer
when we need to know the number of bytes being emitted. On one hand,
this eliminates the parameter `CurByte` for most `emit*` functions, on
the other hand, this make things clear: Only pass the parameter when we
really need it.
Reviewers: craig.topper, pengfei, MaskRay
Reviewed By: craig.topper, MaskRay
Subscribers: hiraditya, llvm-commits, annita.zhang
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D78419
gcc may warn here because X86ISD::NodeType is specified as "unsigned",
but ISD::NodeType is a naked C enum (although passed as an "unsigned"
throughout SDAG).
getFauxShuffle attempts to combine INSERT_VECTOR_ELT(TRUNCATE/EXTEND(EXTRACT_VECTOR_ELT(x))) patterns into a target shuffle chain.
PR45604 identified an issue where the scalar was truncated to a size smaller than the destination vector element and then zero extended back, which requires the upper bits to be zero'd which we don't currently do.
To avoid the bug I've added an early out in these truncation cases, a future commit should allow us to handle this by inserting the necessary SM_SentinelZero padding.
This is an enhancement to D77895 to avoid another
round-trip from XMM->GPR->XMM. This time we handle
the case of starting/ending with an f64 and casting
to signed i32 as the intermediate value.
It's a bit more involved than I initially assumed
because we need to use target-specific opcodes to
represent the non-standard cast ops.
Differential Revision: https://reviews.llvm.org/D78362
Reduce X86Subtarget.h/MCCodeEmitter.h/TargetMachine.h includes to forward declarations
Add explicit X86Subtarget.h/TargetMachine.h includes to X86AsmPrinter.cpp/X86MCInstLower.cpp
Remove unused MCSymbol forward declaration
We were only including MachineInstr.h for DebugLoc.h. This exposes an implicit include dependency in BTFDebug.h where I've had to add the MachineInstr.h include.
Summary:
The instruction in non-text section can not be executed, so they will not affect performance.
In addition, their encoding values are treated as data, so we should not touch them.
Reviewers: MaskRay, reames, LuoYuanke, jyknight
Reviewed By: MaskRay
Subscribers: annita.zhang, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D77971
[MachineOutliner] fix test for excluding CFI and add test to include CFI in outlining
New test to check that we only outline CFI instruction if all CFI
Instructions in the function would be captured by the outlining
adding x86 tests analagous to AARCH64 cfi tests
Revision: https://reviews.llvm.org/D77852
Starting with hasRedZone adding MachineFunctionInfo to be put in the YAML for MIR files.
Split out of: D78062
Based on implementation for MachineFunctionInfo for WebAssembly
Differential Revision: https://reviews.llvm.org/D78173
Patch by Andrew Litteken! (AndrewLitteken)
This reverts commit 17b1869b72.
It is an attempt to fix the failure reported at
The patch differs from the original one reviwed at
https://reviews.llvm.org/D77435 only for the use of the std::make_tuple
in building the return value of `findAddrModeSVELoadStore`:
- return {IsRegReg ? Opc_rr : Opc_ri, NewBase, NewOffset};
+ return std::make_tuple(IsRegReg ? Opc_rr : Opc_ri, NewBase,
the original patch submitted at
fc4e954ed5
was failing the following build:
http://lab.llvm.org:8011/builders/clang-armv7-linux-build-cache/builds/29420/
with error:
/home/buildslave/buildslave/clang-armv7-linux-build-cache/llvm/llvm/lib/Target/AArch64/AArch64ISelDAGToDAG.cpp
/home/buildslave/buildslave/clang-armv7-linux-build-cache/llvm/llvm/lib/Target/AArch64/AArch64ISelDAGToDAG.cpp:1439:10:
error: chosen constructor is explicit in copy-initialization
return {IsRegReg ? Opc_rr : Opc_ri, NewBase, NewOffset};
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/usr/bin/../lib/gcc/arm-linux-gnueabihf/5.4.0/../../../../include/c++/5.4.0/tuple:479:19:
note: explicit constructor declared here
constexpr tuple(_UElements&&... __elements)
^
1 error generated.
Summary:
This change is fixing an issue where the dagcombine incorrectly used an addressing mode with scaled offsets (indices), instead of unscaled offsets.
Those addressing modes do not exist for `prfh` , `prfw` and `prfd`, hence we can reuse `prfb` because that has unscaled offsets, and because the pseudo-code in the XML spec suggests that the element size is not used for the amount of data that is prefetched by the instruction.
FWIW, GCC also emits a `prfb` for these cases.
Reviewers: sdesmalen, andwar, rengolin
Reviewed By: sdesmalen
Subscribers: tschuett, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D78069
Summary:
Remove usages of asserting vector getters in Type in preparation for the
VectorType refactor. The existence of these functions complicates the
refactor while adding little value.
Reviewers: craig.topper, sdesmalen, efriedma, RKSimon
Reviewed By: efriedma
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D77264
Summary:
Fixed wrong conditions for generating (S[LR]I X, Y, C2) from
(or (and X, BvecC1), (lsl Y, C2)) and added ISel nodes to lower to S[LR]I. The
optimisation is also enabled by default now.
Subscribers: kristof.beyls, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D77387
Add initial support for PC Relative addressing for global values that
require GOT indirect addressing. This patch adds PCRelative support for
global addresses that may not be known at link time and may require
access through the GOT.
Differential Revision: https://reviews.llvm.org/D76064
Summary:
Introduce new helper functions getVGPRClassForBitWidth,
getAGPRClassForBitWidth, getSGPRClassForBitWidth and use them to
refactor various other functions that all contained their own lists of
valid register class widths. NFC.
Reviewers: arsenm, rampitec
Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, kerbowa, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D78311
Summary:
AIX symbol have qualname and unqualified name. The stock getSymbol
could only return unqualified name, which leads us to patch many
caller side(lowerConstant, getMCSymbolForTOCPseudoMO).
So we should try to address this problem in the callee
side(getSymbol) and clean up the caller side instead.
Note: this is a "mostly" NFC patch, with a fix for the original
lowerConstant behavior.
Differential Revision: https://reviews.llvm.org/D78045
Summary:
Use more logic and fewer tables. This reduces the line count and
reduces the effort required to introduce more register classes of
different sizes in future.
Reviewers: arsenm, rampitec, nhaehnle
Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, hiraditya, kerbowa, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D78351
Previous patch didn't handle the early return in `emitREXPrefix` correctly,
which causes REX prefix was not emitted for instruction without
operands. This patch includes the fix for that.
This does for G_EXTRACT_VECTOR_ELT what 588bd7be36 did for G_TRUNC.
Ideally types without a corresponding register class wouldn't reach
here, but we're currently missing some (in particular a 192-bit class
is missing).
This allows targets to know exactly which operands are contributing to
the dependency, which is required for targets with per-operand
scheduling models.
Differential Revision: https://reviews.llvm.org/D77135
Add patterns that use a normal, non-wrapping, add and sub nodes along
with an arm vshr imm node.
Differential Revision: https://reviews.llvm.org/D77065
Summary:
We determine the REX prefix used by instruction in `determineREXPrefix`,
and this value is used in `emitMemModRMByte' and used as the return
value of `emitOpcodePrefix`.
Before this patch, REX was passed as reference to `emitPrefixImpl`, it
is strange and not necessary, e.g, we have to write
```
bool Rex = false;
emitPrefixImpl(CurOp, CurByte, Rex, MI, STI, OS);
```
in `emitPrefix` even if `Rex` will not be used.
So we let HasREX be the return value of `emitPrefixImpl`. The HasREX is passed
from `emitREXPrefix` to `emitOpcodePrefix` and then to
`emitPrefixImpl`. This makes sense since REX is a kind of opcode prefix
and of course is a prefix.
Reviewers: craig.topper, pengfei
Reviewed By: craig.topper
Subscribers: annita.zhang, craig.topper, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D78276
If we are and the constant like 0xFFFFFFC00000, for now, we are using several
instructions to generate this 48bit constant and final an "and". However, we
could exploit it with two rotate instructions.
MB ME MB+63-ME
+----------------------+ +----------------------+
|0000001111111111111000| -> |0000000001111111111111|
+----------------------+ +----------------------+
0 63 0 63
Rotate left ME + 1 bit first, and then, mask it with (MB + 63 - ME, 63),
finally, rotate back. Notice that, we need to round it with 64 bit for the
wrapping case.
Reviewed by: ChenZheng, Nemanjai
Differential Revision: https://reviews.llvm.org/D71831
Add DestructiveBinaryImm SQSHLU patterns and tests. These patterns allow the SQSHLU instruction to match with a MOVPRFX.
Differential Revision: https://reviews.llvm.org/D76728
This patch adds PC Relative support for global values that are known at link
time. If a global value requires access through the global offset table (GOT)
it is not covered in this patch.
Differential Revision: https://reviews.llvm.org/D75280
These are needed as a counterpart for VGPR subregs even though
there are no scalar instructions which can operate 16 bit values.
When we are materializing a constant that is done into an SGPR
and that SGPR may/will be copied into a 16 bit VGPR subreg. Such
copy is illegal. There are also similar problems if a source
operand of a 16 bit VALU instruction is an SGPR. In addition
we need to get a register with a lo16 subregister of an SGPR
RC during selection and this fails as well.
All of that makes me believe we need these subregisters as a
syntactic glue.
Differential Revision: https://reviews.llvm.org/D78250
Fix for the address optimization for gathers and scatters which would in
some complex cases push out instructions not to the vector loop preheader,
but to other locations as well which lead to a scrambled order and the
compilation failing.
This patch ensures that said instructions are always pushed to the end
of the vector loop preheader.
Differential Revision: https://reviews.llvm.org/D78293
Summary:
When doing the conversion: MachineInst -> MCInst, we should ignore the
implicit operands, it will expose more opportunity for InstiAlias.
Reviewed By: steven.zhang
Differential Revision: https://reviews.llvm.org/D77118
Summary:
Changing all mnemonic to match assembly instructions to simplify mnemonic
naming rules. This time update all fixed-point arithmetic instructions.
This also corrects bswp operand type.
Reviewed By: simoll
Differential Revision: https://reviews.llvm.org/D78177
Summary:
The INLINEASM MIR instructions use immediate operands to encode the values of some operands.
The MachineInstr pretty printer function already handles those operands and prints human readable annotations instead of the immediates. This patch adds similar annotations to the output of the MIRPrinter, however uses the new MIROperandComment feature.
Reviewers: SjoerdMeijer, arsenm, efriedma
Reviewed By: arsenm
Subscribers: qcolombet, sdardis, jvesely, wdng, nhaehnle, hiraditya, jrtc27, atanasyan, kerbowa, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D78088
Summary:
The function in X86MCCodeEmitter has too many parameters to make it look
messy, and some parameters are unnecessary. This is the first patch to
reduce their parameters.
The follwing operations are cheap
```
unsigned Opcode = MI.getOpcode();
const MCInstrDesc &Desc = MCII.get(Opcode);
uint64_t TSFlags = Desc.TSFlags;
```
So if we pass a `MCInst`, we don't need to pass `MCInstrDesc`;
if we pass a `MCInstrDesc`, we don't need to pass `TSFlags`.
Reviewers: craig.topper, MaskRay, pengfei
Reviewed By: craig.topper
Subscribers: annita.zhang, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D78180
Summary:
The renaming is necessary to make the naming scheme uniform with other
gather/scatter load/stores SVE intrinsics.
The naming of variables and functions have been adapted to make it
explicit whether we are dealing with a scalar offset (which is
unscaled) or an index (which is scaled according to the data type of
the lanes of the vector).
Reviewers: andwar, sdesmalen, rengolin
Reviewed By: andwar
Subscribers: tschuett, hiraditya, arphaman, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D77839
We have added code to correct the .localentry values on assignments. However, we
never clear the set so presumably it will still contain the (now dangling)
MCSymbol pointers across a call to finish() and reset() in the streamer.
This is based on my speculation that it is the reason we are getting
segmentation faults mentioned in https://bugs.llvm.org/show_bug.cgi?id=45366
Fixes: https://bugs.llvm.org/show_bug.cgi?id=45366
Differential revision: https://reviews.llvm.org/D78196
This moves v32i16/v64i8 to a model consistent with how we
treat integer types with avx1.
This does change the ABI for types vXi16/vXi8 vectors larger than
512 bits to pass in multiple zmms instead of multiple ymms. We'd
already hacked some code to make v64i8/v32i16 pass in zmm.
Cost model is still a bit of a mess. In some place I tried to
match existing behavior. But really we need to account for
splitting and concating costs. Cost model for shuffles is
especially pessimistic.
Differential Revision: https://reviews.llvm.org/D76212
-Consistently name the functions as split*
-Add a helper for doing the two extractSubvector calls and determining the size of the split
-Use getSplitDestVTs to get the result type for the split node.
-Move the binary and unary helper to one place in the file near the extractSubvector functions. Left the VSETCC one near LowerVSETCC since that's its only caller.
-Remove the 256/512 wrappers that just had asserts. I don't think they provided a lot of value and now with the routines called split* the call sites are more obvious what they do.
-Make the unary routine support different source and dest types to support D76212.
-Add some weaker asserts into the helpers to make up for losing the very specific asserts from the 256/512 wrappers.
Differential Revision: https://reviews.llvm.org/D78176
Summary:
No error or warning is emitted when specific reserved registers are
written to in inline assembly. Therefore, writes to the program counter
or to the frame pointer, for instance, were permitted, which could have
led to undesirable behaviour.
Example:
int foo() {
register int a __asm__("r7"); // r7 = frame-pointer in M-class ARM
__asm__ __volatile__("mov %0, r1" : "=r"(a) : : );
return a;
}
In contrast, GCC issues an error in the same scenario.
This patch detects writes to specific reserved registers in inline
assembly for ARM and emits an error in such case. The detection works
for output and input operands. Clobber operands are not handled here:
they are already covered at a later point in
AsmPrinter::emitInlineAsm(const MachineInstr *MI). The registers
covered are: program counter, frame pointer and base pointer.
This is ARM only. Therefore the implementation of other targets'
counterparts remain open to do.
Reviewers: efriedma
Reviewed By: efriedma
Subscribers: kristof.beyls, hiraditya, danielkiss, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D76848
An irreducible SCC is one which has multiple "header" blocks, i.e., blocks
with control-flow edges incident from outside the SCC. This pass converts an
irreducible SCC into a natural loop by introducing a single new header
block and redirecting all the edges on the original headers to this
new block.
This is a useful workaround for a limitation in the structurizer
which, which produces incorrect control flow in the presence of
irreducible regions. The AMDGPU backend provides an option to
enable this pass before the structurizer, which may eventually be
enabled by default.
Reviewed By: nhaehnle
Differential Revision: https://reviews.llvm.org/D77198
This restores commit 2ada8e2525.
Originally reverted with commit 44e09b59b8.
Summary:
Changing all mnemonic to match assembly instructions to simplify mnemonic
naming rules. This time update all fixed-point arithmetic instructions.
This also corrects smax/smin code generations.
Reviewed By: simoll
Differential Revision: https://reviews.llvm.org/D77856
This reverts commit 2ada8e2525.
Buildbots produced compilation errors which I was not able to quickly
reproduce locally. Need more time to investigate.
An irreducible SCC is one which has multiple "header" blocks, i.e., blocks
with control-flow edges incident from outside the SCC. This pass converts an
irreducible SCC into a natural loop by introducing a single new header
block and redirecting all the edges on the original headers to this
new block.
This is a useful workaround for a limitation in the structurizer
which, which produces incorrect control flow in the presence of
irreducible regions. The AMDGPU backend provides an option to
enable this pass before the structurizer, which may eventually be
enabled by default.
Reviewed By: nhaehnle
Differential Revision: https://reviews.llvm.org/D77198
Summary:
Only first-faulting and non-faulting loads read/update the FFR register
and hence only the corresponding SDNodes should be decorated with
`SDNPOptInGlue` and `SDNPOutGlue`. This patch:
* removes SDNPOptInGlue from regular loads stores (FFR is not read)
* adds SDNPOutGlue to first-faulting and non-faulting loads (FFR is
both read and updated)
Differential Revision: https://reviews.llvm.org/D77724
The pass was incorrectly reverting back to a "T" when something wrote
to VPR inside a "E" block. This is not the correct behaviour, the
predicate should stay the same.
Differential Revision: https://reviews.llvm.org/D77798
It can be used to avoid passing the begin and end of a range.
This makes the code shorter and it is consistent with another
wrappers we already have.
Differential revision: https://reviews.llvm.org/D78016
Summary:
Creates the SVEIntrinsicOpts pass. In this patch, the pass tries
to remove unnecessary reinterpret intrinsics which convert to
and from svbool_t (llvm.aarch64.sve.convert.[to|from].svbool)
For example, the reinterprets below are redundant:
%1 = call <vscale x 16 x i1> @llvm.aarch64.sve.convert.to.svbool.nxv4i1(<vscale x 4 x i1> %a)
%2 = call <vscale x 4 x i1> @llvm.aarch64.sve.convert.from.svbool.nxv4i1(<vscale x 16 x i1> %1)
The pass also looks for ptest intrinsics and phi instructions where
the operands are being needlessly converted to and from svbool_t.
Reviewers: sdesmalen, andwar, efriedma, cameron.mcinally, c-rhodes, rengolin
Reviewed By: efriedma
Subscribers: mgorny, tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, danielkiss, cfe-commits, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D76078
The _GLOBAL_OFFSET_TABLE_ in SysVr4 ELF is conventionally the base of the
.got or .got.prel sections. Expressions such as _GLOBAL_OFFSET_TABLE_
- (.L1 +8) are used in assembler code to calculate offsets into the .got.
At present MC outputs a R_ARM_REL32 with respect to the
_GLOBAL_OFFSET_TABLE_ symbol, whereas gas outputs a R_ARM_BASE_PREL
relocation with respect to the _GLOBAL_OFFSET_TABLE_ symbol. While both are
correct the R_ARM_REL32 depends on the value of the _GLOBAL_OFFSET_TABLE_
symbol, wheras te R_ARM_BASE_PREL relocation is idependent of the symbol.
The R_ARM_BASE_PREL is therefore slightly more robust to linker's that may
not follow the conventional placement of _GLOBAL_OFFSET_TABLE_; for example
LLD for some time defined _GLOBAL_OFFSET_TABLE_ to 0.
Differential Revision: https://reviews.llvm.org/D46319
Summary:
This iz pattern is a special pattern of im pattern. This im pattern
has been supported by https://reviews.llvm.org/D77769, so removing
iz pattern as a continuous patch.
Reviewed By: simoll
Differential Revision: https://reviews.llvm.org/D77770
Ports the existing DAG combines, minus the simplify demanded bits
which seems to have no equivalent now. Without these, this isn't
particularly helpful in most of the IR sample cases.
Summary:
In CFGStackify, `fixUnwindMismatches` function fixes unwind destination
mismatches created by `try` marker placement. For example,
```
try
...
call @qux ;; This should throw to the caller!
catch
...
end
```
When `call @qux` is supposed to throw to the caller, it is possible that
it is wrapped inside a `catch` so in case it throws it ends up unwinding
there incorrectly. (Also it is possible `call @qux` is supposed to
unwind to another `catch` within the same function.)
To fix this, we wrap this inner `call @qux` with a nested
`try`-`catch`-`end` sequence, and within the nested `catch` body, branch
to the right destination:
```
block $l0
try
...
try ;; new nested try
call @qux
catch ;; new nested catch
local.set n ;; store exnref to a local
br $l0
end
catch
...
end
end
local.get n ;; retrieve exnref back
rethrow ;; rethrow to the caller
```
The previous algorithm placed the nested `try` right before the `call`.
But it is possible that there are stackified instructions before the
call from which the call takes arguments.
```
try
...
i32.const 5
call @qux ;; This should throw to the caller!
catch
...
end
```
In this case we have to place `try` before those stackified
instructions.
```
block $l0
try
...
try ;; this should go *before* 'i32.const 5'
i32.const 5
call @qux
catch
local.set n
br $l0
end
catch
...
end
end
local.get n
rethrow
```
We correctly handle this in the first normal `try` placement phase
(`placeTryMarker` function), but failed to handle this in this
`fixUnwindMismatches`.
Reviewers: dschuff
Subscribers: sbc100, jgravelle-google, hiraditya, sunfish, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D77950
in the same section.
This allows specifying BasicBlock clusters like the following example:
!foo
!!0 1 2
!!4
This places basic blocks 0, 1, and 2 in one section in this order, and
places basic block #4 in a single section of its own.
Since 1725f28841, this should check
isFMADLegalForFAddFSub rather than the the plain isOperationLegal.
This would assert in a subset of cases due to an oddity in how FMAD is
selected. We will allow FMA formation pre-legalize, but not FMAD even
in cases where it would be valid.
The current hook requires passing in the root fadd/fsub. However, in
this distributed case, this would be far more complicated to pass in
the relevant operand. AMDGPU doesn't get any value from the node, and
only needs the type and is the only implementor, so I'm not sure why
we have this complexity. Just rename and expand the assert to avoid
the more complicated checks spread through the distribution logic.
The shuffle decoding is used by X86ISelLowering and
MCTargetDesc/X86InstComments. The latter used to be in a
separate InstPrinter library. The Utils library existed to allow
InstPrinter and CodeGen to share the shuffle decoding. Since
X86InstComments now lives in the MCTargetDesc, which CodeGen
already depends on, we can sink the shuffle decoding there as well.
Differential Revision: https://reviews.llvm.org/D77980
Summary:
Fold (shift (shift X, C2), C1) -> (shift X, (C1 + C2)) for logical as
well as arithmetic shifts. This is needed to prevent regressions from
an upcoming funnel shift expansion change.
While we're here, fold (VSRAI -1, C) -> -1 too.
Reviewers: RKSimon, craig.topper
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D77300
Improve the chances of folding the writemask into the combined shuffle by scaling a wider shuffle mask to match the root's original type.
This creates a few minor issues with variable shuffles, preventing combines of shuffles because of the more limited support binary shuffle types. In most cases we're probably better off combining the shuffles and losing the writemask fold, but this isn't always going to be true.
Summary:
Updated CallPromotionUtils and impacted sites. Parameters that are
expected to be non-null, and return values that are guranteed non-null,
were replaced with CallBase references rather than pointers.
Left FIXME in places where more changes are facilitated by CallBase, but
aren't CallSites: Instruction* parameters or return values, for example,
where the contract that they are actually CallBase values.
Reviewers: davidxl, dblaikie, wmi
Reviewed By: dblaikie
Subscribers: arsenm, jvesely, nhaehnle, eraman, hiraditya, kerbowa, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D77930
As discussed in PR36617:
https://bugs.llvm.org/show_bug.cgi?id=36617#c13
...we can avoid the likely slow round-trip from XMM to GPR to XMM
by using the vector versions of the convert instructions.
Based on experimental results from recent Intel/AMD chips, we don't
need to worry about triggering denorm stalls while operating on
garbage data in the high lanes with convert instructions, so this is
expected to always be as good or better perf than the scalar
instruction equivalent. FP exceptions are also not a concern because
strict code should not be using the regular SDAG opcodes.
Differential Revision: https://reviews.llvm.org/D77895
A lot of vectorized code doesn't use masks so we shouldn't penalize them by not doing shuffle combining on avx512 targets.
I've added support for VALIGNQ/VALIGND and 512-bit SHUF128 to prevent some regressions. I also prevented recombining 256-bit SHUF128 to PERM2X128. We may not need to add 256-bit SHUF128 support, but I don't think I found any cases requiring that in my testing.
Differential Revision: https://reviews.llvm.org/D77928
-Drop llvm:: on MVT::i32
-Use getValueType instead of getSimpleValueType for an equality
check just cause its shorter and doesn't matter.
-Don't create a const SDValue & since its cheap to copy.
-Remove explicit case from MVT enum to EVT.
-Add message to assert.
Selection would fail after the post legalize combiner put an illegal
zextload back together.
The base combiner has parameter to only allow legal operations, but
they appear to not be used. I also don't see a nice way to remove a
single entry from all_combines, so just hack around this.
As proposed in D77881, we'll have the related widening operation,
so this name becomes too vague.
While here, change the function signature to take an 'int' rather
than 'size_t' for the scaling factor, add an assert for overflow of
32-bits, and improve the documentation comments.
This is the same as what was done to the CallLoweringInfo in
TargetLowering.h in r309159.
This is just a step on the way to replacing this with CallBase.
There was another issue introduced by this commit that the OP
initially missed. Namely, for functions that are free to use
R2 as a callee-saved register, we emit a TOC expression based
on the address of the GEP label without emitting the GEP label.
Since we only emit such expressions for the large code model, this
issue only surfaced there.
I have confirmed that with this fix, the kernel build is successful
with target "all".
Summary:
Remove usages of asserting vector getters in Type in preparation for the
VectorType refactor. The existence of these functions complicates the
refactor while adding little value.
Reviewers: sdesmalen, efriedma, jonpa
Reviewed By: sdesmalen
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D77265
Remove a number of includes that aren't necessary (nor are we relying on the remaining includes to provide the declarations), we just needed a llvm::Instruction forward declaration.
This exposed a couple of source files that were implicitly replying on the includes for their use of llvm::SmallSet or std::set, requiring local includes to be added there instead.
These will be widened in the DAG. In the meanwhile early
widening prevents otherwise possible vectorization of
such loads.
Differential Revision: https://reviews.llvm.org/D77835
If we're inserting into v2i8/v4i8/v8i8/v2i16/v4i16 style sub-128bit vectors ensure we don't use the SK_PermuteTwoSrc cost of the legalized value type - this is a followup to rG12c629ec6c59 which added equivalent sub-128bit shuffle costs
In LowerFP_TO_INTForReuse, when emitting `stfiwx`, alignment of 4 is
set for the `MachineMemOperand`, but RLI(ReuseLoadInfo)'s alignment is
not updated for following loads.
It's related to failed alignment check reported in
https://bugs.llvm.org/show_bug.cgi?id=45297
Differential Revision: https://reviews.llvm.org/D77624
The transformation currently does not differentiate between explicit
and implicit kills. However, it is not valid to later simply clear
an implicit kill flag since the kill could be due to a call or return.
Fixes: https://bugs.llvm.org/show_bug.cgi?id=45374
Summary:
D74269 added debug info to newly created instructions, including calls
to `malloc` and `free`, by taking debug info from existing instructions
around, whose debug info may or may not be empty.
But there are cases debug info is required by the IR verifier: when both
the caller and the callee functions have DISubprograms, meaning we
already have declarations to `malloc` or `free` with a DISubprogram
attached, newly created calls to `malloc` and `free` should have
non-empty debug info. This patch creates a non-empty dummy debug info in
this case to those calls to make the IR verifier pass.
Fixes https://bugs.llvm.org/show_bug.cgi?id=45461.
Reviewers: dschuff
Subscribers: aprantl, sbc100, jgravelle-google, hiraditya, sunfish, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D77784
This is a fix for the previous patch 6c4b40def7.
In some cases it may be possible to have the compiler produce st_other=1 without
the compiler using mcpu=future which should not be the case. This patch adds a
guard to make sure that if we are using st_other=1 then we are also compiling
for future CPU.
This is similar to what I recently did for getArithmeticReductionCost.
I'm trying to account for the narrowing from 512->256->128 as we go.
I've also added a new helper method getMinMaxCost that tries to
handle the cases where we have native min/max instructions and
fall back to cmp+select when we don't.
Differential Revision: https://reviews.llvm.org/D76634
When we try to select a SELECT_CC on Power9, we check if it can be matched to a
SETB instruction. In that function, we assert that the output type is i32/i64.
This is unnecessary as it is perfectly reasonable to have an i1 SELECT_CC.
Change that from an assert to an early exit condition.
Fixes: https://bugs.llvm.org/show_bug.cgi?id=45448
to store to a stack location for outgoing args.
During call arg lowering we shouldn't be modifying SP so cache the SP copy
vreg for subsequent uses.
Gives a 0.2% geomean code size improvement on CTMark.
Differential Revision: https://reviews.llvm.org/D77838
Summary:
Remove usages of asserting vector getters in Type in preparation for the
VectorType refactor. The existence of these functions complicates the
refactor while adding little value.
Reviewers: mcrosier, efriedma, sdesmalen
Reviewed By: efriedma
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D77269
Summary:
Remove usages of asserting vector getters in Type in preparation for the
VectorType refactor. The existence of these functions complicates the
refactor while adding little value.
Reviewers: arsenm, efriedma, sdesmalen
Reviewed By: arsenm
Subscribers: wdng, arsenm, jvesely, nhaehnle, hiraditya, kerbowa, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D77268
Summary:
Remove usages of asserting vector getters in Type in preparation for the
VectorType refactor. The existence of these functions complicates the
refactor while adding little value.
Reviewers: grosbach, efriedma, sdesmalen
Reviewed By: efriedma
Subscribers: hiraditya, dmgreen, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D77271
v2i8/v4i8/v8i8 + v2i16/v4i16 all show up in vectorizer code and by just using the legalized types (v16i8/v8i16) we're highly exaggerating the actual cost of the shuffle.
This adds the instruction encoding and mnenomics for the proposed
RISC-V Bit Manipulation extension (version 0.92). It is implemented with
each category of instruction as its own target feature, with the 'b'
extension feature enabling all options. Since this extension is not yet
ratified, all target features are prefixed with 'experimental-' to note
their status.
Differential Revision: https://reviews.llvm.org/D65649
Summary:
This patch adds support for handling of variadic functions for AIX.
This includes ensuring that use and consume correct type of
va_list (char *va_list) for AIX.
Authored by: ZarkoCA
Reviewers: cebowleratibm, sfertile, jasonliu
Reviewed by: jasonliu
Differential Revision: https://reviews.llvm.org/D76130
Add initial support for PC Relative addressing for constant pool loads.
This includes adding a new relocation for @pcrel and adding a new PowerPC flag
to identify PC relative addressing.
Differential Revision: https://reviews.llvm.org/D74486