This reverts the AMDGPU DAG mutation implemented in D72487 and gives
a more general way of adjusting BUNDLE operand latency.
It also replaces FixBundleLatencyMutation with adjustSchedDependency
callback in the AMDGPU, fixing not only successor latencies but
predecessors' as well.
Differential Revision: https://reviews.llvm.org/D72535
Add a predicate to MCInstDesc that allows tools to determine whether an
instruction authenticates a pointer. This can be used by diagnostic
tools to hint at pointer authentication failures.
Differential Revision: https://reviews.llvm.org/D70329
rdar://55089604
ONE is currently softened to OGT | OLT. But the libcalls for OGT and OLT libcalls will trigger an exception for QNAN. At least for X86 with libgcc. UEQ on the other hand uses UO | OEQ. The UO and OEQ libcalls will not trigger an exception for QNAN.
This patch changes ONE to use the inverse of the UEQ lowering. So we now produce O & UNE. Technically the existing behavior was correct for a signalling ONE, but since I don't know how to generate one of those from clang that seemed like something we can deal with later as we would need to fix other predicates as well. Also removing spurious exceptions seemed better than missing an exception.
There are also problems with quiet OGT/OLT/OLE/OGE, but those are harder to fix.
Differential Revision: https://reviews.llvm.org/D72477
This system wasn't very well designed for multi-result nodes. As
a consequence they weren't consistently registered in the
LegalizedNodes map leading to nodes being revisited for different
results.
I've removed the "Result" variable from the main LegalizeOp method
and used a SDNode* instead. The result number from the incoming
Op SDValue is only used for deciding which result to return to the
caller. When LegalizeOp is called it should always register a
legalized result for all of its results. Future calls for any other
result should be pulled for the LegalizedNodes map.
Legal nodes will now register all of their results in the map
instead of just the one we were called for.
The Expand and Promote handling to use a vector of results similar
to LegalizeDAG. Each of the new results is then re-legalized and
logged in the LegalizedNodes map for all of the Results for the
node being legalized. None of the handles register their own
results now. And none call ReplaceAllUsesOfValueWith now.
Custom handling now always passes result number 0 to LowerOperation.
This matches what LegalizeDAG does. Since the introduction of
STRICT nodes, I've encountered several issues with X86's custom
handling being called with an SDValue pointing at the chain and
our custom handlers using that to get a VT instead of result 0.
This should prevent us from having any more of those issues. On
return we will update the LegalizedNodes map for all results so
we shouldn't call the custom handler again for each result number.
I want to push SDNode* further into the Expand and Promote
handlers, but I've left that for a follow to keep this patch size
down. I've created a dummy SDValue(Node, 0) to keep the handlers
working.
Differential Revision: https://reviews.llvm.org/D72224
The Linux kernel uses -fpatchable-function-entry to implement DYNAMIC_FTRACE_WITH_REGS
for arm64 and parisc. GCC 8 implemented
-fpatchable-function-entry, which can be seen as a generalized form of
-mnop-mcount. The N,M form (function entry points before the Mth NOP) is
currently only used by parisc.
This patch adds N,0 support to AArch64 codegen. N is represented as the
function attribute "patchable-function-entry". We will use a different
function attribute for M, if we decide to implement it.
The patch reuses the existing patchable-function pass, and
TargetOpcode::PATCHABLE_FUNCTION_ENTER which is currently used by XRay.
When the integrated assembler is used, __patchable_function_entries will
be created for each text section with the SHF_LINK_ORDER flag to prevent
--gc-sections (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93197) and
COMDAT (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93195) issues.
Retrospectively, __patchable_function_entries should use a PC-relative
relocation type to avoid the SHF_WRITE flag and dynamic relocations.
"patchable-function-entry"'s interaction with Branch Target
Identification is still unclear (see
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92424 for GCC discussions).
Reviewed By: peter.smith
Differential Revision: https://reviews.llvm.org/D72215
Summary:
This patch pushes the AIX vararg unimplemented error diagnostic later
and allows vararg calls so long as all the arguments can be passed in register.
This patch extends the AIX calling convention implementation to initialize
GPR(s) for vararg float arguments. On AIX, both GPR(s) and FPR are allocated
for floating point arguments. The GPR(s) are only initialized for vararg calls,
otherwise the callee is expected to retrieve the float argument in the FPR.
f64 in AIX PPC32 requires special handling in order to allocated and
initialize 2 GPRs. This is performed with bitcast, SRL, truncation to
initialize one GPR for the MSW and bitcast, truncations to initialize
the other GPR for the LSW.
A future patch will follow to add support for arguments passed on the stack.
Patch provided by: cebowleratibm
Reviewers: sfertile, ZarkoCA, hubert.reinterpretcast
Differential Revision: https://reviews.llvm.org/D71013
We only use lowerShuffleAsLanePermuteAndShuffle for unary shuffles at the moment, but we should consistently handle lane index calculations for multiple inputs in both the AVX1 and AVX2 paths.
Minor (almost NFC) tidyup as I'm hoping to use lowerShuffleAsLanePermuteAndShuffle for binary shuffles soon.
Previously extern function is added as BTF_KIND_VAR. This does not work
well with existing BTF infrastructure as function expected to use
BTF_KIND_FUNC and BTF_KIND_FUNC_PROTO.
This patch added extern function to BTF_KIND_FUNC. The two bits 0:1
of btf_type.info are used to indicate what kind of function it is:
0: static
1: global
2: extern
Differential Revision: https://reviews.llvm.org/D71638
Summary:
Avoid using the `nocf_check` attribute with Control Flow Guard. Instead, use a
new `"guard_nocf"` function attribute to indicate that checks should not be
added on indirect calls within that function. Add support for
`__declspec(guard(nocf))` following the same syntax as MSVC.
Reviewers: rnk, dmajor, pcc, hans, aaron.ballman
Reviewed By: aaron.ballman
Subscribers: aaron.ballman, tomrittervg, hiraditya, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D72167
Unlike most of our errors in the debug line parser, the "no end of
sequence" message was missing any reference to which line table it
refererred to. This change adds the offset to this message.
Reviewed by: dblaikie
Differential Revision: https://reviews.llvm.org/D72443
In D71841 we inverted the sense of the SDNode-level flag to ensure all nodes
default to potentially raising FP exceptions unless otherwise specified --
i.e. if we forget to propagate the flag somewhere, the effect is now only
lost performance, not incorrect code.
However, the related flag at the MI level still defaults to nodes not raising
FP exceptions unless otherwise specified. To be fully on the (conservatively)
safe side, we should invert that flag as well.
This patch does so by replacing MIFlag::FPExcept with MIFlag::NoFPExcept.
(Note that this does also introduce an incompatible change in the MIR format.)
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D72466
Changed ThreadPoolExecutor to no longer use detached threads and instead
to join threads on destruction. This is to prevent intermittent crashing
on Windows when doing a normal full exit, e.g. via exit().
Changed ThreadPoolExecutor to be a ManagedStatic so that it can be
stopped on llvm_shutdown(). Without this, it would only be stopped in
the destructor when doing a full exit. This is required to avoid
intermittent crashing on Windows due to a race condition between the
ThreadPoolExecutor starting up threads and the process doing a fast
exit, e.g. via _exit().
The Windows crashes appear to only occur with the MSVC static runtimes
and are more frequent with the debug static runtime.
These changes also prevent intermittent deadlocks on exit with the MinGW
runtime.
Differential Revision: https://reviews.llvm.org/D70447
Summary:
This patch fixes pr23772 [ARM] r226200 can emit illegal thumb2 instruction: "sub sp, r12, #80".
The violation was that SUB and ADD (reg, immediate) instructions can only write to SP if the source register is also SP. So the above instructions was unpredictable.
To enforce that the instruction t2(ADD|SUB)ri does not write to SP we now enforce the destination register to be rGPR (That exclude PC and SP).
Different than the ARM specification, that defines one instruction that can read from SP, and one that can't, here we inserted one that can't write to SP, and other that can only write to SP as to reuse most of the hard-coded size optimizations.
When performing this change, it uncovered that emitting Thumb2 Reg plus Immediate could not emit all variants of ADD SP, SP #imm instructions before so it was refactored to be able to. (see test/CodeGen/Thumb2/mve-stacksplot.mir where we use a subw sp, sp, Imm12 variant )
It also uncovered a disassembly issue of adr.w instructions, that were only written as SUBW instructions (see llvm/test/MC/Disassembler/ARM/thumb2.txt).
Reviewers: eli.friedman, dmgreen, carwil, olista01, efriedma
Reviewed By: efriedma
Subscribers: john.brawn, efriedma, ostannard, kristof.beyls, hiraditya, dmgreen, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D70680
Teach SCEV about the @loop.decrement.reg intrinsic, which has exactly the same
semantics as a sub expression. This allows us to query hardware-loops, which
contain this @loop.decrement.reg intrinsic, so that we can calculate iteration
counts, exit values, etc. of hardwareloops.
This "int_loop_decrement_reg" intrinsic is defined as "IntrNoDuplicate". Thus,
while hardware-loops and tripcounts now become analysable by SCEV, this
prevents the usual loop transformations from applying transformations on
hardware-loops, which is what we want at this point, for which I have added
test cases for loopunrolling and IndVarSimplify and LFTR.
Differential Revision: https://reviews.llvm.org/D71563
PowerPC uses a dedicated method to check if the machine instr is
predicable by opcode. However, there's a bit `isPredicable` in instr
definition. This patch removes the method and set the bit only to
opcodes referenced in it.
Differential Revision: https://reviews.llvm.org/D71921
Memory instruction widening recipes use the pointer operand of their load/store
ingredient for generating the needed GEPs, making it difficult to feed these
recipes with pointers based on other ingredients or none at all.
This patch modifies these recipes to use a VPValue for the pointer instead, in
order to reduce ingredient def-use usage by ILV as a step towards full
VPlan-based def-use relations. The recipes are constructed with VPValues bound
to these ingredients, maintaining current behavior.
Differential revision: https://reviews.llvm.org/D70865
down to pass builder in ltobackend.
Currently CodeGenOpts like UnrollLoops/VectorizeLoop/VectorizeSLP in clang
are not passed down to pass builder in ltobackend when new pass manager is
used. This is inconsistent with the behavior when new pass manager is used
and thinlto is not used. Such inconsistency causes slp vectorization pass
not being enabled in ltobackend for O3 + thinlto right now. This patch
fixes that.
Differential Revision: https://reviews.llvm.org/D72386
If an SGPR vector is indexed with a VGPR, the actual indexing will be
done on the SGPR and produce an SGPR. A copy needs to be inserted
inside the waterwall loop to the VGPR result.
The current implementation assumes there is an instruction associated
with the transform, but this is not the case for
timm/TargetConstant/immarg values. These transforms should directly
operate on a specific MachineOperand in the source
instruction. TableGen would assert if you attempted to define an
equivalent GISDNodeXFormEquiv using timm when it failed to find the
instruction matcher.
Specially recognize SDNodeXForms on timm, and pass the operand index
to the render function.
Ideally this would be a separate render function type that looks like
void renderFoo(MachineInstrBuilder, const MachineOperand&), but this
proved to be somewhat mechanically painful. Add an optional operand
index which will only be passed if the transform should only look at
the one source operand.
Theoretically it would also be possible to only ever pass the
MachineOperand, and the existing renderers would check the parent. I
think that would be somewhat ugly for the standard usage which may
want to inspect other operands, and I also think MachineOperand should
eventually not carry a pointer to the parent instruction.
Use it in one sample pattern. This isn't a great example, since the
transform exists to satisfy DAG type constraints. This could also be
avoided by just changing the MachineInstr's arbitrary choice of
operand type from i16 to i32. Other patterns have nontrivial uses, but
this serves as the simplest example.
One flaw this still has is if you try to use an SDNodeXForm defined
for imm, but the source pattern uses timm, you still see the "Failed
to lookup instruction" assert. However, there is now a way to avoid
it.
Only PPC seems to be using it, and only checks some simple cases and
doesn't distinguish between FP. Just switch to using LLT to simplify
use from GlobalISel.
As an intermediate step, some TLI functions can be converted to using
LLT instead of MVT. Move this somewhere out of GlobalISel so DAG
functions can use these.
When these arguments are broken down by the EVT based callbacks, the
pointer information is lost. Hack around this by coercing the register
types to be the expected pointer element type when building the
remerge operations.
This should be legal, but will require future selection work. 16-bit
shift amounts were already removed from being legal, but this didn't
adjust the transformation rules.
Summary:
In commit b91f239485 I updated the
MipsDelaySlotFiller to skip BUNDLE instructions.
However, in addition to not considering BUNDLE instructions for the delay
slot, we also need to ensure that the register def-use information is
updated. Not updating this information caused run-time crashes (when using
the out-of-tree CHERI backend) since later definitions could be overwritten
with earlier register values.
Reviewers: atanasyan
Reviewed By: atanasyan
Differential Revision: https://reviews.llvm.org/D72254
This adds support for selecting a large chunk of the load/store *roW patterns.
This is pretty much a straight port of AArch64DAGToDAGISel::SelectAddrModeWRO
into GISel. The code is very similar to the XRO code. The main difference is
that in the *roW patterns, we want to try and fold in an extend, and *possibly*
a shift along with it. A good portion of this patch is refactoring the existing
XRO code.
- Add selectAddrModeWRO
- Factor out the code from selectAddrModeShiftedExtendXReg which is used by both
selectAddrModeXRO and selectAddrModeWRO into selectExtendedSHL.
This is similar to the function of the same name in AArch64DAGToDAGISel.
- Add support for extends to the factored out code in selectExtendedSHL.
- Teach getExtendTypeForInst how to handle AND masks that are intended to be
used in loads/stores (necessary for this addressing mode.)
- Make getExtendTypeForInst not static because moving it made an annoying diff
and I wanted to have the WRO/XRO functions close to each other while I was
writing the code.
Differential Revision: https://reviews.llvm.org/D72426
Summary:
Extend D71677 to apply to all branch-target operands, rather than special-casing call instructions.
Also add a regression test for llvm.org/PR44272, since this finishes fixing it.
Reviewers: thakis, rnk
Reviewed By: thakis
Subscribers: merge_guards_bot, hiraditya, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D72417
The patch gives out the details of the znver2 scheduler model.
There are few improvements with respect to execution units, latencies and
throughput when compared with znver1.
The tests that were present for znver1 for llvm-mca tool were replicated.
The latencies, execution units, timeline and throughput information are updated for znver2.
Reviewers: craig.topper, Simon Pilgrim
Differential Revision: https://reviews.llvm.org/D66088