Commit Graph

4742 Commits

Author SHA1 Message Date
Florian Hahn 77fd12a66e
[AArch64] Add aarch64_neon_vcmla{_rot{90,180,270}} intrinsics.
Add builtins required to implement vcmla and rotated variants from
the ACLE

Reviewed By: t.p.northover

Differential Revision: https://reviews.llvm.org/D92929
2020-12-09 19:46:49 +00:00
Kerry McLaughlin 05edfc5475 [SVE][CodeGen] Add DAG combines for s/zext_masked_gather
This patch adds the following DAGCombines, which apply if isVectorLoadExtDesirable() returns true:
 - fold (and (masked_gather x)) -> (zext_masked_gather x)
 - fold (sext_inreg (masked_gather x)) -> (sext_masked_gather x)

LowerMGATHER has also been updated to fetch the LoadExtType associated with the
gather and also use this value to determine the correct masked gather opcode to use.

Reviewed By: sdesmalen

Differential Revision: https://reviews.llvm.org/D92230
2020-12-09 11:53:19 +00:00
Tim Northover 45de42116e AArch64: use correct operand for ubsantrap immediate.
I accidentally pushed the wrong patch originally.
2020-12-09 10:17:16 +00:00
Jessica Paquette 40d1fb2229 [AArch64][GlobalISel] Swap select operands when inverting condition code
This was not obvious when reading the imported tablegen patterns in
AArch64GenDAGISel.

Update select-select.mir.
2020-12-08 14:17:26 -08:00
Jessica Paquette 21308c2b4c [AArch64][GlobalISel] Check if G_SELECT has been optimized when folding binops
`TryFoldBinOpIntoSelect` didn't have a check for `Optimized`, meaning you could
end up folding twice. (e.g. a select with a G_ADD on the true side, and a G_SUB
on the false side)

Add in the missing `if` and a test.
2020-12-08 13:47:08 -08:00
Florian Hahn 4c69b1b98a
[AArch64] Fix rottype use in complex instr defs.
It seems like the order here is wrong. Types like i32 do not take any
arguments.

Currently this is not a problem, because the patterns are not actually
used with any nodes, but will fail once it is used with real ISD nodes.

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D91345
2020-12-08 21:11:33 +00:00
Jessica Paquette 5b5d3fa9d9 [AArch64][GlobalISel] Fold G_SELECT cc, %t, (G_ADD %x, 1) -> CSINC %t, %x, cc
This implements

```
G_SELECT cc, %true, (G_ADD %x, 1) -> CSINC %true, %x, cc
G_SELECT cc, (G_ADD %x, 1), %false -> CSINC %x, %false, inv_cc
```

Godbolt example: https://godbolt.org/z/eoPqKq

Differential Revision: https://reviews.llvm.org/D92868
2020-12-08 10:53:37 -08:00
Jessica Paquette cd9a52b99e [AArch64][GlobalISel] Fold binops on the true side of G_SELECT
This implements the following folds:

```
G_SELECT cc, (G_SUB 0, %x), %false -> CSNEG %x, %false, inv_cc
G_SELECT cc, (G_XOR x, -1), %false -> CSINV %x, %false, inv_cc
```

This is similar to the folds introduced in
5bc0bd05e6.

In 5bc0bd05e6 I mentioned that we may prefer to do
this in AArch64PostLegalizerLowering.

I think that it's probably better to do this in the selector. The way we select
G_SELECT depends on what register banks end up being assigned to it. If we did
this in AArch64PostLegalizerLowering, then we'd end up checking *every* G_SELECT
to see if it's worth swapping operands. Doing it in the selector allows us to
restrict the optimization to only relevant G_SELECTs.

Also fix up some comments in `TryFoldBinOpIntoSelect` which are kind of
confusing IMO.

Example IR: https://godbolt.org/z/3qPGca

Differential Revision: https://reviews.llvm.org/D92860
2020-12-08 10:42:59 -08:00
Jessica Paquette ce199667f6 [AArch64][GlobalISel] Don't explicitly write to the zero register in emitCMN
This case was missed in 78ccb0359d.

Differential Revision: https://reviews.llvm.org/D92438
2020-12-08 10:42:05 -08:00
Huihui Zhang 8e6fc1f97e [AArch64][SVE] Add lowering for llvm.maxnum|minnum for scalable type.
LLVM intrinsic llvm.maxnum|minnum is overloaded intrinsic, can be used on any
floating-point or vector of floating-point type.
This patch extends current infrastructure to support scalable vector type.

This patch also fix a warning message of incorrect use of EVT::getVectorNumElements()
for scalable type, when DAGCombiner trying to split scalable vector.

Reviewed By: sdesmalen

Differential Revision: https://reviews.llvm.org/D92607
2020-12-08 09:35:53 -08:00
Jessica Paquette b15491eb33 [AArch64][GlobalISel] Select G_SADDO and G_SSUBO
We didn't have selector support for these.

Selection code is similar to `getAArch64XALUOOp` in AArch64ISelLowering. Similar
to that code, this returns the AArch64CC and the instruction produced. In SDAG,
this is used to optimize select + overflow and condition branch + overflow
pairs. (See `AArch64TargetLowering::LowerBR_CC` and
`AArch64TargetLowering::LowerSelect`)

(G_USUBO should be easy to add here, but it isn't legalized right now.)

This also factors out the existing G_UADDO selection code, and removes an
unnecessary check for s32/s64. AFAIK, we shouldn't ever get anything other than
s32/s64. It makes more sense for this to be handled by the type assertion in
`emitAddSub`.

Differential Revision: https://reviews.llvm.org/D92610
2020-12-08 09:18:28 -08:00
David Sherwood 59f17b57d9 [SVE] Fix crashes with inline assembly
All the crashes found compiling inline assembly are fixed in this
patch by changing AArch64TargetLowering::getRegForInlineAsmConstraint
to be more resilient to mismatched value and register types. For
example, it makes no sense to request a predicate register for
a nxv2i64 type and so on.

Tests have been added here:

  test/CodeGen/AArch64/inline-asm-constraints-bad-sve.ll

Differential Revision: https://reviews.llvm.org/D92554
2020-12-08 13:48:43 +00:00
Tim Northover c5978f42ec UBSAN: emit distinctive traps
Sometimes people get minimal crash reports after a UBSAN incident. This change
tags each trap with an integer representing the kind of failure encountered,
which can aid in tracking down the root cause of the problem.
2020-12-08 10:28:26 +00:00
Jessica Paquette d49f6491b6 [AArch64][GlobalISel] Refactor G_BRCOND selection
`selectCompareBranch` was hard to understand.

Also, it was being needlessly pessimistic with the `ProduceNonFlagSettingCondBr`
case. It assumed that everything in `selectCompareBranch` would emit a TB(N)Z
or C(B)NZ. That's not true; the G_FCMP + G_BRCOND case would never emit those
instructions, and the G_ICMP + G_BRCOND case was capable of emitting an integer
compare + Bcc.

- Refactor `selectCompareBranch` into separate functions based off of what is
feeding the G_BRCOND's condition.

- Move G_BRCOND selection code from `select` to `selectCompareBranch`.

- Remove duplicated constraint code from the code originally in `select`;
  `emitTestBit` already handles that, so no need to constrain twice.

- Factor out the G_FCMP + G_BRCOND case into `selectCompareBranchFedByFCmp`.

- Split the G_ICMP + G_BRCOND case into an optimization function,
`tryOptCompareBranchFedByICmp` and a general selection function,
`selectCompareBranchFedByICmp`.

- Reduce the number of things passed to `tryOptAndIntoCompareBranch`.

- Improve documentation.

- Give some variables more descriptive names.

Other than improving the code generation for functions with
speculative_load_hardening by getting the logic correct, this is NFC.

Differential Revision: https://reviews.llvm.org/D92582
2020-12-07 17:24:23 -08:00
Jessica Paquette 195a7af0ab [AArch64][GlobalISel] Narrow 128-bit regs to 64-bit regs in emitTestBit
When we have a 128-bit register, emitTestBit would incorrectly narrow to 32
bits always. If the bit number was > 32, then we would need a TB(N)ZX. This
would cause a crash, as we'd have the wrong register class. (PR48379)

This generalizes `narrowExtReg` into `moveScalarRegClass`.

This also allows us to remove `widenGPRBankRegIfNeeded` entirely, since
`selectCopy` correctly handles SUBREG_TO_REG etc.

This does create some codegen changes (since `selectCopy` uses the `all`
regclass variants). However, I think that these will likely be optimized away,
and we can always improve the `selectCopy` code. It looks like we should
revisit `selectCopy` at this point, and possibly refactor it into at least one
`emit` function.

Differential Revision: https://reviews.llvm.org/D92707
2020-12-07 15:04:33 -08:00
Amara Emerson 2ac4d0f45a [AArch64] Fix some minor coding style issues in AArch64CompressJumpTables 2020-12-07 12:48:09 -08:00
Kerry McLaughlin 111f559bbd [SVE][CodeGen] Call refineIndexType & refineUniformBase from visitMGATHER
The refineIndexType & refineUniformBase functions added by D90942 can also be used to
improve CodeGen of masked gathers.

These changes were split out from D91092

Reviewed By: sdesmalen

Differential Revision: https://reviews.llvm.org/D92319
2020-12-07 13:20:19 +00:00
Kerry McLaughlin f6dd32fd35 [SVE][CodeGen] Lower scalable masked gathers
Lowers the llvm.masked.gather intrinsics (scalar plus vector addressing mode only)

Changes in this patch:
- Add custom lowering for MGATHER, using getGatherVecOpcode() to choose the appropriate
  gather load opcode to use.
- Improve codegen with refineIndexType/refineUniformBase, added in D90942
- Tests added for gather loads with 32 & 64-bit scaled & unscaled offsets.

Reviewed By: sdesmalen

Differential Revision: https://reviews.llvm.org/D91092
2020-12-07 12:20:41 +00:00
Craig Topper c55d9af8c0 [AArch64] Add custom lowering for ISD::ABS
Instead of trying to pattern match the code produced by ISD::ABS expansion, just custom legalize ISD::ABS to the desired sequence.

The one test change is because a DAG combine for (neg (abs)) is no longer firing because ISD::ABS is now Custom instead of Expand.

Differential Revision: https://reviews.llvm.org/D92154
2020-12-04 10:45:31 -08:00
Ahmed Bougacha f77c948d56 [Triple][MachO] Define "arm64e", an AArch64 subarch for Pointer Auth.
This also teaches MachO writers/readers about the MachO cpu subtype,
beyond the minimal subtype reader support present at the moment.

This also defines a preprocessor macro to allow users to distinguish
__arm64__ from __arm64e__.

arm64e defaults to an "apple-a12" CPU, which supports v8.3a, allowing
pointer-authentication codegen.
It also currently defaults to ios14 and macos11.

Differential Revision: https://reviews.llvm.org/D87095
2020-12-03 07:53:59 -08:00
Jessica Paquette c82f002cea [AArch64][GlobalISel] Don't write to WZR in non-flag-setting G_BRCOND case
We are avoiding writing to WZR just about everywhere else.

Also update the code to use MachineIRBuilder for the sake of consistency.

We also didn't have a GlobalISel testcase for this path, so add a simple one
now.

Differential Revision: https://reviews.llvm.org/D90626
2020-12-01 16:45:37 -08:00
Jessica Paquette 6c3fa97d8a [AArch64][GlobalISel] Select Bcc when it's better than TB(N)Z
Instead of falling back to selecting TB(N)Z when we fail to select an
optimized compare against 0, select Bcc instead.

Also simplify selectCompareBranch a little while we're here, because the logic
was kind of hard to follow.

At -O0, this is a 0.1% geomean code size improvement for CTMark.

A simple example of where this can kick in is here:
https://godbolt.org/z/4rra6P

In the example above, GlobalISel currently produces a subs, cset, and tbnz.
SelectionDAG, on the other hand, just emits a compare and b.le.

Differential Revision: https://reviews.llvm.org/D92358
2020-12-01 15:45:14 -08:00
Amara Emerson 87ff156414 [AArch64][GlobalISel] Fix crash during legalization of a vector G_SELECT with scalar mask.
The lowering of vector selects needs to first splat the scalar mask into a vector
first.

This was causing a crash when building oggenc in the test suite.

Differential Revision: https://reviews.llvm.org/D91655
2020-11-30 16:37:49 -08:00
Sjoerd Meijer 630d37dc1b [AArch64] Enable Cortex-A55 schedmodel
The model was committed in 4b8ade837e
but not yet enabled to allow for a few fix ups. This adds a few
of these fixes, and also a LLVM MCA test to check most instructions.
While I do have plans to look into some more tuning, it's time to
enable this as it better than using the A53 schedule.

Differential Revision: https://reviews.llvm.org/D88017
2020-11-30 19:28:34 +00:00
Sjoerd Meijer 5110ff0817 [AArch64][CostModel] Fix cost for mul <2 x i64>
This was modeled to have a cost of 1, but since we do not have a MUL.2d this is
scalarized into vector inserts/extracts and scalar muls.

Motivating precommitted test is test/Transforms/SLPVectorizer/AArch64/mul.ll,
which we don't want to SLP vectorize.

Test Transforms/LoopVectorize/AArch64/extractvalue-no-scalarization-required.ll
unfortunately needed changing, but the reason is documented in
LoopVectorize.cpp:6855:

  // The cost of executing VF copies of the scalar instruction. This opcode
  // is unknown. Assume that it is the same as 'mul'.

which I will address next as a follow up of this.

Differential Revision: https://reviews.llvm.org/D92208
2020-11-30 11:36:55 +00:00
Simon Pilgrim 969918e177 [DAG] Legalize umin(x,y) -> sub(x,usubsat(x,y)) and umax(x,y) -> add(x,usubsat(y,x)) iff usubsat is legal
If usubsat() is legal, this is likely to result in smaller codegen expansion than the default cmp+select codegen expansion.

Allows us to move the x86-specific lowering to the generic expansion code.

Differential Revision: https://reviews.llvm.org/D92183
2020-11-27 11:18:58 +00:00
Mark Murray 2b6691894a [ARM][AArch64] Adding Neoverse N2 CPU support
Add support for the Neoverse N2 CPU to the ARM and AArch64 backends.

Differential Revision: https://reviews.llvm.org/D91695
2020-11-25 11:42:54 +00:00
Kerry McLaughlin 603d40da9d [SVE][CodeGen] Add a DAG combine to extend mscatter indices
This patch adds a target-specific DAG combine for mscatter to promote indices
with element types i8 or i16 before legalisation, plus various tests with illegal types.

Reviewed By: sdesmalen

Differential Revision: https://reviews.llvm.org/D90945
2020-11-25 11:18:22 +00:00
Amara Emerson ca7fdf7ce0 [AArch64][GlobalISel] Add pre-isel lowering to convert p0 G_DUPs to use s64.
This uses the same reasoning as other similar conversions just before selection,
without it we miss out on selection because the importer considers s64 and p0
distinct types.
2020-11-23 22:59:35 -08:00
Amara Emerson 0fb76b9035 [AArch64][GlobalISel] Make <2 x p0> of G_SHUFFLE_VECTOR legal. 2020-11-23 22:59:35 -08:00
Martin Storsjö 6f792041a5 Reapply "[CodeGen] [WinException] Only produce handler data at the end of the function if needed"
This reapplies 36c64af9d7 in updated
form.

Emit the xdata for each function at .seh_endproc. This keeps the
exact same output header order for most code generated by the LLVM
CodeGen layer. (Sections still change order for code built from
assembly where functions lack an explicit .seh_handlerdata
directive, and functions with chained unwind info.)

The practical effect should be that assembly output lacks
superfluous ".seh_handlerdata; .text" pairs at the end of functions
that don't handle exceptions, which allows such functions to use
the AArch64 packed unwind format again.

Differential Revision: https://reviews.llvm.org/D87448
2020-11-23 23:17:03 +02:00
Craig Topper 4252f7773a [SelectionDAG][ARM][AArch64][Hexagon][RISCV][X86] Add SDNPCommutative to fma and fmad nodes in tablegen. Remove explicit commuted patterns from targets.
X86 was already specially marking fma as commutable which allowed
tablegen to autogenerate commuted patterns. This moves it to the target
independent definition and fix up the targets to remove now
unneeded patterns.

Unfortunately, the tests change because the commuted version of
the patterns are generating operands in a different than the
explicit patterns.

Differential Revision: https://reviews.llvm.org/D91842
2020-11-23 10:09:20 -08:00
Jay Foad 000400ca0a Fix speling in comments. NFC. 2020-11-23 14:43:24 +00:00
Ella Ma 1756d67934 [llvm][clang][mlir] Add checks for the return values from Target::createXXX to prevent protential null deref
All these potential null pointer dereferences are reported by my static analyzer for null smart pointer dereferences, which has a different implementation from `alpha.cplusplus.SmartPtr`.

The checked pointers in this patch are initialized by Target::createXXX functions. When the creator function pointer is not correctly set, a null pointer will be returned, or the creator function may originally return a null pointer.

Some of them may not make sense as they may be checked before entering the function, but I fixed them all in this patch. I submit this fix because 1) similar checks are found in some other places in the LLVM codebase for the same return value of the function; and, 2) some of the pointers are dereferenced before they are checked, which may definitely trigger a null pointer dereference if the return value is nullptr.

Reviewed By: tejohnson, MaskRay, jpienaar

Differential Revision: https://reviews.llvm.org/D91410
2020-11-21 21:04:12 -08:00
Amara Emerson c58df88886 [AArch64][GlobalISel] Make G_EXTRACT_VECTOR_ELT of <2 x p0> legal.
Also fix a selection issue for this which was using LLT::isScalar() when it
should have been using !isVector(), add test for that too.
2020-11-20 14:07:45 -08:00
Sjoerd Meijer 412237dcd0 [AArch64] Enable post RA scheduler for Cortex-R82
Just something I forgot when I added the R82. Need to have a look
at crypto and fusing, but will do that as a follow up.

Differential Revision: https://reviews.llvm.org/D91848
2020-11-20 14:04:26 +00:00
Pavel Iliin 4d7df43ffd [AArch64] Out-of-line atomics (-moutline-atomics) implementation.
This patch implements out of line atomics for LSE deployment
mechanism. Details how it works can be found in llvm/docs/Atomics.rst
Options -moutline-atomics and -mno-outline-atomics to enable and disable it
were added to clang driver. This is clang and llvm part of out-of-line atomics
interface, library part is already supported by libgcc. Compiler-rt
support is provided in separate patch.

Differential Revision: https://reviews.llvm.org/D91157
2020-11-20 13:30:12 +00:00
Adhemerval Zanella 807320119f [AArch64] Lower fptrunc/fpext from/to FP128t to/from FP16
The compiler-rt part which adds the emitted symbols is handled in
a subsequent patch.

Differential Revision: https://reviews.llvm.org/D91731
2020-11-19 15:14:50 -03:00
Florian Hahn b2f4c5fddc
[AsmWriter] Factor out mnemonic generation to accessible getMnemonic.
This patch factors out the part of printInstruction that gets the
mnemonic string for a given MCInst. This is intended to be used
subsequently for the instruction-mix remarks to display the final
mnemonic (D90040).

Unfortunately making `getMnemonic` available to the AsmPrinter
seems to require making it virtual. Not sure if there's a way around
that with the current layering of the AsmPrinters.

Reviewed By: Paul-C-Anagnostopoulos

Differential Revision: https://reviews.llvm.org/D90039
2020-11-17 09:47:38 +00:00
Jessica Paquette 5bc0bd05e6 [AArch64][GlobalISel] Fold G_XOR x, -1 into G_SELECT and select CSINV
When we see

```
xor = G_XOR xor_lhs, -1
select = G_SELECT cc, tval, xor
```

Fold this into

```
select = CSINV tval, xor_lhs, cc
```

Update select-select.mir to reflect the changes.

For now, only handle the case where the G_XOR is the false-value for the
G_SELECT. It may make more sense to handle the true-value case in post-legalizer
lowering.

Differential Revision: https://reviews.llvm.org/D90774
2020-11-16 14:14:14 -08:00
Amara Emerson 0b6090699a [AArch64][GlobalISel] Look through a G_ZEXT when trying to match shift-extended register offsets.
The G_ZEXT in these cases seems to actually come from a combine that we do but
SelectionDAG doesn't. Looking through it allows us to match "uxtw #2" addressing
modes.

Differential Revision: https://reviews.llvm.org/D91475
2020-11-16 10:50:46 -08:00
Caroline Concatto 6c4d8f4651 [AArch64] Add check for widening instruction for SVE.
This patch fixes the function isWideningInstruction for scalable vectors.
Now the cost model can check the widening pattern for SVE.

Differential Revision: https://reviews.llvm.org/D91260
2020-11-16 12:30:08 +00:00
Jessica Paquette 9a8bfe3835 [AArch64][GlobalISel] Select G_SELECT cc, t, (G_SUB 0, x) -> CSNEG t, x, cc
When we see

```
%sub = G_SUB 0, %x
%select = G_SELECT %cc, %t, %sub
```

Fold away the G_SUB by producing

```
%select = CSNEG %t, %x, cc
```

Simple IR example: https://godbolt.org/z/K8TEnh

This is valid on both sides of the select, but for now, just handle one side.
It may make more sense to handle swapping sides during post-legalizer lowering.

Differential Revision: https://reviews.llvm.org/D90723
2020-11-13 10:12:51 -08:00
Jessica Paquette 6c20c1da1e [AArch64][GlobalISel] NFC: Use CmpInst::isUnsigned instead of static helper
Reducing some code duplication.

We had a helper for checking if a predicate is unsigned. Remove that and use
the existing function in Instructions.cpp.

Differential Revision: https://reviews.llvm.org/D91288
2020-11-13 09:35:42 -08:00
Jessica Paquette b184a2eccf [GlobalISel] Add matchers for specific constants and a matcher for negations
It's fairly common to need matchers for a specific constant value, or for
common idioms like finding a negated register.

Add

- `m_SpecificICst`, which returns true when matching a specific value..
- `m_ZeroInt`, which returns true when an integer 0 is matched.
- `m_Neg`, which returns when a register is negated.

Also update a few places which use idioms related to the new matchers.

Differential Revision: https://reviews.llvm.org/D91397
2020-11-13 09:24:54 -08:00
Kerry McLaughlin 306c8ab208 [SVE][CodeGen] Improve codegen of scalable masked scatters
If the scatter store is able to perform the sign/zero extend of
its index, this is folded into the instruction with refineIndexType().
Additionally, refineUniformBase() will return the base pointer and index
from an add + splat_vector.

Reviewed By: sdesmalen

Differential Revision: https://reviews.llvm.org/D90942
2020-11-13 11:19:36 +00:00
serge-sans-paille 9218ff50f9 llvmbuildectomy - replace llvm-build by plain cmake
No longer rely on an external tool to build the llvm component layout.

Instead, leverage the existing `add_llvm_componentlibrary` cmake function and
introduce `add_llvm_component_group` to accurately describe component behavior.

These function store extra properties in the created targets. These properties
are processed once all components are defined to resolve library dependencies
and produce the header expected by llvm-config.

Differential Revision: https://reviews.llvm.org/D90848
2020-11-13 10:35:24 +01:00
Jessica Paquette d0ba6c4002 [AArch64][GlobalISel] Select CSINC and CSINV for G_SELECT with constants
Select the following:

- G_SELECT cc, 0, 1 -> CSINC zreg, zreg, cc
- G_SELECT cc 0, -1 -> CSINV zreg, zreg cc
- G_SELECT cc, 1, f -> CSINC f, zreg, inv_cc
- G_SELECT cc, -1, f -> CSINV f, zreg, inv_cc
- G_SELECT cc, t, 1 -> CSINC t, zreg, cc
- G_SELECT cc, t, -1 -> CSINC t, zreg, cc

(IR example: https://godbolt.org/z/YfPna9)

These correspond to a bunch of the AArch64csel patterns in AArch64InstrInfo.td.

Unfortunately, it doesn't seem like we can import patterns that use NZCV like
those ones do. E.g.

```
def : Pat<(AArch64csel GPR32:$tval, (i32 1), (i32 imm:$cc), NZCV),
          (CSINCWr GPR32:$tval, WZR, (i32 imm:$cc))>;
```

So we have to manually select these for now.

This replaces `selectSelectOpc` with an `emitSelect` function, which performs
these optimizations.

Differential Revision: https://reviews.llvm.org/D90701
2020-11-12 14:44:01 -08:00
David Sherwood 3225fcf11e [SVE] Deal with SVE tuple call arguments correctly when running out of registers
When passing SVE types as arguments to function calls we can run
out of hardware SVE registers. This is normally fine, since we
switch to an indirect mode where we pass a pointer to a SVE stack
object in a GPR. However, if we switch over part-way through
processing a SVE tuple then part of it will be in registers and
the other part will be on the stack.

I've fixed this by ensuring that:

1. When we don't have enough registers to allocate the whole block
   we mark any remaining SVE registers temporarily as allocated.
2. We temporarily remove the InConsecutiveRegs flags from the last
   tuple part argument and reinvoke the autogenerated calling
   convention handler. Doing this prevents the code from entering
   an infinite recursion and, in combination with 1), ensures we
   switch over to the Indirect mode.
3. After allocating a GPR register for the pointer to the tuple we
   then deallocate any SVE registers we marked as allocated in 1).
   We also set the InConsecutiveRegs flags back how they were before.
4. I've changed the AArch64ISelLowering LowerCALL and
   LowerFormalArguments functions to detect the start of a tuple,
   which involves allocating a single stack object and doing the
   correct numbers of legal loads and stores.

Differential Revision: https://reviews.llvm.org/D90219
2020-11-12 08:41:50 +00:00
Amara Emerson ad376657c1 [AArch64][GlobalISel] Optimize G_PTR_ADD with a negated offset to be a G_SUB. 2020-11-11 22:46:53 -08:00