Commit Graph

868 Commits

Author SHA1 Message Date
Igor Kudrin ddbe812bcc [ARM][llvm-objdump] Annotate PC-relative memory operands
This implements `MCInstrAnalysis::evaluateMemoryOperandAddress()` for
Arm so that the disassembler can print the target address of memory
operands that use PC+immediate addressing.

Differential Revision: https://reviews.llvm.org/D105979
2021-08-05 14:11:11 +07:00
Daniel Egger 98c2e4115d [ARM] Add lowering of uadd_sat to uq{add|sub}8 and uq{add|sub}16
This follow the lead of https://reviews.llvm.org/D68974 to add lowering
of unsigned saturated addition/subtraction.

Differential Revision: https://reviews.llvm.org/D105413
2021-07-11 15:58:11 +01:00
David Green bee2f618d5 [ARM] Introduce t2WhileLoopStartTP
This adds t2WhileLoopStartTP, similar to the t2DoLoopStartTP added in
D90591. It keeps a reference to both the tripcount register and the
element count register, so that the ARMLowOverheadLoops pass in the
backend can pick the correct one without having to search for it from
the operand of a VCTP.

Differential Revision: https://reviews.llvm.org/D103236
2021-06-13 13:55:34 +01:00
Ryan Prichard 65d0264ba2 [MC][ARM] Reject Thumb "ror rX, #0"
The ROR instruction can only handle immediates between 1 and 31. The
would-be encoding for ROR #0 is actually the RRX instruction.

Reviewed By: nickdesaulniers

Differential Revision: https://reviews.llvm.org/D102455
2021-05-19 15:05:39 -07:00
David Green 1011d4ed60 [ARM] Constrain CMPZ shift combine to a single use
We currently prefer t2CMPrs over t2CMPri when the node contains a shift.
This can introduce more nodes if the shift has multiple uses though, as
value from the shift will be needed anyway, and in the case of a t2CMPri
compared with zero will more readily be removed entirely.

Differential Revision: https://reviews.llvm.org/D101688
2021-05-13 18:31:01 +01:00
Nick Desaulniers 52338af569 [MC][ARM] add .w suffixes for RSB/RSBS T1
See also:
F5.1.167 RSB, RSBS (register) T1 shift or rotate by value variant
of the Arm ARM.

Link: https://github.com/ClangBuiltLinux/linux/issues/1309

Reviewed By: DavidSpickett

Differential Revision: https://reviews.llvm.org/D99542
2021-04-01 10:45:37 -07:00
Nick Desaulniers 1addc231cd [MC][ARM] add .w suffixes for ORN/ORNS T1
See also:
F5.1.128 ORN, ORNS (register) T1 shift or rotate by value variant
of the Arm ARM.

Link: https://github.com/ClangBuiltLinux/linux/issues/1309

Reviewed By: DavidSpickett

Differential Revision: https://reviews.llvm.org/D99538
2021-04-01 10:27:09 -07:00
David Green bd516d24c1 [ARM] Move t2DoLoopStart reg alloc hint
This adjusts the place that the t2DoLoopStart reg allocation hint is
inserted, adding it in the ARMTPAndVPTOptimizaionPass in a similar place
as other tail predicated loop optimizations. This removes the need for
doing so in a custom inserter, and should make the hint more accurate,
only adding it where we expect to create a DLS (not DLSTP or WLS).
2021-03-11 17:56:19 +00:00
David Green fad70c3068 [ARM] Improve WLS lowering
Recently we improved the lowering of low overhead loops and tail
predicated loops, but concentrated first on the DLS do style loops. This
extends those improvements over to the WLS while loops, improving the
chance of lowering them successfully. To do this the lowering has to
change a little as the instructions are terminators that produce a value
- something that needs to be treated carefully.

Lowering starts at the Hardware Loop pass, inserting a new
llvm.test.start.loop.iterations that produces both an i1 to control the
loop entry and an i32 similar to the llvm.start.loop.iterations
intrinsic added for do loops. This feeds into the loop phi, properly
gluing the values together:

  %wls = call { i32, i1 } @llvm.test.start.loop.iterations.i32(i32 %div)
  %wls0 = extractvalue { i32, i1 } %wls, 0
  %wls1 = extractvalue { i32, i1 } %wls, 1
  br i1 %wls1, label %loop.ph, label %loop.exit
...
loop:
  %lsr.iv = phi i32 [ %wls0, %loop.ph ], [ %iv.next, %loop ]
  ..
  %iv.next = call i32 @llvm.loop.decrement.reg.i32(i32 %lsr.iv, i32 1)
  %cmp = icmp ne i32 %iv.next, 0
  br i1 %cmp, label %loop, label %loop.exit

The llvm.test.start.loop.iterations need to be lowered through ISel
lowering as a pair of WLS and WLSSETUP nodes, which each get converted
to t2WhileLoopSetup and t2WhileLoopStart Pseudos. This helps prevent
t2WhileLoopStart from being a terminator that produces a value,
something difficult to control at that stage in the pipeline. Instead
the t2WhileLoopSetup produces the value of LR (essentially acting as a
lr = subs rn, 0), t2WhileLoopStart consumes that lr value (the Bcc).

These are then converted into a single t2WhileLoopStartLR at the same
point as t2DoLoopStartTP and t2LoopEndDec. Otherwise we revert the loop
to prevent them from progressing further in the pipeline. The
t2WhileLoopStartLR is a single instruction that takes a GPR and produces
LR, similar to the WLS instruction.

  %1:gprlr = t2WhileLoopStartLR %0:rgpr, %bb.3
  t2B %bb.1
...
bb.2.loop:
  %2:gprlr = PHI %1:gprlr, %bb.1, %3:gprlr, %bb.2
  ...
  %3:gprlr = t2LoopEndDec %2:gprlr, %bb.2
  t2B %bb.3

The t2WhileLoopStartLR can then be treated similar to the other low
overhead loop pseudos, eventually being lowered to a WLS providing the
branches are within range.

Differential Revision: https://reviews.llvm.org/D97729
2021-03-11 17:56:19 +00:00
David Green 438c98515c [ARM] Use 0, not ZR during ISel for CSINC/INV/NEG
Instead of converting the 0 into a ZR reg during lowering, do that with
tablegen by matching the zero immediate. This when combined with other
optimizations is more likely to use ZR and helps keep the DAG more
easily optimizable. It should not otherwise effect code generation.
2021-03-02 19:01:14 +00:00
Nick Desaulniers 404843a94d [MC][ARM] add .w suffixes for BL (T1) and DBG
F1.2 Standard assembler syntax fields
describes .w and .n suffixes for wide and narrow encodings.

arch/arm/probes/kprobes/test-thumb.c tests installing kprobes for
certain instructions using inline asm.  There's a few instructions we
fail to assemble due to missing .w t2InstAliases.

Adds .w suffixes for:
* bl  (F5.1.25 BL, BLX (immediate) T1)
* dbg (F5.1.42 DBG T1)

Reviewed By: DavidSpickett

Differential Revision: https://reviews.llvm.org/D97236
2021-02-24 09:58:08 -08:00
Nick Desaulniers 1e204ac789 [THUMB2] add .w suffixes for ldr/str (immediate) T4
The Linux kernel when built with CONFIG_THUMB2_KERNEL makes use of these
instructions with immediate operands and wide encodings.

These are the T4 variants of the follow sections from the Arm ARM.
F5.1.72 LDR (immediate)
F5.1.229 STR (immediate)

I wasn't able to represent these simple aliases using t2InstAlias due to
the Constraints on the non-suffixed existing instructions, which results
in some manual parsing logic needing to be added.

F1.2 Standard assembler syntax fields
describes the use of the .w (wide) vs .n (narrow) encoding suffix.

Link: https://bugs.llvm.org/show_bug.cgi?id=49118
Link: https://github.com/ClangBuiltLinux/linux/issues/1296
Reported-by: Stefan Agner <stefan@agner.ch>
Reported-by: Arnd Bergmann <arnd@kernel.org>
Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>

Reviewed By: DavidSpickett

Differential Revision: https://reviews.llvm.org/D96632
2021-02-23 09:25:40 -08:00
Nick Desaulniers 68945a8686 [Thumb2] support `movs pc, lr` alias for `subs pc, lr, #0`/`eret`
This is used by the Linux kernel built with CONFIG_THUMB2_KERNEL.

Because different operands are not permitted to `movs`, the diagnostics now provide multiple suggestions along the lines of using a non-pc destination operand or lr source operand.

Forked from D95586.

Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>

Reviewed By: DavidSpickett

Differential Revision: https://reviews.llvm.org/D96304
2021-02-10 11:00:42 -08:00
Zhuojia Shen 8cef45517e [ARM] Fix STRT/STRHT/STRBT input/output operands.
STRT, STRHT, and STRBT are store instructions and their source register
$Rt should be treated as an input operand instead of an output operand.
This should fix things (e.g., liveness tracking in LivePhysRegs) if
these instructions were used in CodeGen.

Differential Revision: https://reviews.llvm.org/D95074
2021-01-26 14:00:58 -08:00
Kristof Beyls 320fd3314e [ARM] Implement harden-sls-retbr for Thumb mode
The only non-trivial consideration in this patch is that the formation
of TBB/TBH instructions, which is done in the constant island pass, does
not understand the speculation barriers inserted by the SLSHardening
pass. As such, when harden-sls-retbr is enabled for a function, the
formation of TBB/TBH instructions in the constant island pass is
disabled.

Differential Revision: https://reviews.llvm.org/D92396
2020-12-19 12:32:47 +00:00
David Green 3f571be1c0 [ARM] Make t2DoLoopStartTP a terminator
Although this was something that I was hoping we would not have to do,
this patch makes t2DoLoopStartTP a terminator in order to keep it at the
end of it's block, so not allowing extra MVE instruction between it and
the end. With t2DoLoopStartTP's also starting tail predication regions,
it also marks them as having side effects. The t2DoLoopStart is still
not a terminator, giving it the extra scheduling freedom that can be
helpful, but now that we have a TP version they can be treated
differently.

Differential Revision: https://reviews.llvm.org/D91887
2020-12-11 09:23:57 +00:00
David Green 0447f3508f [ARM][RegAlloc] Add t2LoopEndDec
We currently have problems with the way that low overhead loops are
specified, with LR being spilled between the t2LoopDec and the t2LoopEnd
forcing the entire loop to be reverted late in the backend. As they will
eventually become a single instruction, this patch introduces a
t2LoopEndDec which is the combination of the two, combined before
registry allocation to make sure this does not fail.

Unfortunately this instruction is a terminator that produces a value
(and also branches - it only produces the value around the branching
edge). So this needs some adjustment to phi elimination and the register
allocator to make sure that we do not spill this LR def around the loop
(needing to put a spill after the terminator). We treat the loop very
carefully, making sure that there is nothing else like calls that would
break it's ability to use LR. For that, this adds a
isUnspillableTerminator to opt in the new behaviour.

There is a chance that this could cause problems, and so I have added an
escape option incase. But I have not seen any problems in the testing
that I've tried, and not reverting Low overhead loops is important for
our performance. If this does work then we can hopefully do the same for
t2WhileLoopStart and t2DoLoopStart instructions.

This patch also contains the code needed to convert or revert the
t2LoopEndDec in the backend (which just needs a subs; bne) and the code
pre-ra to create them.

Differential Revision: https://reviews.llvm.org/D91358
2020-12-10 12:14:23 +00:00
David Green 08d1c2d470 [ARM] Introduce t2DoLoopStartTP
This introduces a new pseudo instruction, almost identical to a
t2DoLoopStart but taking 2 parameters - the original loop iteration
count needed for a low overhead loop, plus the VCTP element count needed
for a DLSTP instruction setting up a tail predicated loop. The idea is
that the instruction holds both values and the backend
ARMLowOverheadLoops pass can pick between the two, depending on whether
it creates a tail predicated loop or falls back to a low overhead loop.

To do that there needs to be something that converts a t2DoLoopStart to
a t2DoLoopStartTP, for which this patch repurposes the
MVEVPTOptimisationsPass as a "tail predication and vpt optimisation"
pass. The extra operand for the t2DoLoopStartTP is chosen based on the
operands of VCTP's in the loop, and the instruction is moved as late in
the block as possible to attempt to increase the likelihood of making
tail predicated loops.

Differential Revision: https://reviews.llvm.org/D90591
2020-11-10 18:08:12 +00:00
David Green 73a6cd4b6b [ARM] Add a RegAllocHint for hinting t2DoLoopStart towards LR
This hints the operand of a t2DoLoopStart towards using LR, which can
help make it more likely to become t2DLS lr, lr. This makes it easier to
move if needed (as the input is the same as the output), or potentially
remove entirely.

The hint is added after others (from COPY's etc) which still take
precedence. It needed to find a place to add the hint, which currently
uses the post isel custom inserter.

Differential Revision: https://reviews.llvm.org/D89883
2020-11-10 16:28:57 +00:00
David Green b2ac9681a7 [ARM] Alter t2DoLoopStart to define lr
This changes the definition of t2DoLoopStart from
t2DoLoopStart rGPR
to
GPRlr = t2DoLoopStart rGPR

This will hopefully mean that low overhead loops are more tied together,
and we can more reliably generate loops without reverting or being at
the whims of the register allocator.

This is a fairly simple change in itself, but leads to a number of other
required alterations.

 - The hardware loop pass, if UsePhi is set, now generates loops of the
   form:
       %start = llvm.start.loop.iterations(%N)
     loop:
       %p = phi [%start], [%dec]
       %dec = llvm.loop.decrement.reg(%p, 1)
       %c = icmp ne %dec, 0
       br %c, loop, exit
 - For this a new llvm.start.loop.iterations intrinsic was added, identical
   to llvm.set.loop.iterations but produces a value as seen above, gluing
   the loop together more through def-use chains.
 - This new instrinsic conceptually produces the same output as input,
   which is taught to SCEV so that the checks in MVETailPredication are not
   affected.
 - Some minor changes are needed to the ARMLowOverheadLoop pass, but it has
   been left mostly as before. We should now more reliably be able to tell
   that the t2DoLoopStart is correct without having to prove it, but
   t2WhileLoopStart and tail-predicated loops will remain the same.
 - And all the tests have been updated. There are a lot of them!

This patch on it's own might cause more trouble that it helps, with more
tail-predicated loops being reverted, but some additional patches can
hopefully improve upon that to get to something that is better overall.

Differential Revision: https://reviews.llvm.org/D89881
2020-11-10 15:57:58 +00:00
Meera Nakrani 48c9e8244b [ARM] Removed hasSideEffects from signed/unsigned saturates
Removed hasSideEffects from SSAT and USAT so that they are no longer
marked as unpredictable.

Differential Revision: https://reviews.llvm.org/D88545
2020-10-01 14:55:01 +00:00
Meera Nakrani 675431b987 [ARM] Added more patterns to generate SSAT/USAT with shift
Added patterns to generate an SSAT or USAT with shift for
SSAT/USAT instructions that are matched from IR patterns.

Differential Revision: https://reviews.llvm.org/D88145
2020-09-28 14:50:19 +00:00
Meera Nakrani 20283ff491 [ARM] Generated SSAT and USAT instructions with shift
Added patterns so that both SSAT and USAT instructions are generated with shifts. Added corresponding regression tests.

Differential Review: https://reviews.llvm.org/D85120
2020-08-04 09:38:17 +00:00
David Green 146d35b6ee [ARM] CSEL generation
This adds a peephole optimisation to turn a t2MOVccr that could not be
folded into any other instruction into a CSEL on 8.1-m. The t2MOVccr
would usually be expanded into a conditional mov, that becomes an IT;
MOV pair. We can instead generate a CSEL instruction, which can
potentially be smaller and allows better register allocation freedom,
which can help reduce codesize. Performance is more variable and may
depend on the micrarchitecture details, but initial results look good.
If we need to control this per-cpu, we can add a subtarget feature as we
need it.

Original patch by David Penry.

Differential Revision: https://reviews.llvm.org/D83566
2020-07-16 11:10:53 +01:00
David Green d604cc6e9a [ARM] Mark more integer instructions as not having side effects.
LDRD and STRD along with UBFX and SBFX are selected from DAGToDAG
transforms, so do not have tblgen patterns. They don't get marked as
having side effects so cannot be scheduled as efficiently as you would
like.

This specifically marks then as not having side effects.

Differential Revision: https://reviews.llvm.org/D82358
2020-06-23 22:45:51 +01:00
Victor Campos c010d4d195 [ARM] Improve codegen of volatile load/store of i64
Summary:
Instead of generating two i32 instructions for each load or store of a volatile
i64 value (two LDRs or STRs), now emit LDRD/STRD.

These improvements cover architectures implementing ARMv5TE or Thumb-2.

The code generation explicitly deviates from using the register-offset
variant of LDRD/STRD. In this variant, the register allocated to the
register-offset cannot be reused in any of the remaining operands. Such
restriction seems to be non-trivial to implement in LLVM, thus it is
left as a to-do.

Differential Revision: https://reviews.llvm.org/D70072
2020-05-28 10:52:43 +01:00
Victor Campos 872ee78f65 Revert "[ARM] Improve codegen of volatile load/store of i64"
This reverts commit 8a12553223.

A bug has been found when generating code for Thumb2. In some very
specific cases, the prologue/epilogue emitter generates erroneous stack
offsets for the new LDRD instructions that access the stack.

This bug does not seem to be caused by the reverted patch though. Likely
the latter has made an undiscovered issue emerge in the
prologue/epilogue emission pass. Nevertheless, this reversion is
necessary since it is blocking users of the ARM backend.
2020-05-22 11:01:57 +01:00
Kazuaki Ishizaki 0312b9f550 [llvm] NFC: Fix trivial typo in rst and td files
Differential Revision: https://reviews.llvm.org/D77469
2020-04-23 14:26:32 +09:00
David Green 2c5f43f9dd [ARM] Fix qdadd operand order
qdadd is defined as sat(Rm + sat(2*Rn)). We had the Rm and Rn switched
the wrong way around.

Differential Revision: https://reviews.llvm.org/D77049
2020-03-31 10:11:36 +01:00
Victor Campos 8a12553223 [ARM] Improve codegen of volatile load/store of i64
Summary:
Instead of generating two i32 instructions for each load or store of a volatile
i64 value (two LDRs or STRs), now emit LDRD/STRD.

These improvements cover architectures implementing ARMv5TE or Thumb-2.

The code generation explicitly deviates from using the register-offset
variant of LDRD/STRD. In this variant, the register allocated to the
register-offset cannot be reused in any of the remaining operands. Such
restriction seems to be non-trivial to implement in LLVM, thus it is
left as a to-do.

Reviewers: dmgreen, efriedma, john.brawn, nickdesaulniers

Reviewed By: efriedma, nickdesaulniers

Subscribers: danielkiss, alanphipps, hans, nathanchance, nickdesaulniers, vvereschaka, kristof.beyls, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D70072
2020-03-11 10:19:27 +00:00
Stefan Agner 2f95d5f103 [ARM][Thumb2] support .w assembler qualifier for dmb/dsb/isb
Support the explicit wide assembler qualifier for the dmb/dsb/isb synchronization barrier instructions.

Differential revision: https://reviews.llvm.org/D75143
2020-02-28 11:08:24 +00:00
Stefan Agner b4207e705b [ARM][Thumb2] Support .w assembler qualifier for pld/pldw/pli
Accept explicit wide assembler qualifier for the pld/pldw/pli.

Differential revision: https://reviews.llvm.org/D75144
2020-02-28 11:08:24 +00:00
Sam Parker 965ba4291a Revert "[ARM] Add CPSR as an implicit use of t2IT"
This reverts commit e58229fded.

Differential Revision: https://reviews.llvm.org/D75186
2020-02-27 15:43:44 +00:00
Sam Parker e58229fded [ARM] Add CPSR as an implicit use of t2IT
This use is already attached to the BUNDLE instruction but is lost
after finalisation.

Differential Revision: https://reviews.llvm.org/D75186
2020-02-27 10:10:40 +00:00
Victor Campos af2a384581 Revert "[ARM] Improve codegen of volatile load/store of i64"
This reverts commit 60e0120c91.
2020-02-08 13:18:45 +00:00
Momchil Velikov a328536c6d [ARM] Correct syntax of the CLRM insn
The predicate should be adjacent to the opcode.

Differential Revision: https://reviews.llvm.org/D74040
2020-02-05 13:54:34 +00:00
Sam Parker 42350cd893 [ARM][MVE] Tail Predicate IsSafeToRemove
Introduce a method to walk through use-def chains to decide whether
it's possible to remove a given instruction and its users. These
instructions are then stored in a set until the end of the transform
when they're erased. This is now used to perform checks on the
iteration count (LoopDec chain), element count (VCTP chain) and the
possibly redundant iteration count.

As well as being able to remove chains of instructions, we know also
check that the sub feeding the vctp is producing the expected value.

Differential Revision: https://reviews.llvm.org/D71837
2020-01-17 13:19:14 +00:00
Diogo Sampaio d94d079a6a [ARM][Thumb2] Fix ADD/SUB invalid writes to SP
Summary:
This patch fixes pr23772  [ARM] r226200 can emit illegal thumb2 instruction: "sub sp, r12, #80".
The violation was that SUB and ADD (reg, immediate) instructions can only write to SP if the source register is also SP. So the above instructions was unpredictable.
To enforce that the instruction t2(ADD|SUB)ri does not write to SP we now enforce the destination register to be rGPR (That exclude PC and SP).
Different than the ARM specification, that defines one instruction that can read from SP, and one that can't, here we inserted one that can't write to SP, and other that can only write to SP as to reuse most of the hard-coded size optimizations.
When performing this change, it uncovered that emitting Thumb2 Reg plus Immediate could not emit all variants of ADD SP, SP #imm instructions before so it was refactored to be able to. (see test/CodeGen/Thumb2/mve-stacksplot.mir where we use a subw sp, sp, Imm12 variant )
It also uncovered a disassembly issue of adr.w instructions, that were only written as SUBW instructions (see llvm/test/MC/Disassembler/ARM/thumb2.txt).

Reviewers: eli.friedman, dmgreen, carwil, olista01, efriedma, andreadb

Reviewed By: efriedma

Subscribers: gbedwell, john.brawn, efriedma, ostannard, kristof.beyls, hiraditya, dmgreen, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D70680
2020-01-14 11:47:19 +00:00
Diogo Sampaio b1bb5ce96d Reverting, broke some bots. Need further investigation.
Summary: This reverts commit 8c12769f30.

Reviewers:

Subscribers:
2020-01-10 13:40:41 +00:00
Diogo Sampaio 8c12769f30 [ARM][Thumb2] Fix ADD/SUB invalid writes to SP
Summary:
This patch fixes pr23772  [ARM] r226200 can emit illegal thumb2 instruction: "sub sp, r12, #80".
The violation was that SUB and ADD (reg, immediate) instructions can only write to SP if the source register is also SP. So the above instructions was unpredictable.
To enforce that the instruction t2(ADD|SUB)ri does not write to SP we now enforce the destination register to be rGPR (That exclude PC and SP).
Different than the ARM specification, that defines one instruction that can read from SP, and one that can't, here we inserted one that can't write to SP, and other that can only write to SP as to reuse most of the hard-coded size optimizations.
When performing this change, it uncovered that emitting Thumb2 Reg plus Immediate could not emit all variants of ADD SP, SP #imm instructions before so it was refactored to be able to. (see test/CodeGen/Thumb2/mve-stacksplot.mir where we use a subw sp, sp, Imm12 variant )
It also uncovered a disassembly issue of adr.w instructions, that were only written as SUBW instructions (see llvm/test/MC/Disassembler/ARM/thumb2.txt).

Reviewers: eli.friedman, dmgreen, carwil, olista01, efriedma

Reviewed By: efriedma

Subscribers: john.brawn, efriedma, ostannard, kristof.beyls, hiraditya, dmgreen, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D70680
2020-01-10 11:25:44 +00:00
Victor Campos 60e0120c91 [ARM] Improve codegen of volatile load/store of i64
Summary:
Instead of generating two i32 instructions for each load or store of a volatile
i64 value (two LDRs or STRs), now emit LDRD/STRD.

These improvements cover architectures implementing ARMv5TE or Thumb-2.

Reviewers: dmgreen, efriedma, john.brawn, nickdesaulniers

Reviewed By: efriedma, nickdesaulniers

Subscribers: nickdesaulniers, vvereschaka, kristof.beyls, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D70072
2020-01-07 13:16:18 +00:00
Victor Campos 2ff5a596cb Revert "[ARM] Improve codegen of volatile load/store of i64"
This reverts commit bbcf1c3496.
2019-12-20 18:08:24 +00:00
Sam Parker acbc9aed72 [ARM][MVE] Fixes for tail predication.
1) Fix an issue with the incorrect value being used for the number of
   elements being passed to [d|w]lstp. We were trying to check that
   the value was available at LoopStart, but this doesn't consider
   that the last instruction in the block could also define the
   register. Two helpers have been added to RDA for this.
2) Insert some code to now try to move the element count def or the
   insertion point so that we can perform more tail predication.
3) Related to (1), the same off-by-one could prevent us from
   generating a low-overhead loop when a mov lr could have been
   the last instruction in the block.
4) Fix up some instruction attributes so that not all the
   low-overhead loop instructions are labelled as branches and
   terminators - as this is not true for dls/dlstp.

Differential Revision: https://reviews.llvm.org/D71609
2019-12-20 09:34:18 +00:00
Victor Campos bbcf1c3496 [ARM] Improve codegen of volatile load/store of i64
Summary:
Instead of generating two i32 instructions for each load or store of a volatile
i64 value (two LDRs or STRs), now emit LDRD/STRD.

These improvements cover architectures implementing ARMv5TE or Thumb-2.

Reviewers: dmgreen, efriedma, john.brawn

Reviewed By: efriedma

Subscribers: kristof.beyls, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D70072
2019-12-19 11:23:01 +00:00
Momchil Velikov 09555ce071 [ARM] Generate CMSE instructions from CMSE intrinsics
This patch adds instruction selection patterns for the TT, TTT, TTA, and TTAT
instructions and tests for llvm.arm.cmse.tt, llvm.arm.cmse.ttt,
llvm.arm.cmse.tta, and llvm.arm.cmse.ttat intrinsics (added in a previous
patch).

Patch by Javed Absar.

Differential Revision: https://reviews.llvm.org/D70407
2019-11-25 18:26:12 +00:00
David Green 0765a4c288 [ARM] Extra qdadd patterns
This adds some new qdadd patterns to go along with the other recently added
qadd's.

Differential Revision: https://reviews.llvm.org/D68999

llvm-svn: 375414
2019-10-21 14:06:49 +00:00
David Green d7b77f2203 [ARM] Add qadd lowering from a sadd_sat
This lowers a sadd_sat to a qadd by treating it as legal. Also adds qsub at the
same time.

The qadd instruction sets the q flag, but we already have many cases where we
do not model this in llvm.

Differential Revision: https://reviews.llvm.org/D68976

llvm-svn: 375411
2019-10-21 12:33:46 +00:00
David Green fba831e791 [ARM] Lower sadd_sat to qadd8 and qadd16
Lower the target independent signed saturating intrinsics to qadd8 and qadd16.
This custom lowers them from a sadd_sat, catching the node early before it is
promoted. It also adds a QADD8b and QADD16b node to mean the bottom "lane" of a
qadd8/qadd16, so that we can call demand bits on it to show that it does not
use the upper bits.

Also handles QSUB8 and QSUB16.

Differential Revision: https://reviews.llvm.org/D68974

llvm-svn: 375402
2019-10-21 09:53:38 +00:00
David Green 120a5e9a74 [ARM] Cortex-M4 schedule additions
This is an attempt to fill in some of the missing instructions from the
Cortex-M4 schedule, and make it easier to do the same for other ARM cpus.

- Some instructions are marked as hasNoSchedulingInfo as they are pseudos or
  otherwise do not require scheduling info
- A lot of features have been marked not supported
- Some WriteRes's have been added for cvt instructions.
- Some extra instruction latencies have been added, notably by relaxing the
  regex for dsp instruction to catch more cases, and some fp instructions.

This goes a long way to get the CompleteModel working for this CPU. It does not
go far enough as to get all scheduling info for all output operands correct.

Differential Revision: https://reviews.llvm.org/D67957

llvm-svn: 373163
2019-09-29 08:38:48 +00:00
David Green 10d10102a4 [ARM] Ensure we do not attempt to create lsll #0
During legalisation we can end up with some pretty strange nodes, like shifts
of 0. We need to make sure we don't try to make long shifts of these, ending up
with invalid assembly instructions. A long shift with a zero immediate actually
encodes a shift by 32.

Differential Revision: https://reviews.llvm.org/D67664

llvm-svn: 372839
2019-09-25 10:16:48 +00:00