Commit Graph

31556 Commits

Author SHA1 Message Date
Alex Richardson b5f69e234e Handle BUNDLE instructions in MipsAsmPrinter
Summary:
In our CHERI fork we use BUNDLE instructions to ensure that a
three-instruction sequence to generate a program-counter-relative value is
emitted without reordering or insertions (since that would break the 32-bit
offset computation).

Currently MipsAsmPrinter asserts when it encounters a pseudo instruction.
To handle BUNDLE we can simply skip the instruction which will then make
EmitInstruction() process the contents of the bundle in order.

Reviewers: atanasyan

Reviewed By: atanasyan

Subscribers: merge_guards_bot, sdardis, hiraditya, jrtc27, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D70945
2019-12-04 11:30:00 +00:00
Alex Richardson b91f239485 MipsDelaySlotFiller: Don't move BUNDLE instructions into the delay slot
Summary:
In our CHERI fork we use BUNDLE instructions to ensure that a
three-instruction sequence to generate a program-counter-relative value is
emitted without reordering or insertions (since that would break the 32-bit
offset computation). This sequence is created in MipsExpandPseudo and we use
finalizeBundle() to create the BUNDLE instruction.

However, the delay slot filler currently breaks this pattern since the BUNDLE
will be removed and so all instructions are moved into the delay slot.
Since the delay slot only executes the first instruction, this results in
incorrect computations (and run-time crashes) if the branch is taken.

The original test cases uses CHERI instructions, so for the test case here
I simple filled a BUNDLE with a no-op DADDiu $sp_64, -16 and DADDiu $sp_64, 16.

Reviewers: atanasyan

Reviewed By: atanasyan

Subscribers: merge_guards_bot, sdardis, hiraditya, jrtc27, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D70944
2019-12-04 11:30:00 +00:00
David Stuttard 46db606834 AMDGPU: Avoid folding 2 constant operands into an SALU operation
Summary:
Catch the (admittedly unusual) case where SIFoldOperands attempts to fold 2
constant operands into the same SALU operation, with neither operand able to be
encoded as an inline constant.

Change-Id: Ibc48d662c9ffd8bbacd154976b0b1c257ace0927

Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, tpr, t-tye, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D70896
2019-12-04 10:25:34 +00:00
Ulrich Weigand c3d05c1b52 [SelectionDAG] Expand nnan FMINNUM/FMAXNUM to select sequence
InstCombine may synthesize FMINNUM/FMAXNUM nodes from fcmp+select
sequences (where the fcmp is marked nnan).  Currently, if the
target does not otherwise handle these nodes, they'll get expanded
to libcalls to fmin/fmax.  However, these functions may reside in
libm, which may introduce a library dependency that was not originally
present in the source code, potentially resulting in link failures.

To fix this problem, add code to TargetLowering::expandFMINNUM_FMAXNUM
to expand FMINNUM/FMAXNUM to a compare+select sequence instead of the
libcall. This is done only if the node is marked as "nnan"; in this case,
the expansion to compare+select is always correct. This also suffices to
catch all cases where FMINNUM/FMAXNUM was synthesized as above.

Differential Revision: https://reviews.llvm.org/D70965
2019-12-04 10:32:35 +01:00
QingShan Zhang d84b320dfd [MacroFusion] Limit the max fused number as 2 to reduce the dependency
This is the example:

int foo(int a, int b, int c, int d) {
  return a + b + c + d;
}

And this is the Dependency Graph:
+------+       +------+       +------+       +------+
|  A   |       |  B   |       |  C   |       |  D   |
+--+--++       +---+--+       +--+---+       +--+---+
   ^  ^            ^  ^          ^              ^
   |  |            |  |          |              |
   |  |            |  |New1      +--------------+
   |  |            |  |          |
   |  |            |  |       +--+---+
   |  |New2        |  +-------+ ADD1 |
   |  |            |          +--+---+
   |  |            |    Fuse     ^
   |  |            +-------------+
   |  +------------+
   |               |
   |   Fuse     +--+---+
   +----------->+ ADD2 |
   |            +------+
+--+---+
| ADD3 |
+------+

We need also create an artificial edge from ADD1 to A if
https://reviews.llvm.org/D69998 is landed. That will force the Node A scheduled
before the ADD1 and ADD2. But in fact, it is ok to schedule the Node A
in-between ADD3 and ADD2, as ADD3 and ADD2 are NOT a fusion pair because
ADD2 has been matched to ADD1. We are creating these unnecessary dependency
edges that override the heuristics.

Differential Revision: https://reviews.llvm.org/D70066
2019-12-04 05:05:35 +00:00
czhengsz f0ba1aec35 [PowerPC] folding rlwinm + rlwinm to rlwinm
For example:
    x3 = rlwinm x3, 27, 5, 31
    x3 = rlwinm x3, 19, 0, 12
  can be combined to
    x3 = rlwinm x3, 14, 0, 12

Reviewed by: steven.zhang, lkail

Differential Revision: https://reviews.llvm.org/D70374
2019-12-03 21:51:19 -05:00
Wang, Pengfei c8995de069 [X86] Model DAZ and FTZ
Summary: This is a follow-up of D70881. It models DAZ and FTZ for releated instructions.

Reviewers: craig.topper, RKSimon, andrew.w.kaylor

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D70938
2019-12-04 08:22:45 +08:00
Wang, Pengfei c1c673303d [X86] Model MXCSR for all AVX512 instructions
Summary: Model MXCSR for all AVX512 instructions

Reviewers: craig.topper, RKSimon, andrew.w.kaylor

Subscribers: hiraditya, llvm-commits, LuoYuanke, LiuChen3

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D70881
2019-12-04 08:07:38 +08:00
Craig Topper f586fd44e4 [FPEnv] [PowerPC] Lowering ppc_fp128 StrictFP Nodes to libcalls
This is an alternative to D64662 that shares more code between
strict and non-strict nodes. It's modeled after the implementation
that I did for softening.

Differential Revision: https://reviews.llvm.org/D70867
2019-12-03 14:11:21 -08:00
James Clarke da7b129b1b [RISCV] Don't force Local Exec TLS for non-PIC
Summary:
Forcing Local Exec TLS requires the use of copy relocations. Copy
relocations need special handling in the runtime linker when being used
against TLS symbols, which is present in glibc, but not in FreeBSD nor
musl, and so cannot be relied upon. Moreover, copy relocations are a
hack that embed the size of an object in the ABI when it otherwise
wouldn't be, and break protected symbols (which are expected to be DSO
local), whilst also wasting space, thus they should be avoided whenever
possible. As discussed in D70398, RISC-V should move away from forcing
Local Exec, and instead use Initial Exec like other targets, with
possible linker relaxation to follow. The RISC-V GCC maintainers also
intend to adopt this more-conventional behaviour (see
https://github.com/riscv/riscv-elf-psabi-doc/issues/122).

Reviewers: asb, MaskRay

Reviewed By: MaskRay

Subscribers: emaste, krytarowski, hiraditya, rbar, johnrusso, simoncook, sabuasal, niosHD, kito-cheng, shiva0217, zzheng, edward-jones, rogfer01, MartinMosbeck, brucehoult, the_o, rkruppe, PkmX, jocewei, psnobl, benna, Jim, lenary, s.egerton, pzheng, sameer.abuasal, apazos, llvm-commits, bsdjhb

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D70649
2019-12-03 22:04:54 +00:00
Aditya Nandakumar 6da7dbb806 [GlobalISel]: Allow targets to override how to widen constants during legalization
https://reviews.llvm.org/D70922

This adds a hook to allow targets to define exactly what extension
operation should be performed for widening constants. This handles cases
like widening i1 true which would end up becoming -1 which affects code
quality during combines.
Additionally, in order to stay consistent with how DAG is promoting
constants, we now signextend for byte sized types and zero extend
otherwise (by default). Targets can of course override this if
necessary.
2019-12-03 10:41:10 -08:00
Sanne Wouda f2e7de81c6 [AArch64] Fix over-eager fusing of NEON SIMD MUL/ADD
Summary:
The ISel pattern for SIMD MLA is a bit too eager: it replaces the ADD with an
MLA even when the MUL cannot be eliminated, e.g. when it has another use.  An
MLA is usually has a higher latency than an ADD (and there are fewer pipes
available that can execute it), so trading an MLA for an ADD is not great.

ISel is not taking the number of uses of the MUL result into account, nor any
other factors such as the length of the critical path or other resource pressure.

The MachineCombiner is able to make these judgments so this patch ports the ISel
pattern for MUL/ADD fusing to the MachineCombiner.

Similarly for MUL/SUB -> MLS, as well as the indexed variants.

The change has no impact on SPEC CPU© intrate nor fprate.

Reviewers: dmgreen, SjoerdMeijer, fhahn, Gerolf

Subscribers: kristof.beyls, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D70673
2019-12-03 15:48:37 +00:00
Sander de Smalen 79f2422d6a [Aarch64][SVE] Add intrinsics for gather loads (vector + imm)
This patch adds intrinsics for SVE gather loads from memory addresses generated by a vector base plus immediate index:
  * @llvm.aarch64.sve.ld1.gather.imm

This intrinsics maps 1-1 to the corresponding SVE instruction (example for half-words):
  * ld1h { z0.d }, p0/z, [z0.d, #16]

Committed on behalf of Andrzej Warzynski (andwar)

Reviewers: sdesmalen, huntergr, kmclaughlin, eli.friedman, rengolin, rovka, dancgr, mgudim, efriedma

Reviewed By: sdesmalen

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D70806
2019-12-03 15:19:16 +00:00
Sanne Wouda 970d9719ea Precommit tests for D70673 2019-12-03 14:56:34 +00:00
Sander de Smalen 8bf31e28d7 [Aarch64][SVE] Add intrinsics for gather loads with 32-bits offsets
This patch adds intrinsics for SVE gather loads for which the offsets are 32-bits wide and are:
* unscaled
  * @llvm.aarch64.sve.ld1.gather.sxtw
  * @llvm.aarch64.sve.ld1.gather.uxtw
* scaled (offsets become indices)
  * @llvm.arch64.sve.ld1.gather.sxtw.index
  * @llvm.arch64.sve.ld1.gather.uxtw.index
The offsets are either zero (uxtw) or sign (sxtw) extended to 64 bits.

These intrinsics map 1-1 to the corresponding SVE instructions (examples for half-words):
* unscaled
  * ld1h { z0.s }, p0/z, [x0, z0.s, sxtw]
  * ld1h { z0.s }, p0/z, [x0, z0.s, uxtw]
* scaled
  * ld1h { z0.s }, p0/z, [x0, z0.s, sxtw #1]
  * ld1h { z0.s }, p0/z, [x0, z0.s, uxtw #1]

Committed on behalf of Andrzej Warzynski (andwar)

Reviewers: sdesmalen, kmclaughlin, eli.friedman, rengolin, rovka, huntergr, dancgr, mgudim, efriedma

Reviewed By: sdesmalen

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D70782
2019-12-03 14:48:29 +00:00
Kerry McLaughlin 8881ac9c39 [AArch64][SVE2] Implement remaining SVE2 floating-point intrinsics
Summary:
Adds the following intrinsics:
  - faddp
  - fmaxp, fminp, fmaxnmp & fminnmp
  - fmlalb, fmlalt, fmlslb & fmlslt
  - flogb

Reviewers: huntergr, sdesmalen, dancgr, efriedma

Reviewed By: sdesmalen

Subscribers: efriedma, tschuett, kristof.beyls, hiraditya, cameron.mcinally, cfe-commits, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D70253
2019-12-03 13:29:41 +00:00
Sander de Smalen 6e51ceba53 [AArch64][SVE] Add intrinsics for gather loads with 64-bit offsets
This patch adds the following intrinsics for gather loads with 64-bit offsets:
      * @llvm.aarch64.sve.ld1.gather (unscaled offset)
      * @llvm.aarch64.sve.ld1.gather.index (scaled offset)

These intrinsics map 1-1 to the following AArch64 instructions respectively (examples for half-words):
      * ld1h { z0.d }, p0/z, [x0, z0.d]
      * ld1h { z0.d }, p0/z, [x0, z0.d, lsl #1]

Committing on behalf of Andrzej Warzynski (andwar)

Reviewers: sdesmalen, huntergr, rovka, mgudim, dancgr, rengolin, efriedma

Reviewed By: efriedma

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D70542
2019-12-03 12:55:03 +00:00
Kerry McLaughlin 7483eb656f [AArch64][SVE] Implement shift intrinsics
Summary:
Adds the following intrinsics:
- asr & asrd
- insr
- lsl & lsr

This patch also adds a new AArch64ISD node (INSR) to represent the int_aarch64_sve_insr intrinsic.

Reviewers: huntergr, sdesmalen, dancgr, mgudim, rengolin, efriedma

Reviewed By: sdesmalen

Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, cameron.mcinally, cfe-commits, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D70437
2019-12-03 11:47:12 +00:00
Sam Parker 26bf2a510f Fix for buildbots
Change pass name in pipeline test.
2019-12-03 11:30:38 +00:00
Sam Parker bc76dadb3c [CodeGen] Move ARMCodegenPrepare to TypePromotion
Convert ARMCodeGenPrepare into a generic type promotion pass by:
- Removing the insertion of arm specific intrinsics to handle narrow
  types as we weren't using this.
- Removing ARMSubtarget references.
- Now query a generic TLI object to know which types should be
  promoted and what they should be promoted to.
- Move all codegen tests into Transforms folder and testing using opt
  and not llc, which is how they should have been written in the
  first place...

The pass searches up from icmp operands in an attempt to safely
promote types so we can avoid generating unnecessary unsigned extends
during DAG ISel.

Differential Revision: https://reviews.llvm.org/D69556
2019-12-03 11:12:52 +00:00
Jonas Paulsson 7b63e27cc0 Temporarily run machine-verifier once in test/CodeGen/SPARC/fp128.ll, so that
it XFAIL:s also without expensive checks.

See https://reviews.llvm.org/D63973
2019-12-03 11:21:52 +01:00
Jonas Paulsson f8c0cfc24e ImplicitNullChecks: Don't add a dead definition of DepMI as live-in
This is one of the fixes needed to reapply D68267 which improves verification
of live-in lists.

Review: craig.topper
https://reviews.llvm.org/D70434
2019-12-03 11:02:53 +01:00
Jonas Paulsson 4fd8f11901 [MachineVerifier] Improve checks of target instructions operands.
While working with a patch for instruction selection, the splitting of a
large immediate ended up begin treated incorrectly by the backend. Where a
register operand should have been created, it instead became an immediate. To
my surprise the machine verifier failed to report this, which at the time
would have been helpful.

This patch improves the verifier so that it will report this type of error.

This patch XFAILs CodeGen/SPARC/fp128.ll, which has been reported at
https://bugs.llvm.org/show_bug.cgi?id=44091

Review: thegameg, arsenm, fhahn
https://reviews.llvm.org/D63973
2019-12-03 10:20:52 +01:00
Wang, Pengfei cf81714a7e [X86] Model MXCSR for AVX instructions other than AVX512
Summary: Model MXCSR for AVX instructions other than AVX512

Reviewers: craig.topper, RKSimon

Subscribers: hiraditya, llvm-commits, LuoYuanke, LiuChen3

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D70875
2019-12-03 08:53:47 +08:00
Amaury Séchet 77b7b23ca1 Automatically generated arm64-abi-varargs.ll . NFC 2019-12-02 22:54:51 +01:00
Volkan Keles 3d02fa6da7 [GlobalISel] CombinerHelper: Fix a bug in matchCombineCopy
Summary:
When combining COPY instructions, we were replacing the destination registers
with the source register without checking register constraints. This patch adds
a simple logic to check if the constraints match before replacing registers.

Reviewers: qcolombet, aditya_nandakumar, aemerson, paquette, dsanders, Petar.Avramovic

Reviewed By: aditya_nandakumar

Subscribers: rovka, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D70616
2019-12-02 12:05:09 -08:00
David Green 57d96ab593 [ARM] Add some VCMP folding and canonicalisation
The VCMP instructions in MVE can accept a register or ZR, but only as
the right hand operator. Most of the time this will already be correct
because the icmp will have been canonicalised that way already. There
are some cases in the lowering of float conditions that this will not
apply to though. This code should fix up those cases.

Differential Revision: https://reviews.llvm.org/D70822
2019-12-02 19:57:12 +00:00
David Green 63aff5cd3c [ARM] More reversed vcmp tests. NFC 2019-12-02 19:57:12 +00:00
Taewook Oh 2da205d43e Reland "b19ec1eb3d0c [BPI] Improve unreachable/ColdCall heurstics to handle loops."
Summary: b19ec1eb3d has been reverted because of the test failures
with PowerPC targets. This patch addresses the issues from the previous
commit.

Test Plan: ninja check-all. Confirmed that CodeGen/PowerPC/pr36292.ll
and CodeGen/PowerPC/sms-cpy-1.ll pass

Subscribers: llvm-commits
2019-12-02 10:28:40 -08:00
Simon Tatham d173fb5d28 [ARM,MVE] Add intrinsics to deal with predicates.
Summary:
This commit adds the `vpselq` intrinsics which take an MVE predicate
word and select lanes from two vectors; the `vctp` intrinsics which
create a tail predicate word suitable for processing the first m
elements of a vector (e.g. in the last iteration of a loop); and
`vpnot`, which simply complements a predicate word and is just
syntactic sugar for the `~` operator.

The `vctp` ACLE intrinsics are lowered to the IR intrinsics we've
already added (and which D70592 just reorganized). I've filled in the
missing isel rule for VCTP64, and added another set of rules to
generate the predicated forms.

I needed one small tweak in MveEmitter to allow the `unpromoted` type
modifier to apply to predicates as well as integers, so that `vpnot`
doesn't pointlessly convert its input integer to an `<n x i1>` before
complementing it.

Reviewers: ostannard, MarkMurrayARM, dmgreen

Reviewed By: dmgreen

Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D70485
2019-12-02 16:20:30 +00:00
Simon Tatham 48cce077ef [ARM,MVE] Rename and clean up VCTP IR intrinsics.
Summary:
D65884 added a set of Arm IR intrinsics for the MVE VCTP instruction,
to use in tail predication. But the 64-bit one doesn't work properly:
its predicate type is `<2 x i1>` / `v2i1`, which isn't a legal MVE
type (due to not having a full set of instructions that manipulate it
usefully). The test of `vctp64` in `basic-tail-pred.ll` goes through
`opt` fine, as the test expects, but if you then feed it to `llc` it
causes a type legality failure at isel time.

The usual workaround we've been using in the rest of the MVE
intrinsics family is to bodge `v2i1` into `v4i1`. So I've adjusted the
`vctp64` IR intrinsic to do that, and completely removed the code (and
test) that uses that intrinsic for 64-bit tail predication. That will
allow me to add isel rules (upcoming in D70485) that actually generate
the VCTP64 instruction.

Also renamed all four of these IR intrinsics so that they have `mve`
in the name, since its absence was confusing.

Reviewers: ostannard, MarkMurrayARM, dmgreen

Reviewed By: MarkMurrayARM

Subscribers: samparker, kristof.beyls, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D70592
2019-12-02 16:20:30 +00:00
Simon Tatham 01aefae4a1 [ARM,MVE] Add an InstCombine rule permitting VPNOT.
Summary:
If a user writing C code using the ACLE MVE intrinsics generates a
predicate and then complements it, then the resulting IR will use the
`pred_v2i` IR intrinsic to turn some `<n x i1>` vector into a 16-bit
integer; complement that integer; and convert back. This will generate
machine code that moves the predicate out of the `P0` register,
complements it in an integer GPR, and moves it back in again.

This InstCombine rule replaces `i2v(~v2i(x))` with a direct complement
of the original predicate vector, which we can already instruction-
select as the VPNOT instruction which complements P0 in place.

Reviewers: ostannard, MarkMurrayARM, dmgreen

Reviewed By: dmgreen

Subscribers: kristof.beyls, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D70484
2019-12-02 16:20:30 +00:00
Nemanja Ivanovic 241cbf201a [PowerPC] Fix crash in peephole optimization
When converting reg+reg shifts to reg+imm rotates, we neglect to consider the
CodeGenOnly versions of the 32-bit shift mnemonics. This means we produce a
rotate with missing operands which causes a crash.

Committing this fix without review since it is non-controversial that the list
of mnemonics to consider should include the 64-bit aliases for the exact
mnemonics.

Fixes PR44183.
2019-12-02 08:56:04 -06:00
Victor Campos dcf11c5e86 [ARM][AArch64] Complex addition Neon intrinsics for Armv8.3-A
Summary:
Add support for vcadd_* family of intrinsics. This set of intrinsics is
available in Armv8.3-A.

The fp16 versions require the FP16 extension, which has been available
(opt-in) since Armv8.2-A.

Reviewers: t.p.northover

Reviewed By: t.p.northover

Subscribers: t.p.northover, kristof.beyls, hiraditya, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D70862
2019-12-02 14:38:39 +00:00
Mark Murray 510792a2e0 [ARM][MVE][Intrinsics] Add VMINQ/VMAXQ/VMINNMQ/VMAXNMQ intrinsics.
Summary: Add VMINQ/VMAXQ/VMINNMQ/VMAXNMQ intrinsics and their predicated versions. Add unit tests.

Subscribers: kristof.beyls, hiraditya, dmgreen, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D70829
2019-12-02 11:18:53 +00:00
David Green e9e1daf2b9 [ARM] Remove VHADD patterns
These instructions do not work quite like I expected them to. They
perform the addition and then shift in a higher precision integer, so do
not match up with the patterns that we added.

For example with s8s, adding 100 and 100 should wrap leaving the shift
to work on a negative number. VHADD will instead do the arithmetic in
higher precision, giving 100 overall. The vhadd gives a "better" result,
but not one that matches up with the input.

I am just removing the patterns here. We might be able to re-add them in
the future by checking for wrap flags or changing bitwidths. But for the
moment just remove them to remove the problem cases.
2019-12-02 10:38:14 +00:00
Austin Kerbow cfbbdc83b4 AMDGPU/GlobalISel: Add AGPR bank and RegBankSelect mfma intrinsics
Differential Revision: https://reviews.llvm.org/D70871
2019-12-01 22:15:48 -08:00
Craig Topper 67298d683c [X86][InstCombine] Move non-X86 specific instcombine test from test/CodeGen/X86/ to test/Transforms/InstCombine/ 2019-12-01 10:31:04 -08:00
Craig Topper 3dd93dc2a1 [X86][InstCombine] Move instcombine test from test/CodeGen/X86 to test/Transforms/InstCombine/ and replace grep with FileCheck 2019-12-01 10:31:04 -08:00
Craig Topper 40dfc6dff1 [X86] Add floating point execution domain to comi/ucomi/cvtss2si/cvtsd2si/cvttss2si/cvttsd2si/cvtsi2ss/cvtsi2sd instructions. 2019-11-30 11:26:28 -08:00
Hans Wennborg c2443155a0 Revert 651f07908a "[AArch64] Don't combine callee-save and local stack adjustment when optimizing for size"
This caused asserts (and perhaps also miscompiles) while building for Windows
on AArch64. See the discussion on D68530 for details and reproducer.

Reverting until this can be investigated and fixed.

> For arm64, D18619 introduced the ability to combine bumping the stack pointer
> upfront in case it needs to be bumped for both the callee-save area as well as
> the local stack area.
>
> That diff already remarks that "This change can cause an increase in
> instructions", but argues that even when that happens, it should be still be a
> performance benefit because the number of micro-ops is reduced.
>
> We have observed that this code-size increase can be significant in practice.
> This diff disables combining stack bumping for methods that are marked as
> optimize-for-size.
>
> Example of a prologue with the behavior before this diff (combining stack bumping when possible):
>   sub        sp, sp, #0x40
>   stp        d9, d8, [sp, #0x10]
>   stp        x20, x19, [sp, #0x20]
>   stp        x29, x30, [sp, #0x30]
>   add        x29, sp, #0x30
>   [... compute x8 somehow ...]
>   stp        x0, x8, [sp]
>
> And after this  diff, if the method is marked as optimize-for-size:
>   stp        d9, d8, [sp, #-0x30]!
>   stp        x20, x19, [sp, #0x10]
>   stp        x29, x30, [sp, #0x20]
>   add        x29, sp, #0x20
>   [... compute x8 somehow ...]
>   stp        x0, x8, [sp, #-0x10]!
>
> Note that without combining the stack bump there are two auto-decrements,
> nicely folded into the stp instructions, whereas otherwise there is a single
> sub sp, ... instruction, but not folded.
>
> Patch by Nikolai Tillmann!
>
> Differential Revision: https://reviews.llvm.org/D68530
2019-11-30 14:20:11 +01:00
Sean Fertile 26ab827c24 [PowerPC][AIX] Add support for lowering int/float/double formal arguments.
This patch adds LowerFormalArguments_AIX, support is added for lowering
int, float, and double formal arguments into general purpose and
floating point registers only.

The aix calling convention testcase have been redone to test for caller
and callee functionality in the same lit test.

Patch by Zarko Todorovski!

Differential Revision: https://reviews.llvm.org/D69578
2019-11-29 12:46:53 -05:00
Carey Williams 76fd58d0fe Revert "[ARM] Allocatable Global Register Variables for ARM"
This reverts commit 2d739f98d8.
2019-11-29 17:01:05 +00:00
Victor Campos e478385e77 [ARM] Fix instruction selection for ARMISD::CMOV with f16 type
Summary:
In the cases where the CMOV (f16) SDNode is used with condition codes
LT, LE, VC or NE, it is successfully selected into a VSEL instruction.

In the remaining cases, however, instruction selection fails since VSEL
does not support other condition codes.

This patch handles such cases by using the single-precision version of
the VMOV instruction.

Reviewers: ostannard, dmgreen

Reviewed By: dmgreen

Subscribers: kristof.beyls, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D70667
2019-11-29 10:40:37 +00:00
Amaury Séchet ca818f4550 [DAGCombiner] Peek through vector concats when trying to combine shuffles.
Summary: This combine showed up as needed when exploring the regression when processing the DAG in topological order.

Reviewers: craig.topper, efriedma, RKSimon, lebedev.ri

Subscribers: llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D68195
2019-11-28 23:57:29 +01:00
Austin Kerbow 256ad954a9 AMDGPU: Reuse carry out register during FI elimination
Summary:
Pre gfx9 we need to scavenge a 64-bit SGPR to use as the carry out for an Add.
If only one SGPR was available this crashed when trying to scavenge another
32bit SGPR to materialize the offset.

Instead, reuse a 32-bit SGPR from the carry out as the offset register.

Also prefer to use vcc for the unused carry out when it is available.

Reviewers: arsenm, rampitec

Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D70614
2019-11-28 10:13:48 -08:00
David Stuttard 943d8326dd AMDGPU: Fix lit test checks with dag option
Summary:
I was seeing some failures on a test with slightly different instruction
ordering. Adding in some DAG directives solved the issue.

Change-Id: If5a3d3969055fb19279943bd45161bb70a3dabce

Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, tpr, t-tye, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D70531
2019-11-28 10:01:06 +00:00
Wang, Pengfei 1bc5c52afd [X86][NFC] Rename test file for following changes. 2019-11-28 15:03:56 +08:00
Craig Topper ed521fef03 [LegalTypes][X86] Add SoftenFloatOperand support for STRICT_FP_TO_SINT/STRICT_FP_TO_UINT. 2019-11-27 21:16:13 -08:00
Craig Topper 1727c4f1a2 [LegalizeTypes][X86] Add ExpandIntegerResult support for STRICT_FP_TO_SINT/STRICT_FP_TO_UINT. 2019-11-27 18:41:45 -08:00