Commit Graph

30684 Commits

Author SHA1 Message Date
Amara Emerson 02bcc86b08 [GlobalISel] Fix insertion point of new instructions to be after PHIs.
For some reason we sometimes insert new instructions one instruction before
the first non-PHI when legalizing. This can result in having non-PHI
instructions before PHIs, which mean that PHI elimination doesn't catch them.

Differential Revision: https://reviews.llvm.org/D67570

llvm-svn: 371901
2019-09-13 21:49:24 +00:00
Jessica Paquette 727328ab63 [AArch64][GlobalISel] Tail call memory intrinsics
Because memory intrinsics are handled differently than other calls, we need to
check them for tail call eligiblity in the legalizer. This allows us to still
inline them when it's beneficial to do so, but also tail call when possible.

This adds simple tail calling support for when the intrinsic is followed by a
return.

It ports the attribute checks from `TargetLowering::isInTailCallPosition` into
a similarly-named function in LegalizerHelper.cpp. The target-specific
`isUsedByReturnOnly` hook is not ported here.

Update tailcall-mem-intrinsics.ll to show that GlobalISel can now tail call
memory intrinsics.

Update legalize-memcpy-et-al.mir to have a case where we don't tail call.

Differential Revision: https://reviews.llvm.org/D67566

llvm-svn: 371893
2019-09-13 20:25:58 +00:00
Alexander Timofeev 9ff70132bf Revert for: [AMDGPU]: PHI Elimination hooks added for custom COPY insertion.
llvm-svn: 371873
2019-09-13 17:37:30 +00:00
Jessica Paquette 14bfb56b1a [AArch64][GlobalISel] Add support for sibcalling callees with varargs
This adds support for tail calling callees with varargs, equivalent to how it
is done in AArch64ISelLowering.

This only works for sibling calls, and does not add the necessary support for
musttail with varargs. (See r345641 for equivalent ISelLowering support.) This
should be implemented when we stop falling back on musttail.

Update call-translator-tail-call.ll to show that we can now tail call varargs.

Differential Revision: https://reviews.llvm.org/D67518

llvm-svn: 371868
2019-09-13 16:10:19 +00:00
Jinsong Ji 455a0db01a [PowerPC][NFC] Move codegen tests to PowerPC from MIR/PowerPC
All tests with -run-pass !=none should not in MIR/, See MIR/README.

```
Tests for codegen passes should NOT be here but in
test/CodeGen/sometarget. As
a rule of thumb this directory should only contain tests using
'llc -run-pass none'.
```

llvm-svn: 371857
2019-09-13 14:18:36 +00:00
Sjoerd Meijer b55456aaa0 [AArch64] More @llvm.fma.f16 tests
Follow up of rL371321 that added FMA FP16 patterns. This adds more tests
for @llvm.fma.f16. This probably shows we miss one fmsub optimisation
opportunity, which I will look into.

llvm-svn: 371833
2019-09-13 09:44:13 +00:00
Sam Tebbs 1572b68509 [ARM] Add support for MVE vmaxv and vminv
This patch adds vecreduce_smax, vecredude_umax, vecreduce_smin, vecreduce_umin and selection for vmaxv and minv.

Differential Revision: https://reviews.llvm.org/D66413

llvm-svn: 371827
2019-09-13 09:11:46 +00:00
Matt Arsenault a4be3eff5c AMDGPU/GlobalISel: Legalize s32->s16 G_SITOFP/G_UITOFP
llvm-svn: 371811
2019-09-13 04:04:55 +00:00
Shiva Chen a49a16ddd0 [RISCV] Support stack offset exceed 32-bit for RV64
Differential Revision: https://reviews.llvm.org/D61884

llvm-svn: 371810
2019-09-13 04:03:32 +00:00
Shiva Chen ea530ba3ed Revert "[RISCV] Support stack offset exceed 32-bit for RV64"
This reverts commit 1c340c62058d4115d21e5fa1ce3a0d094d28c792.

llvm-svn: 371809
2019-09-13 04:03:24 +00:00
Matt Arsenault 67d9349dad AMDGPU/GlobalISel: Fix RegBankSelect for amdgcn.else
llvm-svn: 371808
2019-09-13 03:55:49 +00:00
Matt Arsenault 638f802381 AMDGPU/GlobalISel: Select 16-bit VALU bit ops
llvm-svn: 371807
2019-09-13 03:55:43 +00:00
Shiva Chen eaa230fe3c [RISCV] Support stack offset exceed 32-bit for RV64
Differential Revision: https://reviews.llvm.org/D61884

llvm-svn: 371806
2019-09-13 02:50:13 +00:00
Matt Arsenault f457dd2bd4 AMDGPU/GlobalISel: Legalize G_FFLOOR
llvm-svn: 371803
2019-09-13 01:48:15 +00:00
Tim Shen a31c521f5e Temporarily revert r371640 "LiveIntervals: Split live intervals on multiple dead defs".
It reveals a miscompile on Hexagon. See PR43302 for details.

llvm-svn: 371802
2019-09-13 01:34:25 +00:00
Matt Arsenault 4d33918034 AMDGPU/GlobalISel: Legalize G_FMAD
Unlike SelectionDAG, treat this as a normally legalizable operation.
In SelectionDAG this is supposed to only ever formed if it's legal,
but I've found that to be restricting. For AMDGPU this is contextually
legal depending on whether denormal flushing is allowed in the use
function.

Technically we currently treat the denormal mode as a subtarget
feature, so custom lowering could be avoided. However I consider this
to be a defect, and this should be contextually dependent on the
controllable rounding mode of the parent function.

llvm-svn: 371800
2019-09-13 00:44:35 +00:00
Matt Arsenault 4a73c6eada AMDGPU/GlobalISel: Select G_CTPOP
llvm-svn: 371798
2019-09-13 00:11:20 +00:00
Matt Arsenault b85c8c4bbd LiveIntervals: Remove assertion
This testcase is invalid, and caught by the verifier. For the verifier
to catch it, the live interval computation needs to complete. Remove
the assert so the verifier catches this, which is less confusing.

In this testcase there is an undefined use of a subregister, and lanes
which aren't used or defined. An equivalent testcase with the
super-register shrunk to have no untouched lanes already hit this
verifier error.

llvm-svn: 371792
2019-09-12 23:46:51 +00:00
Matt Arsenault 8382ce5f1b AMDGPU: Inline constant when materalizing FI with add on gfx9
This was relying on the SGPR usable for the carry out clobber to also
be used for the input. There was no carry out on gfx9. With no carry
out clobber to worry about, so the literal can just be directly used
with a VOP2 add.

llvm-svn: 371791
2019-09-12 23:46:46 +00:00
Philip Reames 4a8916cf1a [Test] Restructure check lines to show differences between modes more clearly
With the landing of the previous patch (in particular D66318) there are a lot fewer diffs now.  I added an experimental O0 line, and updated all the tests to group experimental and non-experimental O0/O3 together.

Skimming the remaining diffs, there's only a few which are obviously incorrect.  There's a large number which are questionable, so more todo.

llvm-svn: 371790
2019-09-12 23:22:37 +00:00
Jessica Paquette 0c283cb504 [AArch64][GlobalISel] Support tail calling with swiftself parameters
Swiftself uses a callee-saved register. We can tail call when the register used
in the caller and callee is the same.

This behaviour is equivalent to that in `TargetLowering::parametersInCSRMatch`.

Update call-translator-tail-call.ll to verify that we can do this. When we
support inline assembly, we can write a check similar to the one in the
general swiftself.ll. For now, we need to verify that we get the correct COPY
instruction after call lowering.

Differential Revision: https://reviews.llvm.org/D67511

llvm-svn: 371788
2019-09-12 23:00:59 +00:00
Philip Reames 079e210463 [SDAG] Update generic code to conservatively check for isAtomic in addition to isVolatile
This is the first sweep of generic code to add isAtomic bailouts where appropriate. The intention here is to have the switch from AtomicSDNode to LoadSDNode/StoreSDNode be close to NFC; that is, I'm not looking to allow additional optimizations at this time. That will come later.  See D66309 for context.

Differential Revision: https://reviews.llvm.org/D66318

llvm-svn: 371786
2019-09-12 22:49:17 +00:00
Jessica Paquette a42070a6aa [AArch64][GlobalISel] Support sibling calls with outgoing arguments
This adds support for lowering sibling calls with outgoing arguments.

e.g

```
define void @foo(i32 %a)
```

Support is ported from AArch64ISelLowering's `isEligibleForTailCallOptimization`.
The only thing that is missing is a full port of
`TargetLowering::parametersInCSRMatch`. So, if we're using swiftself,
we'll never tail call.

- Rename `analyzeCallResult` to `analyzeArgInfo`, since the function is now used
  for both outgoing and incoming arguments
- Teach `OutgoingArgHandler` about tail calls. Tail calls use frame indices for
  stack arguments.
- Teach `lowerFormalArguments` to set the bytes in the caller's stack argument
  area. This is used later to check if the tail call's parameters will fit on
  the caller's stack.
- Add `areCalleeOutgoingArgsTailCallable` to perform the eligibility check on
  the callee's outgoing arguments.

For testing:

- Update call-translator-tail-call to verify that we can now tail call with
  outgoing arguments, use G_FRAME_INDEX for stack arguments, and respect the
  size of the caller's stack
- Remove GISel-specific check lines from speculation-hardening.ll, since GISel
  now tail calls like the other selectors
- Add a GISel test line to tailcall-string-rvo.ll since we can tail call in that
  test now
- Add a GISel test line to tailcall_misched_graph.ll since we tail call there
  now. Add specific check lines for GISel, since the debug output from the
  machine-scheduler differs with GlobalISel. The dependency still holds, but
  the output comes out in a different order.

Differential Revision: https://reviews.llvm.org/D67471

llvm-svn: 371780
2019-09-12 22:10:36 +00:00
Craig Topper 36e04d14e9 [PowerPC] Remove the SPE4RC register class and instead add f32 to the GPRC register class.
Summary:
Since the SPE4RC register class contains an identical set of registers
and an identical spill size to the GPRC class its slightly confusing
the tablegen emitter. It's preventing the GPRC_and_GPRC_NOR0 synthesized
register class from inheriting VTs and AltOrders from GPRC or GPRC_NOR0.
This is because SPE4C is found first in the super register class list
when inheriting these properties and it doesn't set the VTs or
AltOrders the same way as GPRC or GPRC_NOR0.

This patch replaces all uses of GPE4RC with GPRC and allows GPRC and
GPRC_NOR0 to contain f32.

The test changes here are because the AltOrders are being inherited
to GPRC_NOR0 now.

Found while trying to determine if getCommonSubClass needs to take
a VT argument. It was originally added to support fp128 on x86-64,
I've changed some things about that so that it might be needed
anymore. But a PowerPC test crashed without it and I think its
due to this subclass issue.

Reviewers: jhibbits, nemanjai, kbarton, hfinkel

Subscribers: wuzish, nemanjai, mehdi_amini, hiraditya, kbarton, MaskRay, dexonsmith, jsji, shchenz, steven.zhang, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D67513

llvm-svn: 371779
2019-09-12 22:07:35 +00:00
Craig Topper efe6724b9f [DAGCombiner][X86] Pass the CmpOpVT to reduceSelectOfFPConstantLoads so X86 can exclude fp128 compares.
The X86 decision assumes the compare will produce a result in an XMM
register, but that can't happen for an fp128 compare since those
go to a libcall the returns an i32. Pass the VT so X86 can check
the type.

llvm-svn: 371775
2019-09-12 21:30:18 +00:00
Petar Avramovic ff6ac1eb5f [MIPS GlobalISel] Select indirect branch
Select G_BRINDIRECT for MIPS32.

Differential Revision: https://reviews.llvm.org/D67441

llvm-svn: 371730
2019-09-12 11:44:36 +00:00
Petar Avramovic 646e1f7b7f [MIPS GlobalISel] Lower G_DYN_STACKALLOC
IRTranslator creates G_DYN_STACKALLOC instruction during expansion of
alloca when argument that tells number of elements to allocate on stack
is a virtual register. Use default lowering for MIPS32.

Differential Revision: https://reviews.llvm.org/D67440

llvm-svn: 371728
2019-09-12 11:39:50 +00:00
Petar Avramovic 75e43a607c [MIPS GlobalISel] Select G_IMPLICIT_DEF
G_IMPLICIT_DEF is used for both integer and floating point implicit-def.
Handle G_IMPLICIT_DEF as ambiguous opcode in MipsRegisterBankInfo.
Select G_IMPLICIT_DEF for MIPS32.

Differential Revision: https://reviews.llvm.org/D67439

llvm-svn: 371727
2019-09-12 11:32:38 +00:00
Tim Northover f1c2892912 AArch64: support arm64_32, an ILP32 slice for watchOS.
This is the main CodeGen patch to support the arm64_32 watchOS ABI in LLVM.
FastISel is mostly disabled for now since it would generate incorrect code for
ILP32.

llvm-svn: 371722
2019-09-12 10:22:23 +00:00
Kai Luo cfaf2b6cfa [PowerPC][MCP][NFC] Pre-commit test cases for https://reviews.llvm.org/D65267
llvm-svn: 371717
2019-09-12 09:00:44 +00:00
Qiu Chaofan b7fb5d0f6f [DAGCombiner] Improve division estimation of floating points.
Current implementation of estimating divisions loses precision since it
estimates reciprocal first and does multiplication.  This patch is to re-order
arithmetic operations in the last iteration in DAGCombiner to improve the
accuracy.

Reviewed By: Sanjay Patel, Jinsong Ji

Differential Revision: https://reviews.llvm.org/D66050

llvm-svn: 371713
2019-09-12 07:51:24 +00:00
Craig Topper 635d383fad [X86] Enable -mprefer-vector-width=256 by default for Skylake-avx512 and later Intel CPUs.
AVX512 instructions can cause a frequency drop on these CPUs. This
can negate the performance gains from using wider vectors. Enabling
prefer-vector-width=256 will prevent generation of zmm registers
unless explicit 512 bit operations are used in the original source
code.

I believe gcc and icc both do something similar to this by default.

Differential Revision: https://reviews.llvm.org/D67259

llvm-svn: 371694
2019-09-11 23:54:36 +00:00
Amara Emerson 55d86f04c7 [AArch64][GlobalISel] Fall back on attempts to allocate split types on the stack.
First we were asserting that the ValNo of a VA was the wrong value. It doesn't actually
make a difference for us in CallLowering but fix that anyway to silence the assert.

The bigger issue was that after fixing the assert we were generating invalid MIR
because the merging/unmerging of values split across multiple registers wasn't
also implemented for memory locs. This happens when we run out of registers and
have to pass the split types like i128 -> i64 x 2 on the stack. This is do-able, but
for now just fall back.

llvm-svn: 371693
2019-09-11 23:53:23 +00:00
Jessica Paquette e297ad1bd9 [GlobalISel][AArch64] Check caller for swifterror params in tailcall eligibility
Before, we only checked the callee for swifterror. However, we should also be
checking the caller to see if it has a swifterror parameter.

Since we don't currently handle outgoing arguments, this didn't show up in the
swifterror.ll testcase.

Also, remove the swifterror checks from call-translator-tail-call.ll, since
they are covered by the existing swifterror testing. Better to have it all in
one place.

Differential Revision: https://reviews.llvm.org/D67465

llvm-svn: 371692
2019-09-11 23:44:16 +00:00
Reid Kleckner ff45955fc8 [X86] Fix latent bugs in 32-bit CMPXCHG8B inserter
I found three issues:
1. the loop over E[ABCD]X copies run over BB start
2. the direct address of cmpxchg8b could be a frame index
3. the displacement of cmpxchg8b could be a global instead of an
   immediate

These were all introduced together in r287875, and should be fixed with
this change.

Issue reported by Zachary Turner.

llvm-svn: 371678
2019-09-11 21:56:17 +00:00
Craig Topper 5278b0a04e [X86] Add test case for v16i64->v16i32 truncate on min-legal-vector-width=256.
I think this case would crash before I added back the -x86-experimental-vector-widening command line option. Adding this test case to prevent breaking it again when we remove the option.

llvm-svn: 371673
2019-09-11 21:30:42 +00:00
Craig Topper 08474ca091 [X86] Move x86_64 fp128 conversion to libcalls from type legalization to DAG legalization
fp128 is considered a legal type for a register, but has almost no legal operations so everything needs to be converted to a libcall. Previously this was implemented by tricking type legalization into softening the operations with various checks for "is legal in hardware register" to change the behavior to still use f128 as the resulting type instead of converting to i128.

This patch abandons this approach and instead moves the libcall conversions to LegalizeDAG. This is the approach taken by AArch64 where they also have a legal fp128 type, but no legal operations. I think this is more in spirit with how SelectionDAG's phases are supposed to work.

I had to make some hacks for STRICT_FP_ROUND because some of the strict FP handling checks if ISD::FP_ROUND is Legal for a given result type, but I had to make ISD::FP_ROUND Custom to allow making a libcall when the input is f128. For all other types the Custom handler just returns the original node. These hacks are incomplete and don't work for a strict truncate from f128, but I don't think it worked before either since LegalizeFloatTypes doesn't know about strict ops yet. I've also raised PR43209 against AArch64 which currently crashes on a strict ftrunc from f64->f32 because of FP_ROUND being marked Custom for the same reason there.

Differential Revision: https://reviews.llvm.org/D67128

llvm-svn: 371672
2019-09-11 21:30:09 +00:00
Austin Kerbow 666af6714c AMDGPU: Move m0 initializations earlier
Summary:
After hoisting and merging m0 initializations schedule them as early as
possible in the MBB. This helps the scheduler avoid hazards in some
cases.

Reviewers: rampitec, arsenm

Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, arphaman, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D67450

llvm-svn: 371671
2019-09-11 21:28:41 +00:00
Michael Liao 7957d4c015 [AMDGPU] Fix crash in phi-elimination hook.
Summary: - Pre-check in case there's just a single PHI insn.

Reviewers: alex-t, rampitec, arsenm

Subscribers: kzhuravl, jvesely, wdng, nhaehnle, dstuttard, tpr, t-tye, hiraditya, llvm-commits, yaxunl

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D67451

llvm-svn: 371649
2019-09-11 19:55:20 +00:00
Matt Arsenault 81196a595c LiveIntervals: Split live intervals on multiple dead defs
If there are multiple dead defs of the same virtual register, these
are required to be split into multiple virtual registers with separate
live intervals to avoid a verifier error.

llvm-svn: 371640
2019-09-11 17:59:21 +00:00
Guillaume Chatelet 48904e9452 [Alignment] Use llvm::Align in MachineFunction and TargetLowering - fixes mir parsing
Summary:
This catches malformed mir files which specify alignment as log2 instead of pow2.
See https://reviews.llvm.org/D65945 for reference,

This is patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790

Reviewers: courbet

Subscribers: MatzeB, qcolombet, dschuff, arsenm, sdardis, nemanjai, jvesely, nhaehnle, hiraditya, kbarton, asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, PkmX, jocewei, jsji, Petar.Avramovic, asbirlea, s.egerton, pzheng, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D67433

llvm-svn: 371608
2019-09-11 11:16:48 +00:00
Simon Atanasyan d811d9115b [mips][msa] Fix infinite loop for mips.nori.b intrinsic
When value of immediate in `mips.nori.b` is 255 (which has all ones in
binary form as 8bit integer) DAGCombiner and Legalizer would fall in an
infinite loop. DAGCombiner would try to simplify `or %value, -1` by
turning `%value` into UNDEF. Legalizer will turn it back into `Constant<0>`
which would then be again turned into UNDEF by DAGCombiner. To avoid this
loop we make UNDEF legal for MSA int types on Mips.

Patch by Mirko Brkusanin.

Differential Revision: https://reviews.llvm.org/D67280

llvm-svn: 371607
2019-09-11 11:16:06 +00:00
Sam Parker 17ea9b463c [NFC][ARM] Add and modify tests
Add test for ParallelDSP.

llvm-svn: 371594
2019-09-11 08:17:48 +00:00
Jessica Paquette 469d42fcf6 [GlobalISel] When a tail call is emitted in a block, stop translating it
This fixes a crash in tail call translation caused by assume and lifetime_end
intrinsics.

It's possible to have instructions other than a return after a tail call which
will still have `Analysis::isInTailCallPosition` return true. (Namely,
lifetime_end and assume intrinsics.)

If we emit a tail call, we should stop translating instructions in the block.
Otherwise, we can end up emitting an extra return, or dead instructions in
general. This makes the verifier unhappy, and is generally unfortunate for
codegen.

This also removes the code from AArch64CallLowering that checks if we have a
tail call when lowering a return. This is covered by the new code now.

Also update call-translator-tail-call.ll to show that we now properly tail call
in the presence of lifetime_end and assume.

Differential Revision: https://reviews.llvm.org/D67415

llvm-svn: 371572
2019-09-10 23:34:45 +00:00
Jessica Paquette 2af5b193d5 [AArch64][GlobalISel] Support sibling calls with mismatched calling conventions
Add support for sibcalling calls whose calling convention differs from the
caller's.

- Port over `CCState::resultsCombatible` from CallingConvLower.cpp into
  CallLowering. This is used to verify that the way the caller and callee CC
  handle incoming arguments matches up.

- Add `CallLowering::analyzeCallResult`. This is basically a port of
  `CCState::AnalyzeCallResult`, but using `ArgInfo` rather than `ISD::InputArg`.

- Add `AArch64CallLowering::doCallerAndCalleePassArgsTheSameWay`. This checks
  that the calling conventions are compatible, and that the caller and callee
  preserve the same registers.

For testing:

- Update call-translator-tail-call.ll to show that we can now handle this.

- Add a GISel line to tailcall-ccmismatch.ll to show that we will not tail call
  when the regmasks don't line up.

Differential Revision: https://reviews.llvm.org/D67361

llvm-svn: 371570
2019-09-10 23:25:12 +00:00
Sanjay Patel 4d2b4077e7 [x86] add test for false dependency with AVX; NFC
Goes with D67363

llvm-svn: 371551
2019-09-10 20:04:10 +00:00
Matt Arsenault 4a23ae5e78 GlobalISel/TableGen: Handle REG_SEQUENCE patterns
The scalar f64 patterns don't work yet because they fail on multiple
results from the unused implicit def of scc in the result bit
operation.

llvm-svn: 371542
2019-09-10 17:57:33 +00:00
Guozhi Wei b329e0728b [BPI] Adjust the probability for floating point unordered comparison
Since NaN is very rare in normal programs, so the probability for floating point unordered comparison should be extremely small. Current probability is 3/8, it is too large, this patch changes it to a tiny number.

Differential Revision: https://reviews.llvm.org/D65303

llvm-svn: 371541
2019-09-10 17:25:11 +00:00
Matt Arsenault e1895aba3d AMDGPU/GlobalISel: Select G_FABS/G_FNEG
f64 doesn't work yet because tablegen currently doesn't handlde
REG_SEQUENCE.

This does regress some multi use VALU fneg cases since now the
immediate remains in an SGPR, and more moves are used for legalizing
the xor. This is a SIFixSGPRCopies deficiency.

llvm-svn: 371540
2019-09-10 17:19:46 +00:00
Matt Arsenault 7df5b3fd26 AMDGPU/GlobalISel: Select cvt pk intrinsics
llvm-svn: 371539
2019-09-10 17:17:05 +00:00
Matt Arsenault 37d1bda4f6 AMDGPU/GlobalISel: Select llvm.amdgcn.sffbh
llvm-svn: 371538
2019-09-10 17:16:59 +00:00
Matt Arsenault da027275c6 AMDGPU/GlobalISel: RegBankSelect for G_ZEXTLOAD/G_SEXTLOAD
llvm-svn: 371536
2019-09-10 16:42:37 +00:00
Matt Arsenault ad6a8b83cd AMDGPU/GlobalISel: Legalize constant 32-bit loads
Legalize by casting to a 64-bit constant address. This isn't how the
DAG implements it, but it should.

llvm-svn: 371535
2019-09-10 16:42:31 +00:00
Sam Elliott d57de491be [RISCV] Support llvm-objdump -M no-aliases and -M numeric
Summary:
Now that llvm-objdump allows target-specific options, we match the
`no-aliases` and `numeric` options for RISC-V, as supported by GNU objdump.

This is done by overriding the variables used for the command-line options, so
that the command-line options are still supported.

This patch updates all tests using `llvm-objdump -riscv-no-aliases` to use
`llvm-objdump -M no-aliases`.

Reviewers: luismarques, asb

Reviewed By: luismarques, asb

Subscribers: pzheng, hiraditya, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, kito-cheng, shiva0217, jrtc27, MaskRay, zzheng, edward-jones, rogfer01, MartinMosbeck, brucehoult, the_o, rkruppe, PkmX, jocewei, psnobl, benna, Jim, s.egerton, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D66139

llvm-svn: 371534
2019-09-10 16:24:03 +00:00
Matt Arsenault c0ceca5883 AMDGPU/GlobalISel: First pass at attempting to legalize load/stores
There's still a lot more to do, but this handles decomposing due to
alignment. I've gotten it to the point where nothing crashes or
infinite loops the legalizer.

llvm-svn: 371533
2019-09-10 16:20:14 +00:00
Sanjay Patel 8812157b11 [x86] add a test for BreakFalseDeps; NFC
As discussed in D67363

llvm-svn: 371528
2019-09-10 15:42:22 +00:00
Sanjay Patel d2434e65fa [ARM] add test for BreakFalseDeps with minsize attribute; NFC
llvm-svn: 371526
2019-09-10 14:29:02 +00:00
Simon Pilgrim 937ca68157 [X86] Add AVX partial dependency tests as noted on D67363
llvm-svn: 371525
2019-09-10 14:28:29 +00:00
Sanjay Patel 3b0b3def86 [ARM] auto-generate complete test checks; NFC
llvm-svn: 371524
2019-09-10 14:26:25 +00:00
Alexander Timofeev c2d292f839 [AMDGPU]: PHI Elimination hooks added for custom COPY insertion.
Reviewers: rampitec, vpykhtin

  Differential Revision: https://reviews.llvm.org/D67101

llvm-svn: 371508
2019-09-10 10:58:57 +00:00
Dmitri Gribenko 2bf8d77453 Revert "Reland "r364412 [ExpandMemCmp][MergeICmps] Move passes out of CodeGen into opt pipeline.""
This reverts commit r371502, it broke tests
(clang/test/CodeGenCXX/auto-var-init.cpp).

llvm-svn: 371507
2019-09-10 10:39:09 +00:00
Clement Courbet 612c260ec3 Reland "r364412 [ExpandMemCmp][MergeICmps] Move passes out of CodeGen into opt pipeline."
With a fix for sanitizer breakage (see explanation in D60318).

llvm-svn: 371502
2019-09-10 09:18:00 +00:00
Craig Topper e8b432fa0e [LegalizeTypes] Teach SoftenFloatOp_SELECT_CC to handle operand 2 or 3 being softened.
This can only happen on X86 when fp128 is a legal type, but we
go through softening to generate libcalls. This causes fp128 to
be softened to fp128 instead of an integer type. This can be
removed if D67128 lands.

llvm-svn: 371493
2019-09-10 07:56:02 +00:00
Craig Topper 0e533ca4bb [X86] Add broadcast load unfolding support for VCMPPS/PD.
llvm-svn: 371487
2019-09-10 05:49:53 +00:00
Craig Topper 7c2fdf2779 [X86] Add broadcast load unfold tests for VCMPPS/PD.
llvm-svn: 371486
2019-09-10 05:49:48 +00:00
Kai Luo 73da43aeb3 [PowerPC][NFC] Update test assertions using update_llc_test_checks.py
Summary:
This patch is made due to https://reviews.llvm.org/rL371289 where typo
fixes failed.

Differential Revision: https://reviews.llvm.org/D67317

llvm-svn: 371483
2019-09-10 02:28:24 +00:00
Matt Arsenault a91f017ae3 AMDGPU/GlobalISel: Fix insert point when lowering fminnum/fmaxnum
llvm-svn: 371471
2019-09-09 23:30:11 +00:00
Reid Kleckner bf02399a85 [Windows] Replace TrapUnreachable with an int3 insertion pass
This is an alternative to D66980, which was reverted. Instead of
inserting a pseudo instruction that optionally expands to nothing, add a
pass that inserts int3 when appropriate after basic block layout.

Reviewers: hans

Differential Revision: https://reviews.llvm.org/D67201

llvm-svn: 371466
2019-09-09 23:04:25 +00:00
Philip Reames 48453bb8ed [Tests] Add anyextend tests for unordered atomics
Motivated by work on changing our representation of unordered atomics in SelectionDAG, but as an aside, all our lowerings for O3 are terrible.  Even the ones which ignore the atomicity.  

llvm-svn: 371449
2019-09-09 20:26:52 +00:00
Douglas Yung 4bd6eb8ff2 Relax opcode checks in test to check for only a number instead of a specific number.
llvm-svn: 371447
2019-09-09 20:12:29 +00:00
Philip Reames 20aafa3156 Introduce infrastructure for an incremental port of SelectionDAG atomic load/store handling
This is the first patch in a large sequence. The eventual goal is to have unordered atomic loads and stores - and possibly ordered atomics as well - handled through the normal ISEL codepaths for loads and stores. Today, there handled w/instances of AtomicSDNodes. The result of which is that all transforms need to be duplicated to work for unordered atomics. The benefit of the current design is that it's harder to introduce a silent miscompile by adding an transform which forgets about atomicity.  See the thread on llvm-dev titled "FYI: proposed changes to atomic load/store in SelectionDAG" for further context.

Note that this patch is NFC unless the experimental flag is set.

The basic strategy I plan on taking is:

    introduce infrastructure and a flag for testing (this patch)
    Audit uses of isVolatile, and apply isAtomic conservatively*
    piecemeal conservative* update generic code and x86 backedge code in individual reviews w/tests for cases which didn't check volatile, but can be found with inspection
    flip the flag at the end (with minimal diffs)
    Work through todo list identified in (2) and (3) exposing performance ops

(*) The "conservative" bit here is aimed at minimizing the number of diffs involved in (4). Ideally, there'd be none. In practice, getting it down to something reviewable by a human is the actual goal. Note that there are (currently) no paths which produce LoadSDNode or StoreSDNode with atomic MMOs, so we don't need to worry about preserving any behaviour there.

We've taken a very similar strategy twice before with success - once at IR level, and once at the MI level (post ISEL). 

Differential Revision: https://reviews.llvm.org/D66309

llvm-svn: 371441
2019-09-09 19:23:22 +00:00
Matt Arsenault a0933e6df7 AMDGPU/GlobalISel: Legalize G_BUILD_VECTOR v2s16
Handle it the same way as G_BUILD_VECTOR_TRUNC. Arguably only
G_BUILD_VECTOR_TRUNC should be legal for this, but G_BUILD_VECTOR will
probably be more convenient in most cases.

llvm-svn: 371440
2019-09-09 18:57:51 +00:00
Matt Arsenault 8bc05d7d60 AMDGPU: Make VReg_1 size be 1
This was getting chosen as the preferred 32-bit register class based
on how TableGen selects subregister classes.

llvm-svn: 371438
2019-09-09 18:43:29 +00:00
Matt Arsenault 77e3e9cafd AMDGPU/GlobalISel: Select llvm.amdgcn.class
Also fixes missing SubtargetPredicate on f16 class instructions.

llvm-svn: 371436
2019-09-09 18:29:45 +00:00
Matt Arsenault d6c1f5bb15 AMDGPU/GlobalISel: Select fmed3
llvm-svn: 371435
2019-09-09 18:29:37 +00:00
Eli Friedman 79f0d3a6e5 [IfConversion] Correctly handle cases where analyzeBranch fails.
If analyzeBranch fails, on some targets, the out parameters point to
some blocks in the function. But we can't use that information, so make
sure to clear it out.  (In some places in IfConversion, we assume that
any block with a TrueBB is analyzable.)

The change to the testcase makes it trigger a bug on builds without this
fix: IfConvertDiamond tries to perform a followup "merge" operation,
which isn't legal, and we somehow end up with a branch to a deleted MBB.
I'm not sure how this doesn't crash the compiler.

Differential Revision: https://reviews.llvm.org/D67306

llvm-svn: 371434
2019-09-09 18:29:27 +00:00
Sanjay Patel c195bde3d4 [x86] add test for false dependency with minsize (PR43239); NFC
llvm-svn: 371433
2019-09-09 18:14:10 +00:00
Matt Arsenault 6ebf605851 AMDGPU: Use PatFrags to allow selecting custom nodes or intrinsics
This enables GlobalISel to handle various intrinsics. The custom node
pattern will be ignored, and the intrinsic will work. This will also
allow SelectionDAG to directly select the intrinsics, but as they are
all custom lowered to the nodes, this ends up leaving dead code in the
table.

Eventually either GlobalISel should add the equivalent of custom nodes
equivalent, or intrinsics should be directly used. These each have
different tradeoffs.

There are a few more to handle, but these are easy to handle
ones. Some others fail for other reasons.

llvm-svn: 371432
2019-09-09 18:10:31 +00:00
Craig Topper ce2cb0f09e [X86] Allow _MM_FROUND_CUR_DIRECTION and _MM_FROUND_NO_EXC to be used together on instructions that only support SAE and not embedded rounding.
Current for SAE instructions we only allow _MM_FROUND_CUR_DIRECTION(bit 2) or _MM_FROUND_NO_EXC(bit 3) to be used as the immediate passed to the inrinsics. But these instructions don't perform rounding so _MM_FROUND_CUR_DIRECTION is just sort of a default placeholder when you don't want to suppress exceptions. Using _MM_FROUND_NO_EXC by itself is really bit equivalent to (_MM_FROUND_NO_EXC | _MM_FROUND_TO_NEAREST_INT) since _MM_FROUND_TO_NEAREST_INT is 0. Since we aren't rounding on these instructions we should also accept (_MM_FROUND_CUR_DIRECTION | _MM_FROUND_NO_EXC) as equivalent to (_MM_FROUND_NO_EXC). icc allows this, but gcc does not.

Differential Revision: https://reviews.llvm.org/D67289

llvm-svn: 371430
2019-09-09 17:48:05 +00:00
Matt Arsenault d2a9516a6d AMDGPU: Move MnemonicAlias out of instruction def hierarchy
Unfortunately MnemonicAlias defines a "Predicates" field just like an
instruction or pattern, with a somewhat different interpretation.

This ends up overriding the intended Predicates set by
PredicateControl on the pseudoinstruction defintions with an empty
list. This allowed incorrectly selecting instructions that should have
been rejected due to the SubtargetPredicate from patterns on the
instruction definition.

This does remove the divergent predicate from the 64-bit shift
patterns, which were already not used for the 32-bit shift, so I'm not
sure what the point was. This also removes a second, redundant copy of
the 64-bit divergent patterns.

llvm-svn: 371427
2019-09-09 17:25:35 +00:00
Jessica Paquette bfb00e3d53 [GlobalISel][AArch64] Handle tail calls with non-void return types
Just return once you emit the call, which is exactly what SelectionDAG does in
this situation.

Update call-translator-tail-call.ll.

Also update dllimport.ll to show that we tail call here in GISel again. Add
-verify-machineinstrs to the GISel line too, to defend against verifier
failures.

Differential revision: https://reviews.llvm.org/D67282

llvm-svn: 371425
2019-09-09 17:15:56 +00:00
Matt Arsenault 64ecca90d4 AMDGPU/GlobalISel: Implement LDS G_GLOBAL_VALUE
Handle the simple case that lowers to a constant.

llvm-svn: 371424
2019-09-09 17:13:44 +00:00
Matt Arsenault 182f9248e8 AMDGPU/GlobalISel: Legalize G_BUILD_VECTOR_TRUNC
Treat this as legal on gfx9 since it can use S_PACK_* instructions for
this.

This isn't used by anything yet. The same will probably apply to
16-bit G_BUILD_VECTOR without the trunc.

llvm-svn: 371423
2019-09-09 17:04:18 +00:00
Dmitri Gribenko d9c4060bd5 Revert "[MachineCopyPropagation] Remove redundant copies after TailDup via machine-cp"
This reverts commit 371359. I'm suspecting a miscompile, I posted a
reproducer to https://reviews.llvm.org/D65267.

llvm-svn: 371421
2019-09-09 16:46:45 +00:00
David Green 2b7089949e [ARM] Fix loads and stores for predicate vectors
These predicate vectors can usually be loaded and stored with a single
instruction, a VSTR_P0. However this instruction will store the entire P0
predicate, 16 bits, zeroextended to 32bits. Each lane of the the
v4i1/v8i1/v16i1 representing 4/2/1 bits.

As far as I understand, when llvm says "store this v4i1", it really does need
to store 4 bits (or 8, that being the size of a byte, with this bottom 4 as the
interesting bits). For example a bitcast from a v8i1 to a i8 is defined as a
store followed by a load, which is how the code is expanded.

So this instead lowers the v4i1/v8i1 load/store through some shuffles to get
the bits into the correct positions. This, as you might imagine, is not as
efficient as a single instruction. But I believe it is needed for correctness.
v16i1 equally should not load/store 32bits, only storing the 16bits of data.
Stack loads/stores are still using the VSTR_P0 (as can be seen by the test not
changing). This is fine as they are self-consistent, it is only "externally
observable loads/stores" (from our point of view) that need to be corrected.

Differential revision: https://reviews.llvm.org/D67085

llvm-svn: 371419
2019-09-09 16:35:49 +00:00
Matt Arsenault 63e6d8db1c AMDGPU/GlobalISel: Select atomic loads
A new check for an explicitly atomic MMO is needed to avoid
incorrectly matching pattern for non-atomic loads

llvm-svn: 371418
2019-09-09 16:18:07 +00:00
Matt Arsenault d8409b178e AMDGPU/GlobalISel: Fix RegBankSelect for unaligned, uniform constant loads
llvm-svn: 371416
2019-09-09 16:06:37 +00:00
Matt Arsenault 02eb308387 AMDGPU/GlobalISel: Fix regbankselect for uniform extloads
There are no scalar extloads.

llvm-svn: 371414
2019-09-09 16:03:45 +00:00
Matt Arsenault ebbd6e4976 AMDGPU: Remove code address space predicates
Fixes 8-byte, 8-byte aligned LDS loads. 16-byte case still broken due
to not be reported as legal.

llvm-svn: 371413
2019-09-09 16:02:07 +00:00
Matt Arsenault c34b4036ff AMDGPU/GlobalISel: Select G_PTR_MASK
llvm-svn: 371412
2019-09-09 15:46:13 +00:00
Matt Arsenault fdb7030117 AMDGPU/GlobalISel: Fix reg bank for uniform LDS loads
The pointer is always a VGPR. Also fix hardcoding the pointer size to
64.

llvm-svn: 371411
2019-09-09 15:44:16 +00:00
Matt Arsenault 2dd088ec7d AMDGPU/GlobalISel: Use known bits for selection
llvm-svn: 371409
2019-09-09 15:39:32 +00:00
Matt Arsenault 8e3bc9b572 AMDGPU/GlobalISel: Legalize wavefrontsize intrinsic
llvm-svn: 371407
2019-09-09 15:20:49 +00:00
James Molloy b6c7fce67a [DFAPacketizer] Reapply: Track resources for packetized instructions
Reapply with fix to reduce resources required by the compiler - use
unsigned[2] instead of std::pair. This causes clang and gcc to compile
the generated file multiple times faster, and hopefully will reduce
the resource requirements on Visual Studio also. This fix is a little
ugly but it's clearly the same issue the previous author of
DFAPacketizer faced (the previous tables use unsigned[2] rather uglily
too).

This patch allows the DFAPacketizer to be queried after a packet is formed to work out which
resources were allocated to the packetized instructions.

This is particularly important for targets that do their own bundle packing - it's not
sufficient to know simply that instructions can share a packet; which slots are used is
also required for encoding.

This extends the emitter to emit a side-table containing resource usage diffs for each
state transition. The packetizer maintains a set of all possible resource states in its
current state. After packetization is complete, all remaining resource states are
possible packetization strategies.

The sidetable is only ~500K for Hexagon, but the extra tracking is disabled by default
(most uses of the packetizer like MachinePipeliner don't care and don't need the extra
maintained state).

Differential Revision: https://reviews.llvm.org/D66936

llvm-svn: 371399
2019-09-09 13:17:55 +00:00
Sam Parker 1ad508e8e2 [ARM][MVE] VCTP instruction selection
Add codegen support for vctp{8,16,32}.

Differential Revision: https://reviews.llvm.org/D67344

llvm-svn: 371395
2019-09-09 12:54:47 +00:00
Simon Pilgrim 462e3d8050 Revert rL371198 from llvm/trunk: [DFAPacketizer] Track resources for packetized instructions
This patch allows the DFAPacketizer to be queried after a packet is formed to work out which
resources were allocated to the packetized instructions.

This is particularly important for targets that do their own bundle packing - it's not
sufficient to know simply that instructions can share a packet; which slots are used is
also required for encoding.

This extends the emitter to emit a side-table containing resource usage diffs for each
state transition. The packetizer maintains a set of all possible resource states in its
current state. After packetization is complete, all remaining resource states are
possible packetization strategies.

The sidetable is only ~500K for Hexagon, but the extra tracking is disabled by default
(most uses of the packetizer like MachinePipeliner don't care and don't need the extra
maintained state).

Differential Revision: https://reviews.llvm.org/D66936
........
Reverted as this is causing "compiler out of heap space" errors on MSVC 2017/19 NDEBUG builds

llvm-svn: 371393
2019-09-09 12:33:22 +00:00
Cullen Rhodes 55244beeee [AArch64][SVE] Implement abs and neg intrinsics
Summary:
This patch implements two arithmetic intrinsics:

      * int_aarch64_sve_abs
      * int_aarch64_sve_neg

testing the support for scalable vector types in intrinsics added in D65930.

Reviewed By: greened

Differential Revision: https://reviews.llvm.org/D65931

llvm-svn: 371388
2019-09-09 11:21:14 +00:00
Tim Northover 36147adc0b GlobalISel: add combiner to form indexed loads.
Loosely based on DAGCombiner version, but this part is slightly simpler in
GlobalIsel because all address calculation is performed by G_GEP. That makes
the inc/dec distinction moot so there's just pre/post to think about.

No targets can handle it yet so testing is via a special flag that overrides
target hooks.

llvm-svn: 371384
2019-09-09 10:04:23 +00:00
Sam Parker c363deb575 [ARM][ParallelDSP] Fix for sext input
The incoming accumulator value can be discovered through a sext, in
which case there will be a mismatch between the input and the result.
So sign extend the accumulator input if we're performing a 64-bit mac.

Differential Revision: https://reviews.llvm.org/D67220

llvm-svn: 371370
2019-09-09 08:39:14 +00:00
Craig Topper a88f58ff0e [X86] Add broadcast load unfolding support for vpcmpeq/vpcmpgt/vpcmp/vpcmpu.
llvm-svn: 371368
2019-09-09 07:46:11 +00:00