Commit Graph

32711 Commits

Author SHA1 Message Date
David Green 0ac4f6b627 [ARM] MVE vector reduce MLA tests. NFC. 2020-02-17 11:54:04 +00:00
Kerry McLaughlin 633db60f3e [AArch64][SVE] Add SVE index intrinsic
Summary:
Implements the @llvm.aarch64.sve.index intrinsic, which
takes a scalar base and step value.

This patch also adds the printSImm function to AArch64InstPrinter
to ensure that immediates of type i8 & i16 are printed correctly.

Reviewers: sdesmalen, andwar, efriedma, dancgr, cameron.mcinally, rengolin

Reviewed By: cameron.mcinally

Subscribers: tatyana-krasnukha, tschuett, kristof.beyls, hiraditya, rkruppe, arphaman, psnobl, cfe-commits, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D74550
2020-02-17 10:30:11 +00:00
Sjoerd Meijer a02056c960 [X86] New test to check rev16 patterns, prep step for D74032. NFC. 2020-02-17 09:13:21 +00:00
Kang Zhang f4e920720d [NFC][PowerPC] Update the test case scalar-equal.ll
Modify the command option to add --enable-no-nans-fp-math
2020-02-17 08:34:56 +00:00
QingShan Zhang 113df90388 [PowerPC] Add the missing InstrAliasing for 64-bit rotate instructions
We have the InstAlias rules for 32-bit rotate but missing the 64-bit one.
Rotate left immediate rotlwi ra,rs,n rlwinm ra,rs,n,0,31
Rotate left rotlw ra,rs,rb rlwnm ra,rs,rb,0,31

Differential Revision: https://reviews.llvm.org/D72676
2020-02-17 05:42:49 +00:00
Kang Zhang 1ae05a3c66 [NFC][PowerPC] Add a new test case scalar-equal.ll 2020-02-17 05:27:36 +00:00
Craig Topper dd0b18e1ec [X86] Disable load folding for X86ISD::ADD with 128 as an immediate.
It can be turned into a sub with -128 instead as long as the
carry flag isn't used.
2020-02-16 20:52:51 -08:00
Matt Arsenault 295bbea3ed AMDGPU/GlobalISel: Fix non-power-of-2 G_SITOFP/G_UITOFP
This wouldn't work for s33-s63 sources.
2020-02-16 22:48:57 -05:00
Matt Arsenault 24c156194b AMDGPU/GlobalISel: Add some missing tests for non-power-of-2 cases 2020-02-16 22:48:42 -05:00
Zheng Chen 04377a81ae [Powerpc] set instruction count as lsr first priority of lsr.
On Powerpc, set instruction count as lsr first priority of lsr by default.
Add an option ppc-lsr-no-insns-cost to return back to default lsr cost model.

Reviewed By: steven.zhang, jsji

Differential Revision: https://reviews.llvm.org/D72683
2020-02-16 21:04:55 -05:00
Simon Pilgrim b85df2e185 [X86] combineX86ShuffleChain - add support for combining 512-bit shuffles to PALIGNR 2020-02-16 16:13:26 +00:00
Simon Pilgrim c9c1c2b335 [X86] combineX86ShuffleChain - add support for combining 512-bit shuffles to bit shifts 2020-02-16 16:13:25 +00:00
Sanjay Patel e48b536be6 [x86] form broadcast of scalar memop even with >1 use
The unseen logic diff occurs because MayFoldLoad() is defined like this:

static bool MayFoldLoad(SDValue Op) {
  return Op.hasOneUse() && ISD::isNormalLoad(Op.getNode());
}

The test diffs here all seem ok to me on screen/paper, but it's hard to know
if that will lead to universally better perf for all targets. For example,
if a target implements broadcast from mem as multiple uops, we would have to
weigh the potential reduction of instructions and register pressure vs.
possible increase in number of uops. I don't know if we can make a truly
informed decision on this at compile-time.

The motivating case that I'm looking at in PR42024:
https://bugs.llvm.org/show_bug.cgi?id=42024
...resembles the diff in extract-concat.ll, but we're not going to change the
larger example there without at least 1 other fix.

Differential Revision: https://reviews.llvm.org/D74088
2020-02-16 10:32:56 -05:00
Simon Pilgrim 5d22b6a87f [X86] Add test cases showing failure to simplify target shuffles to bit shifts 2020-02-15 23:34:31 +00:00
Simon Pilgrim c1186d50f9 [X86][AVX512] Split AVX512F and AVX512BW shuffle combining tests
Split off shuffle combine tests that use AVX512F intrinsics, so we can test it with/without AVX512BW support.
2020-02-15 22:48:52 +00:00
Fangrui Song 46788a21f9 [X86][AsmPrinter] PrintSymbolOperand: prefer to lower ELF MO_GlobalAddress to .Lfoo$local 2020-02-15 13:45:29 -08:00
Simon Pilgrim 34a054ce71 [X86] combineX86ShuffleChain - add support for combining to X86ISD::ROTLI
Refactors matchShuffleAsBitRotate to allow use by both lowerShuffleAsBitRotate and matchUnaryPermuteShuffle.
2020-02-15 20:04:54 +00:00
Simon Pilgrim 4abbaceea0 [X86] Add test showing failure to combine shuffle to bit rotation 2020-02-15 19:23:00 +00:00
Craig Topper 3f7649799b [X86] Move combineIncDecVector logic from Select to PreprocessISelDAG.
This allows it to work properly with masked inc/dec for avx512. Those
would have a vselect as the root node so didn't get a chance to call
combineIncDecVector.

This also simplifies the logic because we don't have to manage
the topological ordering.
2020-02-15 09:59:12 -08:00
David Green da147ef0a5 [AArch64] Fixup kill flags on BSL generation
This hopefully fixes up the expensive checks bot.
2020-02-15 11:44:23 +00:00
Fangrui Song 6b14814e10 [AsmPrinter] Omit unique ID for .stack_sizes
Follow-up for D74006.
2020-02-14 21:25:06 -08:00
Fangrui Song 895cad1a13 [AsmPrinter][XRay] Omit unique ID for xray_instr_map and xray_fn_idx
Follow-up for D74006.
2020-02-14 21:10:46 -08:00
Diogo Sampaio 8bc790f9e6 [AArch64][FPenv] Update chain of int to fp conversion
Summary:
When using strict fp, it is required to update the
chain when performing integer type promotion of a
operand to a integer to floating point conversion.

Reviewers: craig.topper, john.brawn

Reviewed By: craig.topper

Subscribers: kristof.beyls, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D74597
2020-02-15 05:07:34 +00:00
Fangrui Song f554e27224 [AsmPrinter] Omit unique ID for __patchable_function_entries sections
Follow-up for D74006.

When the integrated assembler is used, we use SHF_LINK_ORDER.  The
linked-to symbol is part of ELFSectionKey, thus we can omit the unique
ID.
2020-02-14 20:54:54 -08:00
Fangrui Song 0fbe221543 [MC][ELF] Make linked-to symbol name part of ELFSectionKey
https://bugs.llvm.org/show_bug.cgi?id=44775

This rule has been implemented by GNU as https://sourceware.org/ml/binutils/2020-02/msg00028.html (binutils >= 2.35)

It allows us to simplify

```
.section .foo,"o",foo,unique,0
.section .foo,"o",bar,unique,1  # different section
```

to

```
.section .foo,"o",foo
.section .foo,"o",bar  # different section
```

We consider the two `.foo` different even if the linked-to symbols foo and bar
are defined in the same section.  This is a deliberate choice so that we don't
need to know the section where foo and bar are defined beforehand.

Differential Revision: https://reviews.llvm.org/D74006
2020-02-14 20:03:04 -08:00
Matt Arsenault 8d8d46b57a AMDGPU/GlobalISel: Fix missing impdef of scc on boolean bit ops 2020-02-14 22:35:30 -05:00
Shiva Chen 1cae2f9d19 [RISCV] Correct the CallPreservedMask for the function call in an interrupt handler
CallPreservedMask is used to describe the register liveness after a
function call. The function call in an interrupt handler should use the same
CallPreservedMask as normal functions. So that only callee save registers
can live through the function call.
2020-02-15 09:14:04 +08:00
Matt Arsenault 65dbdc329f AMDGPU: Don't preserve analyses with div64 IR expansion
The dominator tree needs to be updated, but that isn't handled now.
2020-02-14 20:06:02 -05:00
Matt Arsenault dc3e499dd4 AMDGPU/GlobalISel: Fix G_EXTRACT of 96-bit results
This would assert on an unhandled size in getRegSplitParts.
2020-02-14 15:57:40 -08:00
Matt Arsenault 630b47e518 AMDGPU: Use generated checks for memcpy expansion 2020-02-14 15:57:40 -08:00
Matt Arsenault 60fea2713d AMDGPU/GlobalISel: Improve 16-bit bswap
Match the new DAG behavior and use v_perm_b32 when available. Also
does better on SI/CI by expanding 16-bit swaps. Also fix
non-power-of-2 cases.
2020-02-14 15:57:39 -08:00
Matt Arsenault 9ec668606b AMDGPU: Add option to disable CGP division expansion
The division expansions in AMDGPUCodeGenPrepare can't be relied on for
correctness, since they punt to later optimization and possibly
legalization in some cases. We still need a way to be able to write
tests for the legalizer versions of the expansion. This is mostly for
GlobalISel, since the expected optimzations is expecting aren't
implemented.

The interaction with the flag to expand 64-bit division in the IR is
pretty confusing, but these flags have different purposes.
2020-02-14 11:37:07 -08:00
Sanjay Patel 63ed0eceaf [x86] remove stray test assertions; NFC
I updated the prefix and forgot to manually remove the old names
as part of rG6071fc57a45.f
2020-02-14 14:28:50 -05:00
Sanjay Patel 6071fc57a4 [x86] regenerate complete test checks for sqrt{est}; NFC
The existing checks were trying to test both CPU-specific
codegen and generic codegen with explicit attributes for
the various sqrt estimate possibilities, but that was hard
to decipher and update (D69989).

Instead generate the complete results for various CPUs,
and that makes it clear which models have slow/fast sqrt
attributes along with all of the other potential diffs
(FMA, AVX2, scheduling).

Also, explicitly add the function attributes corresponding
to whether DAZ/FTZ denorm settings are expected.
2020-02-14 14:21:28 -05:00
Matt Arsenault 34d9a16e54 AMDGPU: Add option to expand 64-bit integer division in IR
I didn't realize we were already expanding 24/32-bit division here
already. Use the available IntegerDivision utilities. This uses loops,
so produces significantly smaller code than the inline DAG expansion.

This now requires width reductions of 64-bit divisions before
introducing the expanded loops.

This helps work around missing legalization in GlobalISel for
division, which are the only remaining core instructions that didn't
work at all.

I think this is plausibly a better implementation than exists in the
DAG, although turning it on by default misses out on the constant
value optimizations and also needs benchmarking.
2020-02-14 11:16:08 -08:00
Craig Topper 391cc4dd41 [X86] Use ZERO_EXTEND instead of SIGN_EXTEND in the fast isel handling of convert_from_fp16. 2020-02-14 10:57:12 -08:00
Craig Topper fc0c72b2df [X86] Add AVX512 support to the fast isel code for Intrinsic::convert_from_fp16/convert_to_fp16. 2020-02-14 10:57:11 -08:00
Matt Arsenault bfbfa18591 GlobalISel: Lower s64->s16 G_FPTRUNC
This is more or less directly ported from the AMDGPU custom lowering
for FP_TO_FP16. I made a few minor fixups (using G_UNMERGE_VALUES
instead of creating shift/trunc to extract the two halves, and zexting
an inverted compare instead of select_cc).

This also does not include the fast math expansion the DAG which
converts to f32 and then to f16. I think that belongs in a
pre-legalize combine instead.
2020-02-14 10:46:58 -08:00
Volkan Keles 187686a22f [GlobalISel] LegalizationArtifactCombiner: Fix a bug in tryCombineMerges
Like COPY instructions explained in D70616, we don't check the constraints
when combining G_UNMERGE_VALUES. Use the same logic used in D70616 to check
if registers can be replaced, or a COPY instruction needs to be built.

https://reviews.llvm.org/D70564
2020-02-14 10:45:58 -08:00
Brian Cain bf3b86bc2f [Hexagon] v67+ HVX register pairs should support either direction
Assembler now permits pairs like 'v0:1', which are encoded
differently from the odd-first pairs like 'v1:0'.

The compiler will require more work to leverage these new register
pairs.
2020-02-14 12:43:43 -06:00
Matt Arsenault 8c2c0b3637 AMDGPU: Improve i16/v2i16 bswap 2020-02-14 09:53:22 -08:00
Matt Arsenault e0fd2d6d62 AMDGPU: Add baseline tests for 16-bit bswap 2020-02-14 09:34:13 -08:00
Matt Arsenault a257bde420 AMDGPU/GlobalISel: Handle G_BSWAP 2020-02-14 09:09:44 -08:00
Pavel Iliin b6a9fe2099 [AArch64] Add BIT/BIF support.
This patch added generation of SIMD bitwise insert BIT/BIF instructions.
In the absence of GCC-like functionality for optimal constraints satisfaction
during register allocation the bitwise insert and select patterns are matched
by pseudo bitwise select BSP instruction with not tied def.
It is expanded later after register allocation with def tied
to BSL/BIT/BIF depending on operands registers.
This allows to get rid of redundant moves.

Reviewers: t.p.northover, samparker, dmgreen

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D74147
2020-02-14 14:19:39 +00:00
Simon Pilgrim 2492075add [X86][SSE] lowerShuffleAsBitRotate - lower to vXi8 shuffles to ROTL on pre-SSSE3 targets
Without PSHUFB we are better using ROTL (expanding to OR(SHL,SRL)) than using the generic v16i8 shuffle lowering - but if we can widen to v8i16 or more then the existing shuffles are still the better option.

REAPPLIED: Original commit rG11c16e71598d was reverted at rGde1d90299b16 as it wasn't accounting for later lowering. This version emits ROTLI or the OR(VSHLI/VSRLI) directly to avoid the issue.
2020-02-14 11:55:18 +00:00
Kazushi (Jam) Marukawa 60431bd728 [VE] Support for PIC (global data and calls)
Summary: Support for PIC with tests for global variables and function calls.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D74536
2020-02-14 09:50:02 +01:00
Liu, Chen3 ec89335c47 [X86] Fix the bug that _mm_mask_cvtsepi64_epi32 generates result without
zero the upper 64bit.

Differential Revision : https://reviews.llvm.org/D74552
2020-02-14 09:26:06 +08:00
Thomas Lively 918e90559b [WebAssembly] Make stack pointer args inhibit tail calls
Summary:
Also make return calls terminator instructions so epilogues are
inserted before them rather than after them. Together, these changes
make WebAssembly's tail call optimization more stack-safe.

Reviewers: aheejin, dschuff

Subscribers: sbc100, jgravelle-google, hiraditya, sunfish, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D73943
2020-02-13 16:43:53 -08:00
Pavel Iliin b23ec43973 [AArch64][NFC] Update test checks.
This NFC commit updates several llc tests checks by automatically generated ones.
2020-02-14 00:13:15 +00:00
Craig Topper c2e8a421ac [X86] Don't widen 128/256-bit strict compares with vXi1 result to 512-bits on KNL.
If we widen the compare we might trigger a spurious exception from
the garbage data.

We have two choices here. Explicitly force the upper bits to zero.
Or use a legacy VEX vcmpps/pd instruction and convert the XMM/YMM
result to mask register.

I've chosen to go with the second option. I'm not sure which is
really best. In some cases we could get rid of the zeroing since
the producing instruction probably already zeroed it. But we lose
the ability to fold a load. So which is best is dependent on
surrounding code.

Differential Revision: https://reviews.llvm.org/D74522
2020-02-13 13:26:40 -08:00