Commit Graph

27157 Commits

Author SHA1 Message Date
Sanjay Patel 625d5aef62 [DAGCombiner] fold insert_subvector of insert_subvector
This pattern:

    t33: v8i32 = insert_subvector undef:v8i32, t35, Constant:i64<0>
  t21: v16i32 = insert_subvector undef:v16i32, t33, Constant:i64<0>

...shows up in PR33758:
https://bugs.llvm.org/show_bug.cgi?id=33758
...although this patch doesn't make any difference to the final result on that yet.

In the affected tests here, it looks like it just makes RA wiggle. But we might 
as well squash this to prevent it interfering with other pattern-matching.

Differential Revision:
https://reviews.llvm.org/D56604

llvm-svn: 351008
2019-01-12 15:12:28 +00:00
Nikita Popov 537b319860 [X86] Add more usub.sat vector tests; NFC
Add additional vXi32 and vXi64 tests.

llvm-svn: 351003
2019-01-12 11:43:04 +00:00
Simon Pilgrim a21e2bd682 [X86] Improve vXi64 ISD::ABS codegen with SSE41+
Make use of vblendvpd to select on the signbit

Differential Revision: https://reviews.llvm.org/D56544

llvm-svn: 350999
2019-01-12 10:28:12 +00:00
Simon Pilgrim ca0de0363b [X86][AARCH64] Improve ISD::ABS support
This patch takes some of the code from D49837 to allow us to enable ISD::ABS support for all SSE vector types.

Differential Revision: https://reviews.llvm.org/D56544

llvm-svn: 350998
2019-01-12 09:59:32 +00:00
Craig Topper 33b2cf50e3 [X86] Add ISD node for masked version of CVTPS2PH.
The 128-bit input produces 64-bits of output and fills the upper 64-bits with 0. The mask only applies to the lower elements. But we can't represent this with a vselect like we normally do.

This also avoids the need to have a special X86ISD::SELECT when avx512bw isn't enabled since vselect v8i16 isn't legal there.

Fixes another instruction for PR34877.

llvm-svn: 350994
2019-01-12 08:05:12 +00:00
Alex Bradbury 61aa940074 [RISCV] Introduce codegen patterns for RV64M-only instructions
As discussed on llvm-dev
<http://lists.llvm.org/pipermail/llvm-dev/2018-December/128497.html>, we have
to be careful when trying to select the *w RV64M instructions. i32 is not a
legal type for RV64 in the RISC-V backend, so operations have been promoted by
the time they reach instruction selection. Information about whether the
operation was originally a 32-bit operations has been lost, and it's easy to
write incorrect patterns.

Similarly to the variable 32-bit shifts, a DAG combine on ANY_EXTEND will
produce a SIGN_EXTEND if this is likely to result in sdiv/udiv/urem being
selected (and so save instructions to sext/zext the input operands).

Differential Revision: https://reviews.llvm.org/D53230

llvm-svn: 350993
2019-01-12 07:43:06 +00:00
Alex Bradbury d05eae7a7b [RISCV] Add patterns for RV64I SLLW/SRLW/SRAW instructions
This restores support for selecting the SLLW/SRLW/SRAW instructions, which was
removed in rL348067 as the previous patterns made some unsafe assumptions.
Also see the related llvm-dev discussion
<http://lists.llvm.org/pipermail/llvm-dev/2018-December/128497.html>

Ultimately I didn't introduce a custom SelectionDAG node, but instead added a
DAG combine that inserts an AssertZext i5 on the shift amount for an i32
variable-length shift and also added an ANY_EXTEND DAG-combine which will
instead produce a SIGN_EXTEND for an i32 variable-length shift, increasing the
opportunity to safely select SLLW/SRLW/SRAW.

There are obviously different ways of addressing this (a number discussed in
the llvm-dev thread), so I'd welcome further feedback and comments.

Note that there are now some cases in
test/CodeGen/RISCV/rv64i-exhaustive-w-insts.ll where sraw/srlw/sllw is
selected even though sra/srl/sll could be used without any extra instructions.
Given both are semantically equivalent, there doesn't seem a good reason to
prefer one vs the other. Given that would require more logic to still select
sra/srl/sll in those cases, I've left it preferring the *w variants.

Differential Revision: https://reviews.llvm.org/D56264

llvm-svn: 350992
2019-01-12 07:32:31 +00:00
Craig Topper bf61525e8c [X86] When lowering v1i1/v2i1/v4i1/v8i1 load/store with avx512f, but not avx512dq, use v16i1 as the intermediate mask type instead of v8i1.
We still use i8 for the load/store type. So we need to convert to/from i16 to around the mask type.

By doing this we get an i8->i16 extload which we can then pattern match to a KMOVW if the access is aligned.

llvm-svn: 350989
2019-01-12 02:22:10 +00:00
Craig Topper abe6ef8d09 [X86] Add ISD nodes for masked truncate so we can properly represent when the output has more elements than the input due to needing to be 128 bits.
We can't properly represent this with a vselect since the upper elements of the result are supposed to be zeroed regardless of the mask.

This also reuses the new nodes even when the result type fits in 128 bits if the input is q/d and the result is w/b since vselect w/b using k-register condition isn't legal without avx512bw. Currently we're doing this even when avx512bw is enabled, but I might change that.

This fixes some of PR34877

llvm-svn: 350985
2019-01-12 00:55:27 +00:00
Alex Bradbury eea0b07028 [RISCV][NFC] Add CHECK lines for atomic operations on RV64I
As or RV32I, we include these for completeness. Committing now to make it
easier to review the RV64A patch.

llvm-svn: 350962
2019-01-11 19:46:48 +00:00
Evandro Menezes 0674762112 [AArch64] Create feature set for Exynos M4
Complete the feature set for Exynos M4 and update test cases.

llvm-svn: 350953
2019-01-11 18:54:25 +00:00
Pirama Arumuga Nainar cc07dabdaa [Legalizer] Use correct ValueType of SELECT_CC node during Float promotion
Summary:
When legalizing the result of a SELECT_CC node by promoting the
floating-point type, use the promoted-to type rather than the original
type.

Fix PR40273.

Reviewers: efriedma, majnemer

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D56566

llvm-svn: 350951
2019-01-11 18:46:02 +00:00
Sanjay Patel 40cd4b77e9 [x86] allow insert/extract when matching horizontal ops
Previously, we limited this transform to cases where the
extraction into the build vector happens from vectors of
the same type as the build vector, but that's not required.

There's a slight potential regression seen in the AVX512
result for phadd -- we're using the 256-bit flavor of the
instruction now even though the 128-bit subset is sufficient.
The same problem could already be seen in the AVX2 result.
Follow-up patches will attempt to narrow that back down.

llvm-svn: 350928
2019-01-11 14:27:59 +00:00
Craig Topper b97885cc2e [X86] Change vXi1 extract_vector_elt lowering to be legal if the index is 0. Add DAG combine to turn scalar_to_vector+extract_vector_elt into extract_subvector.
We were lowering the last step extract_vector_elt to a bitcast+truncate. Change it to use an extract_vector_elt of index 0 instead. Add isel patterns to do the equivalent of what the bitcast would have done. Plus an isel pattern for an any_extend+extract to prevent some regressions.

Finally add a DAG combine to turn v1i1 scalar_to_vector+extract_vector_elt of 0 into an extract_subvector.

This fixes some of the regressions from D350800.

llvm-svn: 350918
2019-01-11 05:44:56 +00:00
Heejin Ahn e73c7a1ab2 [WebAssembly] Fix stack pointer store check in RegStackify
Summary:
We now use __stack_pointer global and global.get/global.set instruction.
This fixes the checking routine for stack_pointer writes accordingly.

This also fixes the existing __stack_pointer test in reg-stackify.ll:
That test used to pass not because of __stack_pointer clashes but
because the function `stackpointer_callee` was not marked as `readnone`,
so it was assumed to possibly write to memory arbitraily, and
`global.set` instruction was marked as `mayStore` in the .td definition,
so they were identified as intervening writes. After we added `readnone`
to its attribute, this test fails without this patch.

Reviewers: dschuff, sunfish

Subscribers: jgravelle-google, sbc100, llvm-commits

Differential Revision: https://reviews.llvm.org/D56094

llvm-svn: 350906
2019-01-10 23:12:07 +00:00
Anton Korobeynikov 29ffb6d558 [MSP430] Add missing instruction forms
* Add missing mm, [r|m]n, [r|m]p instruction forms.
* Fix bit16mc instruction.

Patch by Kristina Bessonova!

Differential Revision: https://reviews.llvm.org/D56546

llvm-svn: 350902
2019-01-10 22:54:53 +00:00
Thomas Lively 64a39a1c4e [WebAssembly] Add unimplemented-simd128 subtarget feature
Summary:
This is a third attempt, but this time we have vetted it on Windows
first. The previous errors were due to an uninitialized class member.

Reviewers: aheejin

Subscribers: dschuff, sbc100, jgravelle-google, sunfish, jfb, llvm-commits

Differential Revision: https://reviews.llvm.org/D56560

llvm-svn: 350901
2019-01-10 22:32:11 +00:00
Craig Topper 844f989608 [X86] Call SimplifyDemandedBits on conditions of X86ISD::SHRUNKBLEND
This extends to combineVSelectToShrunkBlend to be able to resimplify SHRUNKBLENDS that have already been created.

This should help some of the regressions from D56387

Differential Revision: https://reviews.llvm.org/D56421

llvm-svn: 350875
2019-01-10 19:05:34 +00:00
Neil Henning e85d45a699 [AMDGPU] Fix dwordx3/southern-islands failures.
This commit fixes the dwordx3/southern-islands failures that were found
in bugzilla https://bugs.llvm.org/show_bug.cgi?id=40129, by not
generating the dwordx3 variants of load/store instructions that were
added to the ISA after southern islands.

Differential Revision: https://reviews.llvm.org/D56434

llvm-svn: 350838
2019-01-10 16:21:08 +00:00
Sanjay Patel 87ae1460f7 [x86] fix remaining miscompile bug in horizontal binop matching (PR40243)
When we use the partial-matching function on a 128-bit chunk, we must 
account for the possibility that we've matched undef halves of the
original source vectors, so the outputs may need to be reset.

This should allow closing PR40243:
https://bugs.llvm.org/show_bug.cgi?id=40243

llvm-svn: 350830
2019-01-10 15:27:23 +00:00
Sanjay Patel ed5cfc6792 [x86] fix horizontal binop matching for 256-bit vectors (PR40243)
This is a partial fix for:
https://bugs.llvm.org/show_bug.cgi?id=40243
...as seen in the integer test, we still need to correct the result when using the 
existing (old) horizontal op matching function because it does not model the way 
x86 256-bit horizontal ops return results (each 128-bit half is its own horizontal-op). 
A potential follow-up change for that is discussed in the bug report - see also D56490.

This generally duplicates a lot of the existing matching code, but we can't just remove 
that without introducing regressions, so the existing code is renamed and used less often. 
Follow-ups may try to reduce that overlap.

Differential Revision: https://reviews.llvm.org/D56450

llvm-svn: 350826
2019-01-10 15:04:52 +00:00
Bryan Chan 7ce5775e62 [AArch64] Fix operation actions for FP16 vector intrinsics
Summary:
This patch changes the legalization action for some half-precision floating-
point vector intrinsics (FSIN, FLOG, etc.) from Promote to Expand. These ops
are not supported in hardware for half-precision vectors, but promotion is
not always possible (for v8f16 operands). Changing the action to Expand fixes
an assertion failure in the legalizer when the frontend produces such ops.
In addition, a quick microbenchmark shows that, in the v4f16 case,
expanding introduces fewer spills and is therefore slightly faster than
promoting.

Reviewers: t.p.northover, SjoerdMeijer

Reviewed By: SjoerdMeijer

Subscribers: javed.absar, kristof.beyls, llvm-commits

Differential Revision: https://reviews.llvm.org/D56296

llvm-svn: 350825
2019-01-10 15:02:37 +00:00
George Rimar 70d197d466 [llvm-objdump] - Implement -z/--disassemble-zeroes.
This is https://bugs.llvm.org/show_bug.cgi?id=37151,

GNU objdump spec says that "Normally the disassembly output will skip blocks of zeroes.",
but currently, llvm-objdump prints them.

The patch implements the -z/--disassemble-zeroes option and switches the default to always
skip blocks of zeroes.

Differential revision: https://reviews.llvm.org/D56083

llvm-svn: 350823
2019-01-10 14:55:26 +00:00
Simon Pilgrim 8c221b3f76 [X86] Add SSE41 vector abs tests
llvm-svn: 350822
2019-01-10 14:26:15 +00:00
Sam Parker 7208221452 [ARM] Size reduce teq to eors
Add t2TEQrr to the map of instructions with can be reduced down into
a T1 instruction. This is a special case because TEQ just sets the
CPSR and doesn't write to a GPR, which is not the case for EOR. So,
we need to ensure that the EOR can write to the first operand.

Differential Revision: https://reviews.llvm.org/D56255

llvm-svn: 350801
2019-01-10 08:36:33 +00:00
Craig Topper 5d20eb240f [X86] Disable DomainReassignment pass when AVX512BW is disabled to avoid injecting VK32/VK64 references into the MachineIR
Summary:
This pass replaces GR8/GR16/GR32/GR64 with their equivalent sized mask register classes. But VK32/VK64 aren't legal without AVX512BW. Apparently this mostly appears to work if the register coalescer is able to remove the VK32/VK64 register class reference. Or if we don't ever spill it. But there's no guarantee of that.

Another Intel employee managed to trigger a crash due to this with ISPC. Unfortunately, I've lost the test case he sent me at the time. I'm trying to get him to reproduce it for me. I'd like to get this in before 8.0 branches since its a little scary.

The regressions here are unfortunate, but I think we can make some improvements to DAG combine, load folding, etc. to fix them. Just not sure if we can get that done for 8.0.

Fixes PR39741

Reviewers: RKSimon, spatel

Reviewed By: RKSimon

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D56460

llvm-svn: 350800
2019-01-10 07:43:54 +00:00
Zi Xuan Wu 64c956eea8 Recommit "[PowerPC] Fix assert from machine verify pass that unmatched register class about fcmp selection in fast-isel"
This re-commit r350685.

Differential Revision: https://reviews.llvm.org/D55686

llvm-svn: 350799
2019-01-10 06:20:14 +00:00
Mandeep Singh Grang 859cb2e35d [AArch64] Emit the correct MCExpr relocations specifiers like VK_ABS_G0, etc
Summary:
D55896 and D56029 add support to emit fixups for :abs_g0: , :abs_g1_s: , etc.
This patch adds the necessary enums and MCExpr needed for lowering these.

Reviewers: rnk, mstorsjo, efriedma

Reviewed By: efriedma

Subscribers: javed.absar, kristof.beyls, llvm-commits

Differential Revision: https://reviews.llvm.org/D56037

llvm-svn: 350798
2019-01-10 04:59:44 +00:00
Thomas Lively fdd4999b86 Revert "[WebAssembly] Add simd128-unimplemented subtarget feature"
This reverts rL350791.

llvm-svn: 350795
2019-01-10 04:09:25 +00:00
Thomas Lively eb6f9abd41 [WebAssembly] Add simd128-unimplemented subtarget feature
This is a second attempt at r350778, which was reverted in
r350789. The only change is that the unimplemented-simd128 feature has
been renamed simd128-unimplemented, since naming it
unimplemented-simd128 somehow made the simd128 feature flag enable the
unimplemented-simd128 feature on Windows.

llvm-svn: 350791
2019-01-10 02:55:52 +00:00
Thomas Lively fdca5fab60 Revert "[WebAssembly] Add unimplemented-simd128 subtarget feature"
This reverts L350778.

llvm-svn: 350789
2019-01-10 01:37:44 +00:00
Thomas Lively 2eeade1814 [WebAssembly] Add unimplemented-simd128 subtarget feature
Summary:
This replaces the old ad-hoc -wasm-enable-unimplemented-simd
flag. Also makes the new unimplemented-simd128 feature imply the
simd128 feature.

Reviewers: aheejin, dschuff

Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits, alexcrichton

Differential Revision: https://reviews.llvm.org/D56501

llvm-svn: 350778
2019-01-09 23:59:37 +00:00
Florian Hahn 7c122c1255 [AArch64] Add test for constant shrinking with multiple users (NFC).
Test to avoid regression fixed by rL350684.

llvm-svn: 350762
2019-01-09 21:04:36 +00:00
Francis Visoiu Mistrih ac6454a7f6 [CodeGen] Ignore return sext/zext attributes of unused results for tail calls
If the caller's return type does not have a zeroext attribute but the
callee does a tail call zeroext, we won't consider the tail call during
CodeGenPrepare because the attributes don't match.

However, if the result of the tail call has no uses, it makes sense to
drop the sext/zext attributes.

Differential Revision: https://reviews.llvm.org/D56486

llvm-svn: 350753
2019-01-09 19:46:15 +00:00
Thomas Lively edb54b22d3 [WebAssembly] Standardize order of SIMD bitselect arguments
Summary:
For some reason the backend assumed that the condition mask would be
the first argument to the LLVM intrinsic, but everywhere else the
condition mask is the third argument.

Reviewers: aheejin

Subscribers: dschuff, sbc100, jgravelle-google, sunfish, llvm-commits

Differential Revision: https://reviews.llvm.org/D56412

llvm-svn: 350746
2019-01-09 18:13:11 +00:00
Sanjay Patel 15f2a4d1b9 [x86] use 'nounwind' to remove test noise; NFC
llvm-svn: 350745
2019-01-09 17:29:18 +00:00
Valery Pykhtin b7a459547d Revert "[AMDGPU] Fix DPP combiner"
This reverts commit e3e2923a39cbec3b3bc3a7d3f0e9a77a4115080e, svn revision rL350721

llvm-svn: 350730
2019-01-09 15:21:53 +00:00
Kristof Beyls c650ff77eb Initial AArch64 SLH implementation.
This is an initial implementation for Speculative Load Hardening for
AArch64. It builds on top of the recently introduced
AArch64SpeculationHardening pass.
This doesn't implement (yet) some of the optimizations implemented for
the X86SpeculativeLoadHardening pass. I thought introducing the
optimizations incrementally in follow-up patches should make this easier
to review.

Differential Revision: https://reviews.llvm.org/D55929

llvm-svn: 350729
2019-01-09 15:13:34 +00:00
Nico Weber b61910bbe8 Fix typo in comment
llvm-svn: 350725
2019-01-09 14:20:20 +00:00
Simon Pilgrim fcabfc8666 [X86][SSE] Cleanup shuffle combining test check prefixes
Share prefixes whenever possible, use X86 instead of X32.

llvm-svn: 350722
2019-01-09 13:46:14 +00:00
Valery Pykhtin 1e0b5c719b [AMDGPU] Fix DPP combiner
Fixed issue with identity values and other cases, f32/f16 identity values to be added later. fma/mac instructions is disabled for now.
Test is fully reworked, added comments. Other fixes:

1. dpp move with uses and old reg initializer should be in the same BB.
2. bound_ctrl:0 is only considered when bank_mask and row_mask are fully enabled (0xF). Othervise the old register value is checked for identity.
3. Added add, subrev, and, or instructions to the old folding function.
4. Kill flag is cleared for the src0 (DPP register) as it may be copied into more than one user.

Differential revision: https://reviews.llvm.org/D55444

llvm-svn: 350721
2019-01-09 13:43:32 +00:00
Simon Pilgrim 5a7132ff0f [X86] Enable combining shuffles to PACKSS/PACKUS for 256/512-bit vectors
llvm-svn: 350716
2019-01-09 13:23:28 +00:00
Anton Korobeynikov c18e90369d [MSP430] Optimize 'shl x, 8[+ N] -> swpb(zext(x)) [<< N]' for i16
Perform additional simplification to reduce shift amount.

Patch by Kristina Bessonova!

Differential Revision: https://reviews.llvm.org/D56016

llvm-svn: 350712
2019-01-09 13:03:01 +00:00
Anton Korobeynikov 9222ed4485 [MSP430] Fix crash while lowering llvm.stacksave/stackrestore
Perform the usual expansion of stacksave / restore intrinsics.
Patch by Kristina Bessonova!

Differential Revision: https://reviews.llvm.org/D54890

llvm-svn: 350710
2019-01-09 12:52:15 +00:00
Simon Pilgrim 17eace47cb [X86] Add extra test coverage for combining shuffles to PACKSS/PACKUS
llvm-svn: 350707
2019-01-09 12:34:10 +00:00
Matt Arsenault 3dddb163dd GlobalISel: Implement fewerElements for implicit_def
llvm-svn: 350697
2019-01-09 07:51:52 +00:00
Matt Arsenault befee402ff GlobalISel: Implement widenScalar for implicit_def
llvm-svn: 350695
2019-01-09 07:34:14 +00:00
Zi Xuan Wu f2a75eef41 Revert "[PowerPC] Fix assert from machine verify pass that unmatched register class about fcmp selection in fast-isel"
This reverts commit r350685.

See compile assert in compiler-rt.

llvm-svn: 350693
2019-01-09 06:12:24 +00:00
Zi Xuan Wu 9479f6d72e [PowerPC] Fix assert from machine verify pass that unmatched register class about fcmp selection in fast-isel
Bad machine code: Illegal virtual register for instruction

function: TestULE
basic block: %bb.0 entry (0x1000a39b158)
instruction: %2:crrc = FCMPUD %1:vsfrc, %3:f8rc
operand 1: %1:vsfrc

Fix assert about missing match between fcmp instruction and register class. 
We should use vsx related cmp instruction xvcmpudp instead of fcmpu when vsx is opened.

add -verifymachineinstrs option into related test cases to enable the verify pass.


Differential Revision: https://reviews.llvm.org/D55686

llvm-svn: 350685
2019-01-09 02:31:10 +00:00
Matt Arsenault 0ad1b71fe3 RegisterCoalescer: Assume CR_Replace for SubRangeJoin
Currently it's possible for following
check on V.WriteLanes (which is not really meaningful
during SubRangeJoin) to pass for one half of the pair,
and then fall through to to one of the impossible
or unresolved states. This then fails as inconsistent
on the other half.

During the main range join, the check between V.WriteLanes
and OtherV.ValidLanes must have passed, meaning this
should be a CR_Replace.

Fixes most of the testcases in bugs 39542 and 39602

llvm-svn: 350678
2019-01-08 23:22:18 +00:00