Commit Graph

46356 Commits

Author SHA1 Message Date
Matt Arsenault 5320ee4a05 AMDGPU/GlobalISel: Define instruction mapping for G_OR
Patch by Tom Stellard

llvm-svn: 326489
2018-03-01 21:25:25 +00:00
Matt Arsenault e65404f5c5 AMDGPU/GlobalISel: Remove default register mapping
This crashes for some opcodes, which prevents the SelectionDAG
fallback from working.

Patch by Tom Stellard

llvm-svn: 326487
2018-03-01 21:20:44 +00:00
Evandro Menezes 2bbb4a7c93 [AArch64] Clean up code (NFC)
Clean up a couple of functions in `AArch64TargetLowering` by removing
redundant statements.

llvm-svn: 326486
2018-03-01 21:17:36 +00:00
Matt Arsenault 1422a19a88 AMDGPU/GlobalISel: Use a more correct getValueMapping
This was finding the wrong size registers for anything with
more than 2 components.

Patch by Tom Stellard

llvm-svn: 326483
2018-03-01 21:08:51 +00:00
Matt Arsenault 62669ede94 AMDGPU/GlobalISel: Define instruction mapping for G_BITCAST
Patch by Tom Stellard

llvm-svn: 326482
2018-03-01 20:59:44 +00:00
Matt Arsenault 0529a8e2de AMDGPU/GlobalISel: Mark i32->i64 zext as legal
llvm-svn: 326481
2018-03-01 20:56:21 +00:00
Martin Storsjo c61ff3bef1 [AArch64] Add support for secrel add/load/store relocations for COFF
Differential Revision: https://reviews.llvm.org/D43288

llvm-svn: 326480
2018-03-01 20:42:28 +00:00
Matt Arsenault 36b99e1937 AMDGPU/GlobalISel: InstrMapping for llvm.amdgcn.exp.compr
Patch by Tom Stellard

llvm-svn: 326479
2018-03-01 20:40:55 +00:00
Matt Arsenault 8931bbf8df AMDGPU/GlobalISel: Define instruction mapping for @llvm.amdgcn.exp
Patch by Tom Stellard

llvm-svn: 326477
2018-03-01 20:24:37 +00:00
Matt Arsenault 50721ab325 AMDGPU/GlobalISel: Define InstrMappings for G_ICMP
Patch by Tom Stellard

llvm-svn: 326472
2018-03-01 19:27:10 +00:00
Matt Arsenault dc14ec05d4 AMDGPU/GlobalISel: Make i32 mul legal
llvm-svn: 326471
2018-03-01 19:22:05 +00:00
Matt Arsenault 06cbb27a79 AMDGPU/GlobalISel: Define instruction mapping for G_IMPLICIT_DEF
Patch by Tom Stellard

llvm-svn: 326470
2018-03-01 19:16:52 +00:00
Matt Arsenault e3d9ecf2b9 AMDGPU/GlobalISel: Define instruction mapping for G_FCONSTANT
Patch by Tom Stellard

llvm-svn: 326468
2018-03-01 19:13:30 +00:00
Matt Arsenault 51b0b20023 AMDGPU/GlobalISel: Add copyCost for VGPR->SGPR copies
Patch by Tom Stellard

llvm-svn: 326467
2018-03-01 19:09:25 +00:00
Matt Arsenault 3f6a204eaa AMDGPU/GlobalISel: Make i32 xor legal
llvm-svn: 326466
2018-03-01 19:09:21 +00:00
Matt Arsenault 8e80a5fbca AMDGPU/GlobalISel: Mark 32/64-bit G_FCMP as legal
Patch by Tom Stellard

llvm-svn: 326465
2018-03-01 19:09:16 +00:00
Matt Arsenault dd022ce064 AMDGPU/GlobalISel: Mark 32-bit G_FPTOSI as legal
Patch by Tom Stellard

llvm-svn: 326464
2018-03-01 19:04:25 +00:00
Sam Clegg 503fdea3cb [WebAssembly] Fix broken gcc build after rL326454
The gcc builders were broken by rL326454
See: https://reviews.llvm.org/D43921

llvm-svn: 326460
2018-03-01 18:48:08 +00:00
Artem Belevich 8c9749b1dc [NVPTX] use pattern matching to lower int_nvvm_match_all_sync*.
Now that patterns can handle intrinsics returning multiple results,
use tablegen'ed pattern matching instead of custom lowering.

Differential Revision: https://reviews.llvm.org/D43890

llvm-svn: 326457
2018-03-01 18:28:45 +00:00
Sam Clegg 03e101f1b0 [WebAssembly] Use uint8_t for single byte values to match the spec
The original BinaryEncoding.md document used to specify that
these values were `varint7`, but the official spec lists them
explicitly as single byte values and not LEB.

A similar change for wabt is in flight:
 https://github.com/WebAssembly/wabt/pull/782

Differential Revision: https://reviews.llvm.org/D43921

llvm-svn: 326454
2018-03-01 18:06:21 +00:00
Alexander Timofeev 0081d23fd8 [AMDGPU] : fix for the crash in SIRegisterInfo when the regiser class not found
Differential revision: https://reviews.llvm.org./D43334

llvm-svn: 326451
2018-03-01 17:36:43 +00:00
Krzysztof Parzyszek 22a21d4c5d [Hexagon] Add guest registers
llvm-svn: 326450
2018-03-01 17:03:26 +00:00
Stefan Pintilie e894e0ff6f [Power9] Add missing instructions to the Power 9 scheduler
Adding more instructions using InstRW so that we can move away from ItinRW
and ultimately have a complete Power 9 scheduler.

Differential Revision: https://reviews.llvm.org/D43899

llvm-svn: 326447
2018-03-01 16:16:08 +00:00
Sebastian Pop c33af715d7 [AArch64] generate vuzp instead of mov
when a BUILD_VECTOR is created out of a sequence of EXTRACT_VECTOR_ELT with a
specific pattern sequence, either <0, 2, 4, ...> or <1, 3, 5, ...>, replace the
BUILD_VECTOR with either vuzp1 or vuzp2.

With this patch LLVM generates the following code for the first function fun1 in the testcase:
adrp x8, .LCPI0_0
ldr  q0, [x8, :lo12:.LCPI0_0]
tbl  v0.16b, { v0.16b }, v0.16b
ext  v1.16b, v0.16b, v0.16b, #8
uzp1 v0.8b, v0.8b, v1.8b
str  d0, [x8]
ret

Without this patch LLVM currently generates this code:
adrp    x8, .LCPI0_0
ldr     q0, [x8, :lo12:.LCPI0_0]
tbl     v0.16b, { v0.16b }, v0.16b
mov     v1.16b, v0.16b
mov     v1.b[1], v0.b[2]
mov     v1.b[2], v0.b[4]
mov     v1.b[3], v0.b[6]
mov     v1.b[4], v0.b[8]
mov     v1.b[5], v0.b[10]
mov     v1.b[6], v0.b[12]
mov     v1.b[7], v0.b[14]
str     d1, [x8]
ret

llvm-svn: 326443
2018-03-01 15:47:39 +00:00
Craig Topper cb7881c649 [X86] Stop passing two arguments by reference. NFC
I think these used to be out parameters, but they haven't been for a while.

llvm-svn: 326417
2018-03-01 06:25:13 +00:00
Craig Topper ccfa5257a6 [X86] Make sure we don't combine (fneg (fma X, Y, Z)) to a target specific node when there are no FMA instructions.
This would cause a 'cannot select' error at isel when we should have emitted a lib call and an xor.

Fixes PR36553.

llvm-svn: 326393
2018-03-01 00:08:38 +00:00
Justin Lebar faaf2d298e [NVPTX] Lower loads from global constants using ld.global.nc (aka LDG).
Summary:
After D43914, loads from global variables in addrspace(1) happen with
ld.global.  But since they're constants, even better would be to use
ld.global.nc, aka ldg.

Reviewers: tra

Subscribers: jholewinski, sanjoy, hiraditya, llvm-commits

Differential Revision: https://reviews.llvm.org/D43915

llvm-svn: 326390
2018-02-28 23:58:05 +00:00
Justin Lebar 5a7de898d2 [NVPTX] Use addrspacecast instead of target-specific intrinsics in NVPTXGenericToNVVM.
Summary:
NVPTXGenericToNVVM was using target-specific intrinsics to do address
space casts.  Using the addrspacecast instruction is (a lot) simpler.
But it also has the advantage of being understandable to other passes.
In particular, InferAddrSpaces is able to understand these address space
casts and remove them in most cases.

Reviewers: tra

Subscribers: jholewinski, sanjoy, hiraditya, llvm-commits

Differential Revision: https://reviews.llvm.org/D43914

llvm-svn: 326389
2018-02-28 23:57:48 +00:00
Craig Topper e31b9d1e5f [X86] Lower extract_element from k-registers by bitcasting from v16i1 to i16 and extending/truncating.
This is equivalent to what isel was doing anyway but by canonicalizing earlier we can remove some patterns.

llvm-svn: 326375
2018-02-28 22:23:55 +00:00
Simon Pilgrim 72b86586b0 [X86][AVX512] Improve support for signed saturation truncation stores
Matches what we already manage for unsigned saturation truncation stores

Differential Revision: https://reviews.llvm.org/D43629

llvm-svn: 326372
2018-02-28 21:42:19 +00:00
Krzysztof Parzyszek b1cdb60e75 [Hexagon] Implement target feature +reserved-r19
llvm-svn: 326364
2018-02-28 20:29:36 +00:00
Tim Renouf 2a99fa2c08 [AMDGPU] added writelane intrinsic
Summary:
For use by LLPC SPV_AMD_shader_ballot extension.

The v_writelane instruction was already implemented for use by SGPR
spilling, but I had to add an extra dummy operand tied to the
destination, to represent that all lanes except the selected one keep
the old value of the destination register.

.ll test changes were due to schedule changes caused by that new
operand.

Differential Revision: https://reviews.llvm.org/D42838

llvm-svn: 326353
2018-02-28 19:10:32 +00:00
Artem Belevich 18a7c51520 [NVPTX] Removed always-true predicates in NVPTX.
NVPTX stopped supporting GPUs older than sm_20 (Fermi) quite a while back.
Removal of support of pre-Fermi GPUs made a lot of predicates in the NVPTX
backend pointless as they can't ever be false any more.
It's time to retire them. NFC intended.

Differential Revision: https://reviews.llvm.org/D43843

llvm-svn: 326349
2018-02-28 18:51:22 +00:00
Chih-Hung Hsieh 9f9e4681ac [TLS] use emulated TLS if the target supports only this mode
Emulated TLS is enabled by llc flag -emulated-tls,
which is passed by clang driver.
When llc is called explicitly or from other drivers like LTO,
missing -emulated-tls flag would generate wrong TLS code for targets
that supports only this mode.
Now use useEmulatedTLS() instead of Options.EmulatedTLS to decide whether
emulated TLS code should be generated.
Unit tests are modified to run with and without the -emulated-tls flag.

Differential Revision: https://reviews.llvm.org/D42999

llvm-svn: 326341
2018-02-28 17:48:55 +00:00
Pablo Barrio 512f7ee315 [ARM] Lower lower saturate to 0 and lower saturate to -1 using bit-operations
Summary:
Expressions of the form x < 0 ? 0 :  x; and x < -1 ? -1 : x can be lowered using bit-operations instead of branching or conditional moves

In thumb-mode this results in a two-instruction sequence, a shift followed by a bic or or while in ARM/thumb2 mode that has flexible second operand the shift can be folded into a single bic/or instructions. In most cases this results in smaller code and possibly less branches, and in no case larger than before.

Patch by Martin Svanfeldt

Reviewers: fhahn, pbarrio, rogfer01

Reviewed By: pbarrio, rogfer01

Subscribers: chrib, yroux, eugenis, efriedma, rogfer01, aemerson, javed.absar, kristof.beyls, llvm-commits

Differential Revision: https://reviews.llvm.org/D42574

llvm-svn: 326333
2018-02-28 17:13:07 +00:00
Simon Dardis 4529aac2de [mips] Begin reworking instruction predicates for ISAs/encodings (1/N)
The MIPS backend has inconsistent usage of instruction predicates
for assembly and code generation. The issue arises from supporting three
encodings, two (MIPS and microMIPS) of which have a near 1:1 instruction
mapping across ISA revisions and a third encoding with a more restricted
set of instructions (MIPS16e).

To enforce consistent usage, each of the ISA_* adjectives has (or will
have) the relevant encoding attached to it along the relevant ISA revision
where the instruction is defined.

Each instruction, pattern or alias will then have the correct ISA adjective
attached to it, and the base instruction description classes will have any
predicates relating to ISA encoding or revision removed.

Pseudo instructions will also be guarded for the encoding or ABI that they are
supported in.

Finally, the hasStandardEncoding() / inMicroMipsMode() / inMips16Mode() methods
of MipsSubtarget will be changed such that only one can be true at any one time.

The result of this is that code generation and assembly will produce the
correct encoding up front, while code generated from pseudo instructions
and other inserted sequences of instructions will be able to rely on the mapping
tables to produce the correct encoding. This should fix numerous bugs where
the result 'happens' to be correct but has edge cases where microMIPS and MIPS
have subtle differences (e.g. microMIPSR6 using 'j', 'jal' instructions.)

This patch starts the process by changing most of the ISA adjectives to make
use of the EncodingPredicate member of PredicateControl. Follow on patches
will annotate instructions with their correct ISA adjective and eliminate
the usage of "let Predicates = [..]", "let AdditionalPredicates = [..]" and
"isCodeGenOnly = 1" in the cases where it was used to control instruction
availability.

Contributions from Nitesh Jain.

Reviewers: atanasyan

Differential Revision: https://reviews.llvm.org/D41434

llvm-svn: 326322
2018-02-28 13:02:44 +00:00
Alexander Ivchenko c01f750480 [GlobalIsel][X86] Support G_INTTOPTR instruction.
Add legalization/selection for x86/x86_64 and
corresponding tests.

Reviewed By: igorb

Differential Revision: https://reviews.llvm.org/D43622

llvm-svn: 326320
2018-02-28 12:11:53 +00:00
Alexander Ivchenko 46e07e3623 [GlobalIsel][X86] Support G_PTRTOINT instruction.
Add legalization/selection for x86/x86_64 and
corresponding tests.

Reviewed By: igorb

Differential Revision: https://reviews.llvm.org/D43617

llvm-svn: 326311
2018-02-28 09:18:47 +00:00
Craig Topper 48d5ed265c [X86] Don't use EXTRACT_ELEMENT from v1i1 with i8/i32 result type when we need to guarantee zeroes in the upper bits of return.
An extract_element where the result type is larger than the scalar element type is semantically an any_extend of from the scalar element type to the result type. If we expect zeroes in the upper bits of the i8/i32 we need to mae sure those zeroes are explicit in the DAG.

For these cases the best way to accomplish this is use an insert_subvector to pad zeroes to the upper bits of the v1i1 first. We extend to either v16i1(for i32) or v8i1(for i8). Then bitcast that to a scalar and finish with a zero_extend up to i32 if necessary. We can't extend past v16i1 because that's the largest mask size on KNL. But isel is smarter enough to know that a zext of a bitcast from v16i1 to i16 can use a KMOVW instruction. The insert_subvectors will be dropped during isel because we can determine that the producing instruction already zeroed the upper bits of the k-register.

llvm-svn: 326308
2018-02-28 08:14:28 +00:00
Craig Topper ac799b05d4 [X86] Change the masked FPCLASS implementation to use AND instead of OR to combine the mask results.
While the description for the instruction does mention OR, its talking about how the individual classification test results are ORed together.

The incoming mask is used as a zeroing write mask. If the bit is 1 the classification is written to the output. The bit is 0 the output is 0. This equivalent to an AND.

Here is pseudocode from the intrinsics guide

FOR j := 0 to 1
        i := j*64
        IF k1[j]
                k[j] := CheckFPClass_FP64(a[i+63:i], imm8[7:0])
        ELSE
                k[j] := 0
        FI
ENDFOR
k[MAX:2] := 0

llvm-svn: 326306
2018-02-28 06:19:55 +00:00
Andrew Zhogin f8e88af11d [ARM] Cortex-A57 scheduler fix for ARM backend (missed 16-bit, v8.1/v8.2/v8.3, thumb and pseudo instructions)
Added missed scheduling info for ARM Cortex A57 (AArch32) to have CompleteModel with this checkCompleteness fix: https://reviews.llvm.org/D43235.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D43808

llvm-svn: 326304
2018-02-28 05:53:18 +00:00
Krzysztof Parzyszek 2373f8fcf3 [Hexagon] Recognize more sign-extensions as inputs to 32x32-bit multiply
llvm-svn: 326263
2018-02-27 22:44:41 +00:00
Konstantin Zhuravlyov 40b09e86b9 AMDGPU: Add fast fmaf feature to gfx702
Differential Revision: https://reviews.llvm.org/D43790

llvm-svn: 326252
2018-02-27 21:46:15 +00:00
Sjoerd Meijer fc0d02cbbf [ARM] Another f16 litpool fix
We were always setting the block alignment to 2 bytes in Thumb mode
and 4-bytes in ARM mode (r325754, and r325012), but this could cause 
reducing the block alignment when it already had been aligned (e.g. 
in Thumb mode when the block is a CPE that was already 4-byte aligned).

Patch by Momchil Velikov, I've only added a test.

Differential Revision: https://reviews.llvm.org/D43777

llvm-svn: 326232
2018-02-27 19:26:02 +00:00
Craig Topper 688d1eb919 Revert r326225 "[X86] Move the load folding tables to a separate .inc file"
The bots don't seem to like the .inc file. I must be missing some cmake incantation.

llvm-svn: 326228
2018-02-27 19:15:40 +00:00
Peter Collingbourne e8436e8631 ARM: Don't rewrite add reg, $sp, 0 -> mov reg, $sp if the add defines CPSR.
Differential Revision: https://reviews.llvm.org/D43807

llvm-svn: 326226
2018-02-27 19:00:59 +00:00
Craig Topper c0a1291478 [X86] Move the load folding tables to a separate .inc file
These tables add 3000 lines to X86InstrInfo.cpp. And if we ever manage to auto generate them they'll be a separate file anyway.

Differential Revision: https://reviews.llvm.org/D43806

llvm-svn: 326225
2018-02-27 18:46:11 +00:00
Krzysztof Parzyszek d70f5a0eb4 [Hexagon] Add patterns for compares of i1 values
llvm-svn: 326220
2018-02-27 18:31:46 +00:00
Simon Pilgrim ba43ec8702 [X86][AVX] combineLoopMAddPattern - support 256-bit cases on AVX1 via SplitBinaryOpsAndApply
llvm-svn: 326189
2018-02-27 12:20:37 +00:00
Jonas Paulsson f268cd0aad [SystemZ] Make sure SelectCode() is not called on a target opcode.
Since getNode() might not always return the requsted opcode, for instance if
called with (ISD::AND, -1) arguments, there should be a check so that
SelectCode() is only called when appropriate.

Review: Ulrich Weigand
llvm-svn: 326178
2018-02-27 07:53:23 +00:00
Craig Topper 264707bae4 [X86] Simplify if condition. NFC
SSE2 implies SSE1 and we already covered f32 in the SSE1 check so we don't need to check f32 in the SSE2 check.

llvm-svn: 326170
2018-02-27 06:00:38 +00:00
Craig Topper fcaa0323ec [X86] Replace an impossible if condition with an assert.
llvm-svn: 326167
2018-02-27 03:50:00 +00:00
Aditya Nandakumar 599990530e [GISel]: Don't assert when constraining RegisterOperands which are uses.
Currently we assert that only non target specific opcodes can have
missing RegisterClass constraints in the MCDesc. The backend can have
instructions with register operands but don't have RegisterClass
constraints (say using unknown_class) in which case the instruction
defining the register will constrain it.
Change the assert to only fire if a def has no regclass.

https://reviews.llvm.org/D43409

llvm-svn: 326142
2018-02-26 22:56:21 +00:00
Simon Pilgrim 9929f90740 [X86][SSE] Reduce FADD/FSUB/FMUL costs on later targets (PR36280)
Agner's tables indicate that for SSE42+ targets (Core2 and later) we can reduce the FADD/FSUB/FMUL costs down to 1, which should fix the Himeno benchmark.

Note: the AVX512 FDIV costs look rather dodgy, but this isn't part of this patch.

Differential Revision: https://reviews.llvm.org/D43733

llvm-svn: 326133
2018-02-26 22:10:17 +00:00
Craig Topper e5d39e42b9 [X86] Add constant folding to combineMOVMSK.
There's still some shortcoming in our ability to combine binops of constants with different sizes separated by an extend. I'll try to look at that next.

llvm-svn: 326128
2018-02-26 21:17:33 +00:00
Craig Topper 5e0ceb8865 [X86] Add a custom legalization for (i16 (bitcast v16i1)) and (i32 (bitcast v32i1)) without AVX512 to prevent scalarization
Summary:
We have an early DAG combine to turn these patterns into MOVMSK, but that combine doesn't work if the vXi1 type has more elements than the widest legal vXi8 type. Type legalization will eventually split it down to v16i1 or v32i1 and then the bitcast gets legalized to a truncstore and a scalar load. The truncstore will get lowered to a series of extracts and bit math.

This patch adds a custom legalization to use a sign extend and MOVMSK instead. This prevents the eventual scalarization.

Reviewers: spatel, RKSimon, zvi

Reviewed By: RKSimon

Subscribers: mgorny, llvm-commits

Differential Revision: https://reviews.llvm.org/D43593

llvm-svn: 326119
2018-02-26 20:32:27 +00:00
Simon Pilgrim db0ed7d724 [X86][AVX] createPSADBW - support 256-bit cases on AVX1 via SplitBinaryOpsAndApply
llvm-svn: 326104
2018-02-26 18:17:25 +00:00
Matt Arsenault 2a26a286db AMDGPU/GlobalISel: Make f64 constants legal
llvm-svn: 326101
2018-02-26 17:20:43 +00:00
Tim Renouf 832f90fa0c [AMDGPU] Scratch setup fix on AMDPAL gfx9+ merge shader
Summary:
With OS type AMDPAL, the scratch descriptor is hardwired to be loaded
from offset 0 of the global information table, whose low pointer is
passed in s0. For a merge shader on gfx9+, it needs to be s8 instead, as
the hardware reserves s0-s7.

Reviewers: kzhuravl

Subscribers: arsenm, nhaehnle, dstuttard, llvm-commits, t-tye, yaxunl, wdng, kzhuravl

Differential Revision: https://reviews.llvm.org/D42203

llvm-svn: 326088
2018-02-26 14:46:43 +00:00
Benjamin Kramer b84e158df7 [WebAssembly] Relax constexpr for old standard libraries.
This will still be constexpr when the standard library supports it, but
doesn't force constexpr. Old libraries will get a global constructor,
which is not too bad.

llvm-svn: 326080
2018-02-26 11:07:25 +00:00
Jonas Paulsson b1e81479e9 [XCore] Return true in enableMultipleCopyHints().
Enable multiple COPY hints to eliminate more COPYs during register allocation.

Note that this is something all targets should do, see
https://reviews.llvm.org/D38128.

Review: Robert Lytton
llvm-svn: 326069
2018-02-26 08:03:32 +00:00
Craig Topper 5c980eba47 [X86] Don't use getZExtValue when we have no idea how large the input elements are.
llvm-svn: 326066
2018-02-26 04:43:24 +00:00
Craig Topper 2286058f46 [X86] Use SelectionDAG::SplitVectorOperand to simplify some code. NFC
llvm-svn: 326065
2018-02-26 02:16:34 +00:00
Craig Topper 2bf8e3e0e1 [X86] Simplify the ReplaceNodeResults code for X86ISD::AVG.
This code seemed to try to widen to 128, 256, or 512 bit vectors, but we only create X86ISD::AVG with a power of 2 number of elements. This means the only nodes that need to be legalized are less than 128-bits and need to be widened up to 128 bits.

llvm-svn: 326064
2018-02-26 02:16:33 +00:00
Craig Topper 79d189f597 [X86] Remove VT.isSimple() check from detectAVGPattern.
Which types are considered 'simple' is a function of the requirements of all targets that LLVM supports. That shouldn't directly affect what types we are able to handle. The remainder of this code checks that the number of elements is a power of 2 and takes care of splitting down to a legal size.

llvm-svn: 326063
2018-02-26 02:16:31 +00:00
Craig Topper 6694df14e6 [X86] Use SDNode instead of SDPatternOperator. NFC
llvm-svn: 326048
2018-02-25 06:21:04 +00:00
Craig Topper 81c0eaf4c8 [X86] Allow int_x86_sse2_cvtps2dq and int_x86_avx_cvt_ps2dq_256 to select EVEX encoded instructions.
llvm-svn: 326041
2018-02-24 18:58:07 +00:00
Simon Pilgrim a4fb569483 [X86][SSE] combineSubToSubus - support v8i64 handling from SSSE3
Our UMIN/UMAX, vector truncation and shuffle combining is good enough to efficiently handle v8i64 with the number of leading zeros that are necessary for PSUBUS.

llvm-svn: 326034
2018-02-24 14:06:39 +00:00
Simon Pilgrim 8ad91261e8 [X86][SSE] combineSubToSubus - support v8i32 handling from SSSE3 (not SSE41)
Now that UMIN etc are Legal/Custom for SSE2+, we can efficiently match SUBUS v8i32 cases from SSSE3 which can perform efficient truncation with PSHUFB.

llvm-svn: 326033
2018-02-24 13:39:13 +00:00
Simon Pilgrim 744f008a75 [X86][SSE] combineSubToSubus - begun generalizing to work with any type sizes with SplitBinaryOpsAndApply
llvm-svn: 326030
2018-02-24 12:44:12 +00:00
Simon Pilgrim 51ce2ed367 Fix spelling in comment. NFCI.
llvm-svn: 326029
2018-02-24 12:27:02 +00:00
Jonas Paulsson 8ff0773b13 [Sparc] Return true in enableMultipleCopyHints().
Enable multiple COPY hints to eliminate more COPYs during register allocation.

Note that this is something all targets should do, see
https://reviews.llvm.org/D38128.

Review: James Y Knight
llvm-svn: 326028
2018-02-24 08:24:31 +00:00
Craig Topper 161c805da4 [X86] Use SelectionDAG::getNot instead of implementing manually. NFC
llvm-svn: 326020
2018-02-24 03:15:54 +00:00
Stanislav Mekhanoshin fa48c496e2 [AMDGPU] Shrinking V_SUBBREV_U32
V_SUBBREV_U32 is a commute opcode for V_SUBB_U32. However, when
we try to commute V_SUBB_U32 in order to shrink it we do not then
process V_SUBBREV_U32 and it stay VOP3. This is fixed.

Differential Revision: https://reviews.llvm.org/D43699

llvm-svn: 326011
2018-02-24 01:32:32 +00:00
Heejin Ahn 9386bde11b [WebAssembly] Add exception handling option and feature
Summary:
Add a llc command line option and WebAssembly architecture feature for
exception handling.

Reviewers: dschuff

Subscribers: jfb, sbc100, jgravelle-google, sunfish, llvm-commits

Differential Revision: https://reviews.llvm.org/D43683

llvm-svn: 326004
2018-02-24 00:40:50 +00:00
Craig Topper 7bcac492d4 [X86] Remove checks for '(scalar_to_vector (i8 (trunc GR32:)))' from scalar masked move patterns.
This portion can be matched by other patterns. We don't need it to make the larger pattern valid. It's sufficient to have a v1i1 mask input without caring where it came from.

llvm-svn: 325999
2018-02-24 00:15:05 +00:00
Yonghong Song 60fed1fef0 bpf: New optimization pass for eliminating unnecessary i32 promotions
This pass performs peephole optimizations to cleanup ugly code sequences at
MachineInstruction layer.

Currently, the only optimization in this pass is to eliminate type
promotion
sequences for zero extending 32-bit subregisters to 64-bit registers.

If the compiler could prove the zero extended source come from 32-bit
subregistere then it is safe to erase those promotion sequece, because the
upper half of the underlying 64-bit registers were zeroed implicitly
already.

Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
Reviewed-by: Yonghong Song <yhs@fb.com>
llvm-svn: 325991
2018-02-23 23:49:32 +00:00
Yonghong Song ae961bb061 bpf: New decoder namespace for 32-bit subregister load/store
When -mattr=+alu32 passed to the disassembler, use decoder namespace for
32-bit subregister.

This is to disassemble load and store instructions in preferred B format
as described in previous commit:

      w = *(u8 *) (r + off) // BPF_LDX | BPF_B
      w = *(u16 *)(r + off) // BPF_LDX | BPF_H
      w = *(u32 *)(r + off) // BPF_LDX | BPF_W

      *(u8 *) (r + off) = w // BPF_STX | BPF_B
      *(u16 *)(r + off) = w // BPF_STX | BPF_H
      *(u32 *)(r + off) = w // BPF_STX | BPF_W

NOTE: all other instructions should still use the default decoder
      namespace.

Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
Reviewed-by: Yonghong Song <yhs@fb.com>
llvm-svn: 325990
2018-02-23 23:49:31 +00:00
Yonghong Song ca31c3bb3f bpf: Enable 32-bit subregister support for -mattr=+alu32
After all those preparation patches, now we could enable 32-bit subregister
support once -mattr=+alu32 specified.

Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
Reviewed-by: Yonghong Song <yhs@fb.com>
llvm-svn: 325989
2018-02-23 23:49:30 +00:00
Yonghong Song fcd1e0f625 bpf: Support 32-bit subregister in various InstrInfo hooks
This patch support 32-bit subregister in three InstrInfo hooks, i.e.
copyPhysReg, loadRegFromStackSlot and storeRegToStackSlot,

Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
Reviewed-by: Yonghong Song <yhs@fb.com>
llvm-svn: 325988
2018-02-23 23:49:29 +00:00
Yonghong Song b1a52bd756 bpf: New instruction patterns for 32-bit subregister load and store
The instruction mapping between eBPF/arm64/x86_64 are:

         eBPF              arm64        x86_64
LD1   BPF_LDX | BPF_B       ldrb        movzbl
LD2   BPF_LDX | BPF_H       ldrh        movzwl
LD4   BPF_LDX | BPF_W       ldr         movl

movzbl/movzwl/movl on x86_64 accept 32-bit sub-register, for example %eax,
the same for ldrb/ldrh on arm64 which accept 32-bit "w" register. And
actually these instructions only accept sub-registers. There is no point
to have LD1/2/4 (unsigned) for 64-bit register, because on these arches,
upper 32-bits are guaranteed to be zeroed by hardware or VM, so load into
the smallest available register class is the best choice for maintaining
type information.

For eBPF we should adopt the same philosophy, to change current
format (A):

  r = *(u8 *) (r + off) // BPF_LDX | BPF_B
  r = *(u16 *)(r + off) // BPF_LDX | BPF_H
  r = *(u32 *)(r + off) // BPF_LDX | BPF_W

  *(u8 *) (r + off) = r // BPF_STX | BPF_B
  *(u16 *)(r + off) = r // BPF_STX | BPF_H
  *(u32 *)(r + off) = r // BPF_STX | BPF_W

into B:

  w = *(u8 *) (r + off) // BPF_LDX | BPF_B
  w = *(u16 *)(r + off) // BPF_LDX | BPF_H
  w = *(u32 *)(r + off) // BPF_LDX | BPF_W

  *(u8 *) (r + off) = w // BPF_STX | BPF_B
  *(u16 *)(r + off) = w // BPF_STX | BPF_H
  *(u32 *)(r + off) = w // BPF_STX | BPF_W

There is no change on encoding nor how should they be interpreted,
everything is as it is, load the specified length, write into low bits of
the register then zeroing all remaining high bits.

The only change is their associated register class and how compiler view
them.

Format A still need to be kept, because eBPF LLVM backend doesn't support
sub-registers at default, but once 32-bit subregister is enabled, it should
use format B.

This patch implemented this together with all those necessary extended load
and truncated store patterns.

Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
Reviewed-by: Yonghong Song <yhs@fb.com>
llvm-svn: 325987
2018-02-23 23:49:28 +00:00
Yonghong Song 63cf273f55 bpf: Support i32 in getScalarShiftAmountTy method
getScalarShiftAmount method should be implemented for eBPF backend to make
sure shift amount could still get correct type once 32-bit subregisters
support are enabled.

Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
Reviewed-by: Yonghong Song <yhs@fb.com>
llvm-svn: 325986
2018-02-23 23:49:26 +00:00
Yonghong Song 59fc805c7e bpf: Support condition comparison on i32
We need to support condition comparison on i32. All these comparisons are
supposed to be combined into BPF_J* instructions which only support i64.

For ISD::BR_CC we need to promote it to i64 first, then do custom lowering.

For ISD::SET_CC, just expand to SELECT_CC like what's been done for i64.

For ISD::SELECT_CC, we also want to do custom lower for i32. However, after
32-bit subregister support enabled, it is possible the comparison operands
are i32 while the selected value are i64, or the comparison operands are
i64 while the selected value are i32. We need to define extra instruction
pattern and support them in custom instruction inserter.

Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
Reviewed-by: Yonghong Song <yhs@fb.com>
llvm-svn: 325985
2018-02-23 23:49:25 +00:00
Yonghong Song 219156cff0 bpf: Handle i32 for ALU operations without ISA support
There is no eBPF ISA support for BSWAP, ROTR, ROTL, SREM, SDIVREM, MULHU,
ADDC, ADDE etc on i32.

They could be emulated by other basic BPF_ALU operations, we'd set their
lowering action the same as i64.

Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
Reviewed-by: Yonghong Song <yhs@fb.com>
llvm-svn: 325984
2018-02-23 23:49:24 +00:00
Yonghong Song 07a7a41753 bpf: New calling convention for 32-bit subregisters
This patch add new calling conventions to allow GPR32RegClass as valid
register class for arguments and return types.

New calling convention will only be choosen when -mattr=+alu32 specified.

Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
Reviewed-by: Yonghong Song <yhs@fb.com>
llvm-svn: 325983
2018-02-23 23:49:23 +00:00
Yonghong Song 42389377d8 bpf: New target attribute "alu32" for 32-bit subregister support
This new attribute aims to control the enablement of 32-bit subregister
support on eBPF backend.

Name the interface as "alu32" is because we in particular want to enable
the generation of BPF_ALU32 instructions by enable subregister support.

This attribute could be used in the following format with llc:

  llc -mtriple=bpf -mattr=[+|-]alu32

It is disabled at default.

Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
Reviewed-by: Yonghong Song <yhs@fb.com>
llvm-svn: 325982
2018-02-23 23:49:22 +00:00
Yonghong Song 0252f35362 bpf: Define instruction patterns for extensions and truncations between i32 to i64
For transformations between i32 and i64, if it is explicit signed extension:
  - first cast the operand to i64
  - then use SLL + SRA to finish the extension.

if it is explicit zero extension:
  - first cast the operand to i64
  - then use SLL + SRL to finish the extension.

if it is explicit any extension:
  - just refer to 64-bit register.

if it is explicit truncation:
  - just refer to 32-bit subregister.

NOTE: Some of the zero extension sequences might be unnecessary, they will be
removed by an peephole pass on MachineInstruction layer.

Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
Reviewed-by: Yonghong Song <yhs@fb.com>
llvm-svn: 325981
2018-02-23 23:49:21 +00:00
Yonghong Song 3a564a8f6e bpf: Tighten the immediate predication for 32-bit alu instructions
These 32-bit ALU insn patterns which takes immediate as one operand were
initially added to enable AsmParser support, and the AsmMatcher uses "ins"
and "outs" fields to deduct the operand constraint.

However, the instruction selector doesn't work the same as AsmMatcher. The
selector will use the "pattern" field for which we are not setting the
predication for immediate operands correctly.

Without this patch, i32 would eventually means all i32 operands are valid,
both imm and gpr, while these patterns should allow imm only.

Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
Reviewed-by: Yonghong Song <yhs@fb.com>
llvm-svn: 325980
2018-02-23 23:49:19 +00:00
Yonghong Song ec84e2f1b0 bpf: Use markSuperRegs to mark reserved registers
markSuperRegs is the canonical helper function used to mark reserved
registers. It could mark any overlapping sub-registers automatically.

Reviewed-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
llvm-svn: 325979
2018-02-23 23:49:18 +00:00
Nemanja Ivanovic bcc82c9a78 [PowerPC] Disable shrink-wrapping when getting PC address through the LR
The instruction sequence used to get the address of the PC into a GPR requires
that we clobber the link register. Doing so without having first saved it in
the prologue leaves the function unable to return. Currently, this sequence is
emitted into the entry block. To ensure the prologue is inserted before this
sequence, disable shrink-wrapping.

This fixes PR33547.

Differential Revision: https://reviews.llvm.org/D43677

llvm-svn: 325972
2018-02-23 23:08:34 +00:00
Eric Christopher a70ec1308a Sink the verification code around the assert where it's handled and wrap in NDEBUG.
This has the advantage of making release only builds more warning
free and there's no need to make this routine a class function if
it isn't using class members anyhow.

llvm-svn: 325967
2018-02-23 22:32:05 +00:00
Sriraman Tallam 609f8c013c Intrinsics calls should avoid the PLT when "RtLibUseGOT" metadata is present.
Differential Revision: https://reviews.llvm.org/D42216

llvm-svn: 325962
2018-02-23 21:32:06 +00:00
Craig Topper 16b20245ba [X86] Add assembler/disassembler support for blendm with zero masking and broacast.
Fixes PR31617

llvm-svn: 325957
2018-02-23 20:48:44 +00:00
Stefan Pintilie 626b651016 [Power9] Add missing instructions to the Power 9 scheduler
This is the first in a series of patches that will define more
instructions using InstRW so that we can move away from ItinRW
and ultimately have a complete Power 9 scheduler.

Differential Revision: https://reviews.llvm.org/D43635

llvm-svn: 325956
2018-02-23 20:37:10 +00:00
Krzysztof Parzyszek 96690ceceb [Hexagon] Recognize non-immediate constants in HexagonConstPropagation
llvm-svn: 325954
2018-02-23 20:33:26 +00:00
Simon Pilgrim 69b8fa8391 Fixed unused variable warning. NFCI.
llvm-svn: 325950
2018-02-23 20:16:18 +00:00
Craig Topper 61d6ddbf0a [X86] Add DAG combine to remove (and X, 1) from in front of a v1i1 scalar to vector.
These can be created by type legalization promoting the inputs to select to match scalar boolean contents.

We were trying to pattern match them away during isel, but its better to just remove them from the DAG.

I've cleaned up some patterns to not check for this 'and' anymore. But I suspect this has also opened up opportunities for pattern removal.

llvm-svn: 325949
2018-02-23 20:13:42 +00:00
Benjamin Kramer ae87f86ec4 [WebAssembly] Fix macro metaprogram to not duplicate code as much.
No functionality change intended.

llvm-svn: 325947
2018-02-23 20:13:03 +00:00
Simon Pilgrim 425965be0f [X86][SSE] Generalize x > C-1 ? x+-C : 0 --> subus x, C combine for non-uniform constants
llvm-svn: 325944
2018-02-23 19:58:44 +00:00
Evandro Menezes 1afffac05b [PATCH] [AArch64] Add new target feature to fuse conditional select
This feature enables the fusion of the comparison and the conditional select
instructions together.

Differential revision: https://reviews.llvm.org/D42392

llvm-svn: 325939
2018-02-23 19:27:43 +00:00