llvm-project

Commit Graph

Author	SHA1	Message	Date
Matt Arsenault	5320ee4a05	AMDGPU/GlobalISel: Define instruction mapping for G_OR Patch by Tom Stellard llvm-svn: 326489	2018-03-01 21:25:25 +00:00
Matt Arsenault	e65404f5c5	AMDGPU/GlobalISel: Remove default register mapping This crashes for some opcodes, which prevents the SelectionDAG fallback from working. Patch by Tom Stellard llvm-svn: 326487	2018-03-01 21:20:44 +00:00
Evandro Menezes	2bbb4a7c93	[AArch64] Clean up code (NFC) Clean up a couple of functions in `AArch64TargetLowering` by removing redundant statements. llvm-svn: 326486	2018-03-01 21:17:36 +00:00
Matt Arsenault	1422a19a88	AMDGPU/GlobalISel: Use a more correct getValueMapping This was finding the wrong size registers for anything with more than 2 components. Patch by Tom Stellard llvm-svn: 326483	2018-03-01 21:08:51 +00:00
Matt Arsenault	62669ede94	AMDGPU/GlobalISel: Define instruction mapping for G_BITCAST Patch by Tom Stellard llvm-svn: 326482	2018-03-01 20:59:44 +00:00
Matt Arsenault	0529a8e2de	AMDGPU/GlobalISel: Mark i32->i64 zext as legal llvm-svn: 326481	2018-03-01 20:56:21 +00:00
Martin Storsjo	c61ff3bef1	[AArch64] Add support for secrel add/load/store relocations for COFF Differential Revision: https://reviews.llvm.org/D43288 llvm-svn: 326480	2018-03-01 20:42:28 +00:00
Matt Arsenault	36b99e1937	AMDGPU/GlobalISel: InstrMapping for llvm.amdgcn.exp.compr Patch by Tom Stellard llvm-svn: 326479	2018-03-01 20:40:55 +00:00
Matt Arsenault	8931bbf8df	AMDGPU/GlobalISel: Define instruction mapping for @llvm.amdgcn.exp Patch by Tom Stellard llvm-svn: 326477	2018-03-01 20:24:37 +00:00
Matt Arsenault	50721ab325	AMDGPU/GlobalISel: Define InstrMappings for G_ICMP Patch by Tom Stellard llvm-svn: 326472	2018-03-01 19:27:10 +00:00
Matt Arsenault	dc14ec05d4	AMDGPU/GlobalISel: Make i32 mul legal llvm-svn: 326471	2018-03-01 19:22:05 +00:00
Matt Arsenault	06cbb27a79	AMDGPU/GlobalISel: Define instruction mapping for G_IMPLICIT_DEF Patch by Tom Stellard llvm-svn: 326470	2018-03-01 19:16:52 +00:00
Matt Arsenault	e3d9ecf2b9	AMDGPU/GlobalISel: Define instruction mapping for G_FCONSTANT Patch by Tom Stellard llvm-svn: 326468	2018-03-01 19:13:30 +00:00
Matt Arsenault	51b0b20023	AMDGPU/GlobalISel: Add copyCost for VGPR->SGPR copies Patch by Tom Stellard llvm-svn: 326467	2018-03-01 19:09:25 +00:00
Matt Arsenault	3f6a204eaa	AMDGPU/GlobalISel: Make i32 xor legal llvm-svn: 326466	2018-03-01 19:09:21 +00:00
Matt Arsenault	8e80a5fbca	AMDGPU/GlobalISel: Mark 32/64-bit G_FCMP as legal Patch by Tom Stellard llvm-svn: 326465	2018-03-01 19:09:16 +00:00
Matt Arsenault	dd022ce064	AMDGPU/GlobalISel: Mark 32-bit G_FPTOSI as legal Patch by Tom Stellard llvm-svn: 326464	2018-03-01 19:04:25 +00:00
Sam Clegg	503fdea3cb	[WebAssembly] Fix broken gcc build after rL326454 The gcc builders were broken by rL326454 See: https://reviews.llvm.org/D43921 llvm-svn: 326460	2018-03-01 18:48:08 +00:00
Artem Belevich	8c9749b1dc	[NVPTX] use pattern matching to lower int_nvvm_match_all_sync*. Now that patterns can handle intrinsics returning multiple results, use tablegen'ed pattern matching instead of custom lowering. Differential Revision: https://reviews.llvm.org/D43890 llvm-svn: 326457	2018-03-01 18:28:45 +00:00
Sam Clegg	03e101f1b0	[WebAssembly] Use uint8_t for single byte values to match the spec The original BinaryEncoding.md document used to specify that these values were `varint7`, but the official spec lists them explicitly as single byte values and not LEB. A similar change for wabt is in flight: https://github.com/WebAssembly/wabt/pull/782 Differential Revision: https://reviews.llvm.org/D43921 llvm-svn: 326454	2018-03-01 18:06:21 +00:00
Alexander Timofeev	0081d23fd8	[AMDGPU] : fix for the crash in SIRegisterInfo when the regiser class not found Differential revision: https://reviews.llvm.org./D43334 llvm-svn: 326451	2018-03-01 17:36:43 +00:00
Krzysztof Parzyszek	22a21d4c5d	[Hexagon] Add guest registers llvm-svn: 326450	2018-03-01 17:03:26 +00:00
Stefan Pintilie	e894e0ff6f	[Power9] Add missing instructions to the Power 9 scheduler Adding more instructions using InstRW so that we can move away from ItinRW and ultimately have a complete Power 9 scheduler. Differential Revision: https://reviews.llvm.org/D43899 llvm-svn: 326447	2018-03-01 16:16:08 +00:00
Sebastian Pop	c33af715d7	[AArch64] generate vuzp instead of mov when a BUILD_VECTOR is created out of a sequence of EXTRACT_VECTOR_ELT with a specific pattern sequence, either <0, 2, 4, ...> or <1, 3, 5, ...>, replace the BUILD_VECTOR with either vuzp1 or vuzp2. With this patch LLVM generates the following code for the first function fun1 in the testcase: adrp x8, .LCPI0_0 ldr q0, [x8, :lo12:.LCPI0_0] tbl v0.16b, { v0.16b }, v0.16b ext v1.16b, v0.16b, v0.16b, #8 uzp1 v0.8b, v0.8b, v1.8b str d0, [x8] ret Without this patch LLVM currently generates this code: adrp x8, .LCPI0_0 ldr q0, [x8, :lo12:.LCPI0_0] tbl v0.16b, { v0.16b }, v0.16b mov v1.16b, v0.16b mov v1.b[1], v0.b[2] mov v1.b[2], v0.b[4] mov v1.b[3], v0.b[6] mov v1.b[4], v0.b[8] mov v1.b[5], v0.b[10] mov v1.b[6], v0.b[12] mov v1.b[7], v0.b[14] str d1, [x8] ret llvm-svn: 326443	2018-03-01 15:47:39 +00:00
Craig Topper	cb7881c649	[X86] Stop passing two arguments by reference. NFC I think these used to be out parameters, but they haven't been for a while. llvm-svn: 326417	2018-03-01 06:25:13 +00:00
Craig Topper	ccfa5257a6	[X86] Make sure we don't combine (fneg (fma X, Y, Z)) to a target specific node when there are no FMA instructions. This would cause a 'cannot select' error at isel when we should have emitted a lib call and an xor. Fixes PR36553. llvm-svn: 326393	2018-03-01 00:08:38 +00:00
Justin Lebar	faaf2d298e	[NVPTX] Lower loads from global constants using ld.global.nc (aka LDG). Summary: After D43914, loads from global variables in addrspace(1) happen with ld.global. But since they're constants, even better would be to use ld.global.nc, aka ldg. Reviewers: tra Subscribers: jholewinski, sanjoy, hiraditya, llvm-commits Differential Revision: https://reviews.llvm.org/D43915 llvm-svn: 326390	2018-02-28 23:58:05 +00:00
Justin Lebar	5a7de898d2	[NVPTX] Use addrspacecast instead of target-specific intrinsics in NVPTXGenericToNVVM. Summary: NVPTXGenericToNVVM was using target-specific intrinsics to do address space casts. Using the addrspacecast instruction is (a lot) simpler. But it also has the advantage of being understandable to other passes. In particular, InferAddrSpaces is able to understand these address space casts and remove them in most cases. Reviewers: tra Subscribers: jholewinski, sanjoy, hiraditya, llvm-commits Differential Revision: https://reviews.llvm.org/D43914 llvm-svn: 326389	2018-02-28 23:57:48 +00:00
Craig Topper	e31b9d1e5f	[X86] Lower extract_element from k-registers by bitcasting from v16i1 to i16 and extending/truncating. This is equivalent to what isel was doing anyway but by canonicalizing earlier we can remove some patterns. llvm-svn: 326375	2018-02-28 22:23:55 +00:00
Simon Pilgrim	72b86586b0	[X86][AVX512] Improve support for signed saturation truncation stores Matches what we already manage for unsigned saturation truncation stores Differential Revision: https://reviews.llvm.org/D43629 llvm-svn: 326372	2018-02-28 21:42:19 +00:00
Krzysztof Parzyszek	b1cdb60e75	[Hexagon] Implement target feature +reserved-r19 llvm-svn: 326364	2018-02-28 20:29:36 +00:00
Tim Renouf	2a99fa2c08	[AMDGPU] added writelane intrinsic Summary: For use by LLPC SPV_AMD_shader_ballot extension. The v_writelane instruction was already implemented for use by SGPR spilling, but I had to add an extra dummy operand tied to the destination, to represent that all lanes except the selected one keep the old value of the destination register. .ll test changes were due to schedule changes caused by that new operand. Differential Revision: https://reviews.llvm.org/D42838 llvm-svn: 326353	2018-02-28 19:10:32 +00:00
Artem Belevich	18a7c51520	[NVPTX] Removed always-true predicates in NVPTX. NVPTX stopped supporting GPUs older than sm_20 (Fermi) quite a while back. Removal of support of pre-Fermi GPUs made a lot of predicates in the NVPTX backend pointless as they can't ever be false any more. It's time to retire them. NFC intended. Differential Revision: https://reviews.llvm.org/D43843 llvm-svn: 326349	2018-02-28 18:51:22 +00:00
Chih-Hung Hsieh	9f9e4681ac	[TLS] use emulated TLS if the target supports only this mode Emulated TLS is enabled by llc flag -emulated-tls, which is passed by clang driver. When llc is called explicitly or from other drivers like LTO, missing -emulated-tls flag would generate wrong TLS code for targets that supports only this mode. Now use useEmulatedTLS() instead of Options.EmulatedTLS to decide whether emulated TLS code should be generated. Unit tests are modified to run with and without the -emulated-tls flag. Differential Revision: https://reviews.llvm.org/D42999 llvm-svn: 326341	2018-02-28 17:48:55 +00:00
Pablo Barrio	512f7ee315	[ARM] Lower lower saturate to 0 and lower saturate to -1 using bit-operations Summary: Expressions of the form x < 0 ? 0 : x; and x < -1 ? -1 : x can be lowered using bit-operations instead of branching or conditional moves In thumb-mode this results in a two-instruction sequence, a shift followed by a bic or or while in ARM/thumb2 mode that has flexible second operand the shift can be folded into a single bic/or instructions. In most cases this results in smaller code and possibly less branches, and in no case larger than before. Patch by Martin Svanfeldt Reviewers: fhahn, pbarrio, rogfer01 Reviewed By: pbarrio, rogfer01 Subscribers: chrib, yroux, eugenis, efriedma, rogfer01, aemerson, javed.absar, kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D42574 llvm-svn: 326333	2018-02-28 17:13:07 +00:00
Simon Dardis	4529aac2de	[mips] Begin reworking instruction predicates for ISAs/encodings (1/N) The MIPS backend has inconsistent usage of instruction predicates for assembly and code generation. The issue arises from supporting three encodings, two (MIPS and microMIPS) of which have a near 1:1 instruction mapping across ISA revisions and a third encoding with a more restricted set of instructions (MIPS16e). To enforce consistent usage, each of the ISA_* adjectives has (or will have) the relevant encoding attached to it along the relevant ISA revision where the instruction is defined. Each instruction, pattern or alias will then have the correct ISA adjective attached to it, and the base instruction description classes will have any predicates relating to ISA encoding or revision removed. Pseudo instructions will also be guarded for the encoding or ABI that they are supported in. Finally, the hasStandardEncoding() / inMicroMipsMode() / inMips16Mode() methods of MipsSubtarget will be changed such that only one can be true at any one time. The result of this is that code generation and assembly will produce the correct encoding up front, while code generated from pseudo instructions and other inserted sequences of instructions will be able to rely on the mapping tables to produce the correct encoding. This should fix numerous bugs where the result 'happens' to be correct but has edge cases where microMIPS and MIPS have subtle differences (e.g. microMIPSR6 using 'j', 'jal' instructions.) This patch starts the process by changing most of the ISA adjectives to make use of the EncodingPredicate member of PredicateControl. Follow on patches will annotate instructions with their correct ISA adjective and eliminate the usage of "let Predicates = [..]", "let AdditionalPredicates = [..]" and "isCodeGenOnly = 1" in the cases where it was used to control instruction availability. Contributions from Nitesh Jain. Reviewers: atanasyan Differential Revision: https://reviews.llvm.org/D41434 llvm-svn: 326322	2018-02-28 13:02:44 +00:00
Alexander Ivchenko	c01f750480	[GlobalIsel][X86] Support G_INTTOPTR instruction. Add legalization/selection for x86/x86_64 and corresponding tests. Reviewed By: igorb Differential Revision: https://reviews.llvm.org/D43622 llvm-svn: 326320	2018-02-28 12:11:53 +00:00
Alexander Ivchenko	46e07e3623	[GlobalIsel][X86] Support G_PTRTOINT instruction. Add legalization/selection for x86/x86_64 and corresponding tests. Reviewed By: igorb Differential Revision: https://reviews.llvm.org/D43617 llvm-svn: 326311	2018-02-28 09:18:47 +00:00
Craig Topper	48d5ed265c	[X86] Don't use EXTRACT_ELEMENT from v1i1 with i8/i32 result type when we need to guarantee zeroes in the upper bits of return. An extract_element where the result type is larger than the scalar element type is semantically an any_extend of from the scalar element type to the result type. If we expect zeroes in the upper bits of the i8/i32 we need to mae sure those zeroes are explicit in the DAG. For these cases the best way to accomplish this is use an insert_subvector to pad zeroes to the upper bits of the v1i1 first. We extend to either v16i1(for i32) or v8i1(for i8). Then bitcast that to a scalar and finish with a zero_extend up to i32 if necessary. We can't extend past v16i1 because that's the largest mask size on KNL. But isel is smarter enough to know that a zext of a bitcast from v16i1 to i16 can use a KMOVW instruction. The insert_subvectors will be dropped during isel because we can determine that the producing instruction already zeroed the upper bits of the k-register. llvm-svn: 326308	2018-02-28 08:14:28 +00:00
Craig Topper	ac799b05d4	[X86] Change the masked FPCLASS implementation to use AND instead of OR to combine the mask results. While the description for the instruction does mention OR, its talking about how the individual classification test results are ORed together. The incoming mask is used as a zeroing write mask. If the bit is 1 the classification is written to the output. The bit is 0 the output is 0. This equivalent to an AND. Here is pseudocode from the intrinsics guide FOR j := 0 to 1 i := j*64 IF k1[j] k[j] := CheckFPClass_FP64(a[i+63:i], imm8[7:0]) ELSE k[j] := 0 FI ENDFOR k[MAX:2] := 0 llvm-svn: 326306	2018-02-28 06:19:55 +00:00
Andrew Zhogin	f8e88af11d	[ARM] Cortex-A57 scheduler fix for ARM backend (missed 16-bit, v8.1/v8.2/v8.3, thumb and pseudo instructions) Added missed scheduling info for ARM Cortex A57 (AArch32) to have CompleteModel with this checkCompleteness fix: https://reviews.llvm.org/D43235. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D43808 llvm-svn: 326304	2018-02-28 05:53:18 +00:00
Krzysztof Parzyszek	2373f8fcf3	[Hexagon] Recognize more sign-extensions as inputs to 32x32-bit multiply llvm-svn: 326263	2018-02-27 22:44:41 +00:00
Konstantin Zhuravlyov	40b09e86b9	AMDGPU: Add fast fmaf feature to gfx702 Differential Revision: https://reviews.llvm.org/D43790 llvm-svn: 326252	2018-02-27 21:46:15 +00:00
Sjoerd Meijer	fc0d02cbbf	[ARM] Another f16 litpool fix We were always setting the block alignment to 2 bytes in Thumb mode and 4-bytes in ARM mode (r325754, and r325012), but this could cause reducing the block alignment when it already had been aligned (e.g. in Thumb mode when the block is a CPE that was already 4-byte aligned). Patch by Momchil Velikov, I've only added a test. Differential Revision: https://reviews.llvm.org/D43777 llvm-svn: 326232	2018-02-27 19:26:02 +00:00
Craig Topper	688d1eb919	Revert r326225 "[X86] Move the load folding tables to a separate .inc file" The bots don't seem to like the .inc file. I must be missing some cmake incantation. llvm-svn: 326228	2018-02-27 19:15:40 +00:00
Peter Collingbourne	e8436e8631	ARM: Don't rewrite add reg, $sp, 0 -> mov reg, $sp if the add defines CPSR. Differential Revision: https://reviews.llvm.org/D43807 llvm-svn: 326226	2018-02-27 19:00:59 +00:00
Craig Topper	c0a1291478	[X86] Move the load folding tables to a separate .inc file These tables add 3000 lines to X86InstrInfo.cpp. And if we ever manage to auto generate them they'll be a separate file anyway. Differential Revision: https://reviews.llvm.org/D43806 llvm-svn: 326225	2018-02-27 18:46:11 +00:00
Krzysztof Parzyszek	d70f5a0eb4	[Hexagon] Add patterns for compares of i1 values llvm-svn: 326220	2018-02-27 18:31:46 +00:00
Simon Pilgrim	ba43ec8702	[X86][AVX] combineLoopMAddPattern - support 256-bit cases on AVX1 via SplitBinaryOpsAndApply llvm-svn: 326189	2018-02-27 12:20:37 +00:00
Jonas Paulsson	f268cd0aad	[SystemZ] Make sure SelectCode() is not called on a target opcode. Since getNode() might not always return the requsted opcode, for instance if called with (ISD::AND, -1) arguments, there should be a check so that SelectCode() is only called when appropriate. Review: Ulrich Weigand llvm-svn: 326178	2018-02-27 07:53:23 +00:00
Craig Topper	264707bae4	[X86] Simplify if condition. NFC SSE2 implies SSE1 and we already covered f32 in the SSE1 check so we don't need to check f32 in the SSE2 check. llvm-svn: 326170	2018-02-27 06:00:38 +00:00
Craig Topper	fcaa0323ec	[X86] Replace an impossible if condition with an assert. llvm-svn: 326167	2018-02-27 03:50:00 +00:00
Aditya Nandakumar	599990530e	[GISel]: Don't assert when constraining RegisterOperands which are uses. Currently we assert that only non target specific opcodes can have missing RegisterClass constraints in the MCDesc. The backend can have instructions with register operands but don't have RegisterClass constraints (say using unknown_class) in which case the instruction defining the register will constrain it. Change the assert to only fire if a def has no regclass. https://reviews.llvm.org/D43409 llvm-svn: 326142	2018-02-26 22:56:21 +00:00
Simon Pilgrim	9929f90740	[X86][SSE] Reduce FADD/FSUB/FMUL costs on later targets (PR36280) Agner's tables indicate that for SSE42+ targets (Core2 and later) we can reduce the FADD/FSUB/FMUL costs down to 1, which should fix the Himeno benchmark. Note: the AVX512 FDIV costs look rather dodgy, but this isn't part of this patch. Differential Revision: https://reviews.llvm.org/D43733 llvm-svn: 326133	2018-02-26 22:10:17 +00:00
Craig Topper	e5d39e42b9	[X86] Add constant folding to combineMOVMSK. There's still some shortcoming in our ability to combine binops of constants with different sizes separated by an extend. I'll try to look at that next. llvm-svn: 326128	2018-02-26 21:17:33 +00:00
Craig Topper	5e0ceb8865	[X86] Add a custom legalization for (i16 (bitcast v16i1)) and (i32 (bitcast v32i1)) without AVX512 to prevent scalarization Summary: We have an early DAG combine to turn these patterns into MOVMSK, but that combine doesn't work if the vXi1 type has more elements than the widest legal vXi8 type. Type legalization will eventually split it down to v16i1 or v32i1 and then the bitcast gets legalized to a truncstore and a scalar load. The truncstore will get lowered to a series of extracts and bit math. This patch adds a custom legalization to use a sign extend and MOVMSK instead. This prevents the eventual scalarization. Reviewers: spatel, RKSimon, zvi Reviewed By: RKSimon Subscribers: mgorny, llvm-commits Differential Revision: https://reviews.llvm.org/D43593 llvm-svn: 326119	2018-02-26 20:32:27 +00:00
Simon Pilgrim	db0ed7d724	[X86][AVX] createPSADBW - support 256-bit cases on AVX1 via SplitBinaryOpsAndApply llvm-svn: 326104	2018-02-26 18:17:25 +00:00
Matt Arsenault	2a26a286db	AMDGPU/GlobalISel: Make f64 constants legal llvm-svn: 326101	2018-02-26 17:20:43 +00:00
Tim Renouf	832f90fa0c	[AMDGPU] Scratch setup fix on AMDPAL gfx9+ merge shader Summary: With OS type AMDPAL, the scratch descriptor is hardwired to be loaded from offset 0 of the global information table, whose low pointer is passed in s0. For a merge shader on gfx9+, it needs to be s8 instead, as the hardware reserves s0-s7. Reviewers: kzhuravl Subscribers: arsenm, nhaehnle, dstuttard, llvm-commits, t-tye, yaxunl, wdng, kzhuravl Differential Revision: https://reviews.llvm.org/D42203 llvm-svn: 326088	2018-02-26 14:46:43 +00:00
Benjamin Kramer	b84e158df7	[WebAssembly] Relax constexpr for old standard libraries. This will still be constexpr when the standard library supports it, but doesn't force constexpr. Old libraries will get a global constructor, which is not too bad. llvm-svn: 326080	2018-02-26 11:07:25 +00:00
Jonas Paulsson	b1e81479e9	[XCore] Return true in enableMultipleCopyHints(). Enable multiple COPY hints to eliminate more COPYs during register allocation. Note that this is something all targets should do, see https://reviews.llvm.org/D38128. Review: Robert Lytton llvm-svn: 326069	2018-02-26 08:03:32 +00:00
Craig Topper	5c980eba47	[X86] Don't use getZExtValue when we have no idea how large the input elements are. llvm-svn: 326066	2018-02-26 04:43:24 +00:00
Craig Topper	2286058f46	[X86] Use SelectionDAG::SplitVectorOperand to simplify some code. NFC llvm-svn: 326065	2018-02-26 02:16:34 +00:00
Craig Topper	2bf8e3e0e1	[X86] Simplify the ReplaceNodeResults code for X86ISD::AVG. This code seemed to try to widen to 128, 256, or 512 bit vectors, but we only create X86ISD::AVG with a power of 2 number of elements. This means the only nodes that need to be legalized are less than 128-bits and need to be widened up to 128 bits. llvm-svn: 326064	2018-02-26 02:16:33 +00:00
Craig Topper	79d189f597	[X86] Remove VT.isSimple() check from detectAVGPattern. Which types are considered 'simple' is a function of the requirements of all targets that LLVM supports. That shouldn't directly affect what types we are able to handle. The remainder of this code checks that the number of elements is a power of 2 and takes care of splitting down to a legal size. llvm-svn: 326063	2018-02-26 02:16:31 +00:00
Craig Topper	6694df14e6	[X86] Use SDNode instead of SDPatternOperator. NFC llvm-svn: 326048	2018-02-25 06:21:04 +00:00
Craig Topper	81c0eaf4c8	[X86] Allow int_x86_sse2_cvtps2dq and int_x86_avx_cvt_ps2dq_256 to select EVEX encoded instructions. llvm-svn: 326041	2018-02-24 18:58:07 +00:00
Simon Pilgrim	a4fb569483	[X86][SSE] combineSubToSubus - support v8i64 handling from SSSE3 Our UMIN/UMAX, vector truncation and shuffle combining is good enough to efficiently handle v8i64 with the number of leading zeros that are necessary for PSUBUS. llvm-svn: 326034	2018-02-24 14:06:39 +00:00
Simon Pilgrim	8ad91261e8	[X86][SSE] combineSubToSubus - support v8i32 handling from SSSE3 (not SSE41) Now that UMIN etc are Legal/Custom for SSE2+, we can efficiently match SUBUS v8i32 cases from SSSE3 which can perform efficient truncation with PSHUFB. llvm-svn: 326033	2018-02-24 13:39:13 +00:00
Simon Pilgrim	744f008a75	[X86][SSE] combineSubToSubus - begun generalizing to work with any type sizes with SplitBinaryOpsAndApply llvm-svn: 326030	2018-02-24 12:44:12 +00:00
Simon Pilgrim	51ce2ed367	Fix spelling in comment. NFCI. llvm-svn: 326029	2018-02-24 12:27:02 +00:00
Jonas Paulsson	8ff0773b13	[Sparc] Return true in enableMultipleCopyHints(). Enable multiple COPY hints to eliminate more COPYs during register allocation. Note that this is something all targets should do, see https://reviews.llvm.org/D38128. Review: James Y Knight llvm-svn: 326028	2018-02-24 08:24:31 +00:00
Craig Topper	161c805da4	[X86] Use SelectionDAG::getNot instead of implementing manually. NFC llvm-svn: 326020	2018-02-24 03:15:54 +00:00
Stanislav Mekhanoshin	fa48c496e2	[AMDGPU] Shrinking V_SUBBREV_U32 V_SUBBREV_U32 is a commute opcode for V_SUBB_U32. However, when we try to commute V_SUBB_U32 in order to shrink it we do not then process V_SUBBREV_U32 and it stay VOP3. This is fixed. Differential Revision: https://reviews.llvm.org/D43699 llvm-svn: 326011	2018-02-24 01:32:32 +00:00
Heejin Ahn	9386bde11b	[WebAssembly] Add exception handling option and feature Summary: Add a llc command line option and WebAssembly architecture feature for exception handling. Reviewers: dschuff Subscribers: jfb, sbc100, jgravelle-google, sunfish, llvm-commits Differential Revision: https://reviews.llvm.org/D43683 llvm-svn: 326004	2018-02-24 00:40:50 +00:00
Craig Topper	7bcac492d4	[X86] Remove checks for '(scalar_to_vector (i8 (trunc GR32:)))' from scalar masked move patterns. This portion can be matched by other patterns. We don't need it to make the larger pattern valid. It's sufficient to have a v1i1 mask input without caring where it came from. llvm-svn: 325999	2018-02-24 00:15:05 +00:00
Yonghong Song	60fed1fef0	bpf: New optimization pass for eliminating unnecessary i32 promotions This pass performs peephole optimizations to cleanup ugly code sequences at MachineInstruction layer. Currently, the only optimization in this pass is to eliminate type promotion sequences for zero extending 32-bit subregisters to 64-bit registers. If the compiler could prove the zero extended source come from 32-bit subregistere then it is safe to erase those promotion sequece, because the upper half of the underlying 64-bit registers were zeroed implicitly already. Signed-off-by: Jiong Wang <jiong.wang@netronome.com> Reviewed-by: Yonghong Song <yhs@fb.com> llvm-svn: 325991	2018-02-23 23:49:32 +00:00
Yonghong Song	ae961bb061	bpf: New decoder namespace for 32-bit subregister load/store When -mattr=+alu32 passed to the disassembler, use decoder namespace for 32-bit subregister. This is to disassemble load and store instructions in preferred B format as described in previous commit: w = (u8 ) (r + off) // BPF_LDX \| BPF_B w = (u16 )(r + off) // BPF_LDX \| BPF_H w = (u32 )(r + off) // BPF_LDX \| BPF_W (u8 ) (r + off) = w // BPF_STX \| BPF_B (u16 )(r + off) = w // BPF_STX \| BPF_H (u32 )(r + off) = w // BPF_STX \| BPF_W NOTE: all other instructions should still use the default decoder namespace. Signed-off-by: Jiong Wang <jiong.wang@netronome.com> Reviewed-by: Yonghong Song <yhs@fb.com> llvm-svn: 325990	2018-02-23 23:49:31 +00:00
Yonghong Song	ca31c3bb3f	bpf: Enable 32-bit subregister support for -mattr=+alu32 After all those preparation patches, now we could enable 32-bit subregister support once -mattr=+alu32 specified. Signed-off-by: Jiong Wang <jiong.wang@netronome.com> Reviewed-by: Yonghong Song <yhs@fb.com> llvm-svn: 325989	2018-02-23 23:49:30 +00:00
Yonghong Song	fcd1e0f625	bpf: Support 32-bit subregister in various InstrInfo hooks This patch support 32-bit subregister in three InstrInfo hooks, i.e. copyPhysReg, loadRegFromStackSlot and storeRegToStackSlot, Signed-off-by: Jiong Wang <jiong.wang@netronome.com> Reviewed-by: Yonghong Song <yhs@fb.com> llvm-svn: 325988	2018-02-23 23:49:29 +00:00
Yonghong Song	b1a52bd756	bpf: New instruction patterns for 32-bit subregister load and store The instruction mapping between eBPF/arm64/x86_64 are: eBPF arm64 x86_64 LD1 BPF_LDX \| BPF_B ldrb movzbl LD2 BPF_LDX \| BPF_H ldrh movzwl LD4 BPF_LDX \| BPF_W ldr movl movzbl/movzwl/movl on x86_64 accept 32-bit sub-register, for example %eax, the same for ldrb/ldrh on arm64 which accept 32-bit "w" register. And actually these instructions only accept sub-registers. There is no point to have LD1/2/4 (unsigned) for 64-bit register, because on these arches, upper 32-bits are guaranteed to be zeroed by hardware or VM, so load into the smallest available register class is the best choice for maintaining type information. For eBPF we should adopt the same philosophy, to change current format (A): r = (u8 ) (r + off) // BPF_LDX \| BPF_B r = (u16 )(r + off) // BPF_LDX \| BPF_H r = (u32 )(r + off) // BPF_LDX \| BPF_W (u8 ) (r + off) = r // BPF_STX \| BPF_B (u16 )(r + off) = r // BPF_STX \| BPF_H (u32 )(r + off) = r // BPF_STX \| BPF_W into B: w = (u8 ) (r + off) // BPF_LDX \| BPF_B w = (u16 )(r + off) // BPF_LDX \| BPF_H w = (u32 )(r + off) // BPF_LDX \| BPF_W (u8 ) (r + off) = w // BPF_STX \| BPF_B (u16 )(r + off) = w // BPF_STX \| BPF_H (u32 )(r + off) = w // BPF_STX \| BPF_W There is no change on encoding nor how should they be interpreted, everything is as it is, load the specified length, write into low bits of the register then zeroing all remaining high bits. The only change is their associated register class and how compiler view them. Format A still need to be kept, because eBPF LLVM backend doesn't support sub-registers at default, but once 32-bit subregister is enabled, it should use format B. This patch implemented this together with all those necessary extended load and truncated store patterns. Signed-off-by: Jiong Wang <jiong.wang@netronome.com> Reviewed-by: Yonghong Song <yhs@fb.com> llvm-svn: 325987	2018-02-23 23:49:28 +00:00
Yonghong Song	63cf273f55	bpf: Support i32 in getScalarShiftAmountTy method getScalarShiftAmount method should be implemented for eBPF backend to make sure shift amount could still get correct type once 32-bit subregisters support are enabled. Signed-off-by: Jiong Wang <jiong.wang@netronome.com> Reviewed-by: Yonghong Song <yhs@fb.com> llvm-svn: 325986	2018-02-23 23:49:26 +00:00
Yonghong Song	59fc805c7e	bpf: Support condition comparison on i32 We need to support condition comparison on i32. All these comparisons are supposed to be combined into BPF_J* instructions which only support i64. For ISD::BR_CC we need to promote it to i64 first, then do custom lowering. For ISD::SET_CC, just expand to SELECT_CC like what's been done for i64. For ISD::SELECT_CC, we also want to do custom lower for i32. However, after 32-bit subregister support enabled, it is possible the comparison operands are i32 while the selected value are i64, or the comparison operands are i64 while the selected value are i32. We need to define extra instruction pattern and support them in custom instruction inserter. Signed-off-by: Jiong Wang <jiong.wang@netronome.com> Reviewed-by: Yonghong Song <yhs@fb.com> llvm-svn: 325985	2018-02-23 23:49:25 +00:00
Yonghong Song	219156cff0	bpf: Handle i32 for ALU operations without ISA support There is no eBPF ISA support for BSWAP, ROTR, ROTL, SREM, SDIVREM, MULHU, ADDC, ADDE etc on i32. They could be emulated by other basic BPF_ALU operations, we'd set their lowering action the same as i64. Signed-off-by: Jiong Wang <jiong.wang@netronome.com> Reviewed-by: Yonghong Song <yhs@fb.com> llvm-svn: 325984	2018-02-23 23:49:24 +00:00
Yonghong Song	07a7a41753	bpf: New calling convention for 32-bit subregisters This patch add new calling conventions to allow GPR32RegClass as valid register class for arguments and return types. New calling convention will only be choosen when -mattr=+alu32 specified. Signed-off-by: Jiong Wang <jiong.wang@netronome.com> Reviewed-by: Yonghong Song <yhs@fb.com> llvm-svn: 325983	2018-02-23 23:49:23 +00:00
Yonghong Song	42389377d8	bpf: New target attribute "alu32" for 32-bit subregister support This new attribute aims to control the enablement of 32-bit subregister support on eBPF backend. Name the interface as "alu32" is because we in particular want to enable the generation of BPF_ALU32 instructions by enable subregister support. This attribute could be used in the following format with llc: llc -mtriple=bpf -mattr=[+\|-]alu32 It is disabled at default. Signed-off-by: Jiong Wang <jiong.wang@netronome.com> Reviewed-by: Yonghong Song <yhs@fb.com> llvm-svn: 325982	2018-02-23 23:49:22 +00:00
Yonghong Song	0252f35362	bpf: Define instruction patterns for extensions and truncations between i32 to i64 For transformations between i32 and i64, if it is explicit signed extension: - first cast the operand to i64 - then use SLL + SRA to finish the extension. if it is explicit zero extension: - first cast the operand to i64 - then use SLL + SRL to finish the extension. if it is explicit any extension: - just refer to 64-bit register. if it is explicit truncation: - just refer to 32-bit subregister. NOTE: Some of the zero extension sequences might be unnecessary, they will be removed by an peephole pass on MachineInstruction layer. Signed-off-by: Jiong Wang <jiong.wang@netronome.com> Reviewed-by: Yonghong Song <yhs@fb.com> llvm-svn: 325981	2018-02-23 23:49:21 +00:00
Yonghong Song	3a564a8f6e	bpf: Tighten the immediate predication for 32-bit alu instructions These 32-bit ALU insn patterns which takes immediate as one operand were initially added to enable AsmParser support, and the AsmMatcher uses "ins" and "outs" fields to deduct the operand constraint. However, the instruction selector doesn't work the same as AsmMatcher. The selector will use the "pattern" field for which we are not setting the predication for immediate operands correctly. Without this patch, i32 would eventually means all i32 operands are valid, both imm and gpr, while these patterns should allow imm only. Signed-off-by: Jiong Wang <jiong.wang@netronome.com> Reviewed-by: Yonghong Song <yhs@fb.com> llvm-svn: 325980	2018-02-23 23:49:19 +00:00
Yonghong Song	ec84e2f1b0	bpf: Use markSuperRegs to mark reserved registers markSuperRegs is the canonical helper function used to mark reserved registers. It could mark any overlapping sub-registers automatically. Reviewed-by: Yonghong Song <yhs@fb.com> Signed-off-by: Jiong Wang <jiong.wang@netronome.com> llvm-svn: 325979	2018-02-23 23:49:18 +00:00
Nemanja Ivanovic	bcc82c9a78	[PowerPC] Disable shrink-wrapping when getting PC address through the LR The instruction sequence used to get the address of the PC into a GPR requires that we clobber the link register. Doing so without having first saved it in the prologue leaves the function unable to return. Currently, this sequence is emitted into the entry block. To ensure the prologue is inserted before this sequence, disable shrink-wrapping. This fixes PR33547. Differential Revision: https://reviews.llvm.org/D43677 llvm-svn: 325972	2018-02-23 23:08:34 +00:00
Eric Christopher	a70ec1308a	Sink the verification code around the assert where it's handled and wrap in NDEBUG. This has the advantage of making release only builds more warning free and there's no need to make this routine a class function if it isn't using class members anyhow. llvm-svn: 325967	2018-02-23 22:32:05 +00:00
Sriraman Tallam	609f8c013c	Intrinsics calls should avoid the PLT when "RtLibUseGOT" metadata is present. Differential Revision: https://reviews.llvm.org/D42216 llvm-svn: 325962	2018-02-23 21:32:06 +00:00
Craig Topper	16b20245ba	[X86] Add assembler/disassembler support for blendm with zero masking and broacast. Fixes PR31617 llvm-svn: 325957	2018-02-23 20:48:44 +00:00
Stefan Pintilie	626b651016	[Power9] Add missing instructions to the Power 9 scheduler This is the first in a series of patches that will define more instructions using InstRW so that we can move away from ItinRW and ultimately have a complete Power 9 scheduler. Differential Revision: https://reviews.llvm.org/D43635 llvm-svn: 325956	2018-02-23 20:37:10 +00:00
Krzysztof Parzyszek	96690ceceb	[Hexagon] Recognize non-immediate constants in HexagonConstPropagation llvm-svn: 325954	2018-02-23 20:33:26 +00:00
Simon Pilgrim	69b8fa8391	Fixed unused variable warning. NFCI. llvm-svn: 325950	2018-02-23 20:16:18 +00:00
Craig Topper	61d6ddbf0a	[X86] Add DAG combine to remove (and X, 1) from in front of a v1i1 scalar to vector. These can be created by type legalization promoting the inputs to select to match scalar boolean contents. We were trying to pattern match them away during isel, but its better to just remove them from the DAG. I've cleaned up some patterns to not check for this 'and' anymore. But I suspect this has also opened up opportunities for pattern removal. llvm-svn: 325949	2018-02-23 20:13:42 +00:00
Benjamin Kramer	ae87f86ec4	[WebAssembly] Fix macro metaprogram to not duplicate code as much. No functionality change intended. llvm-svn: 325947	2018-02-23 20:13:03 +00:00
Simon Pilgrim	425965be0f	[X86][SSE] Generalize x > C-1 ? x+-C : 0 --> subus x, C combine for non-uniform constants llvm-svn: 325944	2018-02-23 19:58:44 +00:00
Evandro Menezes	1afffac05b	[PATCH] [AArch64] Add new target feature to fuse conditional select This feature enables the fusion of the comparison and the conditional select instructions together. Differential revision: https://reviews.llvm.org/D42392 llvm-svn: 325939	2018-02-23 19:27:43 +00:00

1 2 3 4 5 ...

46356 Commits