llvm-project

Commit Graph

Author	SHA1	Message	Date
Craig Topper	e60f1472f1	[X86] Stop swapping the operands of AVX512 setge. We swapped the operands and used setle, but I don't see any reason to do that. I think this is a holdover from SSE where we swap and the invert to use pcmpgt. But with AVX512 we don't want an invert so we won't use pcmpgt. So there's no need to swap. llvm-svn: 325527	2018-02-19 19:23:35 +00:00
Craig Topper	9471a7c898	[X86] Reduce the number of isel pattern variations needed for VPTESTM/VPTESTNM matching. Canonicalize EQ/NE PCMPM to have build vector all zeros on the RHS so we don't have to pattern match it in both locations. This significantly reduces the number of isel patterns needed since we also had to multiply it out with loads being in either operand of the 'and' input node and in the 'and' masking node. This removes over 24000 bytes from the isel table. llvm-svn: 325526	2018-02-19 19:23:31 +00:00
Mark Searles	65207923f6	[AMDGPU] Make note of existing waitcnt instrs; this is add-on work related to suppression of redundant waitcnt instrs. It is necessary to make note of these existing waitcnt instrs so that we do not fall into an infinite loop when handling loops. Also, [NFC] some minor code clean-up. llvm-svn: 325524	2018-02-19 19:19:59 +00:00
Simon Pilgrim	70eb508605	[SelectionDAG] ComputeKnownBits - add support for SMIN+SMAX clamp patterns If we have a clamp pattern, SMIN(SMAX(X, LO),HI) or SMAX(SMIN(X, HI),LO) then we can deduce that the number of signbits (zeros/ones) will be at least the minimum of the LO and HI constants. ComputeKnownBits equivalent of D43338. Differential Revision: https://reviews.llvm.org/D43463 llvm-svn: 325521	2018-02-19 18:08:16 +00:00
Mark Searles	419bdab759	[AMDGPU] Increased vector length for global/constant loads. Summary: GCN ISA supports instructions that can read 16 consecutive dwords from memory through the scalar data cache; loadstoreVectorizer should take advantage of the wider vector length and pack 16/8 elements of dwords/quadwords. Author: FarhanaAleen Reviewed By: rampitec Subscribers: llvm-commits, AMDGPU Differential Revision: https://reviews.llvm.org/D43275 llvm-svn: 325518	2018-02-19 16:42:49 +00:00
Rafael Espindola	c7e51805ff	Bring back r323297. It was reverted because it broke the grub build. The reason the grub build broke is because grub does its own relocation processing and was not handing R_386_PLT32. Since grub has no dynamic linker, the fix is trivial: handle R_386_PLT32 exactly like R_386_PC32. On the report it was noted that they are using -fno-integrated-assembler. The upstream GAS (starting with 451875b4f976a527395e9303224c7881b65e12ed) will already be producing a R_386_PLT32 anyway, so they have to update their code one way or the other Original message: Don't assume a null GV is local for ELF and MachO. This is already a simplification, and should help with avoiding a plt reference when calling an intrinsic with -fno-plt. With this change we return false for null GVs, so the caller only needs to check the new metadata to decide if it should use foo@plt or *foo@got. llvm-svn: 325514	2018-02-19 16:02:38 +00:00
Francis Visoiu Mistrih	7f0f8bb4bd	[CodeGen] Fix tests breaking after r325505 llvm-svn: 325512	2018-02-19 15:51:17 +00:00
Simon Pilgrim	c302a581a0	[X86][SSE] combineTruncateWithSat - use truncateVectorWithPACK down to 64-bit subvectors Add support for chaining PACKSS/PACKUS down to 64-bit vectors by using only a single 128-bit input. llvm-svn: 325494	2018-02-19 13:29:20 +00:00
Amara Emerson	7e9f348b2d	[AArch64][GlobalISel] Fix an assert fail/miscompile when fp16 types are copied to gpr register banks. PR36345. rdar://36478867 Differential Revision: https://reviews.llvm.org/D43310 llvm-svn: 325463	2018-02-18 17:10:49 +00:00
Amara Emerson	bc03baef77	[AArch64][GlobalISel] Support G_INSERT/G_EXTRACT of types < s32 bits. These are needed for operations on fp16 types in a later patch. llvm-svn: 325462	2018-02-18 17:03:02 +00:00
Haicheng Wu	aed6e52b3c	[AArch64] Coalesce Copy Zero during instruction selection Add special case for copy of zero to avoid a double copy. Differential Revision: https://reviews.llvm.org/D36104 llvm-svn: 325459	2018-02-18 13:51:33 +00:00
Jonas Paulsson	891789c299	[BPF] Return true in enableMultipleCopyHints(). Enable multiple COPY hints to eliminate more COPYs during register allocation. Note that this is something all targets should do, see https://reviews.llvm.org/D38128. Review: Yonghong Song llvm-svn: 325457	2018-02-18 10:09:54 +00:00
Craig Topper	1040f236a3	[X86] Make masked pcmpeq commutable during isel so we can fold loads in other operand to the shorter encoding. Previously we used the immediate encoding if the load was in operand 0 and the short encoding if the load was in operand 1. This added an insane number of bytes to the size of the isel table. I'm wondering if we should always use the immediate form during isel and change to the short form during emission. This would remove the need to pattern match every combination for both the immediate form and the short form during isel. We could do the same with vpcmpgt llvm-svn: 325456	2018-02-18 02:37:33 +00:00
Craig Topper	b824050658	[X86] Add -show-mc-encoding to the avx512-vec-cmp.ll test and add test case to show that we're failing to use the shorter pcmpeq encoding when the memory arguemnt is the first argument. This can't be spotted without showing the encodings since they have the same mnemonic. llvm-svn: 325455	2018-02-18 02:37:32 +00:00
Simon Pilgrim	7fae42eb27	[SelectionDAG] ComputeNumSignBits - add support for SMIN+SMAX clamp patterns If we have a clamp pattern, SMIN(SMAX(X, LO),HI) or SMAX(SMIN(X, HI),LO) then we can deduce that the number of signbits will be at least the minimum of the LO and HI constants. I haven't bothered with the UMIN/UMAX equivalent as (1) we don't have any current use cases and (2) I wonder if we'd be better off immediately falling back for ComputeKnownBits for UMIN/UMAX which already has optimization patterns useful for unsigned cases. Differential Revision: https://reviews.llvm.org/D43338 llvm-svn: 325450	2018-02-17 22:19:50 +00:00
Simon Pilgrim	8da142bff1	[SelectionDAG] SimplifyDemandedVectorElts - add support for VECTOR_INSERT_ELT Differential Revision: https://reviews.llvm.org/D43431 llvm-svn: 325449	2018-02-17 21:49:40 +00:00
Sjoerd Meijer	c9bde5404a	[ARM] Add LLVM tests for the vcvtr builtins Follow up of Clang commit r325351; this adds the LLVM tests, which were also missing. Differential Revision: https://reviews.llvm.org/D43395 llvm-svn: 325443	2018-02-17 19:59:29 +00:00
Alex Bradbury	2cd14e16a6	[RISCV] Revert r324172 now r323991 was reverted This fixes the build, now that r325421 was commited to revert r323991. llvm-svn: 325441	2018-02-17 18:17:47 +00:00
Sander de Smalen	d01cb72f7e	Made test dbg_value_fastisel.ll specific to AArch64 fast-isel. Some buildbots failed on this test (rL325438) because they don't build all targets. I set the triple to aarch64 and moved the test to test/CodeGen/AArch64/fast-isel-dbg-value.ll. llvm-svn: 325440	2018-02-17 17:43:24 +00:00
Sander de Smalen	47952b0c03	[DebugInfo][FastISel] Fix dropping dbg.value() Summary: https://llvm.org/PR36263 shows that when compiling at -O0 a dbg.value() instruction (that remains from an original dbg.declare()) is dropped by FastISel. Since FastISel selects instructions by iterating a basic block backwards, it drops the dbg.value if one of its operands is not yet instantiated by a previously selected instruction. Instead of calling 'lookUpRegForValue()' we can call 'getRegForValue()' instead that will insert a placeholder for the operand to be filled in when continuing the instruction selection. Reviewers: aprantl, dblaikie, probinson Reviewed By: aprantl Subscribers: llvm-commits, dstenb, JDevlieghere Differential Revision: https://reviews.llvm.org/D43386 llvm-svn: 325438	2018-02-17 16:42:54 +00:00
Martin Storsjo	a63a5b993e	[AArch64] Implement dynamic stack probing for windows This makes sure that alloca() function calls properly probe the stack as needed. Differential Revision: https://reviews.llvm.org/D42356 llvm-svn: 325433	2018-02-17 14:26:32 +00:00
Jonas Paulsson	b51a9bc358	[AMDGPU] Return true in enableMultipleCopyHints(). Enable multiple COPY hints to eliminate more COPYs during register allocation. Note that this is something all targets should do, see https://reviews.llvm.org/D38128. Review: Stanislav Mekhanoshin, Tom Stellard. llvm-svn: 325425	2018-02-17 10:00:28 +00:00
Quentin Colombet	48abac82b8	Revert "[MachineCopyPropagation] Extend pass to do COPY source forwarding" This reverts commit r323991. This commit breaks target that don't model all the register constraints in TableGen. So far the workaround was to set the hasExtraXXXRegAllocReq, but it proves that it doesn't cover all the cases. For instance, when mutating an instruction (like in the lowering of COPYs) the isRenamable flag is not properly updated. The same problem will happen when attaching machine operand from one instruction to another. Geoff Berry is working on a fix in https://reviews.llvm.org/D43042. llvm-svn: 325421	2018-02-17 03:05:33 +00:00
Chandler Carruth	a1d6107b14	[DAG, X86] Revert r324797, r324491, and r324359. Sadly, r324359 caused at least PR36312. There is a patch out for review but it seems to be taking a bit and we've already had these crashers in tree for too long. We're hitting this PR in real code now and are blocked on shipping new compilers as a consequence so I'm reverting us back to green. Sorry for the churn due to the stacked changes that I had to revert. =/ llvm-svn: 325420	2018-02-17 02:26:25 +00:00
Craig Topper	0bcdd399e7	[X86] Turn selects with constant condition into vector shuffles during DAG combine Summary: Currently we convert to shuffles during lowering. This moves it to DAG combine so hopefully we can get it done before type legalization has to extend the condition. I believe in some cases we're creating SHRUNKBLENDs that end up with constant conditions because we see the extended on the condition and think its a dynamic selelect before DAG combine gets a chance to constant fold the extend. We could add combines to turn SHRUNKBLENDs with constant condition back to vselect. But it seemed like it might be better to just send them to shuffles as early as possible so they never get a chance to become SHRUNKBLENDs. This the reason some tests went from blends controlled by a constant pool load to just move. Some of the constant pool entries changed because the sign_extend introduced by type legalization turned undef elements in select condition into 0s. While the select->shuffle used -1 in the shuffle mask. So now the shuffle lowering can do what it wants with them. I'll remove the lowering code as a follow up. We might be able to simplify some of the pre-checks for SHRUNKBLEND as the FIXME there says. Reviewers: spatel, RKSimon, efriedma, zvi, andreadb Reviewed By: spatel Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D43367 llvm-svn: 325417	2018-02-17 00:30:30 +00:00
Eric Christopher	9beff6d4e7	Run these tests, the errors were old and not valid anymore. llvm-svn: 325407	2018-02-16 23:02:28 +00:00
Aditya Nandakumar	b63e763847	[GISel]: Make GlobalISelEmitter rule prioritization compatible with selectionDAG This patch changes GlobalISelEmitter to rank patterns similar to how the DAG does it (ie it computes a score for a pattern and adds the added complexity to it). This is so that the decision tree for GISelSelector remains compatible with that of SelectionDAG. https://reviews.llvm.org/D43270 llvm-svn: 325401	2018-02-16 22:37:15 +00:00
Konstantin Zhuravlyov	9122a63143	AMDGPU: Bring elf flags in sync with the spec - Add MACH flags - Add XNACK flag - Add reserved flags - Minor cleanups in docs Differential Revision: https://reviews.llvm.org/D43356 llvm-svn: 325399	2018-02-16 22:33:59 +00:00
Konstantin Zhuravlyov	331f97e171	AMDGPU: Bring processors and features in sync with the spec - Remove gfx800 - Make iceland gfx802 - Add xnack to gfx902 Differential Revision: https://reviews.llvm.org/D43355 llvm-svn: 325393	2018-02-16 21:26:25 +00:00
Evandro Menezes	10ae20d80c	[AArch64] Fix BITCAST lowering crash The data type is assumed to be a vector, but sometimes it is not, leading to an assertion. Add simple test-case to verify this. Differential revision: https://reviews.llvm.org/D42599 llvm-svn: 325378	2018-02-16 20:00:57 +00:00
Changpeng Fang	ba92059ca9	AMDGPU/SI: Extend promoting alloca to vector to arrays of up to 16 elements Summary: This patch extends the promotion of alloca to vector to the arrays of up to 16 elements. Also we introduce an option, -disable-promote-alloca-to-vector, to switch promotion to vector off, if needed. Reviewers: arsenm Differential Revision: https://reviews.llvm.org/D33559 llvm-svn: 325372	2018-02-16 19:14:17 +00:00
Craig Topper	de565fc73e	[X86] Only reorder srl/and on last DAG combiner run This seems to interfere with a target independent brcond combine that looks for the (srl (and X, C1), C2) pattern to enable TEST instructions. Once we flip, that combine doesn't fire and we end up exposing it to the X86 specific BT combine which causes us to emit a BT instruction. BT has lower throughput than TEST. We could try to make the brcond combine aware of the alternate pattern, but since the flip was just a code size reduction and not likely to enable other combines, it seemed easier to just delay it until after lowering. Differential Revision: https://reviews.llvm.org/D43201 llvm-svn: 325371	2018-02-16 18:51:09 +00:00
Changpeng Fang	da38b5fd49	AMDGPU/SI: Turn off GPR Indexing Mode immediately after the interested instruction. Summary: In the current implementation of GPR Indexing Mode when the index is of non-uniform, the s_set_gpr_idx_off instruction is incorrectly inserted after the loop. This will lead the instructions with vgpr operands (v_readfirstlane for example) to read incorrect vgpr. In this patch, we fix the issue by inserting s_set_gpr_idx_on/off immediately around the interested instruction. Reviewers: rampitec Differential Revision: https://reviews.llvm.org/D43297 llvm-svn: 325355	2018-02-16 16:31:30 +00:00
Simon Pilgrim	ff53a4a234	[SelectionDAG] Enable SimplifyDemandedVectorElts support for simplifying shuffle masks Based off the DemandedElts mask the and UNDEF elements returned from the SimplifyDemandedVectorElts calls to the shuffle operands, we can attempt to simplify the shuffle mask. I had to be very conservative here as accepting post-legalized shuffle masks could cause problems for targets that legalize UNDEF mask elements back to inrange values (PowerPC), similarly combining to identity shuffle masks could cause too much UNDEF information to disappear for later combines. llvm-svn: 325354	2018-02-16 16:22:14 +00:00
Simon Pilgrim	4e2f757dc1	[X86][SSE] Allow float domain crossing if we are merging 2 or more shuffles and the root started as a float domain shuffle llvm-svn: 325349	2018-02-16 14:57:25 +00:00
Simon Dardis	b8ae30ecec	[mips] Remove codegen support from some 16 bit instructions These instructions conflict with their full length variants for the purposes of FastISel as they cannot be distingushed based on the number and type of operands and predicates. Reviewers: atanasyan Differential Revision: https://reviews.llvm.org/D41285 llvm-svn: 325341	2018-02-16 13:34:23 +00:00
Simon Pilgrim	0ffde50f9c	[SelectionDAG] Add initial SimplifyDemandedVectorElts support for simplifying VSELECT operands This just adds a basic pass through - we can add constant selection mask handling in a future patch to fully match InstCombine. llvm-svn: 325338	2018-02-16 12:21:08 +00:00
Jonas Paulsson	995ba6e42c	[ARM] Return true in enableMultipleCopyHints(). Enable multiple COPY hints to eliminate more COPYs during register allocation. Note that this is something all targets should do, see https://reviews.llvm.org/D38128. Review: Eli Friedman llvm-svn: 325327	2018-02-16 09:51:01 +00:00
Mikhail Maltsev	0a7e107e77	[LegalizeDAG] Fix legalization of SETCC Summary: Currently when expanding a SETCC node into a SELECT_CC, LLVM uses an incorrect type for determining BooleanContent of the result. This patch fixes the issue. Fixes PR36079. Reviewers: rogfer01, javed.absar, efriedma Reviewed By: efriedma Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D43282 llvm-svn: 325325	2018-02-16 09:35:16 +00:00
Roger Ferrer Ibanez	d41059a9f6	[ARM] Materialise some boolean values to avoid a branch This patch combines some cases of ARMISD::CMOV for integers that arise in comparisons of the form a != b ? x : 0 a == b ? 0 : x and that currently (e.g. in Thumb1) are emitted as branches. Differential Revision: https://reviews.llvm.org/D34515 llvm-svn: 325323	2018-02-16 09:23:59 +00:00
Craig Topper	2e4b838c06	[X86] Allow CMOVs of constants to be sign extended from i32. Sign extending i32 constants only requires a REX prefix as does widening the CMOV. This is cheaper than the explicit sign extend op. llvm-svn: 325318	2018-02-16 07:16:15 +00:00
Craig Topper	5d9e301042	[X86] Don't zero_extend cmov up to i64, stop at i32. Zero extend from i32 to i64 is free. So extend from i16 to i32, and then use a free zero extend to finish. llvm-svn: 325317	2018-02-16 06:52:43 +00:00
Craig Topper	da9c122203	[X86] Add the test cases that were supposed to go with r325287. llvm-svn: 325306	2018-02-16 00:39:05 +00:00
Stanislav Mekhanoshin	ff2763a658	[AMDGPU] Combine adjacent waitcounts in a single strongest wait Differential Revision: https://reviews.llvm.org/D43350 llvm-svn: 325299	2018-02-15 22:03:55 +00:00
Craig Topper	f3f35efe5c	[X86] Enable BT to be used in place of TEST for single bit checks under optsize We already do this for 64-bit when it won't fit into a 64-bit AND/TEST's immediate field. This adds an additional qualifier to do it for any single bit constant larger than 8-bits under optsize Differential Revision: https://reviews.llvm.org/D43346 llvm-svn: 325290	2018-02-15 20:27:30 +00:00
Craig Topper	dac3c1f5c8	[DAGCombiner] Call ExtendUsesToFormExtLoad in (zext (and (load)))->(and (zextload)) even when the and does not have multiple uses Same for the sign extend case. Currently we check for multiple uses on the binop. Then we call ExtendUsesToFormExtLoad to capture SetCCs that use the load. So we only end up finding any setccs when the and has additional uses and the load is used by a setcc. I don't think the and having multiple uses is relevant here. I think we should only be checking for the load having multiple uses. This changes an NVPTX test because we now find that the load has a second use by a truncate, but ExtendUsesToFormExtLoad only looks at setccs it can extend. All other operations just check isTruncateFree. Maybe we should allow widening of an existing truncate even if its not free? Differential Revision: https://reviews.llvm.org/D43063 llvm-svn: 325289	2018-02-15 20:20:32 +00:00
Pablo Barrio	fa6f1c0130	[ARM] Fix redirect in inline assembly test Summary: Fix silly mistake in a test Reviewers: gkistanova, apilipenko Subscribers: javed.absar, eraman, kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D43342 llvm-svn: 325283	2018-02-15 19:17:55 +00:00
Craig Topper	81631a2609	[X86] Add test cases for opportunities for using BT instead of TEST under optsize. llvm-svn: 325277	2018-02-15 19:00:11 +00:00
Simon Pilgrim	689d8137ce	[X86][SSE] Add saturated truncation tests for storing illegal v8i8 types Tests showing missing opportunities to use PACK instructions in cases where we need to truncate to illegal types for stores llvm-svn: 325270	2018-02-15 17:48:34 +00:00
Yonghong Song	920df52a93	bpf: fix a bug in dag2dag optimization for loads from readonly section The reference '&' is missing in the function parameter. If there are back-to-back optimizations in terms of dag node list like below: t29: i64,ch = load<LD4[bitcast (%struct.test_t* @test.t to i8)+12](dereferenceable), zext from i32> t3, t43, undef:i64 t34: i64,ch = load<LD4[bitcast (%struct.test_t @test.t to i8*)](dereferenceable), zext from i32> t3, t41, undef:i64 The bug will trigger a segfault for the added test case remove_truncate_5.ll: LLVMSymbolizer: error reading file: No such file or directory #0 0x000000000241c4d9 (llc+0x241c4d9) #1 0x000000000241c56a (llc+0x241c56a) #2 0x000000000241aa50 (llc+0x241aa50) ... #22 0x0000000000fd5edf (llc+0xfd5edf) #23 0x00007f0fe03bec05 __libc_start_main (/lib64/libc.so.6+0x21c05) #24 0x0000000000fd3e69 (llc+0xfd3e69) ... Segmentation fault Signed-off-by: Yonghong Song <yhs@fb.com> llvm-svn: 325267	2018-02-15 17:06:45 +00:00

1 2 3 4 5 ...

23501 Commits