llvm-project

Commit Graph

Author	SHA1	Message	Date
Craig Topper	b17e5ec61b	[X86] Don't form masked vpcmp/vcmp/vptestm operations if the setcc node has more than one use. We're better of emitting a single compare + kand rather than a compare for the other use and a masked compare. I'm looking into using custom instruction selection for VPTESTM to reduce the ridiculous number of permutations of patterns in the isel table. Putting a one use check on all masked compare folding makes load fold matching in the custom code easier. llvm-svn: 358358	2019-04-14 18:26:06 +00:00
Craig Topper	476dd06854	[X86] Update bool_reduction_v8f32 test cases from vector-compare-any_of.ll and vector-compare-all_of.ll to be proper reductions. One of the shuffles was used twice. While the intended shuffle wasn't connected. llvm-svn: 358346	2019-04-14 04:20:42 +00:00
Bill Wendling	191f1487b6	[X86] Use PC-relative mode for the kernel code model Summary: The Linux kernel uses PC-relative mode, so allow that when the code model is "kernel". Reviewers: craig.topper Reviewed By: craig.topper Subscribers: llvm-commits, kees, nickdesaulniers Tags: #llvm Differential Revision: https://reviews.llvm.org/D60643 llvm-svn: 358343	2019-04-13 21:39:28 +00:00
Amara Emerson	93e58d2396	[AArch64][GlobalISel] Enable copy elision in the pre-legalizer combine and fix a crash. This enables the simple copy combine that already exists in the CombinerHelper. However, it exposed a bug in the GISelChangeObserver where it wouldn't clear a set of MIs to process, and so would end up causing a crash when deleted MIs were being added to the combiner worklist again. Differential Revision: https://reviews.llvm.org/D60579 llvm-svn: 358318	2019-04-13 00:33:25 +00:00
Amara Emerson	bdb5e4e4ca	[GlobalISel] Fix a crash when handling an invalid MVT during call lowering. This crash was introduced in r358032 as we try to construct an EVT from an MVT in order to find the register type for the calling conv. Fall back instead of trying to do this with an invalid MVT coming from i256. llvm-svn: 358314	2019-04-12 22:05:46 +00:00
Amara Emerson	2806fd01a1	[AArch64][GlobalISel] Fix a crash when selecting shufflevectors with an undef mask element. If a shufflevector's mask vector has an element with "undef" then the generic instruction defining that element register is a G_IMPLICT_DEF instead of G_CONSTANT. This fixes the selector to handle this case, and for now assumes that undef just means zero. In future we'll optimize this case properly. llvm-svn: 358312	2019-04-12 21:31:21 +00:00
Thomas Lively	9e27514996	[WebAssembly] Add mutable-globals to bleeding-edge CPU Summary: This brings the backend in line with Clang. Reviewers: aheejin, dschuff Subscribers: sbc100, jgravelle-google, hiraditya, sunfish, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D60594 llvm-svn: 358310	2019-04-12 20:39:53 +00:00
Brendon Cahoon	4df216cd62	[Hexagon] Fix reuse bug in Vector Loop Carried Reuse pass The Hexagon Vector Loop Carried Reuse pass was allowing reuse between two shufflevectors with different masks. The reason is that the masks are not instruction objects, so the code that checks each operand just skipped over the operands. This patch fixes the bug by checking if the operands are the same when they are not instruction objects. If the objects are not the same, then the code assumes that reuse cannot occur. Differential Revision: https://reviews.llvm.org/D60019 llvm-svn: 358292	2019-04-12 16:37:12 +00:00
Sanjay Patel	5e4ad39af7	[DAGCombiner] narrow shuffle of concatenated vectors // shuffle (concat X, undef), (concat Y, undef), Mask --> // concat (shuffle X, Y, Mask0), (shuffle X, Y, Mask1) The ARM changes with 'vtrn' and narrowed 'vuzp' are improvements. The x86 changes look neutral or better. There's one test with an extra instruction, but that could be reversed for a subtarget with the right attributes. But by default, we want to avoid the 256-bit op when possible (in my motivating benchmark, a handful of ymm ops sprinkled into a sequence of xmm ops are triggering frequency throttling on Haswell resulting in significantly worse perf). Differential Revision: https://reviews.llvm.org/D60545 llvm-svn: 358291	2019-04-12 16:31:56 +00:00
Simon Pilgrim	6c8f4ada36	[X86][SSE] Recognise vXi1 boolean anyof/allof reduction patterns Currently combineHorizontalPredicateResult only handles anyof/allof reduction patterns of legal types, which can be tricky to match as type legalization of bools can introduce bitcasts/truncs/extensions. This patch extends combineHorizontalPredicateResult to recognise vXi1 bool reductions as well and uses the existing combineBitcastvxi1 helper to create the MOVMSK necessary to then compare the signmask result. This ensures the accuracy of the reduction costs added in D60403 which assume the MOVMSK generation. Differential Revision: https://reviews.llvm.org/D60610 llvm-svn: 358286	2019-04-12 14:22:57 +00:00
Hans Wennborg	4e6b857922	Revert r358268 "[DebugInfo] DW_OP_deref_size in PrologEpilogInserter." It causes clang to crash while building Chromium. See https://crbug.com/952230 for reproducer. > The PrologEpilogInserter need to insert a DW_OP_deref_size before > prepending a memory location expression to an already implicit > expression to avoid having the existing expression act on the memory > address instead of the value behind it. > > The reason for using DW_OP_deref_size and not plain DW_OP_deref is that > big-endian targets need to read the right size as simply truncating a > larger read would yield the wrong result (LSB bytes are not at the lower > address). > > Differential Revision: https://reviews.llvm.org/D59687 llvm-svn: 358281	2019-04-12 12:54:52 +00:00
Kang Zhang	2446f843ae	[PowerPC] Add initialization for some ppc passes Summary: Some llc debug options need pass-name as the parameters. But if we use the pass-name ppc-early-ret, we will get below error: llc test.ll -stop-after ppc-early-ret LLVM ERROR: "ppc-early-ret" pass is not registered. Below pass-names have the pass is not registered error: ppc-ctr-loops ppc-ctr-loops-verify ppc-loop-preinc-prep ppc-toc-reg-deps ppc-vsx-copy ppc-early-ret ppc-vsx-fma-mutate ppc-vsx-swaps ppc-reduce-cr-ops ppc-qpx-load-splat ppc-branch-coalescing ppc-branch-select Reviewed By: jsji Differential Revision: https://reviews.llvm.org/D60248 llvm-svn: 358271	2019-04-12 09:59:40 +00:00
Markus Lavin	138c76129b	[DebugInfo] DW_OP_deref_size in PrologEpilogInserter. The PrologEpilogInserter need to insert a DW_OP_deref_size before prepending a memory location expression to an already implicit expression to avoid having the existing expression act on the memory address instead of the value behind it. The reason for using DW_OP_deref_size and not plain DW_OP_deref is that big-endian targets need to read the right size as simply truncating a larger read would yield the wrong result (LSB bytes are not at the lower address). Differential Revision: https://reviews.llvm.org/D59687 llvm-svn: 358268	2019-04-12 08:23:55 +00:00
Eric Christopher	b6926bdcff	Revert "[PowerPC] Add initialization for some ppc passes" This reverts commit `6f8f98ce8d` as it is breaking nearly every bot. llvm-svn: 358260	2019-04-12 07:16:58 +00:00
Craig Topper	3b1239d2a8	[TargetLowering][X86] Teach SimplifyDemandedBits to use ShrinkDemandedOp on ISD::SHL nodes. If the upper bits of the SHL result aren't used, we might be able to use a narrower shift. For example, on X86 this can turn a 64-bit into 32-bit enabling a smaller encoding. Differential Revision: https://reviews.llvm.org/D60358 llvm-svn: 358257	2019-04-12 06:49:28 +00:00
Kang Zhang	6f8f98ce8d	[PowerPC] Add initialization for some ppc passes Summary: Some llc debug options need pass-name as the parameters. But if we use the pass-name ppc-early-ret, we will get below error: llc test.ll -stop-after ppc-early-ret LLVM ERROR: "ppc-early-ret" pass is not registered. Below pass-names have the pass is not registered error: ppc-ctr-loops ppc-ctr-loops-verify ppc-loop-preinc-prep ppc-toc-reg-deps ppc-vsx-copy ppc-early-ret ppc-vsx-fma-mutate ppc-vsx-swaps ppc-reduce-cr-ops ppc-qpx-load-splat ppc-branch-coalescing ppc-branch-select Reviewed By: jsji Differential Revision: https://reviews.llvm.org/D60248 llvm-svn: 358256	2019-04-12 06:35:15 +00:00
Zi Xuan Wu	ac79ef8f0e	[PowerPC] More precise exploitation of P9 maddld instruction when operands are constant There are 3 operands of maddld, (add (mul %1, %2), %3) and sometimes they are constant. If there is constant operand, it takes extra li to materialize the operand, and one more extra register too. So it's not profitable to use maddld to optimize mul-add pattern. Differential Revision: https://reviews.llvm.org/D60181 llvm-svn: 358253	2019-04-12 05:21:31 +00:00
Brendon Cahoon	57c3d4bed3	[Pipeliner] Fix incorrect loop carried dependence calculation The isLoopCarriedDep function does not correctly compute loop carried dependences when the array index offset is negative or the stride is smallar than the access size. Patch by Denis Antrushin. Differential Revision: https://reviews.llvm.org/D60135 llvm-svn: 358233	2019-04-11 21:57:51 +00:00
Amara Emerson	7e9355f870	[AArch64][GlobalISel] Flesh out vector load/store support for more types. Some of these were legalizing into smaller vector types unnecessarily, others were simply not supported yet. llvm-svn: 358223	2019-04-11 20:40:01 +00:00
Amara Emerson	b956051415	[AArch64][GlobalISel] Legalization and ISel support for load/stores of vectors of pointers. Loads and store of values with type like <2 x p0> currently don't get imported because SelectionDAG has no knowledge of pointer types. To leverage the existing support for vector load/stores, we can bitcast the value to have s64 element types instead. We do this as a custom legalization. This patch also adds support for general loads of <2 x s64>, and relaxes some type conditions on selecting G_BITCAST. Differential Revision: https://reviews.llvm.org/D60534 llvm-svn: 358221	2019-04-11 20:32:24 +00:00
Aaron Smith	994023a3f1	[DebugInfo] Combine Trivial and NonTrivial flags Summary: Companion to https://reviews.llvm.org/D59347 Reviewers: rnk, zturner, probinson, dblaikie, deadalnix Subscribers: aprantl, jdoerfert, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D59348 llvm-svn: 358220	2019-04-11 20:25:10 +00:00
Craig Topper	68a5d619a4	[X86] Restrict vselect handling in scalarizeExtEltFP to only case to pre type legalization where the setcc result type is vXi1. If the vector setcc has been legalized then we will need to convert a vector boolean of 0 or -1 to a scalar boolean of 0 or 1. The added test case previously crashed in 32-bit mode by creating a setcc with an i64 condition that type legalization couldn't expand. llvm-svn: 358218	2019-04-11 19:57:44 +00:00
Craig Topper	a3635b94c4	[X86] Add 32-bit command line to extractelement-fp.ll so I can add a test case for a 32-bit only crasher. NFC This is a bit ugly for ABI reasons about how floats/doubles are returned. llvm-svn: 358217	2019-04-11 19:57:24 +00:00
Craig Topper	586fad50ac	[X86] Add patterns for using movss/movsd for atomic load/store of f32/64. Remove atomic fadd pseudos use isel patterns instead. This patch adds patterns for turning bitcasted atomic load/store into movss/sd. It also removes the pseudo instructions for atomic RMW fadd. Instead just adding isel patterns for folding an atomic load into addss/sd. And relying on the new movss/sd store pattern to handle the write part. This also makes the fadd patterns use VEX and EVEX instructions when AVX or AVX512F are enabled. Differential Revision: https://reviews.llvm.org/D60394 llvm-svn: 358215	2019-04-11 19:19:52 +00:00
Craig Topper	f7e548c076	Recommit r358211 "[X86] Use FILD/FIST to implement i64 atomic load on 32-bit targets with X87, but no SSE2" With correct test checks this time. If we have X87, but not SSE2 we can atomicaly load an i64 value into the significand of an 80-bit extended precision x87 register using fild. We can then use a fist instruction to convert it back to an i64 integ This matches what gcc and icc do for this case and removes an existing FIXME. llvm-svn: 358214	2019-04-11 19:19:42 +00:00
Craig Topper	8200880c9a	Revert r358211 "[X86] Use FILD/FIST to implement i64 atomic load on 32-bit targets with X87, but no SSE2" I seem to have messed up the test checks. llvm-svn: 358212	2019-04-11 19:04:38 +00:00
Craig Topper	1c2dfc3100	[X86] Use FILD/FIST to implement i64 atomic load on 32-bit targets with X87, but no SSE2 If we have X87, but not SSE2 we can atomicaly load an i64 value into the significand of an 80-bit extended precision x87 register using fild. We can then use a fist instruction to convert it back to an i64 integer and store it to a stack temporary. From there we can do two 32-bit loads to get the value into integer registers without worrying about atomicness. This matches what gcc and icc do for this case and removes an existing FIXME. Differential Revision: https://reviews.llvm.org/D60156 llvm-svn: 358211	2019-04-11 18:40:21 +00:00
Craig Topper	1fe5a9963d	[X86] Pre-commit i64 volatile test case for D60156. NFC llvm-svn: 358210	2019-04-11 18:40:08 +00:00
Simon Pilgrim	40b647ae8e	[X86] SimplifyDemandedVectorElts - add X86ISD::VPERMV3 mask support Completes SimplifyDemandedVectorElts's basic variable shuffle mask support which should help D60512 + D60562 llvm-svn: 358186	2019-04-11 15:29:15 +00:00
Simon Pilgrim	a41275a398	[X86][AVX] Tweak X86ISD::VPERMV3 demandedelts test Original test was too dependent on the order of the combines that could cause the inserted element being demanded after all llvm-svn: 358182	2019-04-11 15:09:03 +00:00
Simon Pilgrim	34686b6e97	[X86][AVX] Add X86ISD::VPERMV3 demandedelts test llvm-svn: 358175	2019-04-11 14:48:46 +00:00
Simon Pilgrim	8a25154fa7	[X86] SimplifyDemandedVectorElts - add X86ISD::VPERMV mask support llvm-svn: 358174	2019-04-11 14:35:45 +00:00
Simon Pilgrim	b237b54c2d	[X86][AVX] Add X86ISD::VPERMV demandedelts test llvm-svn: 358173	2019-04-11 14:26:32 +00:00
Sanjay Patel	c0f4a35e68	[DAGCombiner][x86] scalarize inserted vector FP ops // bo (build_vec ...undef, x, undef...), (build_vec ...undef, y, undef...) --> // build_vec ...undef, (bo x, y), undef... The lifetime of the nodes in these examples is different for variables versus constants, but they are all build vectors briefly, so I'm proposing to catch them in this form to handle all of the leading examples in the motivating test file. Before we have build vectors, we might have insert_vector_element. After that, we might have scalar_to_vector and constant pool loads. It's going to take more work to ensure that FP vector operands are getting simplified with undef elements, so this transform can apply more widely. In a non-loose FP environment, we are likely simplifying FP elements to NaN values rather than undefs. We also need to allow more opcodes down this path. Eg, we don't handle FP min/max flavors yet. Differential Revision: https://reviews.llvm.org/D60514 llvm-svn: 358172	2019-04-11 14:21:57 +00:00
Diogo N. Sampaio	8ddfd46c61	[AArch64] Add lowering pattern for llvm.aarch64.neon.vcvtfxs2fp.f16.i64 Summary: Add lowering pattern for llvm.aarch64.neon.vcvtfxs2fp.f16.i64 Reviewers: pbarrio, DavidSpickett, LukeGeeson Reviewed By: LukeGeeson Subscribers: javed.absar, kristof.beyls, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D60259 llvm-svn: 358171	2019-04-11 14:19:43 +00:00
Simon Pilgrim	6f3866c6fb	[X86] SimplifyDemandedVectorElts - add X86ISD::VPERMILPV mask support llvm-svn: 358170	2019-04-11 14:15:01 +00:00
Simon Pilgrim	886e32e0f2	[X86][AVX] Add X86ISD::VPERMILPV demandedelts tests llvm-svn: 358168	2019-04-11 14:09:35 +00:00
Simon Pilgrim	cb5218ad48	[X86] SimplifyDemandedVectorElts - add X86ISD::VPERMIL2 mask support llvm-svn: 358167	2019-04-11 14:04:19 +00:00
Simon Pilgrim	7021dec26e	[X86][XOP] Add X86ISD::VPERMIL2 demandedelts test llvm-svn: 358166	2019-04-11 13:52:43 +00:00
Simon Pilgrim	e468cc7f14	[X86] SimplifyDemandedVectorElts - add VPPERM support We need to add support for all variable shuffle mask ops, but VPPERM is the only one that already has test coverage. llvm-svn: 358165	2019-04-11 13:30:38 +00:00
Shiva Chen	7cc03bd064	[RISCV] Put data smaller than eight bytes to small data section Because of gp = sdata_start_address + 0x800, gp with signed twelve-bit offset could covert most of the small data section. Linker relaxation could transfer the multiple data accessing instructions to a gp base with signed twelve-bit offset instruction. Differential Revision: https://reviews.llvm.org/D57493 llvm-svn: 358150	2019-04-11 04:59:13 +00:00
Amara Emerson	213e0bde04	[AArch64][GlobalISel] Make <2 x p0> = G_BUILD_VECTOR legal. The existing isel support already works for p0 once the legalizer accepts it. llvm-svn: 358144	2019-04-10 23:06:14 +00:00
Amara Emerson	a7ff111b04	[AArch64][GlobalISel] Add legalizer support for <8 x s16> and <16 x s8> G_ADD. llvm-svn: 358143	2019-04-10 23:06:11 +00:00
Amara Emerson	ae878dab03	[AArch64][GlobalISel] Scalarize vector SDIV. llvm-svn: 358142	2019-04-10 23:06:08 +00:00
Craig Topper	10048060f6	[X86] Add SSE1 command line to atomic-fp.ll and atomic-non-integer.ll. NFC llvm-svn: 358141	2019-04-10 22:35:32 +00:00
Craig Topper	a3ee7e2b3e	[X86] Autogenerate complete checks. NFC llvm-svn: 358140	2019-04-10 22:35:24 +00:00
Craig Topper	61f31cbcb2	[X86] Teach foldMaskedShiftToScaledMask to look through an any_extend from i32 to i64 between the and & shl foldMaskedShiftToScaledMask tries to reorder and & shl to enable the shl to fold into an LEA. But if there is an any_extend between them it doesn't work. This patch modifies the code to look through any_extend from i32 to i64 when the and mask only uses bits that weren't from the extended part. This will prevent a regression from D60358 caused by 64-bit SHL being narrowed to 32-bits when their upper bits aren't demanded. Differential Revision: https://reviews.llvm.org/D60532 llvm-svn: 358139	2019-04-10 21:42:08 +00:00
David Green	deb3342018	[ARM] Add an extra test for constant hoist. NFC llvm-svn: 358128	2019-04-10 19:18:58 +00:00
Craig Topper	cacb70c94b	[X86] Add test case for LEA formation regression seen with D60358. NFC If we have an (add X, (and (aext (shl Y, C1)), C2)), we can pull the shift through and+aext to fold into an LEA with the. Assuming C1 is small enough and C2 masks off all of the extend bits. This pattern showed up in D60358. And we need to handle it to prevent a regression. llvm-svn: 358124	2019-04-10 19:09:06 +00:00
David Green	4e3fd7757a	[ARM] Add an extra constant hoisting test. NFC This adds a simple extra test for constant hoisting to show it's usefulness with constant addresses like those seen in memory mapped registers in embedded systems. llvm-svn: 358114	2019-04-10 18:05:57 +00:00

1 2 3 4 5 ...

28441 Commits