llvm-project

Commit Graph

Author	SHA1	Message	Date
Craig Topper	789cc8170d	[X86] Add -x86-experimental-vector-widening command lines to pmulh.ll I've only added sse2 and sse4.1 variants as I'm only interested in the two v4i16 tests and I don't expect that to different with AVX other than a v prefix. llvm-svn: 346834	2018-11-14 07:51:26 +00:00
Cameron McInally	cbde0d9c7b	[IR] Add a dedicated FNeg IR Instruction The IEEE-754 Standard makes it clear that fneg(x) and fsub(-0.0, x) are two different operations. The former is a bitwise operation, while the latter is an arithmetic operation. This patch creates a dedicated FNeg IR Instruction to model that behavior. Differential Revision: https://reviews.llvm.org/D53877 llvm-svn: 346774	2018-11-13 18:15:47 +00:00
Craig Topper	333ab7d08b	[X86] Add more tests for -x86-experimental-vector-widening-legalization I'm looking into whether we can make this the default legalization strategy. Adding these tests to help cover the changes that will be necessary. This patch adds copies of some tests with the command line switch enabled. By making copies its easier to compare the two legalization strategies. I've also removed RUN lines from some of these tests that already had -x86-experimental-vector-widening-legalization llvm-svn: 346745	2018-11-13 07:47:52 +00:00
Simon Pilgrim	e565e5a962	[X86][SSE] Add lowerVectorShuffleAsByteRotateAndPermute (PR39387) This patch adds the ability to use a PALIGNR to rotate a pair of inputs to select a range containing all the referenced elements, followed by a single input permute to put them in the right location. Differential Revision: https://reviews.llvm.org/D54267 llvm-svn: 346706	2018-11-12 21:12:38 +00:00
Craig Topper	c48712b341	[X86] In LowerMULH, use generic truncate and vector shuffle nodes instead of directly emitting PACKUS. Truncate and shuffle lowering are already capable of matching to PACKUS using known bits analysis. This features one test change where we now prefer to extend v16i16->v16i32 then trunc v16i32->v16i8 over extract_subvector+packus when avx512f is available, but avx512bw is not. llvm-svn: 346697	2018-11-12 19:37:29 +00:00
Paul Robinson	5b302bfc8e	[DWARFv5] Emit split type units in .debug_info.dwo. Differential Revision: https://reviews.llvm.org/D54350 llvm-svn: 346674	2018-11-12 16:55:11 +00:00
Sanjay Patel	622b71d40a	[x86] auto-generate complete checks; NFC llvm-svn: 346609	2018-11-11 14:57:26 +00:00
Craig Topper	2eab39f77b	[X86] Use DAG.getConstant instead of getZeroVector. llvm-svn: 346605	2018-11-11 07:24:36 +00:00
Sanjay Patel	0a515595a7	[x86] allow vector load narrowing with multi-use values This is a long-awaited follow-up suggested in D33578. Since then, we've picked up even more opportunities for vector narrowing from changes like D53784, so there are a lot of test diffs. Apart from 2-3 strange cases, these are all wins. I've structured this to be no-functional-change-intended for any target except for x86 because I couldn't tell if AArch64, ARM, and AMDGPU would improve or not. All of those targets have existing regression tests (4, 4, 10 files respectively) that would be affected. Also, Hexagon overrides the shouldReduceLoadWidth() hook, but doesn't show any regression test diffs. The trade-off is deciding if an extra vector load is better than a single wide load + extract_subvector. For x86, this is almost always better (on paper at least) because we often can fold loads into subsequent ops and not increase the official instruction count. There's also some unknown -- but potentially large -- benefit from using narrower vector ops if wide ops are implemented with multiple uops and/or frequency throttling is avoided. Differential Revision: https://reviews.llvm.org/D54073 llvm-svn: 346595	2018-11-10 20:05:31 +00:00
Roman Lebedev	b428b8b214	[X86][BdVer2] Fix loads/stores throughput for Piledriver (PR39465) There are two AGU units, and per 1cy, there can be either two loads, or a load and a store; but not two stores, or two loads and a store. Additionally, loads shouldn't affect the store scheduler and vice versa. (but should affect the PdEX scheduler.) Required rL346545. Fixes https://bugs.llvm.org/show_bug.cgi?id=39465 llvm-svn: 346587	2018-11-10 14:31:43 +00:00
Craig Topper	a1b6667c6a	[X86] Use a MOVSX instruction instead of a MOVZX instruction in isel for an any_extend of the remainder from an 8-bit sdivrem. The sdivrem will emit its own MOVSX to move %ah to the low byte of a register. By using a MOVSX for an any_extend this allows a post-isel peephole to merge them. llvm-svn: 346581	2018-11-10 06:04:33 +00:00
Craig Topper	dc12535e00	[X86] Add a test case to show scalarized vector srem to demonstrate unnecessary instructions. NFC After the division %ah is being sign extended to move it to lower byte of a register while avoiding a partial register read. We then zero extend the low byte to the full 32 bit register. But we don't use any of the zero extended bits. In the DAG the zero extend was really an any_extend so the sign extend should have been enough. llvm-svn: 346580	2018-11-10 06:04:09 +00:00
Matthias Braun	0261d6e36a	test/CodeGen/X86: Relax test case No need to hardcode register or expecting totally unnecessary spills from the allocator. llvm-svn: 346575	2018-11-10 00:34:09 +00:00
Craig Topper	0364085281	[X86] In LowerHorizontalByteSum, emit vector_shuffle nodes instead of directly using X86ISD::UNPCKL/X86ISD::UNPCKH. This gives shuffle lowering the freedom to use zero_extend_vector_inreg for the unpckl shuffle. Shuffle combining usually makes this swap later, but not when AVX512 is enabled it seems. While there also use DAG.getConstant to create a 0 vector instead of using the helper the forces a specific BUILD_VECTOR. I don't think that helper is usually needed. We're basically free to create a constant build_vector anytime and it will be legalized on its own. llvm-svn: 346574	2018-11-10 00:26:42 +00:00
Craig Topper	17d64c71c5	[X86] Move the promotion of v16i16->v16i8 for avx512f but not avx512bw from lowering to isel. Change to use vpmovzx instead of vpmovsx. With avx512f but not avx512bw we need to extend to v16i32 then truncate that to to v16i8. Previously we emitted both nodes during lowering, but I'm trying to switch to using target independent nodes and with that switched the extend+truncate wou This patch changes the implementation to what will be necessary with that patch which helps minimize test diffs. llvm-svn: 346552	2018-11-09 20:09:53 +00:00
Paul Robinson	ddbde9a4ad	[DWARFv5] Emit normal type units in .debug_info comdats. Differential Revision: https://reviews.llvm.org/D54282 llvm-svn: 346540	2018-11-09 19:06:09 +00:00
Craig Topper	731ea7dbc1	[X86] Turn X86ISD::VSEXT into X86ISD::VZEXT if the upper bits aren't demanded. This makes X86ISD::VSEXT more similar to ISD::SIGN_EXTEND and ISD::ZERO_EXTEND. I'm hoping to replace X86ISD::VSEXT/VZEXT with target independent nodes. Making the target specific nodes similar to the target independent nodes helps minimize test diffs in that patch. llvm-svn: 346539	2018-11-09 19:05:51 +00:00
Craig Topper	9a7e19b8f2	[DAGCombiner][X86][Mips] Enable combineShuffleOfScalars to run between vector op legalization and DAG legalization. Fix bad one use check in combineShuffleOfScalars It's possible for vector op legalization to generate a shuffle. If that happens we should give a chance for DAG combine to combine that with a build_vector input. I also fixed a bug in combineShuffleOfScalars that was considering the number of uses on a undef input to a shuffle. We don't care how many times undef is used. Differential Revision: https://reviews.llvm.org/D54283 llvm-svn: 346530	2018-11-09 18:04:34 +00:00
Sanjay Patel	fa1c0fe478	[x86] try to form broadcast before widening shuffle elements I noticed that we weren't generating broadcasts as much I thought we would with D54271, and this is part of the problem. Widening the shuffle elements means adding bitcasts and hiding the relationship between a splatted scalar and the vector. If we can form a broadcast, do that before going through the rest of the shuffle lowering because broadcasts should be cheap and can often be load-folded. Differential Revision: https://reviews.llvm.org/D54280 llvm-svn: 346498	2018-11-09 14:54:58 +00:00
Clement Courbet	e6b727e552	[X86] Fix VZEROUPPER scheduling info on SNB,HSW,BDW,SXL,SKX. Summary: Starting from SNB, VZEROUPPER is handled by the renamer and uses no proc resources. After HSW, it also has zero latency. This fixes PR35606. To reproduce: Uops: llvm-exegesis -mode=uops -opcode-name=VZEROUPPER Latency: echo -e '#LLVM-EXEGESIS-DEFREG XMM0 1\n#LLVM-EXEGESIS-DEFREG XMM1 1\nvzeroupper' \| /tmp/llvm-exegesis -mode=latency -snippets-file=- echo -e '#LLVM-EXEGESIS-DEFREG XMM0 1\n#LLVM-EXEGESIS-DEFREG XMM1 1\nvzeroupper\naddps %xmm0, %xmm1' \| /tmp/llvm-exegesis -mode=latency -snippets-file=- Reviewers: RKSimon, craig.topper, andreadb Subscribers: gbedwell, llvm-commits Differential Revision: https://reviews.llvm.org/D54107 llvm-svn: 346482	2018-11-09 09:49:06 +00:00
Carlos Alberto Enciso	fa9cf89734	[DebugInfo][Dexter] Unreachable line stepped onto after SimplifyCFG. In SimplifyCFG when given a conditional branch that goes to BB1 and BB2, the hoisted common terminator instruction in the two blocks, caused debug line records associated with subsequent select instructions to become ambiguous. It causes the debugger to display unreachable source lines. Differential Revision: https://reviews.llvm.org/D53390 llvm-svn: 346481	2018-11-09 09:42:10 +00:00
Simon Pilgrim	0b01062dba	[X86] Regenerate loaduse test llvm-svn: 346434	2018-11-08 19:42:11 +00:00
Sanjay Patel	b5535dc7b3	[x86] use shuffles for scalar insertion into high elements of a constant vector As discussed in D54073, we have a potential regression from more aggressive vector narrowing here, so let's try to avoid that by changing build-vector lowering slightly. Insert-vector-element lowering always does this since there's no "pinsr" for ymm/zmm: // If the vector is wider than 128 bits, extract the 128-bit subvector, insert // into that, and then insert the subvector back into the result. ...but we can sometimes do better for insert-into-constant-vector by using shuffle lowering. Differential Revision: https://reviews.llvm.org/D54271 llvm-svn: 346433	2018-11-08 19:16:27 +00:00
Nirav Dave	6ce9f72f76	[DAGCombine] Improve alias analysis for chain of independent stores. FindBetterNeighborChains simulateanously improves the chain dependencies of a chain of related stores avoiding the generation of extra token factors. For chains longer than the GatherAllAliasDepths, stores further down in the chain will necessarily fail, a potentially significant waste and preventing otherwise trivial parallelization. This patch directly parallelize the chains of stores before improving each store. This generally improves DAG-level parallelism. Reviewers: courbet, spatel, RKSimon, bogner, efriedma, craig.topper, rnk Subscribers: sdardis, javed.absar, hiraditya, jrtc27, atanasyan, llvm-commits Differential Revision: https://reviews.llvm.org/D53552 llvm-svn: 346432	2018-11-08 19:14:20 +00:00
Sanjay Patel	c4f719feb0	[x86] add RUNs for AVX1; NFC Differences in splat-ability might be reason to differentiate some cases. llvm-svn: 346426	2018-11-08 18:18:20 +00:00
Simon Pilgrim	b917740ac3	[X86][SSE] Add PR39387 shuffle test case llvm-svn: 346402	2018-11-08 14:07:17 +00:00
Simon Pilgrim	1ef4af5278	[X86][AVX] Tidyup prefixes and regenerate interleaved tests Share common AVX prefix and split off AVX2OR512 prefix instead llvm-svn: 346399	2018-11-08 12:14:10 +00:00
Than McIntosh	5bcdea5118	[X86] improve split-stack machine BB placement Summary: The conditional branch created to support -fsplit-stack for X86 is left unbiased/unhinted, resulting in less than ideal block placement: the __morestack call block is kept on the main hot path. Bias the branch to insure that the stack allocation block is treated as a "cold" block during machine basic block placement. Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D54123 llvm-svn: 346336	2018-11-07 17:41:57 +00:00
James Y Knight	72f76bf230	Add support for llvm.is.constant intrinsic (PR4898) This adds the llvm-side support for post-inlining evaluation of the __builtin_constant_p GCC intrinsic. Also fixed SCCPSolver::visitCallSite to not blow up when seeing a call to a function where canConstantFoldTo returns true, and one of the arguments is a struct. Updated from patch initially by Janusz Sobczak. Differential Revision: https://reviews.llvm.org/D4276 llvm-svn: 346322	2018-11-07 15:24:12 +00:00
Matthias Braun	5b7c90b4e2	RegAllocFast: Leave unassigned virtreg entries in map Set `LiveReg::PhysReg` to zero when freeing a register instead of removing it from the entry from `LiveRegMap`. This way no iterators get invalidated and we can avoid passing around and updating iterators all over the place. This does not change any allocator decisions. It is not completely NFC because the arbitrary iteration order through `LiveRegMap` in `spillAll()` changes so we may get a different order in those spill sequences (the amount of spills does not change). This is in preparation of https://reviews.llvm.org/D52010. llvm-svn: 346298	2018-11-07 06:57:03 +00:00
Craig Topper	6428a2cd9a	[X86] Add custom promotion of v2i8/v2i16 fp_to_sint to avoid over promotion to v2i64 which would force scalarization. llvm-svn: 346259	2018-11-06 19:24:21 +00:00
Craig Topper	17057b52fe	[X86] Autogenerate complete checks. NFC llvm-svn: 346188	2018-11-06 00:31:27 +00:00
Craig Topper	ab896b08d4	[X86] Regenerate test checks in preparation for a patch. NFC I'm preparing a patch to avoid creating critical edges in cmov expansion. Updating these tests to make the changes by the next patch easier to see. llvm-svn: 346161	2018-11-05 19:45:37 +00:00
Cameron McInally	9757d5d6c1	[FPEnv] Add constrained CEIL/FLOOR/ROUND/TRUNC intrinsics Differential Revision: https://reviews.llvm.org/D53411 llvm-svn: 346141	2018-11-05 15:59:49 +00:00
Cameron McInally	51a91e86e1	[NFCI][FPEnv] Split constrained intrinsic tests The constrained intrinsic tests have grown in number. Split off the FMA tests into their own file to reduce double coverage. Differential Revision: https://reviews.llvm.org/D53932 llvm-svn: 346137	2018-11-05 15:28:10 +00:00
Roman Lebedev	7db25f2b38	[NFC][x86][AArch64] extract-bits.ll: add test with 'ashr'. llvm-svn: 346121	2018-11-05 09:20:08 +00:00
Craig Topper	30b627e5c9	[X86] Custom type legalize v2i8/v2i16/v2i32 mul to use to pmuludq. v2i8/v2i16/v2i32 are promoted to v2i64. pmuludq takes a v2i64 input and produces a v2i64 output. Since we don't about the upper bits of the type legalized multiply we can use the pmuludq to produce the multiply result for the bits we do care about. llvm-svn: 346115	2018-11-05 05:02:12 +00:00
Craig Topper	60789b34e0	[X86] Fix typo in test comment. NFC llvm-svn: 346110	2018-11-05 01:21:52 +00:00
Craig Topper	6d3c713689	[X86] Add nounwind to some tests to remove cfi directives from checks. NFC llvm-svn: 346106	2018-11-04 21:37:45 +00:00
Craig Topper	a3210b2713	[X86] Regenerate test checks to merge 32 and 64 bit. Remove stale check prefixes. NFC llvm-svn: 346105	2018-11-04 21:37:43 +00:00
Craig Topper	ed6a0a817f	[X86] Add vector shift by immediate to SimplifyDemandedBitsForTargetNode. Summary: This also enables some constant folding from KnownBits propagation. This helps on some cases vXi64 case in 32-bit mode where constant vectors appear as vXi32 and a bitcast. This can prevent getNode from constant folding sra/shl/srl. Reviewers: RKSimon, spatel Reviewed By: spatel Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D54069 llvm-svn: 346102	2018-11-04 17:31:27 +00:00
Craig Topper	f7108aef14	[X86] In LowerEXTEND_VECTOR_INREG, emit a vector shuffle instead of directly using X86ISD::UNPCKL The majority of the changes are because the rest of shuffle lowering/combining prefers to replace the undef input with the other operand. Using UNPCKL directly seemed to avoid this and just grabbed a randomish register for the undef which can create false dependencies. llvm-svn: 346050	2018-11-02 22:48:02 +00:00
Craig Topper	60c202a494	[X86] Don't emit *_extend_vector_inreg nodes when both the input and output types are legal with AVX1 We already have custom lowering for the AVX case in LegalizeVectorOps. So its better to keep the regular extend op around as long as possible. I had to qualify one place in DAG combine that created illegal vector extending load operations. This change by itself had no effect on any tests which is why its included here. I've made a few cleanups to the custom lowering. The sign extend code no longer creates an identity shuffle with undef elements. The zero extend code now emits a zero_extend_vector_inreg instead of an unpckl with a zero vector. For the high half of the custom lowering of zero_extend/any_extend, we're now using an unpckh with a zero vector or undef. Previously we used used a pshufd to move the upper 64-bits to the lower 64-bits and then used a zero_extend_vector_inreg. I think the zero vector should require less execution resources and be smaller code size. Differential Revision: https://reviews.llvm.org/D54024 llvm-svn: 346043	2018-11-02 21:09:49 +00:00
Simon Pilgrim	88e8763bae	[X86][AVX512] Change mask ops on vpermi2var tests to not use zeroinitializer. This is necessary as I'm wanting to remove the 'Constant Pool' shuffle decoding from getTargetShuffleMask - but using getTargetShuffleMaskIndices allows the shuffle combiner to realize that these calls are really broadcasts..... As with a lot of the X86ISD::VPERMV3 code this causes some vperm2i/vperm2t shuffles to flip depending on optimal commutation. llvm-svn: 346032	2018-11-02 19:39:41 +00:00
Jeremy Morse	d538352b3e	[MachineSink][DebugInfo] Correctly sink DBG_VALUEs As reported in PR38952, postra-machine-sink relies on DBG_VALUE insns being adjacent to the def of the register that they reference. This is not always true, leading to register copies being sunk but not the associated DBG_VALUEs, which gives the debugger a bad variable location. This patch collects DBG_VALUEs as we walk through a BB looking for copies to sink, then passes them down to performSink. Compile-time impact should be negligable. Differential Revision: https://reviews.llvm.org/D53992 llvm-svn: 345996	2018-11-02 16:52:48 +00:00
Simon Pilgrim	cdcbeb4997	[DAGCombiner] Remove reduceBuildVecConvertToConvertBuildVec and rely on the vectorizers instead (PR35732) reduceBuildVecConvertToConvertBuildVec vectorizes int2float in the DAGCombiner, which means that even if the LV/SLP has decided to keep scalar code using the cost models, this will override this. While there are cases where vectorization is necessary in the DAG (mainly due to legalization artefacts), I don't think this is the case here, we should assume that the vectorizers know what they are doing. Differential Revision: https://reviews.llvm.org/D53712 llvm-svn: 345964	2018-11-02 11:06:18 +00:00
Craig Topper	e2483020f2	[DAGCombiner] Make the isTruncateOf call from visitZERO_EXTEND work for vectors. Remove FIXME. I'm having trouble creating a test case for the ISD::TRUNCATE part of this that shows any codegen differences. But I was able to test the setcc path which is what the test changes here cover. llvm-svn: 345908	2018-11-01 23:21:45 +00:00
Craig Topper	7a782cce35	[X86] Add test cases for adding vector support to isTruncateOf in DAGCombiner::visitZERO_EXTEND llvm-svn: 345907	2018-11-01 23:21:42 +00:00
Sanjay Patel	c5fe3ce2ec	[DAGCombiner] make sure we have a whole-number extract before trying to narrow a vector op (PR39511) The test causes a crash because we were trying to extract v4f32 to v3f32, and the narrowing factor was then 4/3 = 1 producing a bogus narrow type. This should fix: https://bugs.llvm.org/show_bug.cgi?id=39511 llvm-svn: 345842	2018-11-01 15:41:12 +00:00
Simon Pilgrim	1f0a8421ad	[X86][SSE] Move 2-input limit up from getFauxShuffleMask to resolveTargetShuffleInputs (reapplied) Reapplying an updated version of rL345395 (reverted in rL345451), now the issues noticed in PR39483 have been fixed. This patch allows resolveTargetShuffleInputs to remove UNDEF inputs from cases where we have more than 2 inputs. llvm-svn: 345824	2018-11-01 11:52:09 +00:00

1 2 3 4 5 ...

12775 Commits