llvm-project

Commit Graph

Author	SHA1	Message	Date
Craig Topper	b34eef7b41	[X86] Remove another weird scalar sqrt/rcp/rsqrt pattern. This pattern turned a vector sqrt/rcp/rsqrt operation of sse_load_f32/f64 into the the scalar instruction for the operation and put undef into the upper bits. For correctness, the resulting code should still perform the sqrt/rcp/rsqrt on the upper bits after the load is extended since that's what the operation asked for. Particularly in the case where the upper bits are 0, in that case we need calculate the sqrt/rcp/rsqrt of the zeroes and keep the result in the upper-bits. This implies we should be using the packed instruction still. The only test case for this pattern is one I just added so there was no coverage of this. llvm-svn: 288784	2016-12-06 08:08:12 +00:00
Craig Topper	26ce4267ef	[X86] Add test case demonstrating a case where a vector sqrt being passed (scalar_to_vector loadf64) uses a scalar sqrt instruction. This occurs due to a pattern that uses sse_load_f32/f64 with vector sqrt/rcp/rsqrt operations and turns them into scalar instructions. Perhaps for the case were the upper bits come from undef this is ok. I believe a (vzmovl load64) would do the same thing but those seems to become vzload instead and selectScalarSSELoad doesn't handle that today. In that case we should be performing the vector operation on the zeros in the upper bits which is not equivalent to using a scalar instruction. I will remove this pattern in a follow up patch. There appears to be no other test content for it. llvm-svn: 288783	2016-12-06 08:08:09 +00:00
Craig Topper	aa2c38378c	[X86] Regenerate a test using update_llc_test_checks.py llvm-svn: 288782	2016-12-06 08:08:07 +00:00
Craig Topper	683470bf1b	[X86] Remove bad pattern that caused 128-bit loads being used by scalar sqrt/rcp/rsqrt intrinsics to select the memory form of the corresponding instruction and violate the semantics of the intrinsic. The intrinsics are supposed to pass the upper bits straight through to their output register. This means we need to make sure we still perform the 128-bit load to get those upper bits to pass to give to the instruction since the memory form of the instruction only reads 32 or 64 bits. llvm-svn: 288781	2016-12-06 08:08:04 +00:00
Craig Topper	125939ff65	[X86] Add test case that shows a scalar sqrtsd intrinsic of a 128-bit vector load using the load form of the sqrtsd instruction which violates the intrinsic semantics. The sqrtsd instruction only loads 64-bits and writes bits 63:0 with the sqrt result. Bits 127:64 are preserved in the destination register. The semantics of the intrinsic indicate bits 127:64 should come from the intrinsic argument which in this case is a 128-bit load. So the generated code should have a 128-bit load and use a register form of sqrtsd. llvm-svn: 288780	2016-12-06 08:08:01 +00:00
Craig Topper	5fc7bc91f9	[X86] Correct pattern for VSQRTSSr_Int, VSQRTSDr_Int, VRCPSSr_Int, and VRSQRTSSr_Int to not have an IMPLICIT_DEF on the first input. The semantics of the intrinsic are clear and not undefined. The intrinsic takes one argument, the lower bits are affected by the operation and the upper bits should be passed through. The instruction itself takes two operands, the high bits of the first operand are passed through and the low bits of the second operand are modified by the operation. To match this to the intrinsic we should pass the single intrinsic input to both operands. I had to remove the stack folding test for these instructions since they depended on the incorrect behavior. The same register is now used for both inputs so the load can't be folded. llvm-svn: 288779	2016-12-06 08:07:58 +00:00
Craig Topper	6413f8a8f2	[X86] Remove scalar logical op alias instructions. Just use COPY_FROM/TO_REGCLASS and the normal packed instructions instead Summary: This patch removes the scalar logical operation alias instructions. We can just use reg class copies and use the normal packed instructions instead. This removes the need for putting these instructions in the execution domain fixing tables as was done recently. I removed the loadf64_128 and loadf32_128 patterns as DAG combine creates a narrower load for (extractelt (loadv4f32)) before we ever get to isel. I plan to add similar patterns for AVX512DQ in a future commit to allow use of the larger register class when available. Reviewers: spatel, delena, zvi, RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D27401 llvm-svn: 288771	2016-12-06 04:58:39 +00:00
Michael Kuperstein	e3036abcf9	[X86] Fix non-intrinsic roundss/roundsd to not read the destination register This changes the scalar non-intrinsic non-avx roundss/sd instruction definitions not to read their destination register - allowing partial dependency breaking. This fixes PR31143. Differential Revision: https://reviews.llvm.org/D27323 llvm-svn: 288703	2016-12-05 20:57:37 +00:00
Adrian Prantl	941fa7588b	[DIExpression] Introduce a dedicated DW_OP_LLVM_fragment operation so we can stop using DW_OP_bit_piece with the wrong semantics. The entire back story can be found here: http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20161114/405934.html The gist is that in LLVM we've been misinterpreting DW_OP_bit_piece's offset field to mean the offset into the source variable rather than the offset into the location at the top the DWARF expression stack. In order to be able to fix this in a subsequent patch, this patch introduces a dedicated DW_OP_LLVM_fragment operation with the semantics that we used to apply to DW_OP_bit_piece, which is what we actually need while inside of LLVM. This patch is complete with a bitcode upgrade for expressions using the old format. It does not yet fix the DWARF backend to use DW_OP_bit_piece correctly. Implementation note: We discussed several options for implementing this, including reserving a dedicated field in DIExpression for the fragment size and offset, but using an custom operator at the end of the expression works just fine and is more efficient because we then only pay for it when we need it. Differential Revision: https://reviews.llvm.org/D27361 rdar://problem/29335809 llvm-svn: 288683	2016-12-05 18:04:47 +00:00
Sanjay Patel	1f158d6955	[TargetLowering] add special-case for demanded bits analysis of 'not' We treat bitwise 'not' as a special operation and try not to reduce its all-ones mask. Presumably, this is because a 'not' may be cheaper than a generic 'xor' or it may get folded into another logic op if the target has those. However, if we can remove a logic instruction by changing the xor's constant mask value, that should always be a win. Note that the IR version of SimplifyDemandedBits() does not treat 'not' as a special-case currently (although that's marked with a FIXME). So if you run this IR through -instcombine, you should get the same end result. I'm hoping to add a different backend transform that will expose this problem though, so I need to solve this first. Differential Revision: https://reviews.llvm.org/D27356 llvm-svn: 288676	2016-12-05 15:58:21 +00:00
Sanjay Patel	f807f6a05f	[x86] fold fand (fxor X, -1) Y --> fandn X, Y I noticed this gap in the scalar FP-logic matching with: D26712 and: rL287171 Differential Revision: https://reviews.llvm.org/D27385 llvm-svn: 288675	2016-12-05 15:45:27 +00:00
Simon Pilgrim	b08c98f125	[X86][SSE] Add support for combining target shuffles to UNPCKL/UNPCKH. llvm-svn: 288663	2016-12-05 11:25:13 +00:00
Craig Topper	db8467ae26	[AVX-512] Teach fast isel to handle 512-bit vector bitcasts. llvm-svn: 288641	2016-12-05 05:50:51 +00:00
Craig Topper	7ef6ea324a	[AVX-512] Teach fast isel to use masked compare and movss for handling scalar cmp and select sequence when AVX-512 is enabled. This matches the behavior of normal isel. llvm-svn: 288636	2016-12-05 04:51:31 +00:00
Craig Topper	227d4279a8	[AVX-512] Add avx512f command lines to fast isel SSE select test. Currently the fast isel code emits an avx1 instruction sequence even with avx512. This is different than normal isel. A follow up commit will fix this. llvm-svn: 288635	2016-12-05 04:51:28 +00:00
Simon Pilgrim	6133fc3aa2	[X86][XOP] Add target shuffle tests showing missing UNPCKL combine. llvm-svn: 288628	2016-12-04 22:55:57 +00:00
Simon Pilgrim	38d245197e	[X86][AVX512] Add target shuffle tests showing missing UNPCK combines. llvm-svn: 288627	2016-12-04 22:54:21 +00:00
Matt Arsenault	92fede361f	DAG: Fold out out of bounds insert_vector_elt getNode already prevents formation of out of bounds constant extract_vector_elts. Do the same for insert_vector_elt. llvm-svn: 288603	2016-12-03 23:03:26 +00:00
Craig Topper	9d16bfa0f5	[AVX-512] Add many of the VPERM instructions to the load folding table. Move VPERMPDZri to the correct table. llvm-svn: 288591	2016-12-03 19:37:39 +00:00
Craig Topper	c210827b53	[AVX-512] Add EVEX VPMADDUBSW and VPMADDWD to the load folding tables. llvm-svn: 288587	2016-12-03 17:19:15 +00:00
Craig Topper	8e7498976a	[X86] Fix VEX encoded VPMADDUBSW to not be marked commutable. This was accidentallly broken in r285515 when we started lowering the intrinsic to an ISD node. Should fix PR31241. llvm-svn: 288578	2016-12-03 05:35:44 +00:00
Craig Topper	da73a09fcd	[X86] Add test cases demonstrating where we incorrectly commute VEX VPMADDUSBW due to a bug introduced in r285515. I believe this is the cause of PR31241. llvm-svn: 288577	2016-12-03 05:35:38 +00:00
Sanjay Patel	a5dbdf342b	[x86] add common check prefix to reduce duplication; NFC llvm-svn: 288522	2016-12-02 17:58:26 +00:00
Sanjay Patel	c731187732	fix check-label llvm-svn: 288517	2016-12-02 17:50:14 +00:00
Sanjay Patel	91d1ed5ee6	[x86] add tests to show missing demanded bits analysis; NFC llvm-svn: 288515	2016-12-02 17:48:48 +00:00
Nicolai Haehnle	33ca182c91	[DAGCombiner] do not fold (fmul (fadd X, 1), Y) -> (fmad X, Y, Y) by default Summary: When X = 0 and Y = inf, the original code produces inf, but the transformed code produces nan. So this transform (and its relatives) should only be used when the no-infs-fp-math flag is explicitly enabled. Also disable the transform using fmad (intermediate rounding) when unsafe-math is not enabled, since it can reduce the precision of the result; consider this example with binary floating point numbers with two bits of mantissa: x = 1.01 y = 111 x * (y + 1) = 1.01 * 1000 = 1010 (this is the exact result; no rounding occurs at any step) x * y + x = 1000.11 + 1.01 =r 1000 + 1.01 = 1001.01 =r 1000 (with rounding towards zero) The example relies on rounding towards zero at least in the second step. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=98578 Reviewers: RKSimon, tstellarAMD, spatel, arsenm Subscribers: wdng, llvm-commits Differential Revision: https://reviews.llvm.org/D26602 llvm-svn: 288506	2016-12-02 16:06:18 +00:00
Simon Pilgrim	3a19863f1c	[X86][SSE] Renamed shuffle combine test. We're trying to combine to vpunpckhbw not vpunpckhwd llvm-svn: 288501	2016-12-02 14:43:39 +00:00
Simon Pilgrim	cbf5f97018	[X86][SSE] Add support for extracting constant bit data from broadcasted constants llvm-svn: 288499	2016-12-02 13:16:08 +00:00
Craig Topper	4961fa9bba	[AVX-512] Add EVEX vpshuflw/vpshufhw/vpshufd instructions to load folding tables. llvm-svn: 288484	2016-12-02 07:57:11 +00:00
Craig Topper	17ddb521ef	[AVX-512] Add EVEX PSHUFB instructions to load folding tables. llvm-svn: 288482	2016-12-02 07:06:30 +00:00
Craig Topper	f7866fad54	[AVX-512] Add masked VINSERTF/VINSERTI instructions to load folding tables. llvm-svn: 288481	2016-12-02 06:24:38 +00:00
Chih-Hung Hsieh	76b913c470	[SelectionDAG] getRawSubclassData should not return HasDebugValue. This change fixes a regression in r279537 and makes getRawSubclassData behave like r279536. Without this change, the fp128-g.ll test case will have an infinite loop involving SoftenFloatRes_LOAD. Differential Revision: http://reviews.llvm.org/D26942 llvm-svn: 288420	2016-12-01 21:56:33 +00:00
Simon Pilgrim	5fe6236035	[X86][SSE] Classify AND bitmasks as variable shuffle masks They are loading the bitmasks from the constant pool so the cost is similar to loading a shuffle mask. llvm-svn: 288367	2016-12-01 16:00:14 +00:00
Simon Pilgrim	1e4d870999	[X86][SSE] Add support for combining AND bitmasks to shuffles. llvm-svn: 288365	2016-12-01 15:41:40 +00:00
Simon Pilgrim	2cff28dd27	[X86][SSE] Tidied up filecheck prefixes for uitofp fast-math tests. They should be in 'narrowing' order from common to more specific test prefixes. llvm-svn: 288338	2016-12-01 14:56:48 +00:00
Simon Pilgrim	55066e5622	[X86][SSE] Add support for combining target shuffles to AND bitmasks. llvm-svn: 288335	2016-12-01 13:47:02 +00:00
Simon Pilgrim	947650e99d	[X86][SSE] Add support for combining ISD::AND with shuffles. Attempts to convert an AND with a vector of 255 or 0 values into a shuffle (blend) mask. llvm-svn: 288333	2016-12-01 11:52:37 +00:00
Simon Pilgrim	ed4ede0c29	[X86][SSE] Added tests showing missed combines of shuffles with ANDs. llvm-svn: 288330	2016-12-01 11:26:07 +00:00
Kostya Serebryany	b66cb88c2e	revert r288283 as it causes debug info (line numbers) to be lost in instrumented code. also revert r288299 which was a workaround for the problem. llvm-svn: 288300	2016-12-01 02:06:56 +00:00
Matthias Braun	39c3c89cdc	MCStreamer: Use "cfi" for CFI related temp labels. Choosing a "cfi" name makes the intend a bit clearer in an assembly dump and more importantly the assembly dumps are slightly more stable as the numbers don't move around anymore when unrelated code calls createTempSymbol() more or less often. As they are temp labels the name doesn't influence the generated object code. Differential Revision: https://reviews.llvm.org/D27244 llvm-svn: 288290	2016-11-30 23:48:26 +00:00
Paul Robinson	37a13ddb4b	Recommit r288212: Emit 'no line' information for interesting 'orphan' instructions. The LLDB tests are now ready for this patch. DWARF specifies that "line 0" really means "no appropriate source location" in the line table. Use this for branch targets and some other cases that have no specified source location, to prevent inheriting unfortunate line numbers from physically preceding instructions (which might be from completely unrelated source). Differential Revision: http://reviews.llvm.org/D24180 llvm-svn: 288283	2016-11-30 22:49:55 +00:00
Simon Pilgrim	4ae3834792	[X86][SSE] Added tests showing missed combines of ANDs with shuffles. llvm-svn: 288259	2016-11-30 18:15:10 +00:00
Simon Pilgrim	288c088c17	[X86][SSE] Add support for target shuffle constant folding Initial support for target shuffle constant folding in cases where all shuffle inputs are constant. We may be able to relax this and merge shuffles with only some constant inputs in the future. I've added the helper function getTargetConstantBitsFromNode (based off a similar function in X86ShuffleDecodeConstantPool.cpp) that could be reused for other cases requiring constant vector extraction. Differential Revision: https://reviews.llvm.org/D27220 llvm-svn: 288250	2016-11-30 16:33:46 +00:00
Simon Pilgrim	6c38b7548d	Updated test with -verify-machineinstrs to check for PR21931 llvm-svn: 288242	2016-11-30 13:21:12 +00:00
Simon Pilgrim	b099d16516	[X86][SSE] Add tests demonstrating missed opportunities to combine 64-bit element unpacks with horizontal pair ops. llvm-svn: 288240	2016-11-30 11:30:33 +00:00
Paul Robinson	957ba405e8	Revert r288212 due to lldb failure. llvm-svn: 288216	2016-11-29 23:20:35 +00:00
Paul Robinson	96de8c778b	Emit 'no line' information for interesting 'orphan' instructions. DWARF specifies that "line 0" really means "no appropriate source location" in the line table. Use this for branch targets and some other cases that have no specified source location, to prevent inheriting unfortunate line numbers from physically preceding instructions (which might be from completely unrelated source). Differential Revision: http://reviews.llvm.org/D24180 llvm-svn: 288212	2016-11-29 22:41:16 +00:00
Simon Pilgrim	17062a2bf6	[X86][AVX512VL] Improved testing of vcvtpd2ps, vcvtpd2dq/vcvtpd2udq and vcvttpd2dq/vcvttpd2udq implicit zeroing of upper 64-bits of xmm result Ensure that masked instruction doesn't assume implicit zeroing. llvm-svn: 288211	2016-11-29 22:38:30 +00:00
Simon Pilgrim	849473ae99	[X86][AVX512DQVL] Improved testing of vcvtqq2ps/vcvtuqq2ps implicit zeroing of upper 64-bits of xmm result Ensure that masked instruction doesn't assume implicit zeroing. llvm-svn: 288209	2016-11-29 22:36:28 +00:00
Simon Pilgrim	35c47c494d	[X86][SSE] Add initial support for combining target shuffles to (V)PMOVZX. We can only handle 128-bit vectors until we support target shuffle inputs of different size to the output. llvm-svn: 288140	2016-11-29 14:18:51 +00:00
Simon Pilgrim	c17fb85090	[X86][SSE] Added tests showing missed combines to (V)PMOVZX llvm-svn: 288136	2016-11-29 13:16:11 +00:00
Reid Kleckner	c68a6c4ca9	Recognize ${:uid} escapes in intel syntax inline asm It looks like this logic was duplicated long ago and the GCC side of things has grown additional functionality. We need ${:uid} at least to generate unique MS inline asm labels (PR23715), so expose these. llvm-svn: 288092	2016-11-29 00:29:27 +00:00
Joerg Sonnenberger	caaa82d90d	Revert r287553: [CodeGenPrep] Skip merging empty case blocks It results in assertions in lib/Analysis/BlockFrequencyInfoImpl.cpp line 670 ("Expected irreducible CFG"). llvm-svn: 288052	2016-11-28 18:56:54 +00:00
Simon Pilgrim	2228f70a85	[X86][SSE] Add initial support for combining (V)PMOVZX with shuffles. llvm-svn: 288049	2016-11-28 17:58:19 +00:00
Simon Pilgrim	3f10e66981	[X86][SSE] Added support for combining bit-shifts with shuffles. Bit-shifts by a whole number of bytes can be represented as a shuffle mask suitable for combining. Added a 'getFauxShuffleMask' function to allow us to create shuffle masks from other suitable operations. llvm-svn: 288040	2016-11-28 16:25:01 +00:00
Simon Pilgrim	3def9e11e2	[X86][SSE] Added tests showing missed combines of shifts with shuffles. llvm-svn: 288037	2016-11-28 15:50:39 +00:00
Craig Topper	ff9d45875a	[X86][FMA4] Add load folding support for FMA4 scalar intrinsic instructions. llvm-svn: 288009	2016-11-27 21:37:00 +00:00
Craig Topper	b00872b983	[X86][FMA4] Add test cases to demonstrate missed folding opportunities for FMA4 scalar intrinsics. llvm-svn: 288008	2016-11-27 21:36:58 +00:00
Simon Pilgrim	91d6f5fbc1	[X86][SSE] Add support for combining target shuffles to 128/256-bit PSLL/PSRL bit shifts llvm-svn: 288006	2016-11-27 21:08:19 +00:00
Craig Topper	4fab487265	[AVX-512] Add integer and fp unpck instructions to load folding tables. llvm-svn: 288004	2016-11-27 19:51:41 +00:00
Simon Pilgrim	4571157d2d	[X86][SSE] Added tests showing missed combines for shuffle to shifts. llvm-svn: 288000	2016-11-27 18:25:02 +00:00
Craig Topper	c3b3926f8b	[AVX-512] Add masked EVEX vpmovzx/sx instructions to load folding tables. llvm-svn: 287995	2016-11-27 08:55:31 +00:00
Craig Topper	991d1ca3ba	[X86] Add a hasOneUse check to selectScalarSSELoad to keep the same load from being folded multiple times. Summary: When selectScalarSSELoad is looking for a scalar_to_vector of a scalar load, it makes sure the load is only used by the scalar_to_vector. But it doesn't make sure the scalar_to_vector is only used once. This can cause the same load to be folded multiple times. This can be bad for performance. This also causes the chain output to be duplicated, but not connected to anything so chain dependencies will not be satisfied. Reviewers: RKSimon, zvi, delena, spatel Subscribers: andreadb, llvm-commits Differential Revision: https://reviews.llvm.org/D26790 llvm-svn: 287983	2016-11-26 17:29:25 +00:00
Craig Topper	10d5eec1a1	[AVX-512] Add unmasked EVEX vpmovzx/sx instructions to load folding tables. llvm-svn: 287975	2016-11-26 08:21:52 +00:00
Craig Topper	97169ea5f9	[AVX-512] Add masked 128/256-bit integer add/sub instructions to load folding tables. llvm-svn: 287974	2016-11-26 08:21:48 +00:00
Craig Topper	53b33de1e3	[AVX-512] Add masked 512-bit integer add/sub instructions to load folding tables. llvm-svn: 287972	2016-11-26 07:21:00 +00:00
Craig Topper	6677bb4e50	[AVX-512] Teach LowerFormalArguments to use the extended register class when available. Fix the avx512vl stack folding tests to clobber more registers or otherwise they use xmm16 after this change. llvm-svn: 287971	2016-11-26 07:20:57 +00:00
Craig Topper	39265bb1ce	[AVX-512] Add VLX versions of VDIVPD/PS and VMULPD/PS to load folding tables. llvm-svn: 287970	2016-11-26 07:20:53 +00:00
Craig Topper	88071b37ab	[AVX-512] Add support for changing VSHUFF64x2 to VSHUFF32x4 when its feeding a vselect with 32-bit element size. Summary: Shuffle lowering may have widened the element size of a i32 shuffle to i64 before selecting X86ISD::SHUF128. If this shuffle was used by a vselect this can prevent us from selecting masked operations. This patch detects this and changes the element size to match the vselect. I don't handle changing integer to floating point or vice versa as its not clear if its better to push such a bitcast to the inputs of the shuffle or to the user of the vselect. So I'm ignoring that case for now. Reviewers: delena, zvi, RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D27087 llvm-svn: 287939	2016-11-25 16:48:05 +00:00
Craig Topper	1e48829747	[AVX-512] Add VPERMT2* and VPERMI2* instructions to load folding tables. llvm-svn: 287937	2016-11-25 16:33:53 +00:00
Simon Pilgrim	84b6f26eca	[X86][SSE] Added knownbits through bitcast test llvm-svn: 287928	2016-11-25 15:07:15 +00:00
Simon Pilgrim	8424b03d00	[X86][SSE] Added v16i8 shuffle test case from PR31151 llvm-svn: 287919	2016-11-25 11:10:43 +00:00
Craig Topper	d621e3a25b	[X86] Modify two tests that passed undef to both sides of a vselect to instead pass unique values. I'd like to teach DAG combine to remove vselects where both sides are identical and these tests were in the way of that. llvm-svn: 287903	2016-11-24 21:48:50 +00:00
Craig Topper	00758090ca	[AVX-512] Add tests demonstrating failure to generated masked instructions for VSHUFF32x4 and VSHUFI32x4 due to shuffle lowering widening elements. llvm-svn: 287897	2016-11-24 18:24:46 +00:00
Simon Pilgrim	9c71e07276	[X86][SSE] Improve UINT_TO_FP v2i32 -> v2f64 Vectorize UINT_TO_FP v2i32 -> v2f64 instead of scalarization (albeit still on the SIMD unit). The codegen matches that generated by legalization (and is in fact used by AVX for UINT_TO_FP v4i32 -> v4f64), but has to be done in the x86 backend to account for legalization via 4i32. Differential Revision: https://reviews.llvm.org/D26938 llvm-svn: 287886	2016-11-24 15:12:56 +00:00
Simon Pilgrim	841d7ca463	[X86][AVX512] Add support for v2i64 fptosi/fptoui/sitofp/uitofp on AVX512DQ-only targets Use 512-bit instructions with subvector insertion/extraction like we do in a number of similar circumstances llvm-svn: 287882	2016-11-24 14:46:55 +00:00
Simon Pilgrim	7c26a6f9ef	[X86][AVX512DQVL] Add awareness of vcvtqq2ps and vcvtuqq2ps implicit zeroing of upper 64-bits of xmm result llvm-svn: 287878	2016-11-24 14:02:30 +00:00
Simon Pilgrim	ab323ec411	[X86][AVX512DQVL] Add support for v2i64 -> v2f32 SINT_TO_FP/UINT_TO_FP lowering llvm-svn: 287877	2016-11-24 13:38:59 +00:00
Simon Pilgrim	1e7a846d5f	[X86][AVX512DQVL] Add v2i64 -> v2f32 + zero codegen tests llvm-svn: 287876	2016-11-24 13:26:51 +00:00
Nikolai Bozhenov	3a8d108b2b	[x86] Fixing PR28755 by precomputing the address used in CMPXCHG8B The bug arises during register allocation on i686 for CMPXCHG8B instruction when base pointer is needed. CMPXCHG8B needs 4 implicit registers (EAX, EBX, ECX, EDX) and a memory address, plus ESI is reserved as the base pointer. With such constraints the only way register allocator would do its job successfully is when the addressing mode of the instruction requires only one register. If that is not the case - we are emitting additional LEA instruction to compute the address. It fixes PR28755. Patch by Alexander Ivchenko <alexander.ivchenko@intel.com> Differential Revision: https://reviews.llvm.org/D25088 llvm-svn: 287875	2016-11-24 13:23:35 +00:00
Craig Topper	f23b995f78	[AVX-512] Fix some mask shuffle tests to actually test the case they were supposed to test. llvm-svn: 287854	2016-11-24 05:36:50 +00:00
Craig Topper	993c7416d3	[AVX-512] Move a 16 x float shuffle test to the v16 test file and add an integer variant. llvm-svn: 287853	2016-11-24 05:36:47 +00:00
Simon Pilgrim	3ce6a545c7	[X86][SSE] Add awareness of (v)cvtpd2dq and vcvtpd2udq implicit zeroing of upper 64-bits of xmm result We've already added the equivalent for (v)cvttpd2dq (rL284459) and vcvttpd2udq llvm-svn: 287835	2016-11-23 22:35:06 +00:00
Simon Pilgrim	eda1193456	[X86][AVX512VL] Add v2f64 -> v2i32/v2f32 + zero codegen tests llvm-svn: 287821	2016-11-23 22:01:50 +00:00
Simon Pilgrim	9234ff26d9	[X86][SSE] Add v2i64 -> v2i32 + zero codegen test llvm-svn: 287813	2016-11-23 21:19:57 +00:00
Michael Kuperstein	47eb85a003	[X86] Allow folding of stack reloads when loading a subreg of the spilled reg We did not support subregs in InlineSpiller:foldMemoryOperand() because targets may not deal with them correctly. This adds a target hook to let the spiller know that a target can handle subregs, and actually enables it for x86 for the case of stack slot reloads. This fixes PR30832. Differential Revision: https://reviews.llvm.org/D26521 llvm-svn: 287792	2016-11-23 18:33:49 +00:00
John Brawn	150addb45c	[DAGCombiner] Fix infinite loop in vector mul/shl combining We have the following DAGCombiner transformations: (mul (shl X, c1), c2) -> (mul X, c2 << c1) (mul (shl X, C), Y) -> (shl (mul X, Y), C) (shl (mul x, c1), c2) -> (mul x, c1 << c2) Usually the constant shift is optimised by SelectionDAG::getNode when it is constructed, by SelectionDAG::FoldConstantArithmetic, but when we're dealing with vectors and one of those vector constants contains an undef element FoldConstantArithmetic does not fold and we enter an infinite loop. Fix this by making FoldConstantArithmetic use getNode to decide how to fold each vector element, the same as FoldConstantVectorArithmetic does, and rather than adding the constant shift to the work list instead only apply the transformation if it's already been folded into a constant, as if it's not we're going to loop endlessly. Additionally add missing NoOpaques to one of those transformations, which I noticed when writing the tests for this. Differential Revision: https://reviews.llvm.org/D26605 llvm-svn: 287766	2016-11-23 16:05:51 +00:00
Simon Pilgrim	4e9b9cbee9	[X86][AVX512] Add support for v4i64 fptosi/fptoui/sitofp/uitofp on AVX512DQ-only targets Use 512-bit instructions with subvector insertion/extraction like we do in a number of similar circumstances llvm-svn: 287762	2016-11-23 14:01:18 +00:00
Elena Demikhovsky	09375d98b8	Type legalization for compressstore and expandload intrinsics. Implemented widening (v2f32) and splitting (v16f64). On splitting, I use "popcnt" to calculate memory increment. More type legalization work will come in the next patches. llvm-svn: 287761	2016-11-23 13:58:24 +00:00
Craig Topper	f57e17def0	[AVX-512] Remove intrinsics for valignd/q and autoupgrade them to native shuffles. llvm-svn: 287744	2016-11-23 06:54:55 +00:00
Kuba Mracek	06995e866b	[xray] Add XRay support for Mach-O in CodeGen Currently, XRay only supports emitting the XRay table (xray_instr_map) on ELF binaries. Let's add Mach-O support. Differential Revision: https://reviews.llvm.org/D26983 llvm-svn: 287734	2016-11-23 02:07:04 +00:00
Simon Pilgrim	eda365cf80	[X86][AVX512DQ] Add fp <-> int tests for AVX512DQ/AVX512DQ+VL llvm-svn: 287706	2016-11-22 22:04:50 +00:00
Simon Pilgrim	4aa876ca7c	[X86][SSE] Combine UNPCKL(FHADD,FHADD) -> FHADD for v2f64 shuffles. This occurs during UINT_TO_FP v2f64 lowering. We can easily generalize this to other horizontal ops (FHSUB, PACKSS, PACKUS) as required - we are doing something similar with PACKUS in lowerV2I64VectorShuffle llvm-svn: 287676	2016-11-22 17:50:06 +00:00
Simon Pilgrim	d70c03ad68	Fix line endings llvm-svn: 287638	2016-11-22 13:27:29 +00:00
Simon Pilgrim	72e43570b7	[SelectionDAG] ComputeNumSignBits of TRUNCATE operations Add basic ComputeNumSignBits support for TRUNCATE ops for cases where the source's number of sign bits overlaps with the truncated size. Improves X86 SIGN_EXTEND_IN_REG vector cases which were needlessly sign extending boolean vector results. Differential Revision: https://reviews.llvm.org/D26851 llvm-svn: 287635	2016-11-22 11:29:19 +00:00
Craig Topper	cada9f2275	[AVX-512] Add support for commuting VPERMT2(B/W/D/Q/PS/PD) to/from VPERMI2(B/W/D/Q/PS/PD). Summary: The index and one of the table operands can be swapped by changing the opcode to the other version. Neither of these operands are the one that can load from memory so this can't be used to increase memory folding opportunities. We need to handle the unmasked forms and the kz forms. Since the load operand isn't being commuted we can commute the load and broadcast instructions too. Reviewers: igorb, delena, Ayal, Farhana, RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D25652 llvm-svn: 287621	2016-11-22 04:57:34 +00:00
Craig Topper	da22267055	[AVX-512] Add support for changing the element size of PALIGNR/VALIGND/VALIGNQ shuffles if they feed a vselect with a different type Summary: Shuffle lowering widens the element size of a shuffle if elements are contiguous. This is sometimes help because wider element types have more shuffle options. If the shuffle is one of the arguments to a vselect this shuffle widening can introduce a bitcast between the vselect and the shuffle. This will prevent isel from selecting a masked operation. If the shuffle can be written equally efficiently with a different element size to match the vselect type we should change the shuffle type to allow masking. This patch does this conversion for all VALIGND/VALIGNQ sizes. It also supports turning 128-bit PALIGNR into VALIGND/VALIGNQ. This fixes the case shown in PR31018. I plan to add support for more operations in future patches. Reviewers: RKSimon, zvi, delena Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D26902 llvm-svn: 287612	2016-11-22 03:51:53 +00:00
Jun Bum Lim	82f55c5446	[CodeGenPrep] Skip merging empty case blocks Summary: Merging an empty case block into the header block of switch could cause ISel to add COPY instructions in the header of switch, instead of the case block, if the case block is used as an incoming block of a PHI. This could potentially increase dynamic instructions, especially when the switch is in a loop. I added a test case which was reduced from the benchmark I was targetting. Reviewers: t.p.northover, mcrosier, manmanren, wmi, davidxl Subscribers: qcolombet, danielcdh, hfinkel, mcrosier, llvm-commits Differential Revision: https://reviews.llvm.org/D22696 llvm-svn: 287553	2016-11-21 16:47:28 +00:00
Simon Pilgrim	6704059a0d	[X86][SSE] Add SSE reciprocal estimate tests llvm-svn: 287543	2016-11-21 15:28:21 +00:00
Simon Pilgrim	49d7eda968	[SelectionDAG] Add ComputeNumSignBits support for CONCAT_VECTORS opcode llvm-svn: 287541	2016-11-21 14:36:19 +00:00

1 2 3 4 5 ...

8681 Commits