llvm-project

Commit Graph

Author	SHA1	Message	Date
Kevin B. Smith	47e64abe1a	[X86] More test updates to support fixup-byte-word-insts optimization either on or off. Differential Revisions: http://reviews.llvm.org/D17458 llvm-svn: 261505	2016-02-22 01:27:56 +00:00
Simon Pilgrim	e9093adae0	[X86][AVX] Add shuffle masking support for EltsFromConsecutiveLoads Add support for the case where we have a consecutive load (which must include the first + last elements) with a mixture of undef/zero elements. We load the vector and then apply a shuffle to clear the zero'd elements. Differential Revision: http://reviews.llvm.org/D17297 llvm-svn: 261490	2016-02-21 19:15:48 +00:00
Sanjoy Das	aa63dc0e9a	Fix LLVM's handling and detection of skylake and cannonlake CPUs Summary: - Rename `"skylake"` == SkylakeServerProc to `"skylake-avx512"` - Change `"skylake"` to denote SkylakeClientProc - Fix the detection of cpu family 6 and model 94 to be SkylakeClientProc instead of SkylakeServerProc - Remove the `"cnl"` for CannonLake Reviewers: craig.topper, delena Subscribers: zansari, echristo, qcolombet, RKSimon, spatel, DavidKreitzer, mcrosier, llvm-commits Differential Revision: http://reviews.llvm.org/D17090 llvm-svn: 261482	2016-02-21 17:12:03 +00:00
David Majnemer	a3ea407d48	[X86] Use the correct alignment for COMDAT constant pool entries COFF doesn't have sections with mergeable contents. Instead, each constant pool entry ends up in a COMDAT section. The linker, when choosing between COMDAT sections, doesn't choose the max alignment of the two sections. You just get whatever alignment was on the section. If one constant needed a higher alignment in one object file from another one, then we will get into trouble if the linker chooses the lower alignment one. Instead, lets promote the alignment of the constant pool entry to make sure we don't use an under aligned constant with an instruction which assumed otherwise. This fixes PR26680. llvm-svn: 261462	2016-02-21 01:30:30 +00:00
Simon Pilgrim	d765c0b8b9	[X86][AVX] Added test case for PR22359 llvm-svn: 261444	2016-02-20 19:21:20 +00:00
Simon Pilgrim	79a14dd3d1	[X86] Regenerated pr16360.ll llvm-svn: 261440	2016-02-20 17:56:45 +00:00
Simon Pilgrim	972d9fb76b	[X86][SSE41] More fast-isel intrinsics tests llvm-svn: 261439	2016-02-20 17:30:37 +00:00
Simon Pilgrim	19b3ce0f07	[X86][SSE41] Added fast-isel intrinsics tests As discussed on PR24580, this patch adds some (more to come) initial fast-isel codegen tests to match the IR generated in clang/test/CodeGen/sse41-builtins.c llvm-svn: 261438	2016-02-20 17:11:32 +00:00
Simon Pilgrim	ecb0433599	[X86][SSE] Fixed issue with commutation of 'faux unary' target shuffles (PR26667) Fixed a bug introduced by D16683 when a binary shuffle is simplified to a unary shuffle (with undef/zero sentinel mask indices) - if this resulted in only the second input being used combineX86ShuffleChain failed to take this into account and still referenced the first input. llvm-svn: 261434	2016-02-20 14:39:45 +00:00
Andrey Turetskiy	9994b8894a	[X86] Enable the LEA optimization pass by default. Differential Revision: http://reviews.llvm.org/D16877 llvm-svn: 261429	2016-02-20 11:11:55 +00:00
Andrey Turetskiy	0babd26626	[X86] PR26575: Fix LEA optimization pass (Part 2). Handle address displacement operands of a type other than Immediate or Global in LEAs and load/stores. Ref: https://llvm.org/bugs/show_bug.cgi?id=26575 Differential Revision: http://reviews.llvm.org/D17374 llvm-svn: 261428	2016-02-20 10:58:28 +00:00
Davide Italiano	228978c0dc	[X86ISelLowering] Fix TLSADDR lowering when shrink-wrapping is enabled. TLSADDR nodes are lowered into actuall calls inside MC. In order to prevent shrink-wrapping from pushing prologue/epilogue past them (which result in TLS variables being accessed before the stack frame is set up), we put markers, so that the stack gets adjusted properly. Thanks to Quentin Colombet for guidance/help on how to fix this problem! llvm-svn: 261387	2016-02-20 00:44:47 +00:00
Quentin Colombet	e611698e84	[RegAllocFast] Properly track the physical register definitions on calls. PR26485 llvm-svn: 261384	2016-02-20 00:32:29 +00:00
Dimitry Andric	db417b6d40	Fix incorrect selection of AVX512 sqrt when OptForSize is on Summary: When optimizing for size, sqrt calls can be incorrectly selected as AVX512 VSQRT instructions. This is because X86InstrAVX512.td has a `Requires<[OptForSize]>` in its `avx512_sqrt_scalar` multiclass definition. Even if the target does not support AVX512, the class can apparently still be chosen, leading to an incorrect selection of `vsqrtss`. In PR26625, this lead to an assertion: Reg >= X86::FP0 && Reg <= X86::FP6 && "Expected FP register!", because the `vsqrtss` instruction requires an XMM register, which is not available on i686 CPUs. Reviewers: grosbach, resistor, joker.eph Subscribers: spatel, emaste, llvm-commits Differential Revision: http://reviews.llvm.org/D17414 llvm-svn: 261360	2016-02-19 20:14:11 +00:00
Sanjoy Das	d2db73ba59	[StatepointLowering] Fix bug in allocateStackSlot allocateStackSlot did not consider the size of the value to be spilled before deciding to re-use a spill slot. This was originally okay (since originally we'd only ever spill pointers), but it became not okay when we changed our scheme to directly spill vectors of pointers. While this change fixes the bug pointed out, it has two performance caveats: - It matches spill slot and spillee size exactly, while in theory we can spill, e.g., an 8 byte pointer into a 16 byte slot. This is slightly complicated to fix since in the stackmaps section, we report the size of the spill slot as the size of the "indirect value"; and if they're no longer equivalent, we'll have to keep track of the (indirect) value size separately from the stack slot size. - It will "spuriously run out" of reusable slots, since we now have an second check in the search loop in addition to the availablity check (e.g. you had two free scalar slots, and you first ask for a vector slot followed by a scalar slot). I'll fix this in a later commit. llvm-svn: 261336	2016-02-19 17:15:22 +00:00
Kevin B. Smith	652128d48c	[X86] Change fixup-bw-inst.ll to test output with this optimization on and off. Differential Revision: http://reviews.llvm.org/D17415 llvm-svn: 261332	2016-02-19 16:20:48 +00:00
Simon Pilgrim	9630a4ab15	[X86][AVX] Added fast-isel intrinsics tests As discussed on PR24580, this patch adds some (more to come) initial fast-isel codegen tests to match the IR generated in clang/test/CodeGen/avx-builtins.c llvm-svn: 261329	2016-02-19 14:38:09 +00:00
Justin Lebar	c75d566f56	When printing MIR, output to errs() rather than outs(). Summary: Without this, this command $ llvm-run llc -stop-after machine-cp -o - <( echo '' ) outputs an error, because we close stdout twice -- once when closing the file opened for "-o", and again when closing outs(). Also clarify in the outs() definition that you can't ever call it if you want to open your own raw_fd_ostream on stdout. Reviewers: jroelofs, tstellarAMD Subscribers: jholewinski, qcolombet, dsanders, llvm-commits Differential Revision: http://reviews.llvm.org/D17422 llvm-svn: 261286	2016-02-19 00:18:46 +00:00
David Majnemer	a822c880a9	[WinEH] Hoist state stores from successors If we know that all of our successors want to be in the exact same state, it makes sense to hoist the state transition into their common predecessor. Differential Revision: http://reviews.llvm.org/D17391 llvm-svn: 261262	2016-02-18 21:13:35 +00:00
Hans Wennborg	75734f87a6	Add more triples after r261235 Since the behaviour is now different between Darwin and non-Darwin, more triples are needed :-/ llvm-svn: 261238	2016-02-18 18:44:33 +00:00
Hans Wennborg	23cdc643b9	Revert to extend i8/i16 return values on Darwin (PR26665) In r260133, LLVM was changed to no longer extend i8/i16 return values, as it's not required by the ABI. However, code was found in the wild that relies on the old behaviour on Darwin, so this commit reverts back to that old behaviour for Darwin. On other platforms, it's less likely that code would be depending on the old behaviour, as GCC and MSVC haven't been extending such return values. llvm-svn: 261235	2016-02-18 18:17:05 +00:00
Simon Pilgrim	05e48b95eb	[X86][SSE] Improve PSHUFB shuffle mask decoding. In cases where the PSHUFB shuffle mask is shared it might not be bitcasted to a vXi8 byte vector. This patch adds support for decoding these wider shuffle masks from the ConstantPool. The test case in question makes use of this to recognise the shuffle mask is an unary UNPCKL pattern and simplifies accordingly. llvm-svn: 261201	2016-02-18 10:17:40 +00:00
Michael Zuckerman	724dc3b20c	[AVX512][PRORQ][PRORD] Change imm8 to int Differential Revision: http://reviews.llvm.org/D17024 llvm-svn: 261198	2016-02-18 09:52:12 +00:00
David Majnemer	7e5937b775	[WinEH] Optimize WinEH state stores 32-bit x86 Windows targets use a linked-list of nodes allocated on the stack, referenced to via thread-local storage. The personality routine interprets one of the fields in the node as a 'state number' which indicates where the personality routine should transfer control. State transitions are possible only before call-sites which may throw exceptions. Our previous scheme had us update the state number before all call-sites which may throw. Instead, we can try to minimize the number of times we need to store by reasoning about the nearest store which dominates the current call-site. If the last store agrees with the current call-site, then we know that the state-update is redundant and can be elided. This is largely straightforward: an RPO walk of the blocks allows us to correctly forward propagate the information when the function is a DAG. Currently, loops are not handled optimally and may trigger superfluous state stores. Differential Revision: http://reviews.llvm.org/D16763 llvm-svn: 261122	2016-02-17 18:37:11 +00:00
Simon Pilgrim	07d72f4f49	[X86][SSE] Update pshufb mask tests. We are getting better at combining constant pshufb masks - use a real input instead of undef. Add test for decoding multi-use bitcasted masks as well (actual support will come soon). llvm-svn: 261101	2016-02-17 15:52:39 +00:00
Simon Pilgrim	43bd887090	[X86][SSE] Update pshufb mask test to use a real input instead of undef We are getting better at combining constant pshufb masks - this test would've failed once we decode bitcasted masks as well. llvm-svn: 261095	2016-02-17 14:56:58 +00:00
Igor Breger	ac02f1bb62	AVX512: Fix LowerMSCATTER() return value. Bug description: The bug was discovered when test was compiled with -O0. In case scatter result is DAG root , VectorLegalizer failed (assert) due to LowerMSCATTER() return kmask as result. Change LowerMSCATTER() to return chain as original node do. Differential Revision: http://reviews.llvm.org/D17331 llvm-svn: 261090	2016-02-17 14:04:33 +00:00
Simon Pilgrim	c5b5dcb985	[X86][AVX] Support bit-blend integer shuffles for 256-bit integer vectors AVX1 doesn't support the shuffling of 256-bit integer vectors. For 32/64-bit elements we get around this by shuffling as float/double but for 8/16-bit elements (assuming they can't widen) we currently just split, shuffle as 128-bit vectors and concatenate the results back. This patch adds the ability to lower using the bit-blend patterns before defaulting to the splitting behaviour. Part 2 of 2 Differential Revision: http://reviews.llvm.org/D17292 llvm-svn: 261082	2016-02-17 10:50:06 +00:00
Simon Pilgrim	a50e8d3627	[X86][AVX] Support bit-mask integer shuffles for 256-bit integer vectors AVX1 doesn't support the shuffling of 256-bit integer vectors. For 32/64-bit elements we get around this by shuffling as float/double but for 8/16-bit elements (assuming they can't widen) we currently just split, shuffle as 128-bit vectors and concatenate the results back. This patch adds the ability to lower using the bit-mask patterns before defaulting to the splitting behaviour. In some cases this ends up matching what AVX2 would do anyhow or what AVX1 does on the split vectors. Part 1 of 2 Differential Revision: http://reviews.llvm.org/D17292 llvm-svn: 261081	2016-02-17 10:37:49 +00:00
Hans Wennborg	84047896b9	Revert r260979 "[X86] Enable the LEA optimization pass by default." Asserts are still firing in Chromium builds. PR26575. llvm-svn: 261058	2016-02-17 02:49:59 +00:00
Reid Kleckner	8de35fef3d	[X86] Fix a shrink-wrapping miscompile around __chkstk __chkstk clobbers EAX. If EAX is live across the prologue, then we have to take extra steps to save it. We already had code to do this if EAX was a register parameter. This change adapts it to work when shrink wrapping is used. llvm-svn: 261039	2016-02-17 00:17:33 +00:00
Simon Pilgrim	cc8a282647	[X86][AVX] Regenerated vselect tests llvm-svn: 261026	2016-02-16 22:33:27 +00:00
Ahmed Bougacha	af60a429c9	[X86] Generalize logic blend of (x, -x) combine to match (-x, x). I suspect this is what let PR26110 lie dormant for so long. llvm-svn: 261024	2016-02-16 22:14:07 +00:00
Ahmed Bougacha	132fbf5476	[X86] Don't turn (c?-v:v) into (c?-v:0) by blindly using PSIGN. Currently, we sometimes miscompile this vector pattern: (c ? -v : v) We lower it to (because "c" is <4 x i1>, lowered as a vector mask): (~c & v) \| (c & -v) When we have SSSE3, we incorrectly lower that to PSIGN, which does: (c < 0 ? -v : c > 0 ? v : 0) in other words, when c is either all-ones or all-zero: (c ? -v : 0) While this is an old bug, it rarely triggers because the PSIGN combine is too sensitive to operand order. This will be improved separately. Note that the PSIGN tests are also incorrect. Consider: %b.lobit = ashr <4 x i32> %b, <i32 31, i32 31, i32 31, i32 31> %sub = sub nsw <4 x i32> zeroinitializer, %a %0 = xor <4 x i32> %b.lobit, <i32 -1, i32 -1, i32 -1, i32 -1> %1 = and <4 x i32> %a, %0 %2 = and <4 x i32> %b.lobit, %sub %cond = or <4 x i32> %1, %2 ret <4 x i32> %cond if %b is zero: %b.lobit = <4 x i32> zeroinitializer %sub = sub nsw <4 x i32> zeroinitializer, %a %0 = <4 x i32> <i32 -1, i32 -1, i32 -1, i32 -1> %1 = <4 x i32> %a %2 = <4 x i32> zeroinitializer %cond = or <4 x i32> %a, zeroinitializer ret <4 x i32> %a whereas we currently generate: psignd %xmm1, %xmm0 retq which returns 0, as %xmm1 is 0. Instead, use a pure logic sequence, as described in: https://graphics.stanford.edu/~seander/bithacks.html#ConditionalNegate Fixes PR26110. Differential Revision: http://reviews.llvm.org/D17181 llvm-svn: 261023	2016-02-16 22:14:03 +00:00
Ahmed Bougacha	a87c3480b5	[X86] Extract PSIGN/BLENDVP tests into vector-blend.ll. NFC. We're going to stop generating PSIGN, so calling a test "psign" isn't ideal. Instead, call these tests what they really are: variable blends using logic. Also add a test to exhibit a case we're currently missing in the PSIGN combine. llvm-svn: 261022	2016-02-16 22:13:59 +00:00
Andrey Turetskiy	eab4e68650	[X86] Enable the LEA optimization pass by default. Differential Revision: http://reviews.llvm.org/D16877 llvm-svn: 260979	2016-02-16 16:41:38 +00:00
Andrey Turetskiy	1052ac2311	[X86] PR26575: Fix LEA optimization pass. Add a missing check for a type of address displacement operand of the load/store instruction being a candidate for LEA substitution. Ref: https://llvm.org/bugs/show_bug.cgi?id=26575 Differential Revision: http://reviews.llvm.org/D17261 llvm-svn: 260959	2016-02-16 12:47:45 +00:00
Zia Ansari	30a02384f7	Implemented stack symbol table ordering/packing optimization to improve data locality and code size from SP/FP offset encoding. Differential Revision: http://reviews.llvm.org/D15393 llvm-svn: 260917	2016-02-15 23:44:13 +00:00
Simon Pilgrim	7c920e611c	[X86][SSE2] Regenerated sse2 tests llvm-svn: 260900	2016-02-15 17:57:40 +00:00
Simon Pilgrim	766a659eb5	[X86] More thorough partial-register division checks For when grep counts are just not enough... llvm-svn: 260891	2016-02-15 14:09:35 +00:00
Simon Pilgrim	a62170834d	[X86] Regenerated 64/128 bit multiply tests llvm-svn: 260890	2016-02-15 14:04:05 +00:00
Simon Pilgrim	9513b3c4c7	[X86][SSE] More thorough testing of all-ones vectors re-materialization llvm-svn: 260889	2016-02-15 13:50:48 +00:00
Simon Pilgrim	02d3b6a82d	[X86][SSE] Regenerated uint2fp special case tests llvm-svn: 260888	2016-02-15 13:41:41 +00:00
Simon Pilgrim	4e4989a64a	[X86][SSE] Regenerated fast isel intrinsics tests llvm-svn: 260885	2016-02-15 12:32:16 +00:00
Igor Breger	4dc7d390db	AVX512: Change store size of kmask. Store size of v8i1, v4i1 , v2i1 and i1 are changed to 16 bits. If KMOVB not supported (require AVX512DQ) only KMOVW can be used so store size should be 2 bytes. Differential Revision: http://reviews.llvm.org/D17138 llvm-svn: 260878	2016-02-15 08:25:28 +00:00
Simon Pilgrim	834931554b	[X86][AVX] Fixed copy+paste typo in shuffle test llvm-svn: 260852	2016-02-14 18:11:52 +00:00
Simon Pilgrim	08ba012973	[X86][AVX] Lower shuffles as repeated lane shuffles then lane-crossing shuffles This patch attempts to represent a shuffle as a repeating shuffle (recognisable by is128BitLaneRepeatedShuffleMask) with the source input(s) in their original lanes, followed by a single permutation of the 128-bit lanes to their final destinations. On AVX2 we can additionally attempt to match using 64-bit sub-lane permutation. AVX2 can also now match a similar 'broadcasted' repeating shuffle. This patch has several benefits: * Avoids prematurely matching with lowerVectorShuffleByMerging128BitLanes which can require both inputs to have their input lanes permuted before shuffling. * Can replace PERMPS/PERMD instructions - although these are useful for cross-lane unary shuffling, they require their shuffle mask to be pre-loaded (and increase register pressure). * Matching the repeating shuffle makes use of a lot of existing shuffle lowering. There is an outstanding minor AVX1 regression (combine_unneeded_subvector1 in vector-shuffle-combining.ll) of a previously 128-bit shuffle + subvector splat being converted to a subvector splat + (2 instruction) 256-bit shuffle, I intend to fix this in a followup patch for review. Differential Revision: http://reviews.llvm.org/D16537 llvm-svn: 260834	2016-02-13 21:54:04 +00:00
Sanjay Patel	e9bf993cee	[x86-64] allow mfence even with -mno-sse (PR23203) As shown in: https://llvm.org/bugs/show_bug.cgi?id=23203 ...we currently die because lowering believes that mfence is allowed without SSE2 on x86-64, but the instruction def doesn't know that. I don't know if allowing mfence without SSE is right, but if not, at least now it's consistently wrong. :) Differential Revision: http://reviews.llvm.org/D17219 llvm-svn: 260828	2016-02-13 17:26:29 +00:00
Pirama Arumuga Nainar	7476bc89e9	Don't combine fp_round (fp_round x) if f80 to f16 is generated Summary: This patch skips DAG combine of fp_round (fp_round x) if it results in an fp_round from f80 to f16. fp_round from f80 to f16 always generates an expensive (and as yet, unimplemented) libcall to __truncxfhf2. This prevents selection of native f16 conversion instructions from f32 or f64. Moreover, the first (value-preserving) fp_round from f80 to either f32 or f64 may become a NOP in platforms like x86. Reviewers: ab Subscribers: srhines, llvm-commits Differential Revision: http://reviews.llvm.org/D17221 llvm-svn: 260769	2016-02-13 00:08:05 +00:00
Yunzhong Gao	0de36ec169	Disable the vzeroupper insertion pass on PS4. Differential Revision: http://reviews.llvm.org/D16837 llvm-svn: 260764	2016-02-12 23:37:57 +00:00

1 2 3 4 5 ...

7037 Commits