llvm-project

Commit Graph

Author	SHA1	Message	Date
Simon Pilgrim	118da63a9d	[X86][SSE] Added test cases for missed opportunities to combine pshufb to pslldq/psrldq llvm-svn: 274631	2016-07-06 15:09:48 +00:00
Elena Demikhovsky	ad0a56f3da	Re-commit of 274613. The prev commit failed on compilation. A minor change in one pattern in lib/Target/X86/X86InstrAVX512.td fixes the failure. llvm-svn: 274626	2016-07-06 14:15:43 +00:00
Elena Demikhovsky	02ced295aa	Reverted 274613 due to compilation failue. llvm-svn: 274615	2016-07-06 09:11:49 +00:00
Elena Demikhovsky	5a4f2476fd	AVX-512: Optimization for patterns with i1 scalar type The patch removes redundant kmov instructions (not all, we still have a lot of work here) and redundant "and" instructions after "setcc". I use "AssertZero" marker between X86ISD::SETCC node and "truncate" to eliminate extra "and $1" instruction. I also changed zext, aext and trunc patterns in the .td file. It allows to remove extra "kmov" instruictions. This patch fixes https://llvm.org/bugs/show_bug.cgi?id=28173. Fast ISEL mode is not supported correctly for AVX-512. ICMP/FCMP scalar instruction should return result in k-reg. It will be fixed in one of the next patches. I redirected handling of "cmp" to the DAG builder mode. (The code looks worse in one specific test case, but without this fix the new patch fails). Differential revision: http://reviews.llvm.org/D21956 llvm-svn: 274613	2016-07-06 09:01:20 +00:00
Simon Pilgrim	bec6543d17	[X86][AVX2] Add support for target shuffle combining to BROADCAST Only support broadcast from vector register so far - memory folding support will have to wait. llvm-svn: 274572	2016-07-05 20:11:29 +00:00
Simon Pilgrim	48adedffb7	[X86][AVX512] Fixed decoding of permd/permpd variable mask shuffles + enabled them for target shuffle combining Corrected element mask masking to extract the bottom index bits (now matches the perm2 implementation but for unary inputs). llvm-svn: 274571	2016-07-05 18:31:17 +00:00
Simon Pilgrim	4e96fbf3c1	[X86][AVX512] Autoupgrade the BROADCAST intrinsics llvm-svn: 274550	2016-07-05 13:58:47 +00:00
Simon Pilgrim	1e91654b38	[X86][AVX512BW] Added BROADCAST intrinsics fast-isel generic IR tests llvm-svn: 274545	2016-07-05 13:16:05 +00:00
Simon Pilgrim	20ede63a33	[X86][AVX512] Added BROADCAST intrinsics fast-isel generic IR tests llvm-svn: 274537	2016-07-05 10:15:14 +00:00
Simon Pilgrim	dea33cc2f3	[X86][AVX512] Added VSHUFPD intrinsics fast-isel generic IR tests llvm-svn: 274534	2016-07-05 09:10:07 +00:00
Simon Pilgrim	8a01915bd2	[X86][AVX512VL] Added VSHUFPD/VSHUFPS intrinsics fast-isel generic IR tests llvm-svn: 274533	2016-07-05 09:09:41 +00:00
Simon Pilgrim	3ad040909a	[X86][AVX512] Add support for lowering shuffles to VSHUFPD llvm-svn: 274520	2016-07-04 20:41:24 +00:00
Simon Pilgrim	02d435d2f4	[X86][AVX512] Autoupgrade the VPERMPD/VPERMQ intrinsics llvm-svn: 274506	2016-07-04 14:19:05 +00:00
Simon Pilgrim	8b82fce537	[X86][AVX512] Added VPERMPD/VPERMQ intrinsics fast-isel generic IR tests llvm-svn: 274503	2016-07-04 13:43:10 +00:00
Simon Pilgrim	9fca300cbe	[X86][AVX512] Autoupgrade the VPERMILPD/VPERMILPS intrinsics llvm-svn: 274498	2016-07-04 12:40:54 +00:00
Simon Pilgrim	c8cf2ddb6d	[X86][AVX512] Added VPERMILPD/VPERMILPS intrinsics fast-isel generic IR tests Added PSHUFD tests as well llvm-svn: 274493	2016-07-04 11:07:50 +00:00
Craig Topper	d83f818a3e	[CodeGen] Make the code that detects a if a shuffle is really a concatenation of the inputs more general purpose. We can now handle concatenation of each source multiple times. The previous code just checked for each source to appear once in either order. This also now handles an entire source vector sized piece having undef indices correctly. We now concat with UNDEF instead of using one of the sources. This is responsible for the test case change. llvm-svn: 274483	2016-07-04 06:19:35 +00:00
Simon Pilgrim	7f096de0b8	[X86][AVX512] Add support for 512-bit shuffle lowering to VPERMPD/VPERMQ llvm-svn: 274473	2016-07-03 19:50:06 +00:00
Craig Topper	d1eca0f32c	[CodeGen] Teach OR combine of shuffles involving zero vectors to better handle undef indices. Undef indices can now be treated as zeros. Or if its undef ORed with zero, we will keep the undef. llvm-svn: 274472	2016-07-03 19:37:12 +00:00
Craig Topper	8e826d5abe	[X86] Add tests to show that the DAG combine for OR of shuffles with zero vectors doesn't handle undefs as well as it could. Fix coming in another commit. llvm-svn: 274471	2016-07-03 19:37:10 +00:00
Haicheng Wu	b71b2f622a	[MBB] add a missing corner case in UpdateTerminator() After the block placement, if a block ends with a conditional branch, but the next block is not its successor. The conditional branch should be changed to unconditional branch. This patch fixes PR28307, PR28297, PR28402. Differential Revision: http://reviews.llvm.org/D21811 llvm-svn: 274470	2016-07-03 19:14:17 +00:00
Simon Pilgrim	68ea80649b	[X86][AVX512] Add support for VPERMPD/VPERMQ masked shuffle comments llvm-svn: 274469	2016-07-03 18:40:24 +00:00
Simon Pilgrim	a0d73835b2	[X86][AVX512] Add support for 512-bit shuffle decoding of VPERMPD/VPERMQ llvm-svn: 274468	2016-07-03 18:27:37 +00:00
Simon Pilgrim	dbd6db0dc7	[X86][AVX512] Add support for VPALIGNR/PSHUFD/PSHUFHW/PSHUFLW masked shuffle comments llvm-svn: 274466	2016-07-03 15:00:51 +00:00
Simon Pilgrim	598bdb6bfe	[X86][AVX512] Add support for UNPCK masked shuffle comments llvm-svn: 274464	2016-07-03 14:26:21 +00:00
Simon Pilgrim	1f59076196	[X86][AVX512] Add support for VPERM/VSHUF masked shuffle comments llvm-svn: 274462	2016-07-03 13:55:41 +00:00
Simon Pilgrim	68f438a036	[X86][AVX512] Add support for PMOVZX masked shuffle comments llvm-svn: 274461	2016-07-03 13:33:28 +00:00
Simon Pilgrim	7c2fbdc101	[X86][AVX512] Add support for masked shuffle comments This patch adds support for including the avx512 mask register information in the mask/maskz versions of shuffle instruction comments. This initial version just adds support for MOVDDUP/MOVSHDUP/MOVSLDUP to reduce the mass of test regenerations, other shuffle instructions can be added in due course. Differential Revision: http://reviews.llvm.org/D21953 llvm-svn: 274459	2016-07-03 13:08:29 +00:00
Simon Pilgrim	129b720c18	[X86][AVX512] Add support for lowering shuffles to VPERMILPS llvm-svn: 274458	2016-07-03 12:47:21 +00:00
Simon Pilgrim	99e8a1aa0b	[X86][AVX512] Add support for lowering shuffles to VPERMILPD llvm-svn: 274450	2016-07-02 20:20:12 +00:00
Simon Pilgrim	72052f6de9	[X86][AVX512VL] Add fast-isel MOVDDUP/MOVSLDUP/MOVSHDUP shuffle tests llvm-svn: 274448	2016-07-02 19:22:46 +00:00
Simon Pilgrim	cde7c54baa	[X86][AVX512] Add support for 512-bit PSHUFB lowering llvm-svn: 274444	2016-07-02 18:14:31 +00:00
Simon Pilgrim	77dda7c2e0	[X86][AVX512] Converted the MOVDDUP/MOVSLDUP/MOVSHDUP masked intrinsics to generic IR llvm-svn: 274443	2016-07-02 17:16:41 +00:00
Simon Pilgrim	19adee9d84	[X86][AVX512] Autoupgrade the MOVDDUP/MOVSLDUP/MOVSHDUP intrinsics llvm-svn: 274439	2016-07-02 14:42:35 +00:00
Simon Pilgrim	f040d8c061	[X86][AVX512] Add support for lowering shuffles to MOVDDUP/MOVSLDUP/MOVSHDUP llvm-svn: 274436	2016-07-02 12:45:03 +00:00
Simon Pilgrim	5e95390957	[X86][AVX512] Add test cases that should lower to MOVSLDUP/MOVSHDUP llvm-svn: 274435	2016-07-02 12:20:35 +00:00
Simon Pilgrim	a6f262a1f9	[X86][AVX512] Add fast-isel shuffle tests Its not worth trying to write out tests for all the avx512f builtins yet, just adding tests for lowering of generic IR as we transition to it (shuffles mainly right now). llvm-svn: 274434	2016-07-02 12:13:29 +00:00
Dehao Chen	7b2c997736	Specify mtriple for the frame-order.ll test. Summary: original test may have different bahavior on different bot, specifically it broke llvm-clang-lld-x86_64-scei-ps4-ubuntu-fast Reviewers: majnemer Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D21931 llvm-svn: 274368	2016-07-01 17:35:13 +00:00
Dehao Chen	ad2b4e1334	Do not count debug instructions when counting number of uses to reorder frame objects. Summary: The code generation should be independent of the debug info. Reviewers: zansari, davidxl, mkuper, majnemer Subscribers: majnemer, llvm-commits Differential Revision: http://reviews.llvm.org/D21911 llvm-svn: 274357	2016-07-01 15:40:25 +00:00
Yunzhong Gao	b386955adc	Add an artificial line-0 debug location when the compiler emits a call to __stack_chk_fail(). This avoids a compiler crash. Differential Revision: http://reviews.llvm.org/D21818 llvm-svn: 274263	2016-06-30 18:49:04 +00:00
Ahmed Bougacha	15a2f6d58c	[X86] Lower blended PACKUSes using appropriate types. When lowering two blended PACKUS, we used to disregard the types of the PACKUS inputs, indiscriminately generating a v16i8 PACKUS. This leads to non-selectable things like: (v16i8 (PACKUS (v4i32 v0), (v4i32 v1))) Instead, check that the PACKUSes have the same type, and use that as the final result type. llvm-svn: 274138	2016-06-29 16:56:09 +00:00
Rafael Espindola	c4cabb8054	Update tests to use at least darwin9. llvm-svn: 274129	2016-06-29 14:51:10 +00:00
Simon Pilgrim	f9c5908ffd	[X86][SSE2] Added _mm_loadu_si64 test to match llvm\tools\clang\test\CodeGen\sse2-builtins.c llvm-svn: 274127	2016-06-29 14:05:33 +00:00
Simon Pilgrim	851019175b	[X86] Regenerated popcnt combine tests llvm-svn: 274124	2016-06-29 13:54:03 +00:00
Craig Topper	3a011de10c	[DAGCombine] Teach DAG combine to handle ORs of shuffles involving zero vectors where the zero vector is the first operand to the shuffle instead of the second. llvm-svn: 274097	2016-06-29 03:29:12 +00:00
Craig Topper	1e7e36e7e6	[DAGCombine] Add test cases to show that DAG combining an OR of two shuffles with zero vectors doesn't work if the zero vector is the first operand of the shuffle. Fix coming in a follow up patch. llvm-svn: 274096	2016-06-29 03:29:09 +00:00
Dehao Chen	8cd84aaa6f	Relax the clearance calculating for breaking partial register dependency. Summary: LLVM assumes that large clearance will hide the partial register spill penalty. But in our experiment, 16 clearance is too small. As the inserted XOR is normally fairly cheap, we should have a higher clearance threshold to aggressively insert XORs that is necessary to break partial register dependency. Reviewers: wmi, davidxl, stoklund, zansari, myatsina, RKSimon, DavidKreitzer, mkuper, joerg, spatel Subscribers: davidxl, llvm-commits Differential Revision: http://reviews.llvm.org/D21560 llvm-svn: 274068	2016-06-28 21:19:34 +00:00
Matthias Braun	0b9a07883d	X86FrameLowering: Check subregs when deciding prolog kill flags llvm-svn: 274057	2016-06-28 20:31:56 +00:00
Artur Pilipenko	7ad95ec22d	Support arbitrary addrspace pointers in masked load/store intrinsics This is a resubmittion of 263158 change after fixing the existing problem with intrinsics mangling (see LTO and intrinsics mangling llvm-dev thread for details). This patch fixes the problem which occurs when loop-vectorize tries to use @llvm.masked.load/store intrinsic for a non-default addrspace pointer. It fails with "Calling a function with a bad signature!" assertion in CallInst constructor because it tries to pass a non-default addrspace pointer to the pointer argument which has default addrspace. The fix is to add pointer type as another overloaded type to @llvm.masked.load/store intrinsics. Reviewed By: reames Differential Revision: http://reviews.llvm.org/D17270 llvm-svn: 274043	2016-06-28 18:27:25 +00:00
Michael Kuperstein	a118acb82f	[X86] Update a test with more explicit checks. NFC. llvm-svn: 274040	2016-06-28 17:42:13 +00:00
David Majnemer	1c7d532cde	[X86] Make WRPKRU/RDPKRU pass -verify-machineinstrs The original implementation attempted to zero registers using XOR %foo, %foo. This is problematic because it constitutes a read-modify-write of a register which might not be defined. Instead, use MOV32r0 to avoid these problems; expandPostRAPseudo does the right thing here. llvm-svn: 274024	2016-06-28 16:04:46 +00:00
Simon Pilgrim	5f71c909f0	[X86][AVX] Peek through bitcasts to find the source of broadcasts (reapplied) AVX1 can only broadcast vectors as floats/doubles, so for 256-bit vectors we insert bitcasts if we are shuffling v8i32/v4i64 types. Unfortunately the presence of these bitcasts prevents the current broadcast lowering code from peeking through cases where we have concatenated / extracted vectors to create the 256-bit vectors. This patch allows us to peek through bitcasts as long as the number of elements doesn't change (i.e. element bitwidth is the same) so the broadcast index is not affected. Note this bitcast peek is different from the stage later on which doesn't care about the type and is just trying to find a load node. As we're being more aggressive with bitcasts, we also need to ensure that the broadcast type is correctly bitcasted Differential Revision: http://reviews.llvm.org/D21660 llvm-svn: 274013	2016-06-28 13:24:05 +00:00
Simon Pilgrim	c15d217831	[X86][SSE] Added support for combining target shuffles to (V)PSHUFD/VPERMILPD/VPERMILPS immediate permutes This patch allows target shuffles to be combined to single input immediate permute instructions - (V)PSHUFD/VPERMILPD/VPERMILPS - allowing more general pattern matching than what we current do and improves the likelihood of memory folding compared to existing patterns which tend to reuse the input in multiple arguments. Further permute instructions (V)PSHUFLW/(V)PSHUFHW/(V)PERMQ/(V)PERMPD may be added in the future but its proven tricky to create tests cases for them so far. (V)PSHUFLW/(V)PSHUFHW is already handled quite well in combineTargetShuffle so it may be that removing some of that code may allow us to perform more of the combining in one place without duplication. Differential Revision: http://reviews.llvm.org/D21148 llvm-svn: 273999	2016-06-28 08:08:15 +00:00
Elena Demikhovsky	a727f3cfde	[X86 Target Lowering] Merged ICMP test. llvm-svn: 273995	2016-06-28 06:25:38 +00:00
Rafael Espindola	8121becac3	Teach shouldAssumeDSOLocal about tls. Fixes a fixme about handling other visibilities. llvm-svn: 273921	2016-06-27 20:19:14 +00:00
Elena Demikhovsky	ad3929cc64	X86 Lowering - Fixed a crash in ICMP scalar instruction Fixed a bug in EmitTest() function in combining shl + icmp. https://llvm.org/bugs/show_bug.cgi?id=28119 llvm-svn: 273899	2016-06-27 18:07:16 +00:00
Artur Pilipenko	72f76b8805	Revert -r273892 "Support arbitrary addrspace pointers in masked load/store intrinsics" since some of the clang tests don't expect to see the updated signatures. llvm-svn: 273895	2016-06-27 16:54:33 +00:00
Artur Pilipenko	a36aa41519	Support arbitrary addrspace pointers in masked load/store intrinsics This is a resubmittion of 263158 change after fixing the existing problem with intrinsics mangling (see LTO and intrinsics mangling llvm-dev thread for details). This patch fixes the problem which occurs when loop-vectorize tries to use @llvm.masked.load/store intrinsic for a non-default addrspace pointer. It fails with "Calling a function with a bad signature!" assertion in CallInst constructor because it tries to pass a non-default addrspace pointer to the pointer argument which has default addrspace. The fix is to add pointer type as another overloaded type to @llvm.masked.load/store intrinsics. Reviewed By: reames Differential Revision: http://reviews.llvm.org/D17270 llvm-svn: 273892	2016-06-27 16:29:26 +00:00
Simon Pilgrim	476e8ceed3	[X86][SSE] Added extra broadcast tests to cover PR28327 llvm-svn: 273891	2016-06-27 16:15:37 +00:00
Nico Weber	1e058160dd	Revert 273848, it caused PR28329 llvm-svn: 273879	2016-06-27 14:36:46 +00:00
Simon Pilgrim	9c2f378587	Removed duplicate assertions note llvm-svn: 273874	2016-06-27 13:06:18 +00:00
Simon Pilgrim	a45da385f8	[X86][AVX] Peek through bitcasts to find the source of broadcasts AVX1 can only broadcast vectors as floats/doubles, so for 256-bit vectors we insert bitcasts if we are shuffling v8i32/v4i64 types. Unfortunately the presence of these bitcasts prevents the current broadcast lowering code from peeking through cases where we have concatenated / extracted vectors to create the 256-bit vectors. This patch allows us to peek through bitcasts as long as the number of elements doesn't change (i.e. element bitwidth is the same) so the broadcast index is not affected. Note this bitcast peek is different from the stage later on which doesn't care about the type and is just trying to find a load node. Differential Revision: http://reviews.llvm.org/D21660 llvm-svn: 273848	2016-06-27 07:44:32 +00:00
Rafael Espindola	88ae09e9be	Use shouldAssumeDSOLocal in isOffsetFoldingLegal. This makes it slightly more powerful for dynamic-no-pic. llvm-svn: 273704	2016-06-24 18:48:36 +00:00
Kyle Butt	267164df0a	Codegen: Fix broken assumption in Tail Merge. Tail merge was making the assumption that a layout successor or predecessor was always a cfg successor/predecessor. Remove that assumption. Changes to tests are necessary because the errant cfg edges were preventing optimizations. llvm-svn: 273700	2016-06-24 18:16:36 +00:00
Rafael Espindola	955d3569e7	Use FileCheck. NFC. llvm-svn: 273699	2016-06-24 18:04:39 +00:00
Kyle Butt	991df7889b	Codegen: [X86] preservere memory refs for folded umul_lohi Memory references were not being propagated for this folded load. This prevented optimizations like LICM from hoisting the load. Added test to verify that this allows LICM to proceed. llvm-svn: 273617	2016-06-23 21:40:35 +00:00
Kyle Butt	178314ab52	Codegen: LICM Remove check for exactly 1 register def. When considering whether to split an instruction with a memory operand into an explicit load and a register-based instruction, we currently check that the resulting instruction has exactly 1 def. This prevents 2 important LICM optimizations: compares with memory operands, and double indirect calls. All the tests and the test-suite pass without the check. My guess as to original intent is to limit the additional register pressure created by the new instruction, but given that we only split out a single register, it is already limited. The licm-dominance test now checks actual memory loads for hoisting instead of undef, and it tests compares. hoist-invariant-load.ll now checks for 2 hoists, the intended hoist, and a bonus from calling a got-relative function in a loop. llvm-svn: 273616	2016-06-23 21:38:49 +00:00
Michael Kuperstein	0194d30e09	[X86] Extract HiPE prologue constants into metadata X86FrameLowering::adjustForHiPEPrologue() contains a hard-coded offset into an Erlang Runtime System-internal data structure (the PCB). As the layout of this data structure is prone to change, this poses problems for maintaining compatibility. To address this problem, the compiler can produce this information as module-level named metadata. For example (where P_NSP_LIMIT is the offending offset): !hipe.literals = !{ !2, !3, !4 } !2 = !{ !"P_NSP_LIMIT", i32 152 } !3 = !{ !"X86_LEAF_WORDS", i32 24 } !4 = !{ !"AMD64_LEAF_WORDS", i32 24 } Patch by Magnus Lang Differential Revision: http://reviews.llvm.org/D20363 llvm-svn: 273593	2016-06-23 18:17:25 +00:00
Simon Pilgrim	595dddb103	[X86][AVX512] Added AVX512F vector sign extend tests Now that Elena has confirmed that PR26474 has been fixed llvm-svn: 273560	2016-06-23 14:01:45 +00:00
Craig Topper	597aa42fec	[AVX512] Remove masked unpack intrinsics and autoupgrade to vectorshuffle and selects. llvm-svn: 273543	2016-06-23 07:37:33 +00:00
Sanjoy Das	e57bf680ec	[ImplicitNullChecks] Hoist trivial depdendencies if possible When trying to convert a loading instruction into a FAULTING_LOAD, we sometimes face code like this: if %R10 is not null: %R9<def> = MOV32ri Immediate %R9<def, tied> = AND32rm %R9, 0x20(%R10) else: goto TRAP In these cases we would like to use the AND32rm instruction as the faulting operation by hoisting the "depedency" def-ing %R9 also above the control flow, transforming the program into: %R9<def> = MOV32ri Immediate %R9<def, tied> = FAULTING_LOAD_OP(AND32rm %R9, 0x20(%R10), FailPath: TRAP) This change teaches ImplicitNullChecks to do the above, when safe. llvm-svn: 273501	2016-06-22 22:16:51 +00:00
Artur Pilipenko	1cec4fdddf	Upgrade old memset/memcpy signatures (without isVolatile argument) in tests We no longer have corresponding code in autoupgrade and the vast majority of the tests were fixed long time ago. Fix the remaining few. One of the verifier test cases is marked as XFAIL because it was passing only because the signature was incorrect. llvm-svn: 273428	2016-06-22 15:16:06 +00:00
Simon Pilgrim	1536c19642	Regenerated test llvm-svn: 273404	2016-06-22 12:58:15 +00:00
Etienne Bergeron	f6be62f2c8	[StackProtector] Fix computation of GSCookieOffset and EHCookieOffset with SEH4 Summary: Fix the computation of the offsets present in the scopetable when using the SEH (__except_handler4). This patch added an intrinsic to track the position of the allocation on the stack of the EHGuard. This position is needed when producing the ScopeTable. ``` struct _EH4_SCOPETABLE { DWORD GSCookieOffset; DWORD GSCookieXOROffset; DWORD EHCookieOffset; DWORD EHCookieXOROffset; _EH4_SCOPETABLE_RECORD ScopeRecord[1]; }; struct _EH4_SCOPETABLE_RECORD { DWORD EnclosingLevel; long (FilterFunc)(); union { void (HandlerAddress)(); void (*FinallyFunc)(); }; }; ``` The code to generate the EHCookie is added in `X86WinEHState.cpp`. Which is adding these instructions when using SEH4. ``` Lfunc_begin0: # BB#0: # %entry pushl %ebp movl %esp, %ebp pushl %ebx pushl %edi pushl %esi subl $28, %esp movl %ebp, %eax <<-- Loading FramePtr movl %esp, -36(%ebp) movl $-2, -16(%ebp) movl $L__ehtable$use_except_handler4_ssp, %ecx xorl ___security_cookie, %ecx movl %ecx, -20(%ebp) xorl ___security_cookie, %eax <<-- XOR FramePtr and Cookie movl %eax, -40(%ebp) <<-- Storing EHGuard leal -28(%ebp), %eax movl $__except_handler4, -24(%ebp) movl %fs:0, %ecx movl %ecx, -28(%ebp) movl %eax, %fs:0 movl $0, -16(%ebp) calll _may_throw_or_crash LBB1_1: # %cont movl -28(%ebp), %eax movl %eax, %fs:0 addl $28, %esp popl %esi popl %edi popl %ebx popl %ebp retl ``` And the corresponding offset is computed: ``` Luse_except_handler4_ssp$parent_frame_offset = -36 .p2align 2 L__ehtable$use_except_handler4_ssp: .long -2 # GSCookieOffset .long 0 # GSCookieXOROffset .long -40 # EHCookieOffset <<---- .long 0 # EHCookieXOROffset .long -2 # ToState .long _catchall_filt # FilterFunction .long LBB1_2 # ExceptionHandler ``` Clang is not yet producing function using SEH4, but it's a work in progress. This patch is a step toward having a valid implementation of SEH4. Unfortunately, it is not yet fully working. The EH registration block is not allocated at the right offset on the stack. Reviewers: rnk, majnemer Subscribers: llvm-commits, chrisha Differential Revision: http://reviews.llvm.org/D21231 llvm-svn: 273281	2016-06-21 15:58:55 +00:00
Daniel Sanders	bf2c03ee69	[arm+x86] Make GNU variants behave like GNU w.r.t combining sin+cos into sincos. Summary: canCombineSinCosLibcall() would previously combine sin+cos into sincos for GNUX32/GNUEABI/GNUEABIHF regardless of whether UnsafeFPMath were set or not. However, GNU would only combine them for UnsafeFPMath because sincos does not set errno like sin and cos do. It seems likely that this is an oversight. Reviewers: t.p.northover Subscribers: t.p.northover, aemerson, llvm-commits, rengolin Differential Revision: http://reviews.llvm.org/D21431 llvm-svn: 273259	2016-06-21 12:29:03 +00:00
Craig Topper	283418fbb6	[AVX512] Add patterns for any-extending a mask that use the def of KMOVW/KMOVB without going through an EXTRACT_SUBREG and a MOVZX. llvm-svn: 273253	2016-06-21 07:37:32 +00:00
Craig Topper	9038aa3001	[AVX512] Use update_llc_test_checks.py to regenerate a test in preparation for a future commit. llvm-svn: 273252	2016-06-21 07:37:27 +00:00
James Y Knight	03c1415b8f	Revert "Change RelaxELFRelocations for llc." This reverts commit r273019. From email I sent to list: > I don't think this makes sense. Either the linker you're using supports > this feature, or it doesn't. Having it enabled for llc if your linker > doesn't support it is not fun. > > Further note that this also affects basically all other code using llvm > libraries -- other than Clang, which explicitly sets it back to false by > default, unless you set the ENABLE_X86_RELAX_RELOCATIONS cmake flag to > true. > > If you want to enable the relax mode across all llvm tools in some > circumstances, I think it should be via moving the cmake flag from clang > down into llvm. > > I'm going to revert this commit, since I both think it intrinsically > doesn't make sense to do this, and because it's breaking some of our > tools. llvm-svn: 273245	2016-06-21 05:40:41 +00:00
Craig Topper	0a0fb0fda1	[AVX512] Remove the masked vpcmpeq/vcmpgt intrinsics and autoupgrade them to native icmps. llvm-svn: 273240	2016-06-21 03:53:24 +00:00
Simon Pilgrim	225b2e37a0	[X86][X87] Fix issue with sitofp i64 -> fp128 on 32-bit targets Fix for PR27726 - sitofp i64 to fp128 was loading the merged load i64 to a x87 register preventing legalization for conversion to fp128. Added 32-bit tests for fp128 cast/conversions. llvm-svn: 273210	2016-06-20 22:41:17 +00:00
Simon Pilgrim	0a81b13f31	[X86][F16C] Added half <-> double conversion tests llvm-svn: 273153	2016-06-20 12:51:55 +00:00
Igor Breger	e59165ca63	[AVX512] [AVX512/AVX][Intrinsics] Fix Variable Bit Shift Right Arithmetic intrinsic lowering. Differential Revision: http://reviews.llvm.org/D20897 llvm-svn: 273138	2016-06-20 07:05:43 +00:00
Simon Pilgrim	0887d5b02e	[X86][AVX512] Added 512-bit BITREVERSE tests and enabled AVX512BW lowering support llvm-svn: 273125	2016-06-19 20:59:19 +00:00
Simon Pilgrim	3d881a0230	[X86][SSE] Allow target shuffle combining to match masks with SM_Sentinel values We currently only allow exact matches of shuffle mask patterns during target shuffle combining. This patch relaxes this to permit SM_SentinelUndef in the combined shuffle to always be accepted as well as allowing exact matching of the SM_SentinelZero value. I've adjusted some tests that were requiring exact shuffle masks to now include undef values. Differential Revision: http://reviews.llvm.org/D21495 llvm-svn: 273119	2016-06-19 18:03:52 +00:00
Simon Pilgrim	9a09652a3a	[X86][AVX] Added test case for PR28136 llvm-svn: 273098	2016-06-18 22:59:08 +00:00
Simon Pilgrim	cd6d4352bc	[X86][SSSE3] Added examples of target shuffle combining failing to match undefs in shuffle masks llvm-svn: 273097	2016-06-18 21:18:21 +00:00
Simon Pilgrim	ab009e9f41	[X86][XOP] Added fast-isel tests matching tools/clang/test/CodeGen/xop-builtins.c llvm-svn: 273096	2016-06-18 21:07:31 +00:00
Simon Pilgrim	b201678763	[X86][TBM] Added fast-isel tests matching tools/clang/test/CodeGen/tbm-builtins.c llvm-svn: 273087	2016-06-18 17:20:52 +00:00
Simon Pilgrim	f4b2af1b9f	[X86][SSE4A] Autoupgrade and remove MOVNTSD/MOVNTSS intrinsics Required better annotation of the instruction defs upon removal of the builtin intrinsic pattern. llvm-svn: 273077	2016-06-18 02:38:26 +00:00
Marcin Koscielnicki	fd4b6b9e51	[SelectionDAG] Don't treat library calls specially if marked with nobuiltin. To be used by D19781. Differential Revision: http://reviews.llvm.org/D19801 llvm-svn: 273039	2016-06-17 20:24:07 +00:00
Michael Kuperstein	18d6d3d95e	[X86] Add missing AVX512 anyext patterns. Add AVX512 anyext patterns for i16 and i64, modeled on the existing i8 and i32 patterns. llvm-svn: 273038	2016-06-17 20:21:17 +00:00
Rafael Espindola	9f86baebe0	Change RelaxELFRelocations for llc. As a developer tool it makes sense for it to use the new relocations. llvm-svn: 273019	2016-06-17 17:43:41 +00:00
Simon Pilgrim	6a35e5ab97	[X86][SSE4A] Remove the GCCBuiltins from the movntsd/movntss intrinsic defs so we can emit native IR from clang. Clang-side sibling commit to follow. llvm-svn: 273002	2016-06-17 14:27:38 +00:00
Sanjay Patel	0e9afea3c8	[x86] autoupgrade and remove AVX2 integer min/max intrinsics This will (hopefully very temporarily) break clang. The clang side of this should be the next commit. llvm-svn: 272932	2016-06-16 18:44:20 +00:00
Rafael Espindola	5a07687a8e	dos2unix this test. NFC. llvm-svn: 272928	2016-06-16 18:21:11 +00:00
Sanjay Patel	d09a21682f	remove old FileCheck lines that are no longer used llvm-svn: 272921	2016-06-16 17:04:16 +00:00
Sanjay Patel	f664f3a578	[DAG] Remove redundant FMUL in Newton-Raphson SQRT code When calculating a square root using Newton-Raphson with two constants, a naive implementation is to use five multiplications (four muls to calculate reciprocal square root and another one to calculate the square root itself). However, after some reassociation and CSE the same result can be obtained with only four multiplications. Unfortunately, there's no reliable way to do such a reassociation in the back-end. So, the patch modifies NR code itself so that it directly builds optimal code for SQRT and doesn't rely on any further reassociation. Patch by Nikolai Bozhenov! Differential Revision: http://reviews.llvm.org/D21127 llvm-svn: 272920	2016-06-16 16:58:54 +00:00
Sanjay Patel	51ab757941	[x86] autoupgrade and remove SSE2/SSE41 integer min/max intrinsics Follow-up to: http://reviews.llvm.org/rL272806 http://reviews.llvm.org/rL272807 llvm-svn: 272907	2016-06-16 15:48:30 +00:00
Sanjay Patel	74b40bdb53	[x86, SSE] update packed FP compare tests for direct translation from builtin to IR The clang side of this was r272840: http://reviews.llvm.org/rL272840 A follow-up step would be to auto-upgrade and remove these LLVM intrinsics completely. Differential Revision: http://reviews.llvm.org/D21269 llvm-svn: 272841	2016-06-15 21:22:15 +00:00
Sanjay Patel	0b526676ab	[x86] delete unnecessary function declarations Missed this in r272806, r272807. llvm-svn: 272834	2016-06-15 20:51:47 +00:00
Sanjay Patel	1a4569df54	[x86] add folds for x86 vector compare nodes (PR27924) Ideally, we can get rid of most x86 LLVM intrinsics by transforming them to IR (and some of that happened with http://reviews.llvm.org/rL272807), but it doesn't cost much to have some simple folds in the backend too while we're working on that and as a backstop. This fixes: https://llvm.org/bugs/show_bug.cgi?id=27924 Differential Revision: http://reviews.llvm.org/D21356 llvm-svn: 272828	2016-06-15 20:26:58 +00:00
Kevin B. Smith	acbda9ef30	[X86]: Updated r272801 to promote 16 bit compares with immediate operand to 32 bits. This is in response to a comment by Eli Friedman. llvm-svn: 272814	2016-06-15 18:18:05 +00:00
Sanjay Patel	a6c6f09967	[x86, SSE] remove the GCCBuiltins from the integer min/max intrinsics This allows us to emit native IR in Clang (next commit). Also, update the intrinsic tests to show that codegen already knows how to handle the IR that Clang will soon produce. llvm-svn: 272806	2016-06-15 17:17:27 +00:00
Kevin B. Smith	54566a0e9a	[X86]: Quit promoting 8 and 16 bit compares to 32 bit. Differential Revision: http://reviews.llvm.org/D21144 llvm-svn: 272801	2016-06-15 16:37:46 +00:00
Kevin B. Smith	c3c82cdbd0	[X86]: Improve Liveness checking for X86FixupBWInsts.cpp Differential Revision: http://reviews.llvm.org/D21085 llvm-svn: 272797	2016-06-15 16:03:06 +00:00
Igor Breger	64cfd3a442	[AVX512] Fix BLENDM lowering patterns. Operands should be swapped to match SELECT behavior. Use BLENDM instead of masked move instruction. Differential Revision: http://reviews.llvm.org/D21001 llvm-svn: 272763	2016-06-15 07:30:38 +00:00
Sanjoy Das	0272be206a	Don't force SP-relative addressing for statepoints Summary: ... when the offset is not statically known. Prioritize addresses relative to the stack pointer in the stackmap, but fallback gracefully to other modes of addressing if the offset to the stack pointer is not a known constant. Patch by Oscar Blumberg! Reviewers: sanjoy Subscribers: llvm-commits, majnemer, rnk, sanjoy, thanm Differential Revision: http://reviews.llvm.org/D21259 llvm-svn: 272756	2016-06-15 05:35:14 +00:00
David Majnemer	cbf614a93b	Remove the ScalarReplAggregates pass Nearly all the changes to this pass have been done while maintaining and updating other parts of LLVM. LLVM has had another pass, SROA, which has superseded ScalarReplAggregates for quite some time. Differential Revision: http://reviews.llvm.org/D21316 llvm-svn: 272737	2016-06-15 00:19:09 +00:00
Xinliang David Li	8052238ac0	Fix a test case to match its intention llvm-svn: 272733	2016-06-14 23:05:46 +00:00
Dehao Chen	9f2bdfb40f	Set machine block placement hot prob threshold for both static and runtime profile. Summary: With runtime profile, we have more confidence in branch probability, thus during basic block layout, we set a lower hot prob threshold so that blocks can be layouted optimally. Reviewers: djasper, davidxl Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D20991 llvm-svn: 272729	2016-06-14 22:27:17 +00:00
Sanjay Patel	4c3cb8b6c0	[x86] add current codegen tests for PR27924 llvm-svn: 272714	2016-06-14 21:25:46 +00:00
Peter Collingbourne	96efdd6107	IR: Introduce local_unnamed_addr attribute. If a local_unnamed_addr attribute is attached to a global, the address is known to be insignificant within the module. It is distinct from the existing unnamed_addr attribute in that it only describes a local property of the module rather than a global property of the symbol. This attribute is intended to be used by the code generator and LTO to allow the linker to decide whether the global needs to be in the symbol table. It is possible to exclude a global from the symbol table if three things are true: - This attribute is present on every instance of the global (which means that the normal rule that the global must have a unique address can be broken without being observable by the program by performing comparisons against the global's address) - The global has linkonce_odr linkage (which means that each linkage unit must have its own copy of the global if it requires one, and the copy in each linkage unit must be the same) - It is a constant or a function (which means that the program cannot observe that the unique-address rule has been broken by writing to the global) Although this attribute could in principle be computed from the module contents, LTO clients (i.e. linkers) will normally need to be able to compute this property as part of symbol resolution, and it would be inefficient to materialize every module just to compute it. See: http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20160509/356401.html http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20160516/356738.html for earlier discussion. Part of the fix for PR27553. Differential Revision: http://reviews.llvm.org/D20348 llvm-svn: 272709	2016-06-14 21:01:22 +00:00
Wei Mi	b799a625f9	[X86] Reduce the width of multiplification when its operands are extended from i8 or i16 For <N x i32> type mul, pmuludq will be used for targets without SSE41, which often introduces many extra pack and unpack instructions in vectorized loop body because pmuludq generates <N/2 x i64> type value. However when the operands of <N x i32> mul are extended from smaller size values like i8 and i16, the type of mul may be shrunk to use pmullw + pmulhw/pmulhuw instead of pmuludq, which generates better code. For targets with SSE41, pmulld is supported so no shrinking is needed. Differential Revision: http://reviews.llvm.org/D20931 llvm-svn: 272694	2016-06-14 18:53:20 +00:00
Nirav Dave	f8d00d5cac	Fix BSS global handling in AsmPrinter Change EmitGlobalVariable to check final assembler section is in BSS before using .lcomm/.comm directive. This prevents globals from being put into .bss erroneously when -data-sections is used. This fixes PR26570. Reviewers: echristo, rafael Subscribers: llvm-commits, mehdi_amini Differential Revision: http://reviews.llvm.org/D21146 llvm-svn: 272674	2016-06-14 15:09:30 +00:00
Simon Pilgrim	cf1165b86e	[X86][SSE4A] Added patterns for nontemporal stores of scalar float/doubles using MOVNTSD/MOVNTSS llvm-svn: 272651	2016-06-14 09:43:38 +00:00
Igor Breger	484bace21b	re-generate the tests using the update_llc_test_checks.py script llvm-svn: 272643	2016-06-14 07:05:10 +00:00
Craig Topper	99e30e6a66	[AVX512] Use MOVZX32 instead of MOVZ16 for loading single v8/v4/v2/v1 masks when KMOVB is not available. This has better behavior with respect to partial register stalls since it won't need to preserve the upper 16-bits of the GPR. llvm-svn: 272626	2016-06-14 03:13:00 +00:00
Craig Topper	ddab395397	[AVX512] Add patterns for zero-extending a mask that use the def of KMOVW/KMOVB without going through an EXTRACT_SUBREG and a MOVZX. llvm-svn: 272625	2016-06-14 03:12:54 +00:00
Craig Topper	cbe54a4bd9	[AVX512] Add tests for zero extending masks that show an unnecessary movzx instruction. A followup patch will remove that instruction, but adding the tests first to make the more obvious. llvm-svn: 272624	2016-06-14 03:12:48 +00:00
Sanjoy Das	98ac278b86	Move previously added test case to the right location In rL272580 I accidentally added a test case to test/CodeGen when test/Transforms/DeadStoreElimination/ is a better place for it. llvm-svn: 272581	2016-06-13 20:12:07 +00:00
Sanjoy Das	d0bdf3e02b	Fix AAResults::callCapturesBefore for operand bundles Summary: AAResults::callCapturesBefore would previously ignore operand bundles. It was possible for a later instruction to miss its memory dependency on a call site that would only access the pointer through a bundle. Patch by Oscar Blumberg! Reviewers: sanjoy Differential Revision: http://reviews.llvm.org/D21286 llvm-svn: 272580	2016-06-13 19:55:04 +00:00
Simon Pilgrim	582b9ce36e	[X86][SSE] Added extract to scalar nontemporal store tests llvm-svn: 272577	2016-06-13 19:08:28 +00:00
David Majnemer	248190ba69	[X86] Remove llvm.x86.bit.scan.{forward,reverse}.32 The need for these intrinsics has been obviated by r272564 which reimplements their functionality using generic IR. llvm-svn: 272566	2016-06-13 17:33:13 +00:00
Simon Pilgrim	377bc2ea43	[X86][SSE4A] Renamed tests to correspond with the the instruction with being tested llvm-svn: 272542	2016-06-13 10:14:42 +00:00
Craig Topper	13cf7cac07	[AVX512] Remove maksed pshufd, pshuflw, and phufhw intrinsics and autoupgrade them to selects and shufflevector. llvm-svn: 272527	2016-06-13 02:36:48 +00:00
Sanjay Patel	977530a8c9	[x86, SSE] change patterns for CMPP to float types to allow matching with SSE1 (PR28044) This patch is intended to solve: https://llvm.org/bugs/show_bug.cgi?id=28044 By changing the definition of X86ISD::CMPP to use float types, we allow it to be created and pass legalization for an SSE1-only target where v4i32 is not legal. The motivational trail for this change includes: https://llvm.org/bugs/show_bug.cgi?id=28001 and eventually makes this trigger: http://reviews.llvm.org/D21190 Ie, after this step, we should be free to have Clang generate FP compare IR instead of x86 intrinsics for SSE C packed compare intrinsics. (We can auto-upgrade and remove the LLVM sse.cmp intrinsics as a follow-up step.) Once we're generating vector IR instead of x86 intrinsics, a big pile of generic optimizations can trigger. Differential Revision: http://reviews.llvm.org/D21235 llvm-svn: 272511	2016-06-12 15:03:25 +00:00
Craig Topper	1067986c5b	[X86] Remove sse2 pshufd/pshuflw/pshufhw intrinsics and upgrade them to shufflevector. llvm-svn: 272510	2016-06-12 14:11:32 +00:00
Simon Pilgrim	9d8bed1796	[X86][BMI] Added fast-isel tests for BMI1 intrinsics A lot of the codegen is pretty awful for these as they are mostly implemented as generic bit twiddling ops llvm-svn: 272508	2016-06-12 09:56:05 +00:00
Craig Topper	b7713e413b	[X86] Move tests for llvm.x86.avx.vpermil.* intrinsics to a -upgrade test since they are autoupgraded to shufflevector. llvm-svn: 272494	2016-06-12 01:41:06 +00:00
Simon Pilgrim	2b7c02a04f	[X86] Updated test checks script to generalise LCPI symbol refs The script now replace '.LCPI888_8' style asm symbols with the {{\.LCPI.*}} re pattern - this helps stop hardcoded symbols in 32-bit x86 tests changing with every edit of the file Refreshed some tests to demonstrate the new check llvm-svn: 272488	2016-06-11 20:39:21 +00:00
Simon Pilgrim	5b9bade8dd	[X86][SSSE3] Added PSHUFB LUT implementation of BITREVERSE PSHUFB can speed up BITREVERSE of byte vectors by performing LUT on the low/high nibbles separately and ORing the results. Wider integer vector types are already BSWAP'd beforehand so also make use of this approach. llvm-svn: 272477	2016-06-11 15:44:13 +00:00
Craig Topper	46f49fb407	[AVX512] Re-generate v8i64 shuffle test now that we use pshufd for some cases. llvm-svn: 272474	2016-06-11 13:57:08 +00:00
Craig Topper	504fba5c8a	[AVX512] Lower v8i64 and v16i32 to pshufd when possible. llvm-svn: 272473	2016-06-11 13:43:21 +00:00
Simon Pilgrim	6800a45790	[X86][SSE] Added PSLLDQ/PSRLDQ as a target shuffle type Ensure that PALIGNR/PSLLDQ/PSRLDQ are byte vectors so that they can be correctly decoded for target shuffle combining llvm-svn: 272471	2016-06-11 13:38:28 +00:00
Simon Pilgrim	8dd73e3ffa	[X86][AVX2] Added PSLLDQ/PSRLDQ shuffle combining tests llvm-svn: 272469	2016-06-11 13:18:21 +00:00
Craig Topper	40abd1cc61	[AVX512] Add support for lowering v32i16 shuffles with repeated lanes. This allows us to create 512-bit PSHUFLW/PSHUFHW. llvm-svn: 272450	2016-06-11 03:27:42 +00:00
Sanjay Patel	b114fd65fc	[x86] enable bitcasted fabs/fneg transforms The vector cases don't change because we already have folds in X86ISelLowering to look through and remove bitcasts. llvm-svn: 272427	2016-06-10 20:33:50 +00:00
Mehdi Amini	cbd68ecf04	Move CodeGen test from Generic to X86 specific directory llvm-svn: 272416	2016-06-10 19:14:01 +00:00
Sanjay Patel	d558bdadd2	[x86] add test for PR28044 llvm-svn: 272411	2016-06-10 18:05:55 +00:00
Mehdi Amini	bbacddfe92	Interprocedural Register Allocation (IPRA) Analysis Add an option to enable the analysis of MachineFunction register usage to extract the list of clobbered registers. When enabled, the CodeGen order is changed to be bottom up on the Call Graph. The analysis is split in two parts, RegUsageInfoCollector is the MachineFunction Pass that runs post-RA and collect the list of clobbered registers to produce a register mask. An immutable pass, RegisterUsageInfo, stores the RegMask produced by RegUsageInfoCollector, and keep them available. A future tranformation pass will use this information to update every call-sites after instruction selection. Patch by Vivek Pandya <vivekvpandya@gmail.com> Differential Revision: http://reviews.llvm.org/D20769 llvm-svn: 272403	2016-06-10 16:19:46 +00:00
Sanjay Patel	27f06ae7a5	[x86] fix test attributes and autogenerate checks llvm-svn: 272398	2016-06-10 15:30:52 +00:00
Sanjay Patel	cccccd9ab5	[x86] add missing tests for fcmp ueq/one Somehow, the codegen logic for these sequences has gone completely untested until now (note the 2 compare instructions generated per test). There's also an Intel AVX optimization opportunity exposed in these cases and the existing tests. Intel's (but not AMD's) AVX spec shows that extra FP predicates were added, so a single comparison should always be sufficient, and operand commutation should never be necessary. llvm-svn: 272397	2016-06-10 15:17:54 +00:00
Sanjay Patel	330a359fb3	[x86] regenerate checks llvm-svn: 272396	2016-06-10 14:48:50 +00:00
Simon Pilgrim	2fa2690bca	[X86][SSE] Added target shuffle combine tests for byte shift/rotates (PSLLDQ/PSRLDQ/PALIGNR) llvm-svn: 272392	2016-06-10 13:03:22 +00:00
Simon Pilgrim	34263ad995	[X86][AVX512] Added VPSLLDQ/VPSRLDQ memory fold tests Memory operand is new for AVX512 (SSE/AVX2 didn't support it). Also dropped the 'mask' from the tests (VPSLLDQ/VPSRLDQ don't support masked operations). Regenerated VPALIGNR test now that the shuffle comments work llvm-svn: 272383	2016-06-10 09:56:20 +00:00
Craig Topper	200d237e57	[AVX512] Add shuffle comment printing for masked VPERMPD/VPERMQ. llvm-svn: 272371	2016-06-10 05:12:40 +00:00
Craig Topper	89c1761474	[AVX512] Fix shuffle comment printing to handle the masked versions of some shuffles. Previously we were printing the mask operands as the register names. llvm-svn: 272367	2016-06-10 04:48:05 +00:00
Quentin Colombet	3198649199	[LiveRangeEdit] Add a test case for r272314. The test case is not great espicially because it is still cumbersome to run the regalloc pass with run-pass. (We miss a bunch of initiliazier to be properly implemented.) Related to llvm.org/PR27983 llvm-svn: 272360	2016-06-10 01:57:48 +00:00
Simon Pilgrim	643734c565	[X86][AVX512] Added avx512 VPSLLDQ/VPSRLDQ instruction comments llvm-svn: 272319	2016-06-09 22:03:15 +00:00
Simon Pilgrim	f718682eb9	[X86][AVX512] Dropped avx512 VPSLLDQ/VPSRLDQ intrinsics Auto-upgrade to generic shuffles like sse/avx2 implementations now that we can lower to VPSLLDQ/VPSRLDQ llvm-svn: 272308	2016-06-09 21:09:03 +00:00
Simon Pilgrim	47c76e201a	[X86][AVX512] Fixed issue with v16i32 shuffles lowering to VPALIGNR llvm-svn: 272307	2016-06-09 20:53:12 +00:00
Simon Pilgrim	0ab9d3026a	[X86][AVX512] Added support for lowering 512-bit vector shuffles to bit/byte shifts 512-bit VPSLLDQ/VPSRLDQ can only be used for avx512bw targets so lowerVectorShuffleAsShift had to be adjusted to include the subtarget llvm-svn: 272300	2016-06-09 20:13:58 +00:00
Davide Italiano	1a7e32cc48	Also fix a typo. Need more coffee today. llvm-svn: 272278	2016-06-09 17:06:01 +00:00
Davide Italiano	f326b30a15	Improve r272262, check that __stack_chk_guard is used. Thanks to Rafael for the suggestion. llvm-svn: 272277	2016-06-09 17:04:38 +00:00
Davide Italiano	24f1f62dca	Move stackguard test to X86/ directory as it's not generic. llvm-svn: 272264	2016-06-09 15:16:58 +00:00
Igor Breger	f635367e2b	[AVX512] Remove masked_move/blendm intrinsic from back-end. This is complement patch to D21060. Differential Revision: http://reviews.llvm.org/D21174 llvm-svn: 272257	2016-06-09 11:46:55 +00:00
Craig Topper	6f7288dc44	[AVX512] Fix shuffle decode printing for several instructions with write masks. There are still more bugs here with UNPCK and PALIGN for sure. But these were the easiest ones to fix. llvm-svn: 272252	2016-06-09 07:49:08 +00:00
Craig Topper	8537c11ff3	[X86] Fix a test I failed to re-generate in r272249. llvm-svn: 272250	2016-06-09 07:10:34 +00:00
Craig Topper	7a2993093e	[X86] Bring consistent naming to the SSE/AVX and AVX512 PALIGNR instructions. Then add shuffle decode printing for the EVEX forms which is made easier by having the naming structure more similar to other instructions. llvm-svn: 272249	2016-06-09 07:06:38 +00:00
Dehao Chen	769219b11a	Revive http://reviews.llvm.org/D12778 to handle forward-hot-prob and backward-hot-prob consistently. Summary: Consider the following diamond CFG: A / \ B C \/ D Suppose A->B and A->C have probabilities 81% and 19%. In block-placement, A->B is called a hot edge and the final placement should be ABDC. However, the current implementation outputs ABCD. This is because when choosing the next block of B, it checks if Freq(C->D) > Freq(B->D) * 20%, which is true (if Freq(A) = 100, then Freq(B->D) = 81, Freq(C->D) = 19, and 19 > 8120%=16.2). Actually, we should use 25% instead of 20% as the probability here, so that we have 19 < 8125%=20.25, and the desired ABDC layout will be generated. Reviewers: djasper, davidxl Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D20989 llvm-svn: 272203	2016-06-08 21:30:12 +00:00
Simon Pilgrim	536434e80f	[X86][SSE4A] Regenerated SSE4A intrinsics tests There are no VEX encoded versions of SSE4A instructions, make sure that AVX targets give the same output llvm-svn: 272060	2016-06-07 21:15:45 +00:00
Etienne Bergeron	22bfa83208	[stack-protection] Add support for MSVC buffer security check Summary: This patch is adding support for the MSVC buffer security check implementation The buffer security check is turned on with the '/GS' compiler switch. * https://msdn.microsoft.com/en-us/library/8dbf701c.aspx * To be added to clang here: http://reviews.llvm.org/D20347 Some overview of buffer security check feature and implementation: * https://msdn.microsoft.com/en-us/library/aa290051(VS.71).aspx * http://www.ksyash.com/2011/01/buffer-overflow-protection-3/ * http://blog.osom.info/2012/02/understanding-vs-c-compilers-buffer.html For the following example: ``` int example(int offset, int index) { char buffer[10]; memset(buffer, 0xCC, index); return buffer[index]; } ``` The MSVC compiler is adding these instructions to perform stack integrity check: ``` push ebp mov ebp,esp sub esp,50h [1] mov eax,dword ptr [__security_cookie (01068024h)] [2] xor eax,ebp [3] mov dword ptr [ebp-4],eax push ebx push esi push edi mov eax,dword ptr [index] push eax push 0CCh lea ecx,[buffer] push ecx call _memset (010610B9h) add esp,0Ch mov eax,dword ptr [index] movsx eax,byte ptr buffer[eax] pop edi pop esi pop ebx [4] mov ecx,dword ptr [ebp-4] [5] xor ecx,ebp [6] call @__security_check_cookie@4 (01061276h) mov esp,ebp pop ebp ret ``` The instrumentation above is: * [1] is loading the global security canary, * [3] is storing the local computed ([2]) canary to the guard slot, * [4] is loading the guard slot and ([5]) re-compute the global canary, * [6] is validating the resulting canary with the '__security_check_cookie' and performs error handling. Overview of the current stack-protection implementation: * lib/CodeGen/StackProtector.cpp * There is a default stack-protection implementation applied on intermediate representation. * The target can overload 'getIRStackGuard' method if it has a standard location for the stack protector cookie. * An intrinsic 'Intrinsic::stackprotector' is added to the prologue. It will be expanded by the instruction selection pass (DAG or Fast). * Basic Blocks are added to every instrumented function to receive the code for handling stack guard validation and errors handling. * Guard manipulation and comparison are added directly to the intermediate representation. * lib/CodeGen/SelectionDAG/SelectionDAGISel.cpp * lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp * There is an implementation that adds instrumentation during instruction selection (for better handling of sibbling calls). * see long comment above 'class StackProtectorDescriptor' declaration. * The target needs to override 'getSDagStackGuard' to activate SDAG stack protection generation. (note: getIRStackGuard MUST be nullptr). * 'getSDagStackGuard' returns the appropriate stack guard (security cookie) * The code is generated by 'SelectionDAGBuilder.cpp' and 'SelectionDAGISel.cpp'. * include/llvm/Target/TargetLowering.h * Contains function to retrieve the default Guard 'Value'; should be overriden by each target to select which implementation is used and provide Guard 'Value'. * lib/Target/X86/X86ISelLowering.cpp * Contains the x86 specialisation; Guard 'Value' used by the SelectionDAG algorithm. Function-based Instrumentation: * The MSVC doesn't inline the stack guard comparison in every function. Instead, a call to '__security_check_cookie' is added to the epilogue before every return instructions. * To support function-based instrumentation, this patch is * adding a function to get the function-based check (llvm 'Value', see include/llvm/Target/TargetLowering.h), * If provided, the stack protection instrumentation won't be inlined and a call to that function will be added to the prologue. * modifying (SelectionDAGISel.cpp) do avoid producing basic blocks used for inline instrumentation, * generating the function-based instrumentation during the ISEL pass (SelectionDAGBuilder.cpp), * if FastISEL (not SelectionDAG), using the fallback which rely on the same function-based implemented over intermediate representation (StackProtector.cpp). Modifications * adding support for MSVC (lib/Target/X86/X86ISelLowering.cpp) * adding support function-based instrumentation (lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp, .h) Results * IR generated instrumentation: ``` clang-cl /GS test.cc /Od /c -mllvm -print-isel-input ``` ``` * Final LLVM Code input to ISel * ; Function Attrs: nounwind sspstrong define i32 @"\01?example@@YAHHH@Z"(i32 %offset, i32 %index) #0 { entry: %StackGuardSlot = alloca i8* <<<-- Allocated guard slot %0 = call i8* @llvm.stackguard() <<<-- Loading Stack Guard value call void @llvm.stackprotector(i8* %0, i8** %StackGuardSlot) <<<-- Prologue intrinsic call (store to Guard slot) %index.addr = alloca i32, align 4 %offset.addr = alloca i32, align 4 %buffer = alloca [10 x i8], align 1 store i32 %index, i32* %index.addr, align 4 store i32 %offset, i32* %offset.addr, align 4 %arraydecay = getelementptr inbounds [10 x i8], [10 x i8]* %buffer, i32 0, i32 0 %1 = load i32, i32* %index.addr, align 4 call void @llvm.memset.p0i8.i32(i8* %arraydecay, i8 -52, i32 %1, i32 1, i1 false) %2 = load i32, i32* %index.addr, align 4 %arrayidx = getelementptr inbounds [10 x i8], [10 x i8]* %buffer, i32 0, i32 %2 %3 = load i8, i8* %arrayidx, align 1 %conv = sext i8 %3 to i32 %4 = load volatile i8, i8* %StackGuardSlot <<<-- Loading Guard slot call void @__security_check_cookie(i8* %4) <<<-- Epilogue function-based check ret i32 %conv } ``` * SelectionDAG generated instrumentation: ``` clang-cl /GS test.cc /O1 /c /FA ``` ``` "?example@@YAHHH@Z": # @"\01?example@@YAHHH@Z" # BB#0: # %entry pushl %esi subl $16, %esp movl ___security_cookie, %eax <<<-- Loading Stack Guard value movl 28(%esp), %esi movl %eax, 12(%esp) <<<-- Store to Guard slot leal 2(%esp), %eax pushl %esi pushl $204 pushl %eax calll _memset addl $12, %esp movsbl 2(%esp,%esi), %esi movl 12(%esp), %ecx <<<-- Loading Guard slot calll @__security_check_cookie@4 <<<-- Epilogue function-based check movl %esi, %eax addl $16, %esp popl %esi retl ``` Reviewers: kcc, pcc, eugenis, rnk Subscribers: majnemer, llvm-commits, hans, thakis, rnk Differential Revision: http://reviews.llvm.org/D20346 llvm-svn: 272053	2016-06-07 20:15:35 +00:00
Simon Pilgrim	15c6ab5fac	[X86][AVX512] Added 512-bit integer vector non-temporal load tests llvm-svn: 272016	2016-06-07 15:12:47 +00:00
Simon Pilgrim	9a89623b57	[X86][SSE] Add general lowering of nontemporal vector loads Currently the only way to use the (V)MOVNTDQA nontemporal vector loads instructions is through the int_x86_sse41_movntdqa style builtins. This patch adds support for lowering nontemporal loads from general IR, allowing us to remove the movntdqa builtins in a future patch. We currently still fold nontemporal loads into suitable instructions, we should probably look at removing this (and nontemporal stores as well) or at least make the target's folding implementation aware that its dealing with a nontemporal memory transaction. There is also an issue that VMOVNTDQA only acts on 128-bit vectors on pre-AVX2 hardware - so currently a normal ymm load is still used on AVX1 targets. Differential Review: http://reviews.llvm.org/D20965 llvm-svn: 272010	2016-06-07 13:34:24 +00:00
Igor Breger	61e628591f	[AVX512] Fix load opcode for fast isel. Differential Revision: http://reviews.llvm.org/D21067 llvm-svn: 272006	2016-06-07 13:08:45 +00:00
Simon Pilgrim	ca1da1bf07	[X86][SSE] Improved blend+zero target shuffle combining to use combined shuffle mask directly We currently only combine to blend+zero if the target value type has 8 elements or less, but this was missing a lot of cases where the combined mask had been widened. This change makes it so we use the combined mask to determine the blend value type, allowing us to catch more widened cases. llvm-svn: 272003	2016-06-07 12:20:14 +00:00
Igor Breger	edafb0595e	[KNL] Fix UMULO lowering. Differential Revision: http://reviews.llvm.org/D21013 llvm-svn: 271891	2016-06-06 12:24:52 +00:00
Craig Topper	33350cc406	[AVX512] Remove masked palignr intrinsics and auto-upgrade them to native IR of vector shuffle and select. llvm-svn: 271872	2016-06-06 06:12:54 +00:00
Craig Topper	143446d5c1	[AVX512] Add PALIGNR shuffle lowering for v32i16 and v16i32. llvm-svn: 271870	2016-06-06 05:39:10 +00:00
Craig Topper	ccad6d57c1	[AVX512] Update tests to show shuffle decoding for vpshuflw/vpshufhw. llvm-svn: 271869	2016-06-06 05:39:07 +00:00
Simon Pilgrim	64c6de4525	[X86][XOP] Added VPERMIL2PD/VPERMIL2PS raw mask decoding for target shuffle combines llvm-svn: 271834	2016-06-05 15:21:30 +00:00
Simon Pilgrim	478295dadd	[X86][XOP] Added VPERMIL2PD/VPERMIL2PS as a target shuffle type llvm-svn: 271831	2016-06-05 15:01:45 +00:00
Craig Topper	8eeda57a40	[AVX512] Add support for lowering PALIGNR for v64i8. Could do this for other types to, but this is what's needed to replace the instrinsic with native IR in clang. llvm-svn: 271828	2016-06-05 06:29:12 +00:00
Craig Topper	5a315d4613	[AVX512] Split command lines and regenerate a test to prepare for a future commit. llvm-svn: 271827	2016-06-05 06:29:08 +00:00
Craig Topper	9f51c9ef15	[AVX512] Fix PANDN combining for v4i32/v8i32 when VLX is enabled. v4i32/v8i32 ANDs aren't promoted to v2i64/v4i64 when VLX is enabled. llvm-svn: 271826	2016-06-05 05:35:11 +00:00
Simon Pilgrim	2ead861d07	[X86][XOP] Added VPERMIL2PD/VPERMIL2PS shuffle mask comment decoding llvm-svn: 271809	2016-06-04 21:44:28 +00:00
Saleem Abdulrasool	1fcdc23a6e	X86: enable TLS on Windows itanium Windows itanium is nearly identical to windows-msvc (MS ABI for C, itanium for C++). Enable the TLS support for the target similar to the MSVC model. llvm-svn: 271797	2016-06-04 18:27:22 +00:00
Simon Pilgrim	fd2eda4f64	[X86][AVX2] Fix v16i16 SHL lowering (PR27730) The AVX2 v16i16 shift lowering works by unpacking to 2 x v8i32, performing the shift and then truncating the result. The unpacking is used to place the values in the upper 16-bits so that we can correctly sign-extend for SRA shifts. Unfortunately we weren't ensuring that the lower 16-bits were zero to ensure that SHL correctly shifts in zero bits. llvm-svn: 271796	2016-06-04 16:45:33 +00:00
Simon Pilgrim	ff35eecd90	[X86][AVX512] Fixed 512-bit vector nontemporal load alignment llvm-svn: 271673	2016-06-03 14:12:43 +00:00
Simon Pilgrim	f92d175a78	[X86][AVX512] Added 512-bit vector nontemporal load tests llvm-svn: 271668	2016-06-03 13:42:49 +00:00
Simon Pilgrim	a6022c9a63	[X86][SSE] Added nontemporal load tests These currently all lower to regular loads, generic nontemporal load support will be added in a future patch llvm-svn: 271659	2016-06-03 11:00:55 +00:00
Simon Pilgrim	960ca812ed	[X86] Added nontemporal scalar store tests llvm-svn: 271656	2016-06-03 10:30:54 +00:00
Simon Pilgrim	02284541b2	[X86][SSE] Regenerated nontemporal vector store tests and added extra target types llvm-svn: 271654	2016-06-03 10:24:24 +00:00
Simon Pilgrim	38b4661b1b	[X86] Regenerated nontemporal store tests and added tests for all 128-bit vector types llvm-svn: 271651	2016-06-03 10:15:36 +00:00
Simon Pilgrim	205f65f62f	[X86][AVX2] Relaxed alignment on nontemporal store tests llvm-svn: 271646	2016-06-03 10:06:59 +00:00
Simon Pilgrim	8ea8940677	[X86][AVX2] Regenerated nontemporal store tests and added tests for all 256-bit vector types llvm-svn: 271645	2016-06-03 09:56:24 +00:00
Simon Pilgrim	e85506b6e0	[X86][XOP] Support for VPERMIL2PD/VPERMIL2PS 2-input shuffle instructions This patch begins adding support for lowering to the XOP VPERMIL2PD/VPERMIL2PS shuffle instructions - adding the X86ISD::VPERMIL2 opcode and cleaning up the usage. The internal llvm intrinsics were assuming the shuffle mask operand was the same type as the float/double input operands (I guess to simplify the intrinsic definitions in X86InstrXOP.td to a single value type). These needed changing to integer types (matching the clang builtin and the AMD intrinsics definitions), an auto upgrade path is added to convert old calls. Mask decoding/target shuffle support will be added in future patches. Differential Revision: http://reviews.llvm.org/D20049 llvm-svn: 271633	2016-06-03 08:06:03 +00:00
Craig Topper	e7ae106147	[AVX512] Ensure EVEX vpshufd, vpshuflw, and vpshufhw have isel priority over the VEX encoded ones. llvm-svn: 271629	2016-06-03 05:31:04 +00:00
Craig Topper	01f53b1773	[AVX512] Fix shuffle comment printing for EVEX encoded PSHUFD, PSHUFHW, and PSHUFLW. llvm-svn: 271628	2016-06-03 05:31:00 +00:00
Simon Pilgrim	ab95b2fe26	[X86][SSE] Added SSE41/AVX2 non-temporal tests Useful for when we add MOVNTDQA support llvm-svn: 271552	2016-06-02 18:01:21 +00:00
Dimitry Andric	6a482a73d6	Only attempt to detect AVG if SSE2 is available Summary: In PR29973 Sanjay Patel reported an assertion failure when a certain loop was optimized, for a target without SSE2 support. It turned out this was because of the AVG pattern detection introduced in rL253952. Prevent the assertion failure by bailing out early in `detectAVGPattern()`, if the target does not support SSE2. Also add a minimized test case. Reviewers: congh, eli.friedman, spatel Subscribers: emaste, llvm-commits Differential Revision: http://reviews.llvm.org/D20905 llvm-svn: 271548	2016-06-02 17:30:49 +00:00
Sanjay Patel	f509d85a6d	[DAG] use getBitcast() to reduce code Although this was intended to be NFC, the test case wiggle shows a change in code scheduling/RA caused by a difference in the SDLoc() generation. Depending on how you look at it, this is the (dis)advantage of exact checking in regression tests. llvm-svn: 271526	2016-06-02 16:01:15 +00:00
Simon Pilgrim	ebdc397c86	[X86][SSE] Added non-temporal load tests for vector types These currently lower to regular loads instead of MOVNTDQA llvm-svn: 271516	2016-06-02 13:51:50 +00:00
Simon Pilgrim	0afd5a4d80	[X86][SSE] Replace (V)CVTTPS2DQ and VCVTTPD2DQ truncating (round to zero) f32/f64 to i32 with generic IR (llvm) This patch removes the llvm intrinsics (V)CVTTPS2DQ and VCVTTPD2DQ truncation (round to zero) conversions and auto-upgrades to FP_TO_SINT calls instead. Note: I looked at updating CVTTPD2DQ as well but this still requires a lot more work to correctly lower. Differential Revision: http://reviews.llvm.org/D20860 llvm-svn: 271510	2016-06-02 10:55:21 +00:00
Craig Topper	ca9c0801e1	[X86] Add AVX 256-bit load and stores to fast isel. I'm not sure why this was missing for so long. This also exposed that we were picking floating point 256-bit VMOVNTPS for some integer types in normal isel for AVX1 even though VMOVNTDQ is available. In practice it doesn't matter due to the execution dependency fix pass, but it required extra isel patterns. Fixing that in a follow up commit. llvm-svn: 271481	2016-06-02 04:19:45 +00:00
Craig Topper	f10fbfa738	[AVX512] Remove masked load intrinsics. Clang now emits generic masked load intrinsics instead. The intrinsics will be autoupgraded to the same generic masked loads. llvm-svn: 271478	2016-06-02 04:19:36 +00:00
Sanjay Patel	b4a4357ecb	[x86, AVX2] regenerate checks llvm-svn: 271434	2016-06-01 21:32:56 +00:00
Michael Kuperstein	738ae45ce8	[DAG] Improve legalization of INSERT_SUBVECTOR When the index is known to be constant 0, insert directly into the the low half, instead of spilling, performing the insert in-memory, and reloading. Differential Revision: http://reviews.llvm.org/D20763 llvm-svn: 271428	2016-06-01 20:49:35 +00:00
Than McIntosh	4ef761aa35	Better fix for PR27903. Summary: Re-enable lifetime-start-on-first-use for stack coloring, but explicitly disable it for slots with more than one start or end lifetime marker. Bug: 27903 Reviewers: wmi, tejohnson, qcolombet, gbiv Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D20739 llvm-svn: 271412	2016-06-01 17:55:10 +00:00
Simon Pilgrim	1cd61b82bd	[X86][SSE] Added non-temporal store tests for all 512-bit vector types llvm-svn: 271393	2016-06-01 13:58:00 +00:00

... 2 3 4 5 6 ...

7894 Commits