llvm-project

Commit Graph

Author	SHA1	Message	Date
Elena Demikhovsky	9d0e7c33d3	X86 CodeGen: Optimized pattern for truncate with unsigned saturation. DAG patterns optimization: truncate + unsigned saturation supported by VPMOVUS* instructions in AVX-512. And VPACKUS* instructions on SEE* targets. Differential Revision: https://reviews.llvm.org/D28216 llvm-svn: 291670	2017-01-11 12:59:32 +00:00
Simon Pilgrim	5a81fefad3	[X86][AVX512BW] Vectorize v64i8 vector shifts Differential Revision: https://reviews.llvm.org/D28447 llvm-svn: 291665	2017-01-11 10:36:51 +00:00
Elad Cohen	0c2601073e	[X86] Fix PR30926 - Add patterns for (v)cvtsi2s{s,d} and (v)cvtsd2s{s,d} The code emiited by Clang's intrinsics for (v)cvtsi2ss, (v)cvtsi2sd, (v)cvtsd2ss and (v)cvtss2sd is lowered to a code sequence that includes redundant (v)movss/(v)movsd instructions. This patch adds patterns for optimizing these sequences. Differential revision: https://reviews.llvm.org/D28455 llvm-svn: 291660	2017-01-11 09:11:48 +00:00
Mohammed Agabaria	2c96c43388	[X86] updating TTI costs for arithmetic instructions on X86\SLM arch. updated instructions: pmulld, pmullw, pmulhw, mulsd, mulps, mulpd, divss, divps, divsd, divpd, addpd and subpd. special optimization case which replaces pmulld with pmullw\pmulhw\pshuf seq. In case if the real operands bitwidth <= 16. Differential Revision: https://reviews.llvm.org/D28104 llvm-svn: 291657	2017-01-11 08:23:37 +00:00
Hans Wennborg	6573976f57	Re-commit r289955: [X86] Fold (setcc (cmp (atomic_load_add x, -C) C), COND) to (setcc (LADD x, -C), COND) (PR31367) This was reverted because it would miscompile code where the cmp had multiple uses. That was due to a deficiency in the existing code, which was fixed in r291630 (see the PR for details). This re-commit includes an extra test for the kind of code that got miscompiled: @test_sub_1_setcc_jcc. llvm-svn: 291640	2017-01-11 01:36:57 +00:00
Hans Wennborg	12de693747	[X86] Dont run combineSetCCAtomicArith() when the cmp has multiple uses We would miscompile the following: void g(int); int f(volatile long long *p) { bool b = __atomic_fetch_add(p, 1, __ATOMIC_SEQ_CST) < 0; g(b ? 12 : 34); return b ? 56 : 78; } into pushq %rax lock incq (%rdi) movl $12, %eax movl $34, %edi cmovlel %eax, %edi callq g(int) testq %rax, %rax <---- Bad. movl $56, %ecx movl $78, %eax cmovsl %ecx, %eax popq %rcx retq because the code failed to take into account that the cmp has multiple uses, replaced one of them, and left the other one comparing garbage. llvm-svn: 291630	2017-01-11 00:49:54 +00:00
Michael Zuckerman	bcd03e7f3b	[X86][AVX512]Improving shuffle lowering by using AVX-512 EXPAND* instructions This patch fix PR31351: https://llvm.org/bugs/show_bug.cgi?id=31351 1. This patch adds new type of shuffle lowering 2. We can use the expand instruction, When the shuffle pattern is as following: { 0a[0]0a[1]...0*a[n] , n >=0 where a[] elements in a ascending order}. Reviewers: 1. igorb 2. guyblank 3. craig.topper 4. RKSimon Differential Revision: https://reviews.llvm.org/D28352 llvm-svn: 291584	2017-01-10 18:57:17 +00:00
Craig Topper	d55b83128b	AMD family 17h (znver1) enablement Summary: This patch enables the following 1. AMD family 17h architecture using "znver1" tune flag (-march, -mcpu). 2. ISAs that are enabled for "znver1" architecture. 3. Checks ADX isa from cpuid to identify "znver1" flag when -march=native is used. 4. ISAs FMA4, XOP are disabled as they are dropped from amdfam17. 5. For the time being, it uses the btver2 scheduler model. 6. Test file is updated to check this flag. This item is linked to clang review item https://reviews.llvm.org/D28018 Patch by Ganesh Gopalasubramanian Reviewers: RKSimon, craig.topper Subscribers: vprasad, RKSimon, ashutosh.nema, llvm-commits Differential Revision: https://reviews.llvm.org/D28017 llvm-svn: 291543	2017-01-10 06:01:16 +00:00
Craig Topper	2ed461e5c4	[X86] When lowering uniform shifts, use X86ISD::VZEXT instead of using a ZERO_EXTEND_VECTOR_INREG. If we emit the ZERO_EXTEND_VECTOR_INREG too late it doesn't get lowered properly and makes it through to isel and fails. Fixes PR31593. llvm-svn: 291535	2017-01-10 04:12:24 +00:00
Michael Kuperstein	1559e8863e	Revert r291092 because it introduces a crash. See PR31589 for details. llvm-svn: 291478	2017-01-09 21:04:46 +00:00
Vyacheslav Klochkov	d497d36083	X86-specific path: Implemented the fusing of MUL+ADDSUB to FMADDSUB. Differential Revision: https://reviews.llvm.org/D28087 llvm-svn: 291473	2017-01-09 20:26:17 +00:00
Simon Pilgrim	0f23b2ba1a	[X86][AVX512] Enable v16i8/v32i8 vector shifts to use an extend+shift+truncate pattern. Use the existing AVX2 v8i16 vector shift lowering for v16i8 (extending to v16i32) on AVX512 targets and v32i8 (extending to v32i16) on AVX512BW targets. Cost model updates to follow. llvm-svn: 291451	2017-01-09 17:20:03 +00:00
Simon Pilgrim	d990cd371b	[X86][AVX512DQ] Enable v16i16 vector shifts to use an extend+shift+truncate pattern. Use the existing AVX2 v8i16 vector shift lowering for v16i16 on AVX512 targets (AVX512BW will have already have lowered with vpsravw). Cost model updates to follow. llvm-svn: 291445	2017-01-09 15:15:45 +00:00
Craig Topper	96ab6fd2eb	[AVX-512] Change another pattern that was using BLENDM to use masked moves. A future patch will conver it back to BLENDM if its beneficial to register allocation. llvm-svn: 291419	2017-01-09 04:19:34 +00:00
Craig Topper	6393afce97	[AVX-512] Add patterns to use a zero masked VPTERNLOG instruction for vselects of all ones and all zeros. Previously we emitted a VPTERNLOG and a separate masked move. llvm-svn: 291415	2017-01-09 02:44:34 +00:00
Craig Topper	f51ba1e3da	[AVX-512] If avx512dq is available use vpmovm2d/vpmovm2q instead of vselect of zeroes/ones when handling sign extends of i1 without VLX. llvm-svn: 291402	2017-01-08 21:32:30 +00:00
Simon Pilgrim	e6d948b857	Strip trailing whitespace. llvm-svn: 291395	2017-01-08 18:37:42 +00:00
Simon Pilgrim	b2a80950fe	Fix line endings and strip trailing whitespace. llvm-svn: 291393	2017-01-08 16:45:39 +00:00
Sanjay Patel	bf51c8a975	[x86] fix usage of stale operands when lowering select I noticed this problem as part of the ongoing attempt to canonicalize min/max ops in IR. The debug output shows nodes like this: t4: i32 = xor t2, Constant:i32<-1> t21: i8 = setcc t4, Constant:i32<0>, setlt:ch t14: i32 = select t21, t4, Constant:i32<-1> And because the select is holding onto the t4 (xor) node while EmitTest creates a new x86-specific xor node, the lowering results in: t4: i32 = xor t2, Constant:i32<-1> t25: i32,i32 = X86ISD::XOR t2, Constant:i32<-1> t28: i32,glue = X86ISD::CMOV Constant:i32<-1>, t4, Constant:i8<15>, t25:1 Differential Revision: https://reviews.llvm.org/D28374 llvm-svn: 291392	2017-01-08 15:53:40 +00:00
Simon Pilgrim	9c58950eeb	[CostModel][X86] Fixed vXi8 uniform shift costs. The 'fast' costs should only work for shifts by uniform constants (uniform non-constant are lowered using the slow default implementation). Logical shifts were not taking into account that we must mask the psrlw result, so the costs needed to be doubled. Added missing AVX2/AVX512BW costs as well. llvm-svn: 291391	2017-01-08 14:14:36 +00:00
Simon Pilgrim	1fa5487c05	[CostModel][X86] Moved legal uniform shift costs earlier. XOP was prematurely matching, doubling the cost of ashr/lshr uniform shifts. llvm-svn: 291390	2017-01-08 13:12:03 +00:00
Craig Topper	5c46c7526e	[AVX-512] Remove redundant patterns that select unaligned moves with zero masking for patterns that already use the aligned form. NFC llvm-svn: 291383	2017-01-08 05:46:21 +00:00
Simon Pilgrim	9681c407b4	[CostModel][X86] Update SSE41/AVX1 vXi32 SHL costs SSE41 provides pmulld which allows the simpler pslld/paddd/cvttps2dq/pmulld pattern than SSE2's use of pmuludq. llvm-svn: 291372	2017-01-07 22:27:43 +00:00
Craig Topper	a74e3088df	[AVX-512] Remove patterns from the other VBLENDM instructions. They are all redundant with masked move instructions. We should probably teach the two address instruction pass to turn masked moves into BLENDM when its beneficial to the register allocator. llvm-svn: 291371	2017-01-07 22:20:34 +00:00
Craig Topper	81f20aa336	[AVX-512] Remove patterns from masked broadcast versions of BLENDM instructions. All but (v2f64 broadcast f64) are handled with VBROADCAST instructions. The v2f64 version can be handled with VMOVDDUP. We may want to consider converting to BLENDM instructions in the two address instruction pass if its beneficial to register allocation. llvm-svn: 291369	2017-01-07 22:20:26 +00:00
Craig Topper	da84ff3ed4	[AVX-512] Add masked forms of the alternate MOVDDUP patterns. I'm not too sure how to get isel to select even all of the unmasked forms, but at least we have a consistent set now. llvm-svn: 291368	2017-01-07 22:20:23 +00:00
Simon Pilgrim	a470296367	[CostModel][X86] Fix AVX2 v16i16 shift 'splat' costs. llvm-svn: 291366	2017-01-07 22:08:09 +00:00
Simon Pilgrim	82e3e05fe2	[CostModel][X86] Match 256-bit vector shift 'splat' costs for AVX2 and above We were matching against general vector shift costs before the uniform splat costs llvm-svn: 291365	2017-01-07 21:47:10 +00:00
Simon Pilgrim	e70644dab7	[CostModel][X86] Generalized cost calculation of SHL by constant -> MUL conversion. llvm-svn: 291364	2017-01-07 21:33:00 +00:00
Simon Pilgrim	725997154d	[CostModel][X86] Merge separate AVX1 cost LUTs. NFCI. llvm-svn: 291355	2017-01-07 18:19:25 +00:00
Simon Pilgrim	a4109d6433	[CostModel][AVX512BW] Add v32i16 vector shift costs for avx512bw targets. llvm-svn: 291354	2017-01-07 17:54:10 +00:00
Simon Pilgrim	df7de7a87e	[CostModel][X86] Added missing AVX2 arithmetic costs. Allows us to correctly fall through to the lower AVX1 costs if look up failed. llvm-svn: 291353	2017-01-07 17:27:39 +00:00
Simon Pilgrim	100eae1ee0	[CostModel][X86] Reordered AVX1 arithmetic cost LUT into descending target order. NFCI. llvm-svn: 291352	2017-01-07 17:03:51 +00:00
Simon Pilgrim	a1b8e2c725	[X86][AVX512] Use lowerShuffleAsRepeatedMaskAndLanePermute for non-VBMI v64i8 shuffles (PR31470) llvm-svn: 291347	2017-01-07 15:37:50 +00:00
Craig Topper	42b848a683	[X86] Disable load unfolding for 128-bit MOVDDUP instructions since the load size is smaller than the register size so unfolding would increase the load size. llvm-svn: 291338	2017-01-07 06:56:54 +00:00
Simon Pilgrim	3128d6b520	[X86][SSE] Pass float domain flag to shuffle combine match functions. NFCI. Early step towards ignoring domain above a certain shuffle depth. llvm-svn: 291248	2017-01-06 17:34:30 +00:00
Simon Pilgrim	bd3c6824d4	[X86][SSE] Simplify float domain requirement in unary shuffle matching. The AVX1-only limit is never actually required in matchUnaryVectorShuffle llvm-svn: 291244	2017-01-06 17:00:59 +00:00
Simon Pilgrim	a08d7b9913	Remove trailing whitespace. NFCI. llvm-svn: 291240	2017-01-06 15:31:52 +00:00
Simon Pilgrim	9b8c7caf4e	[X86] Add X86Subtarget argument. NFCI. All callers of getTargetVShiftNode have access to X86Subtarget already so pass it along instead of re-extracting it. llvm-svn: 291239	2017-01-06 15:29:17 +00:00
Simon Pilgrim	d8333372bc	[CostModel][X86] Fix 512-bit SDIV/UDIV 'big' costs. Set the costs on the lowest target that supports the type. llvm-svn: 291229	2017-01-06 11:12:53 +00:00
Craig Topper	e86fb932ea	[AVX-512] Add EXTRACT_SUBVECTOR support to combineBitcastForMaskedOp. llvm-svn: 291214	2017-01-06 05:18:48 +00:00
Simon Pilgrim	aa186c632d	[CostModel][X86] Tidyup arithmetic costs code. NFCI. Remove unnecessary braces, remove one use variables and keep LUTs to similar naming convention. llvm-svn: 291187	2017-01-05 22:48:02 +00:00
Simon Pilgrim	4c050c2190	[CostModel][X86] Move vXi32 MUL costs into existing tables. NFCI. llvm-svn: 291165	2017-01-05 19:42:43 +00:00
Simon Pilgrim	6f72eba606	Remove trailing whitespace. NFCI. llvm-svn: 291163	2017-01-05 19:24:25 +00:00
Simon Pilgrim	5b06e4d319	[CostModel][X86] Reordered SSE42 arithmetic cost LUT into descending order. NFCI. llvm-svn: 291162	2017-01-05 19:19:39 +00:00
Simon Pilgrim	a8bf97569a	[CostModel][X86] Move vXi64 MUL costs into existing tables. NFCI. Removes need for yet another LUT. llvm-svn: 291158	2017-01-05 19:01:50 +00:00
Simon Pilgrim	430d34fc14	[CostModel][X86] Strip unused 256-bit vector shift costs. NFCI. Remove SSE2 256-bit entries - AVX targets will have used the SSE42 costs instead. llvm-svn: 291152	2017-01-05 18:36:48 +00:00
Simon Pilgrim	b01e844241	[CostModel][X86] Include the cost of 256-bit upper subvector extract/insertion in AVX1 v4i64 MUL Matches other MUL/ADD/SUB 256-bit case on AVX1 llvm-svn: 291149	2017-01-05 18:20:25 +00:00
Simon Pilgrim	f74700aa8c	[CostModel][X86] Merged SK_PermuteSingleSrc/SK_PermuteTwoSrc into common shuffle cost LUTs. NFCI. llvm-svn: 291146	2017-01-05 17:56:19 +00:00
Sanjay Patel	dea5a7bd53	less braces; NFC llvm-svn: 291126	2017-01-05 16:47:32 +00:00
Simon Pilgrim	bca02f9e20	[CostModel][X86] Add support for broadcast shuffle costs Currently only for broadcasts with input and output of the same width. Differential Revision: https://reviews.llvm.org/D27811 llvm-svn: 291122	2017-01-05 15:56:08 +00:00
Zvi Rackover	4b7d724d62	[X86] Optimize vector shifts with variable but uniform shift amounts Summary: For instructions such as PSLLW/PSLLD/PSLLQ a variable shift amount may be passed in an XMM register. The lower 64-bits of the register are evaluated to determine the shift amount. This patch improves the construction of the vector containing the shift amount. Reviewers: craig.topper, delena, RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D28353 llvm-svn: 291120	2017-01-05 15:11:43 +00:00
Simon Pilgrim	a62395a4bd	[CostModel][X86] Pulled out common type legalization code llvm-svn: 291109	2017-01-05 14:33:32 +00:00
Mohammed Agabaria	23599ba794	Currently isLikelyComplexAddressComputation tries to figure out if the given stride seems to be 'complex' and need some extra cost for address computation handling. This code seems to be target dependent which may not be the same for all targets. Passed the decision whether the given stride is complex or not to the target by sending stride information via SCEV to getAddressComputationCost instead of 'IsComplex'. Specifically at X86 targets we dont see any significant address computation cost in case of the strided access in general. Differential Revision: https://reviews.llvm.org/D27518 llvm-svn: 291106	2017-01-05 14:03:41 +00:00
Mohammed Agabaria	189e2d29ba	[Test Commit] fixing some format issue in X86TTI to match clang-format output. llvm-svn: 291095	2017-01-05 09:51:02 +00:00
Elena Demikhovsky	143cbc425b	AVX-512: Optimized pattern for truncate with unsigned saturation. DAG patterns optimization: truncate + unsigned saturation supported by VPMOVUS* instructions in AVX-512. Differential revision: https://reviews.llvm.org/D28216 llvm-svn: 291092	2017-01-05 08:21:09 +00:00
Eric Christopher	568c113ac0	Remove dead and unused variable NumSentinelElements. Fixes PR31529. llvm-svn: 290998	2017-01-04 20:05:18 +00:00
Simon Pilgrim	bb895f3e9c	[CostModel][X86] Updated vXi8 and vXi16 Reverse/Alternate shuffle costs Actual codegen is much better than the extract+insert patterns that was assumed. llvm-svn: 290962	2017-01-04 14:01:33 +00:00
Simon Pilgrim	939b8cd708	[X86] Merged Reverse/Alternate shuffle cost tables. NFCI. As discussed on D27811, merged the shuffle cost LUTs and use the shuffle kind to perform the lookup instead of the ISD opcode. llvm-svn: 290956	2017-01-04 12:08:41 +00:00
Florian Hahn	5815f6c53c	[framelowering] Skip dbg values when getting next/previous instruction. Summary: In mergeSPUpdates, debug values need to be ignored when getting the previous element, otherwise debug data could have an impact on codegen. In eliminateCallFramePseudoInstr, debug values after the erased element could have an impact on codegen and should be skipped. Closes PR31319 (https://llvm.org/bugs/show_bug.cgi?id=31319) Reviewers: aprantl, MatzeB, mkuper Subscribers: gbedwell, llvm-commits Differential Revision: https://reviews.llvm.org/D27688 llvm-svn: 290955	2017-01-04 12:08:35 +00:00
Ayman Musa	02f9533823	[X86][AVX512] Passing the appropriate memory operand class to INT_{U}COMIS{S\|D} instructions Replacing the memory operand in the intrinsic versions of the comis/ucomis instrucions from f128mem to ssmem/sdmem accordingly. Differential Revision: https://reviews.llvm.org/D28138 llvm-svn: 290948	2017-01-04 08:21:54 +00:00
Simon Pilgrim	c76ea4b638	[X86] Attempt to pre-truncate arithmetic operations if useful In some cases its more efficient to combine TRUNC( BINOP( X, Y ) ) --> BINOP( TRUNC( X ), TRUNC( Y ) ) if the binop is legal for the truncated types. This is true for vector integer multiplication (especially vXi64), as well as ADD/AND/XOR/OR in cases where we only need to truncate one of the inputs at runtime (e.g. a duplicated input or an one use constant we can fold). Further work could be done here - scalar cases (especially i64) could often benefit (if we avoid partial registers etc.), other opcodes, and better analysis of when truncating the inputs reduces costs. I have considered implementing this for all targets within the DAGCombiner but wasn't sure we could devise a suitable cost model system that would give us the range we need. Differential Revision: https://reviews.llvm.org/D28219 llvm-svn: 290947	2017-01-04 08:05:42 +00:00
Craig Topper	d0aa53b9ae	[AVX-512] Add support for detecting 512-bit shuffles that contain a 128-bit subvector insertion from the lowest subvector of one of the sources. These are best handled with a vinsert32x4 or vinsert64x2 instruction. llvm-svn: 290946	2017-01-04 07:32:03 +00:00
Craig Topper	83115a809f	[AVX-512] Simplify code for creating 512-bit SHUF128 operations. We don't need two loops and we can safely assume assume and hardcode the size of the widened mask. llvm-svn: 290942	2017-01-04 07:31:51 +00:00
Craig Topper	48d232d3e7	[X86] Move 128-bit shuffle mask widening check into lowerV2X128VectorShuffle to reduce code duplication. Use the now available widened mask to simplify some code inside lowerV2X128VectorShuffle. llvm-svn: 290872	2017-01-03 07:36:41 +00:00
Craig Topper	785e58fdc9	[AVX-512] Simplify the code added in r290870 to recognized 256-bit subvector inserts and avoid calling isShuffleEquivalent on a widened mask. llvm-svn: 290871	2017-01-03 07:36:39 +00:00
Craig Topper	9496e3f916	[AVX-512] Teach shuffle lowering to use vinsert instructions for shuffles corresponding to 256-bit subvector inserts. llvm-svn: 290870	2017-01-03 07:00:40 +00:00
Craig Topper	fa875a1d3d	[AVX-512] Teach EVEX to VEX conversion pass to handle VINSERT and VEXTRACT instructions. llvm-svn: 290869	2017-01-03 05:46:18 +00:00
Craig Topper	be9ef55152	[X86] Remove trailing whitespace and an unnecessary line wrap. NFC llvm-svn: 290867	2017-01-03 05:46:06 +00:00
Craig Topper	06bae884bd	[X86] Fix header comment. NFC llvm-svn: 290866	2017-01-03 05:46:05 +00:00
Craig Topper	c849172105	[AVX-512] Add support for pushing bitcasts through INSERT_SUBVEC in order to select a masked operation. llvm-svn: 290865	2017-01-03 05:46:02 +00:00
Craig Topper	0cda8bbf74	[AVX-512] Remove vinsert intrinsics and autoupgrade to native shufflevectors. There are some codegen problems here that I'll try to fix in future commits. llvm-svn: 290864	2017-01-03 05:45:57 +00:00
Craig Topper	4d47c6ae57	[AVX-512] Remove vextract intrinsics and autoupgrade to native shufflevectors. This unfortunately generates some really terrible code without VLX support due to v2i1 and v4i1 not being legal. Hopefully we can improve that in future patches. llvm-svn: 290863	2017-01-03 05:45:46 +00:00
Dean Michael Berris	f7e7b938ea	[XRay] Merge instrumentation point table emission code into AsmPrinter. Summary: No need to have this per-architecture. While there, unify 32-bit ARM's behaviour with what changed elsewhere and start function names lowercase as per the coding standards. Individual entry emission code goes to the entry's own class. Fully tested on amd64, cross-builds on both ARMs and PowerPC. Reviewers: dberris Subscribers: aemerson, llvm-commits Differential Revision: https://reviews.llvm.org/D28209 llvm-svn: 290858	2017-01-03 04:30:21 +00:00
Elena Demikhovsky	d96200d60a	Fixed shuffle-reverse cost on AVX-512. (This changed was approved in https://reviews.llvm.org/D28118, but Simon asked to submit it separately). llvm-svn: 290812	2017-01-02 11:44:10 +00:00
Elena Demikhovsky	21706cbd24	AVX-512 Loop Vectorizer: Cost calculation for interleave load/store patterns. X86 target does not provide any target specific cost calculation for interleave patterns.It uses the common target-independent calculation, which gives very high numbers. As a result, the scalar version is chosen in many cases. The situation on AVX-512 is even worse, since we have 3-src shuffles that significantly reduce the cost. In this patch I calculate the cost on AVX-512. It will allow to compare interleave pattern with gather/scatter and choose a better solution (PR31426). * Shiffle-broadcast cost will be changed in Simon's upcoming patch. Differential Revision: https://reviews.llvm.org/D28118 llvm-svn: 290810	2017-01-02 10:37:52 +00:00
Reid Kleckner	cd46c1df80	Revert "[COFF] Use 32-bit jump table entries in .rdata for Win64" This reverts commit r290694. It broke sanitizer tests on Win64. I'll probably bring this back, but the jump tables will just live in .text like they do for MSVC. llvm-svn: 290714	2016-12-29 17:07:10 +00:00
Reid Kleckner	c9e0a153cf	[COFF] Use 32-bit jump table entries in .rdata for Win64 Summary: We were already using 32-bit jump table entries, but this was a consequence of the default PIC model on Win64, and not an intentional design decision. This patch ensures that we always use 32-bit label difference jump table entries on Win64 regardless of the PIC model. This is a good idea because it saves executable size and object file size. Moving the jump tables to .rdata cleans up the disassembled object code and reduces the available ROP targets, but it requires adding one more RIP-relative lea to the code. COFF doesn't have relocations to express the difference between two arbitrary symbols, so we can't use the jump table label in the label difference like we do elsewhere. Fixes PR31488 Reviewers: majnemer, compnerd Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D28141 llvm-svn: 290694	2016-12-29 00:12:39 +00:00
Gadi Haber	19c4fc5e62	This is a large patch for X86 AVX-512 of an optimization for reducing code size by encoding EVEX AVX-512 instructions using the shorter VEX encoding when possible. There are cases of AVX-512 instructions that have two possible encodings. This is the case with instructions that use vector registers with low indexes of 0 - 15 and do not use the zmm registers or the mask k registers. The EVEX encoding prefix requires 4 bytes whereas the VEX prefix can take only up to 3 bytes. Consequently, using the VEX encoding for these instructions results in a code size reduction of ~2 bytes even though it is compiled with the AVX-512 features enabled. Reviewers: Craig Topper, Zvi Rackoover, Elena Demikhovsky Differential Revision: https://reviews.llvm.org/D27901 llvm-svn: 290663	2016-12-28 10:12:48 +00:00
Craig Topper	e77e901130	[AVX-512] Add all forms of VPALIGNR, VALIGND, and VALIGNQ to the load folding tables. llvm-svn: 290591	2016-12-27 06:51:09 +00:00
Craig Topper	2da265b7bf	[AVX-512] Remove masked pmuldq and pmuludq intrinsics and autoupgrade them to unmasked intrinsics plus a select. llvm-svn: 290583	2016-12-27 05:30:14 +00:00
Craig Topper	89b3e0223f	[AVX-512] Add 512-bit unmasked intrinsics for pmuldq and pmuludq so we can add them to InstCombine with the 128 and 256 bit versions. The 128 and 256 bit masked intrinsics are currently unused by clang. The sse and avx2 unmasked intrinsics are used instead. The new 512-bit intrinsic will be used to do the same. Then all masked versions will removed and autoupgraded. llvm-svn: 290573	2016-12-27 03:46:05 +00:00
Craig Topper	83f2145c18	[AVX-512] Add isel patterns to turn native masked scalar add/sub/mul/div into masked instructions. llvm-svn: 290564	2016-12-27 01:56:24 +00:00
Craig Topper	5ef13ba18b	[AVX-512] Fix some patterns to use extended register classes. llvm-svn: 290536	2016-12-26 07:26:07 +00:00
Craig Topper	f56d985f77	[AVX-512] Don't assume that the rounding mode argument to intrinsics is a constant. While clang will guarantee this, nothing in the backend will. A non-constant value will now result in an isel error instead of just asserting or crashing due to a bad cast during lowering. llvm-svn: 290532	2016-12-26 01:40:17 +00:00
Michael Zuckerman	86602e85dd	revert commit 290516 llvm-svn: 290517	2016-12-25 12:45:18 +00:00
Michael Zuckerman	45aa420640	Commit try added new empty line llvm-svn: 290516	2016-12-25 12:01:34 +00:00
Florian Hahn	898127fe36	Revert r290423 because it broke the sanitizer-x86_64-linux-autoconf buildbot. llvm-svn: 290425	2016-12-23 12:26:11 +00:00
Florian Hahn	1d6b1a7b79	[framelowering] Skip dbg values when getting next/previous instruction. Summary: In mergeSPUpdates, debug values need to be ignored when getting the previous element, otherwise debug data could have an impact on codegen. In eliminateCallFramePseudoInstr, debug values after the erased element could have an impact on codegen and should be skipped. Closes PR31319 (https://llvm.org/bugs/show_bug.cgi?id=31319) Reviewers: mkuper, MatzeB, aprantl Subscribers: gbedwell, llvm-commits Differential Revision: https://reviews.llvm.org/D27688 llvm-svn: 290423	2016-12-23 11:35:00 +00:00
Wei Mi	f3f01aba48	Change the interface of TLI.isMultiStoresCheaperThanBitsMerge. This is for splitMergedValStore in DAG Combine to share the target query interface with similar logic in CodeGenPrepare. Differential Revision: https://reviews.llvm.org/D24707 llvm-svn: 290363	2016-12-22 19:38:22 +00:00
Ayman Musa	9ff608cdc6	[X86][AVX2] Passing the appropriate memory operand class to VPMADDWD instruction. Replacing the memory operand in the ymm version of VPMADDWD from i128mem to i256mem. Differential Revision: https://reviews.llvm.org/D28024 llvm-svn: 290333	2016-12-22 08:42:46 +00:00
Simon Pilgrim	081abbb164	[X86][SSE] Improve lowering of vXi64 multiplies As mentioned on PR30845, we were performing our vXi64 multiplication as: AloBlo = pmuludq(a, b); AloBhi = pmuludq(a, psrlqi(b, 32)); AhiBlo = pmuludq(psrlqi(a, 32), b); return AloBlo + psllqi(AloBhi, 32)+ psllqi(AhiBlo, 32); when we could avoid one of the upper shifts with: AloBlo = pmuludq(a, b); AloBhi = pmuludq(a, psrlqi(b, 32)); AhiBlo = pmuludq(psrlqi(a, 32), b); return AloBlo + psllqi(AloBhi + AhiBlo, 32); This matches the lowering on gcc/icc. Differential Revision: https://reviews.llvm.org/D27756 llvm-svn: 290267	2016-12-21 20:00:10 +00:00
Michael Zuckerman	85e12d2851	revert first commit . removing empty line in X86.h llvm-svn: 290255	2016-12-21 12:48:01 +00:00
Michael Zuckerman	58838cf29d	First commit adding new line to X86.h llvm-svn: 290254	2016-12-21 12:44:47 +00:00
Elena Demikhovsky	7c7bf1b432	Added a template for building target specific memory node in DAG. I added API for creation a target specific memory node in DAG. Today, all memory nodes are common for all targets and their constructors are located in SelectionDAG.cpp. There are some cases in X86 where we need to create a special node - truncation-with-saturation store, float-to-half-store. In the current patch I added truncation-with-saturation nodes and I'm using them for intrinsics. In the future I plan to implement DAG lowering for truncation-with-saturation pattern. Differential Revision: https://reviews.llvm.org/D27899 llvm-svn: 290250	2016-12-21 10:43:36 +00:00
Oren Ben Simhon	cb692157b7	[X86] Vectorcall Calling Convention - Adding CodeGen Complete Support Fixing a warning. llvm-svn: 290248	2016-12-21 09:47:31 +00:00
Oren Ben Simhon	de2eea7298	[X86] Vectorcall Calling Convention - Adding CodeGen Complete Support Fixing build issues. llvm-svn: 290244	2016-12-21 08:59:42 +00:00
Oren Ben Simhon	3b95157090	[X86] Vectorcall Calling Convention - Adding CodeGen Complete Support The vectorcall calling convention specifies that arguments to functions are to be passed in registers, when possible. vectorcall uses more registers for arguments than fastcall or the default x64 calling convention use. The vectorcall calling convention is only supported in native code on x86 and x64 processors that include Streaming SIMD Extensions 2 (SSE2) and above. The current implementation does not handle Homogeneous Vector Aggregates (HVAs) correctly and this review attempts to fix it. This aubmit also includes additional lit tests to cover better HVAs corner cases. Differential Revision: https://reviews.llvm.org/D27392 llvm-svn: 290240	2016-12-21 08:31:45 +00:00
Simon Pilgrim	688114d888	[X86][SSE] Ensure we're only combining shuffles with legal mask types. I haven't managed to get this to fail yet but its technically possible for the AND -> shuffle decomposition to result in illegal types. llvm-svn: 290183	2016-12-20 17:09:52 +00:00
Michael LeMay	20565e2aa1	[TargetInstrInfo] replace redundant expression in getMemOpBaseRegImmOfs Summary: The expression for computing the return value of getMemOpBaseRegImmOfs has only one possible value. The other value would result in a return earlier in the function. This patch replaces the expression with its only possible value. Reviewers: sanjoy Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D27437 llvm-svn: 290133	2016-12-19 21:02:41 +00:00
Craig Topper	1fd4196337	[X86] When recognizing vector loads or VZEXT_LOAD in selectScalarSSELoad make sure we pass the load's user rather than load itself to the second operand of IsLegalToFold. llvm-svn: 290089	2016-12-19 08:35:56 +00:00
Craig Topper	375aa90291	[X86] Remove all of the patterns that use X86ISD:FAND/FXOR/FOR/FANDN except for the ones needed for SSE1. Anything SSE2 or above uses the integer ISD opcode. This removes 11721 bytes from the DAG isel table or 2.2% llvm-svn: 290073	2016-12-19 00:42:28 +00:00
Daniel Jasper	373f9a6a0c	Revert r289955 and r289962. This is causing lots of ASAN failures for us. Not sure whether it causes and ASAN false positive or whether it actually leads to incorrect code or whether it even exposes bad code. Hans, I'll get you instructions to reproduce this. llvm-svn: 290066	2016-12-18 14:36:38 +00:00
Michael Zuckerman	4b88a770ef	[X86] [AVX512] Minor fix in encoding of scalar EVEX instructions. NFC. Commit on behalf of Gadi Haber Removed EVEX_V512 prefix from scalar EVEX instructions since HW ignores L'L bits anyway (LIG). 4 instructions are modified. The changed encodings are validated with XED. Rviewers: delena, igorb Differential revision: https://reviews.llvm.org/D27802 llvm-svn: 290065	2016-12-18 14:29:00 +00:00
Simon Pilgrim	e940daf532	[X86][SSE] Add support for combining target shuffles to SHUFPS. As discussed on D27692, the next step will be to allow cross-domain shuffles once the combined shuffle depth passes a certain point. llvm-svn: 290064	2016-12-18 14:26:02 +00:00
Craig Topper	7029db0eaa	[X86][SSE][AVX-512] Convert FAND/FOR/FXOR/FANDN nodes to integer operations if they are available. This will allow a bunch of patterns to be removed. These nodes are only emitted for lowering FABS/FNEG/FNABS/FCOPYSIGN. Ideally we just wouldn't create these nodes if SSE2 or higher is available, but it was simple to just convert them in DAG combine. For SSE2, AVX, and AVX512 with DQI this is no functional change as the execution domain fixing pass ensures the right domain is selected regardless of the ISD opcode. For AVX-512 without DQI we end up using integer instructions since the floating point versions aren't available. But we were already doing that for any logical operations in code that didn't come from FABS/FNEG/FNABS/FCOPYSIGN so this seems no worse. And we get the benefit of being able to fold broadcasts now. llvm-svn: 290060	2016-12-18 07:54:23 +00:00
Craig Topper	add9cc697a	[AVX-512] Use EVEX encoded XOR instruction for zeroing scalar registers when DQI and VLX instructions are available. This can give the register allocator more registers to use. llvm-svn: 290057	2016-12-18 06:23:14 +00:00
Craig Topper	2baef8f466	[AVX-512] Make sure VLX is also enabled before using EVEX encoded logic ops for scalars. I missed this in r290049. llvm-svn: 290055	2016-12-18 04:17:00 +00:00
Craig Topper	d3295c6a3a	[AVX-512] Use EVEX encoded logic operations for scalar types when they are available. This gives the register allocator more registers to work with. llvm-svn: 290049	2016-12-17 19:26:00 +00:00
Hans Wennborg	ef57755427	Fix -Wself-assign from r289955 llvm-svn: 289962	2016-12-16 17:16:46 +00:00
Hans Wennborg	35f21cba13	[X86] Fold (setcc (cmp (atomic_load_add x, -C) C), COND) to (setcc (LADD x, -C), COND) (PR31367) atomic_load_add returns the value before addition, but sets EFLAGS based on the result of the addition. That means it's setting the flags based on effectively subtracting C from the value at x, which is also what the outer cmp does. This targets a pattern that occurs frequently with reference counting pointers: void decrement(long volatile *ptr) { if (_InterlockedDecrement(ptr) == 0) release(); } Clang would previously compile it (for 32-bit at -Os) as: 00000000 <?decrement@@YAXPCJ@Z>: 0: 8b 44 24 04 mov 0x4(%esp),%eax 4: 31 c9 xor %ecx,%ecx 6: 49 dec %ecx 7: f0 0f c1 08 lock xadd %ecx,(%eax) b: 83 f9 01 cmp $0x1,%ecx e: 0f 84 00 00 00 00 je 14 <?decrement@@YAXPCJ@Z+0x14> 14: c3 ret and with this patch it becomes: 00000000 <?decrement@@YAXPCJ@Z>: 0: 8b 44 24 04 mov 0x4(%esp),%eax 4: f0 ff 08 lock decl (%eax) 7: 0f 84 00 00 00 00 je d <?decrement@@YAXPCJ@Z+0xd> d: c3 ret (Equivalent variants with _InterlockedExchangeAdd, std::atomic<>'s fetch_add or pre-decrement operator generate the same code.) Differential Revision: https://reviews.llvm.org/D27781 llvm-svn: 289955	2016-12-16 16:34:59 +00:00
Simon Pilgrim	4b73c3de50	[X86][AVX] Call lowerVectorShuffleWithSHUFPS directly instead of calling DAG.getVectorShuffle (PR27885) We've already done the hardwork of ensuring the mask is safe for 'SHUFPS'. llvm-svn: 289950	2016-12-16 15:23:32 +00:00
Simon Pilgrim	9519bd9232	[X86][AVX512] use a single shufps for 512-bit vectors when it can save instructions This is the 512-bit counterpart to the 128-bit transform checked in here: https://reviews.llvm.org/rL289837 This patch is based on the draft by @sroland (Roland Scheidegger) that is attached to PR27885: https://llvm.org/bugs/show_bug.cgi?id=27885 llvm-svn: 289946	2016-12-16 14:30:04 +00:00
Simon Pilgrim	f159a3414f	[X86][SSE] Combine shuffles to MOVSS/MOVSD whatever the domain. We already do the same thing in shuffle lowering; but don't do it if we have SSE41 (PBLEND) instead. llvm-svn: 289937	2016-12-16 11:48:51 +00:00
Ahmed Bougacha	5228603387	[GlobalISel] Drop workaround for Legalizer member/class sharing a name. NFC. MachineLegalizer used to be the name of both the class and the member, causing GCC errors. r276522 fixed that by renaming the member to just 'Legalizer'. The 'class' workaround isn't necessary anymore; drop it. llvm-svn: 289848	2016-12-15 18:45:30 +00:00
Sanjay Patel	a97358bc8e	[x86] use a single shufps for 256-bit vectors when it can save instructions This is the 256-bit counterpart to the 128-bit transform checked in here: https://reviews.llvm.org/rL289837 This patch is based on the draft by @sroland (Roland Scheidegger) that is attached to PR27885: https://llvm.org/bugs/show_bug.cgi?id=27885 llvm-svn: 289846	2016-12-15 18:43:46 +00:00
Sanjay Patel	a0d8a278a7	[x86] use a single shufps when it can save instructions This is a tiny patch with a big pile of test changes. This partially fixes PR27885: https://llvm.org/bugs/show_bug.cgi?id=27885 My motivating case looks like this: - vpshufd {{.#+}} xmm1 = xmm1[0,1,0,2] - vpshufd {{.#+}} xmm0 = xmm0[0,2,2,3] - vpblendw {{.#+}} xmm0 = xmm0[0,1,2,3],xmm1[4,5,6,7] + vshufps {{.#+}} xmm0 = xmm0[0,2],xmm1[0,2] And this happens several times in the diffs. For chips with domain-crossing penalties, the instruction count and size reduction should usually overcome any potential domain-crossing penalty due to using an FP op in a sequence of int ops. For chips such as recent Intel big cores and Atom, there is no domain-crossing penalty for shufps, so using shufps is a pure win. So the test case diffs all appear to be improvements except one test in vector-shuffle-combining.ll where we miss an opportunity to use a shift to generate zero elements and one test in combine-sra.ll where multiple uses prevent the expected shuffle combining. Differential Revision: https://reviews.llvm.org/D27692 llvm-svn: 289837	2016-12-15 18:03:38 +00:00
Simon Pilgrim	7522f54feb	[X86][SSE] Fix domains for scalar store instructions As discussed on D27692 llvm-svn: 289834	2016-12-15 17:09:24 +00:00
Simon Pilgrim	ba46422694	[X86][AVX512] Moved instruction domain lookups to the right table. NFCI. Avoid duplicating instructions in the int32/int64 domains. llvm-svn: 289830	2016-12-15 16:38:51 +00:00
Simon Pilgrim	d7518896ff	[X86][SSE] Fix domains for VZEXT_LOAD type instructions Add the missing domain equivalences for movss, movsd, movd and movq zero extending loading instructions. Differential Revision: https://reviews.llvm.org/D27684 llvm-svn: 289825	2016-12-15 16:05:29 +00:00
Simon Pilgrim	2f7f0e7a48	[CostModel][X86] Updated reverse shuffle costs llvm-svn: 289819	2016-12-15 14:24:07 +00:00
Michael Zuckerman	1ce2a23a1e	Fix bug 30945- [AVX512] Failure to flip vector comparison to remove not mask instruction adding new optimization opportunity by adding new X86ISelLowering pattern. The test case was shown in https://llvm.org/bugs/show_bug.cgi?id=30945. Test explanation: Select gets three arguments mask, op and op2. In this case, the Mask is a result of ICMP. The ICMP instruction compares (with equal operand) the zero initializer vector and the result of the first ICMP. In general, The result of "cmp eq, op1, zero initializers" is "not(op1)" where op1 is a mask. By rearranging of the two arguments inside the Select instruction, we can get the same result. Without the necessary of the middle phase ("cmp eq, op1, zero initializers"). Missed optimization opportunity: vpcmpled %zmm0, %zmm1, %k0 knotw %k0, %k1 can be combine to vpcmpgtd %zmm0, %zmm2, %k1 Reviewers: 1. delena 2. igorb Commited after check all Differential Revision: https://reviews.llvm.org/D27160 llvm-svn: 289653	2016-12-14 14:57:10 +00:00
Stephan Bergmann	17c7f70362	Replace APFloatBase static fltSemantics data members with getter functions At least the plugin used by the LibreOffice build (<https://wiki.documentfoundation.org/Development/Clang_plugins>) indirectly uses those members (through inline functions in LLVM/Clang include files in turn using them), but they are not exported by utils/extract_symbols.py on Windows, and accessing data across DLL/EXE boundaries on Windows is generally problematic. Differential Revision: https://reviews.llvm.org/D26671 llvm-svn: 289647	2016-12-14 11:57:17 +00:00
Philip Reames	1f1bbac8da	[peephole] Enhance folding logic to work for STATEPOINTs The general idea here is to get enough of the existing restrictions out of the way that the already existing folding logic in foldMemoryOperand can kick in for STATEPOINTs and fold references to immutable stack slots. The key changes are: Support for folding multiple operands at once which reference the same load Support for folding multiple loads into a single instruction Walk all the operands of the instruction for varidic instructions (this is a bug fix!) Once this lands, I'll post another patch which refactors the TII interface here. There's nothing actually x86 specific about the x86 code used here. Differential Revision: https://reviews.llvm.org/D24103 llvm-svn: 289510	2016-12-13 01:38:41 +00:00
Sanjay Patel	62104ee6d9	[x86] fix formatting; NFC llvm-svn: 289476	2016-12-12 22:31:01 +00:00
Simon Pilgrim	4cbe1834e4	Update inline argument comment. NFCI. combineX86ShufflesRecursively 'HasPSHUFB' flag has been the more generic 'HasVariableMask' flag for some time. llvm-svn: 289430	2016-12-12 13:43:15 +00:00
Simon Pilgrim	5ebd2b542b	[X86][SSE] Add support for combining SSE VSHLI/VSRLI uniform constant shifts. Fixes some missed constant folding opportunities and allows us to combine shuffles that end with a logical bit shift. llvm-svn: 289429	2016-12-12 13:33:58 +00:00
Simon Pilgrim	369cd349b9	[X86][SSE] Lower suitably sign-extended mul vXi64 using PMULDQ PMULDQ returns the 64-bit result of the signed multiplication of the lower 32-bits of vXi64 vector inputs, we can lower with this if the sign bits stretch that far. Differential Revision: https://reviews.llvm.org/D27657 llvm-svn: 289426	2016-12-12 10:49:15 +00:00
Craig Topper	36ecce9bed	[X86] Teach selectScalarSSELoad to accept full 128-bit vector loads and the X86ISD::VZEXT_LOAD opcode. Disable peephole on some of the tests that no longer require it to properly fold scalar intrinsics. llvm-svn: 289424	2016-12-12 07:57:24 +00:00
Craig Topper	f2c6f7abf3	[X86] Change CMPSS/CMPSD intrinsic instructions to use sse_load_f32/f64 as its memory pattern instead of full vector load. These intrinsics only load a single element. We should use sse_loadf32/f64 to give more options of what loads it can match. Currently these instructions are often only getting their load folded thanks to the load folding in the peephole pass. I plan to add more types of loads to sse_load_f32/64 so we can match without the peephole. llvm-svn: 289423	2016-12-12 07:57:21 +00:00
Craig Topper	081c0e2864	[X86] Remove some intrinsic instructions from hasPartialRegUpdate Summary: These intrinsic instructions are all selected from intrinsics that have well defined behavior for where the upper bits come from. It's not the same place as the lower bits. As you can see we were suppressing load folding for these instructions in some cases. In none of the cases was the separate load helping avoid a partial dependency on the destination register. So we should just go ahead and allow the load to be folded. Only foldMemoryOperand was suppressing folding for these. They all have patterns for folding sse_load_f32/f64 that aren't gated with OptForSize, but sse_load_f32/f64 doesn't allow 128-bit vector loads. It only allows scalar_to_vector and vzmovl of scalar loads to match. There's no reason we can't allow a 128-bit vector load to be narrowed so I would like to fix sse_load_f32/f64 to allow that. And if I do that it changes some of these same test cases to fold the load too. Reviewers: spatel, zvi, RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D27611 llvm-svn: 289419	2016-12-12 05:07:17 +00:00
Simon Pilgrim	831435cb14	[X86][SSE] Add support for combining target shuffles to SHUFPD. llvm-svn: 289407	2016-12-11 21:26:25 +00:00
Ayman Musa	7ec4ed55d3	[X86][AVX512] Add missing patterns for broadcast fallback in case load node has multiple uses (for v4i64 and v4f64). When the load node which the broadcast instruction broadcasts has multiple uses, it cannot be folded. A fallback pattern is added to catch these cases and provide another solution. Differential Revision: https://reviews.llvm.org/D27661 llvm-svn: 289404	2016-12-11 20:11:17 +00:00
Oren Ben Simhon	9683ecbff6	[X86] Regcall - Adding support for mask types Regcall calling convention passes mask types arguments in x86 GPR registers. The review includes the changes required in order to support v32i1, v16i1 and v8i1. Differential Revision: https://reviews.llvm.org/D27148 llvm-svn: 289383	2016-12-11 14:10:52 +00:00
Craig Topper	e7166ce237	[X86] Fix a comment to say 'an FMA' instead of 'a FMA'. NFC llvm-svn: 289352	2016-12-11 01:28:08 +00:00
Craig Topper	1f1b441267	[X86] Remove masking from 512-bit VPERMIL intrinsics in preparation for being able to constant fold them in InstCombineCalls like we do for 128/256-bit. llvm-svn: 289350	2016-12-11 01:26:44 +00:00
Craig Topper	edab02b50b	[X86] Remove masking from 512-bit PSHUFB intrinsics in preparation for being able to constant fold it in InstCombineCalls like we do for 128/256-bit. llvm-svn: 289344	2016-12-10 23:09:43 +00:00
Simon Pilgrim	a03e350e69	[X86][SSE] Ensure UNPCK inputs are a consistent value type in LowerHorizontalByteSum llvm-svn: 289341	2016-12-10 21:16:45 +00:00
Craig Topper	abe7c5b5e9	[AVX-512] Remove 128/256 masked vpermil instrinsics and autoupgrade to a select around the unmasked avx1 intrinsics. llvm-svn: 289340	2016-12-10 21:15:52 +00:00
Simon Pilgrim	fb58550d73	[X86][SSE] Move ZeroVector creation into the shuffle pattern case where its actually used. Also fix the ZeroVector's type - I've no idea how this hasn't caused problems........ llvm-svn: 289336	2016-12-10 19:49:55 +00:00
Craig Topper	18b57da491	[AVX-512] Add support for lowering (v2i64 (fp_to_sint (v2f32))) to vcvttps2uqq when AVX512DQ and AVX512VL are available. llvm-svn: 289335	2016-12-10 19:35:39 +00:00
Craig Topper	8e288e0b68	[X86] Clarify indentation. NFC llvm-svn: 289334	2016-12-10 19:35:36 +00:00
Craig Topper	85f0e57c33	[X86] Combine LowerFP_TO_SINT and LowerFP_TO_UINT. They only differ by a single boolean flag passed to a helper function. Just check the opcode and create the flag. llvm-svn: 289333	2016-12-10 19:35:33 +00:00
Craig Topper	a39b650d72	[X86] Use X86ISD::CVTTP2SI and X86ISD::CVTTP2UI for lowering 128-bit cvttps2qq and cvttps2uqq intrinsics since there is a mismatch between number of input and output elements. Ideally ISD::FP_TO_SINT and ISD::FP_TO_UINT would only be used for cases with the same number of input and output elements. Similar things have already been done for other convert intrinsics. llvm-svn: 289316	2016-12-10 06:02:48 +00:00
Simon Pilgrim	017b7a71d8	[SelectionDAG] Add knownbits support for EXTRACT_VECTOR_ELT opcodes (REAPPLIED) Reapplied with fix for PR31323 - X86 SSE2 vXi16 multiplies for illegal types were creating CONCAT_VECTORS nodes with vector inputs that might not total the number of elements in the result type. llvm-svn: 289232	2016-12-09 17:53:11 +00:00
Craig Topper	38b1b5d44f	[X86] Modify patterns from memory form of RCP/RSQRT/SQRT intrinsics to only allow (scalar_to_vector (loadf32/load64)) instead of anything that sse_load_f32/f64 can match. sse_load_f32/f64 can also match loads that are zero extended to vectors. We shouldn't match that because we wouldn't be able to get the instruction to zero the upper bits like the intrinsic semantics would require for such a case. There is a test case that does depend on this behavior. llvm-svn: 289193	2016-12-09 07:57:21 +00:00
Craig Topper	a55b483bb5	[AVX-512] Correctly preserve the passthru semantics of the FMA scalar intrinsics Summary: Scalar intrinsics have specific semantics about the which input's upper bits are passed through to the output. The same input is also supposed to be the input we use for the lower element when the mask bit is 0 in a masked operation. We aren't currently keeping these semantics with instruction selection. This patch corrects this by introducing new scalar FMA ISD nodes that indicate whether operand 1(one of the multiply inputs) or operand 3(the additon/subtraction input) should pass thru its upper bits. We use this information to select 213/132 form for the operand 1 version and the 231 form for the operand 3 version. We also use this information to suppress combining FNEG operations on the passthru input since semantically the passthru bits aren't negated. This is stronger than the earlier check added for a user being SELECTS so we can remove that. This fixes PR30913. Reviewers: delena, zvi, v_klochkov Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D27144 llvm-svn: 289190	2016-12-09 06:42:28 +00:00
Craig Topper	c4f2b0996d	[X86] Add masked versions of VPERMT2* and VPERMI2* to load folding tables. llvm-svn: 289186	2016-12-09 05:20:11 +00:00
Craig Topper	2aeb456425	[AVX-512] Add vpermilps/pd to load folding tables. llvm-svn: 289173	2016-12-09 02:18:11 +00:00
Peter Collingbourne	235c275b20	IR, X86: Understand !absolute_symbol metadata on global variables. Summary: Attaching !absolute_symbol to a global variable does two things: 1) Marks it as an absolute symbol reference. 2) Specifies the value range of that symbol's address. Teach the X86 backend to allow absolute symbols to appear in place of immediates by extending the relocImm and mov64imm32 matchers. Start using relocImm in more places where it is legal. As previously proposed on llvm-dev: http://lists.llvm.org/pipermail/llvm-dev/2016-October/105800.html Differential Revision: https://reviews.llvm.org/D25878 llvm-svn: 289087	2016-12-08 19:01:00 +00:00

1 2 3 4 5 ...

14408 Commits