llvm-project

Commit Graph

Author	SHA1	Message	Date
Craig Topper	0857a1da0c	[X86] Add xrstors/xsavec/xsaves/clflushopt/clwb/pcommit instructions llvm-svn: 228283	2015-02-05 08:51:06 +00:00
Chandler Carruth	4d31f58c88	[x86] Give movss and movsd execution domains in the x86 backend. This associates movss and movsd with the packed single and packed double execution domains (resp.). While this is largely cosmetic, as we now don't have weird ping-pong-ing between single and double precision, it is also useful because it avoids the domain fixing algorithm from seeing domain breaks that don't actually exist. It will also be much more important if we have an execution domain default other than packed single, as that would cause us to mix movss and movsd with integer vector code on a regular basis, a very bad mixture. llvm-svn: 228135	2015-02-04 10:58:53 +00:00
Chandler Carruth	f4a1c33c7c	[x86] Add missing patterns for andps, orps, xorps, and andnps. Specifically, the existing patterns were scalar-only. These cover the packed vector bitwise operations when specifically requested with pseudo instructions. This is particularly important in SSE1 where we can't actually emit a logical operation on a v2i64 as that isn't a legal type. This will be tested in subsequent patches which form the floating point and patterns in more places. llvm-svn: 228123	2015-02-04 09:06:01 +00:00
Sanjay Patel	b7d5628784	Merge consecutive 16-byte loads into one 32-byte load (PR22329) This patch detects consecutive vector loads using the existing EltsFromConsecutiveLoads() logic. This fixes: http://llvm.org/bugs/show_bug.cgi?id=22329 This patch effectively reverts the tablegen additions of D6492 / http://reviews.llvm.org/rL224344 ...which in hindsight were a horrible hack. The test cases that were added with that patch are simply modified to load from varying offsets of a base pointer. These loads did not match the existing tablegen patterns. A happy side effect of doing this optimization earlier is that we can now fold the load into a math op where possible; this is shown in some of the updated checks in the test file. Differential Revision: http://reviews.llvm.org/D7303 llvm-svn: 228006	2015-02-03 18:54:00 +00:00
Simon Pilgrim	0629ba1ad9	[X86][SSE] Float comparisons can sometimes be safely commuted For ordered, unordered, equal and not-equal tests, packed float and double comparison instructions can be safely commuted without affecting the results. This patch checks the comparison mode of the (v)cmpps + (v)cmppd instructions and commutes the result if it can. Differential Revision: http://reviews.llvm.org/D7178 llvm-svn: 227145	2015-01-26 22:29:24 +00:00
Simon Pilgrim	9b7c00352d	[X86][PCLMUL] Enable commutation for PCLMUL instructions Patch to allow (v)pclmulqdq to be commuted - swaps the src registers and inverts the immediate (low/high) src mask. Differential Revision: http://reviews.llvm.org/D7180 llvm-svn: 227141	2015-01-26 22:00:18 +00:00
Sanjay Patel	805bc02c2b	Model sqrtsd as a binary operation with one source operand tied to the destination (PR14221) This patch fixes the following miscompile: define void @sqrtsd(<2 x double> %a) nounwind uwtable ssp { %0 = tail call <2 x double> @llvm.x86.sse2.sqrt.sd(<2 x double> %a) nounwind %a0 = extractelement <2 x double> %0, i32 0 %conv = fptrunc double %a0 to float %a1 = extractelement <2 x double> %0, i32 1 %conv3 = fptrunc double %a1 to float tail call void @callee2(float %conv, float %conv3) nounwind ret void } Current codegen: sqrtsd %xmm0, %xmm1 ## high element of %xmm1 is undef here xorps %xmm0, %xmm0 cvtsd2ss %xmm1, %xmm0 shufpd $1, %xmm1, %xmm1 cvtsd2ss %xmm1, %xmm1 ## operating on undef value jmp _callee This is a continuation of http://llvm.org/viewvc/llvm-project?view=revision&revision=224624 ( http://reviews.llvm.org/D6330 ) which was itself a continuation of r167064 ( http://llvm.org/viewvc/llvm-project?view=revision&revision=167064 ). All of these patches are partial fixes for PR14221 ( http://llvm.org/bugs/show_bug.cgi?id=14221 ); this should be the final patch needed to resolve that bug. Differential Revision: http://reviews.llvm.org/D6885 llvm-svn: 227111	2015-01-26 18:42:16 +00:00
Craig Topper	ca8e179bc2	[X86] Give scalar VRNDSCALE instructions priority in AVX512 mode. llvm-svn: 227039	2015-01-25 08:49:22 +00:00
Craig Topper	e3155c96ee	Remove tab characters. NFC llvm-svn: 227036	2015-01-25 08:45:32 +00:00
Craig Topper	53a846764c	[X86] Replace i32i8imm on SSE/AVX instructions with i32u8imm which will make the assembler bounds check them. It will also make them print as unsigned. llvm-svn: 227032	2015-01-25 02:21:16 +00:00
Craig Topper	fc946a0e6f	[X86] Use u8imm in several places that used i32i8imm that don't require an i32 type. llvm-svn: 227031	2015-01-25 02:21:13 +00:00
Craig Topper	e7f6cf437c	Remove tab characters. NFC. llvm-svn: 227030	2015-01-25 02:21:11 +00:00
Simon Pilgrim	b16b09b154	[X86][SSE] Added support for SSE3 lane duplication shuffle instructions This patch adds shuffle matching for the SSE3 MOVDDUP, MOVSLDUP and MOVSHDUP instructions. The big use of these being that they avoid many single source shuffles from needing to use (pre-AVX) dual source instructions such as SHUFPD/SHUFPS: causing extra moves and preventing load folds. Adding these instructions uncovered an issue in XFormVExtractWithShuffleIntoLoad which crashed on single operand shuffle instructions (now fixed). It also involved fixing getTargetShuffleMask to correctly identify theses instructions as unary shuffles. Also adds a missing tablegen pattern for MOVDDUP. Differential Revision: http://reviews.llvm.org/D7042 llvm-svn: 226716	2015-01-21 22:44:35 +00:00
Ahmed Bougacha	8f09e9f7c5	[X86] Declare SSE4.1/AVX2 vector extloads covered by PMOV[SZ]X legal. Now that we can fully specify extload legality, we can declare them legal for the PMOVSX/PMOVZX instructions. This for instance enables a DAGCombine to fire on code such as (and (<zextload-equivalent> ...), <redundant mask>) to turn it into: (zextload ...) as seen in the testcase changes. There is one regression, in widen_load-2.ll: we're no longer able to do store-to-load forwarding with illegal extload memory types. This will be addressed separately. Differential Revision: http://reviews.llvm.org/D6533 llvm-svn: 226676	2015-01-21 17:07:06 +00:00
Craig Topper	620b50cc23	[X86] Convert all the i8imm used by SSE and AVX instructions to u8imm. This makes the assembler check their size and removes a hack from the disassembler to avoid sign extending the immediate. llvm-svn: 226645	2015-01-21 08:15:54 +00:00
Craig Topper	f38dea1cfa	[x86] Add assembly parser bounds checking to the immediate value for cmpss/cmpsd/cmpps/cmppd. llvm-svn: 226642	2015-01-21 06:07:53 +00:00
Craig Topper	9f4d485610	[x86] Add some mayLoad/hasSideEffects flags. Remove one that was already covered by a pattern. llvm-svn: 226562	2015-01-20 12:15:30 +00:00
Simon Pilgrim	b5869f6c7c	[X86][SSE] Minor fix to VPBLENDW AVX2 commutation. D6015 / rL221313 enabled commutation for SSE immediate blend instructions, but due to a typo the AVX2 VPBLENDW ymm instructions weren't flagged as commutative along with the others in the tables, but were still being commuted in code and tested for. llvm-svn: 225612	2015-01-11 22:08:01 +00:00
Craig Topper	acc73445b7	[x86] Prevent llvm.x86.cmp.ps/pd/ss/sd from being selected with bad immediates. The frontend now checks this when the builtin is used. This will allow the instruction printer to not have to deal with invalid immediates on these instructions. llvm-svn: 224885	2014-12-27 18:10:56 +00:00
Elena Demikhovsky	fb73ca516b	Masked load and store codegen - fixed 128-bit vectors The codegen failed on 128-bit types on AVX2. I added patterns and in td files and tests. llvm-svn: 224647	2014-12-19 23:27:57 +00:00
Sanjay Patel	1da5f1645b	Model sqrtss as a binary operation with one source operand tied to the destination (PR14221) This is a continuation of r167064 ( http://llvm.org/viewvc/llvm-project?view=revision&revision=167064 ). That patch started to fix PR14221 ( http://llvm.org/bugs/show_bug.cgi?id=14221 ), but it was not completed. Differential Revision: http://reviews.llvm.org/D6330 llvm-svn: 224624	2014-12-19 22:16:28 +00:00
Robert Khasanov	79fb7292d7	[AVX512] Enable FP arithmetic lowering for AVX512VL subsets. Added RegOp2MemOpTable4 to transform 4th operand from register to memory in merge-masked versions of instructions. Added lowering tests. llvm-svn: 224516	2014-12-18 12:28:22 +00:00
Craig Topper	9480732be2	[X86] Don't use PS prefix on LDMXCSR/STMXCSR. Near as I can tell prefixes are ignored on these instructions except for a comment in the Intel docs about 0xf3. Binutils disassembler seems to ignore prefixes on these instructions. Our disassembler still doesn't distinguish PS and "no prefix" well enough for this to make a functional change, but it helps with experiments I'm doing on a potential new disassembler table builder. llvm-svn: 224496	2014-12-18 05:02:10 +00:00
Robert Khasanov	d04cd2fbfe	[AVX512] Enable integer arithmetic lowering for AVX512BW/VL subsets. Added lowering tests. llvm-svn: 224349	2014-12-16 18:24:07 +00:00
Sanjay Patel	e46d54f0bf	combine consecutive subvector 16-byte loads into one 32-byte load This is a fix for PR21709 ( http://llvm.org/bugs/show_bug.cgi?id=21709 ). When we have 2 consecutive 16-byte loads that are merged into one 32-byte vector, we can use a single 32-byte load instead. But we don't do this for SandyBridge / IvyBridge because they have slower 32-byte memops. We also don't bother using 32-byte integer loads on a machine that only has AVX1 (btver2) because those operands would have to be split in half anyway since there is no support for 32-byte integer math ops. Differential Revision: http://reviews.llvm.org/D6492 llvm-svn: 224344	2014-12-16 16:30:01 +00:00
Robert Khasanov	37c3ad6c20	[AVX512] Enabling bit logic lowering Added lowering tests. llvm-svn: 224132	2014-12-12 17:02:18 +00:00
Robert Khasanov	e82a3630b7	[AVX512] Enabling MIN/MAX lowering. Added lowering tests. llvm-svn: 224127	2014-12-12 15:10:43 +00:00
Ahmed Bougacha	79c797443b	[X86] Add a temporary testcase for PR21876/r223996. llvm-svn: 224074	2014-12-11 23:07:52 +00:00
Ahmed Bougacha	611a3ef0bc	[X86] Add back AVX2 VR256 PMOVX patterns. We can't reach those from zext, but other parts of the backend (the shuffle lowering) generate 256-bit VZEXT nodes. Fixes PR21876. llvm-svn: 223996	2014-12-11 04:32:17 +00:00
Sanjay Patel	e20437f9af	Match new shuffle codegen for MOVHPD patterns Add patterns to match SSE (shufpd) and AVX (vpermilpd) shuffle codegen when storing the high element of a v2f64. The existing patterns were only checking for an unpckh type of shuffle. http://llvm.org/bugs/show_bug.cgi?id=21791 Differential Revision: http://reviews.llvm.org/D6586 llvm-svn: 223929	2014-12-10 16:58:54 +00:00
Ahmed Bougacha	8b54286d1c	[X86] Refactor PMOV[SZ]Xrm to add missing AVX2 patterns. Most patterns will go away once the extload legalization changes land. Differential Revision: http://reviews.llvm.org/D6125 llvm-svn: 223567	2014-12-06 01:31:07 +00:00
Elena Demikhovsky	f1de34b84d	Masked Load / Store Intrinsics - the CodeGen part. I'm recommiting the codegen part of the patch. The vectorizer part will be send to review again. Masked Vector Load and Store Intrinsics. Introduced new target-independent intrinsics in order to support masked vector loads and stores. The loop vectorizer optimizes loops containing conditional memory accesses by generating these intrinsics for existing targets AVX2 and AVX-512. The vectorizer asks the target about availability of masked vector loads and stores. Added SDNodes for masked operations and lowering patterns for X86 code generator. Examples: <16 x i32> @llvm.masked.load.v16i32(i8* %addr, <16 x i32> %passthru, i32 4 /* align /, <16 x i1> %mask) declare void @llvm.masked.store.v8f64(i8 %addr, <8 x double> %value, i32 4, <8 x i1> %mask) Scalarizer for other targets (not AVX2/AVX-512) will be done in a separate patch. http://reviews.llvm.org/D6191 llvm-svn: 223348	2014-12-04 09:40:44 +00:00
Michael Liao	5bf9578ce4	[X86] Clean up whitespace as well as minor coding style llvm-svn: 223339	2014-12-04 05:20:33 +00:00
Simon Pilgrim	6b988ad8f2	[X86][SSE] Keep 4i32 vector insertions in integer domain on SSE4.1 targets 4i32 shuffles for single insertions into zero vectors lowers to X86vzmovl which was using (v)blendps - causing domain switch stalls. This patch fixes this by using (v)pblendw instead. The updated tests on test/CodeGen/X86/sse41.ll still contain a domain stall due to the use of insertps - I'm looking at fixing this in a future patch. Differential Revision: http://reviews.llvm.org/D6458 llvm-svn: 223165	2014-12-02 22:31:23 +00:00
Duncan P. N. Exon Smith	9bc81fbe92	Revert "Masked Vector Load and Store Intrinsics." This reverts commit r222632 (and follow-up r222636), which caused a host of LNT failures on an internal bot. I'll respond to the commit on the list with a reproduction of one of the failures. Conflicts: lib/Target/X86/X86TargetTransformInfo.cpp llvm-svn: 222936	2014-11-28 21:29:14 +00:00
Craig Topper	c50d64b07b	Replace neverHasSideEffects=1 with hasSideEffects=0 in all .td files. llvm-svn: 222801	2014-11-26 00:46:26 +00:00
Elena Demikhovsky	9e5089a938	Masked Vector Load and Store Intrinsics. Introduced new target-independent intrinsics in order to support masked vector loads and stores. The loop vectorizer optimizes loops containing conditional memory accesses by generating these intrinsics for existing targets AVX2 and AVX-512. The vectorizer asks the target about availability of masked vector loads and stores. Added SDNodes for masked operations and lowering patterns for X86 code generator. Examples: <16 x i32> @llvm.masked.load.v16i32(i8* %addr, <16 x i32> %passthru, i32 4 /* align /, <16 x i1> %mask) declare void @llvm.masked.store.v8f64(i8 %addr, <8 x double> %value, i32 4, <8 x i1> %mask) Scalarizer for other targets (not AVX2/AVX-512) will be done in a separate patch. http://reviews.llvm.org/D6191 llvm-svn: 222632	2014-11-23 08:07:43 +00:00
Simon Pilgrim	c9a0779309	[X86][SSE] Enable commutation for SSE immediate blend instructions Patch to allow (v)blendps, (v)blendpd, (v)pblendw and vpblendd instructions to be commuted - swaps the src registers and inverts the blend mask. This is primarily to improve memory folding (see new tests), but it also improves the quality of shuffles (see modified tests). Differential Revision: http://reviews.llvm.org/D6015 llvm-svn: 221313	2014-11-04 23:25:08 +00:00
Andrea Di Biagio	c48cb86f05	[X86] Fix missed selection of non-temporal store of zero vector. When the input to a store instruction was a zero vector, the backend always selected a normal vector store regardless of the non-temporal hint. This is fixed by this patch. This fixes PR19370. llvm-svn: 220054	2014-10-17 17:27:06 +00:00
Benjamin Kramer	4ba642a2f7	X86: Drop the isConvertibleTo3Addr bit from shufps/shufpd now that we don't convert them anymore. llvm-svn: 219112	2014-10-06 09:56:40 +00:00
Chandler Carruth	f3e880697a	[x86] Add a really preposterous number of patterns for matching all of the various ways in which blends can be used to do vector element insertion for lowering with the scalar math instruction forms that effectively re-blend with the high elements after performing the operation. This then allows me to bail on the element insertion lowering path when we have SSE4.1 and are going to be doing a normal blend, which in turn restores the last of the blends lost from the new vector shuffle lowering when I got it to prioritize insertion in other cases (for example when we don't have a blend instruction). Without the patterns, using blends here would have regressed sse-scalar-fp-arith.ll completely with the new vector shuffle lowering. For completeness, I've added RUN-lines with the new lowering here. This is somewhat superfluous as I'm about to flip the default, but hey, it shows that this actually significantly changed behavior. The patterns I've added are just ridiculously repetative. Suggestions on making them better very much welcome. In particular, handling the commuted form of the v2f64 patterns is somewhat obnoxious. llvm-svn: 219033	2014-10-03 22:43:17 +00:00
Chandler Carruth	0adda1e4d4	[x86] Adjust the patterns for lowering X86vzmovl nodes which don't perform a load to use blendps rather than movss when it is available. For non-loads, blendps is much faster. It can execute on two ports in Sandy Bridge and Ivy Bridge, and three ports on Haswell. This fixes one of the "regressions" from aggressively taking the "insertion" path in the new vector shuffle lowering. This does highlight one problem with blendps -- it isn't commuted as heavily as it should be. That's future work though. llvm-svn: 219022	2014-10-03 21:38:49 +00:00
Chandler Carruth	1964078936	[x86] Teach the new vector shuffle lowering to aggressively form MOVSS and MOVSD nodes for single element vector inserts. This is particularly important because a number of patterns in the backend detect these patterns and leverage them to simplify things. It also fixes quite a few of the insertion bad code examples. However, it regresses a specific area: when available, blendps and blendpd are dramatically faster than movss and movsd respectively. But it doesn't really work to form the blend logic first because the blends aren't as crazy efficient when the data is coming from memory anyways, and thus will have a movss or movsd regardless. Also, doing that would block a bunch of the patterns that this is designed to hit. So my plan is to go into the patterns for lowering MOVSS and MOVSD and lower them via blends when available. However that's a pretty invasive restructuring so it will need to be a follow-up patch. I have already gone into the patterns to lower MOVSS and MOVSD from memory using MOVLPD, etc. Without that, several of the test cases I already have regress. llvm-svn: 218985	2014-10-03 13:11:13 +00:00
Chandler Carruth	75e182b414	[x86] Teach the new vector shuffle lowering to widen floating point elements as well as integer elements in order to form simpler shuffle patterns. This is the primary reason why we were failing to match some of the 2-and-2 floating point shuffles such as PR21140. Even after fixing this we need to support some extra patterns in the backend in order to match the resulting X86ISD::UNPCKL nodes into the correct instructions. This commit should fix PR21140 and includes more comprehensive testing of insertion patterns in v4 shuffles. Not all of the added tests are beautiful. For example, we don't have clever instructions to insert-via-load in the integer domain. There are also some places where we aren't sufficiently cunning with our use of movq and movd, but that's future work. llvm-svn: 218911	2014-10-02 21:37:14 +00:00
Chandler Carruth	b9d3fa1e65	[x86] Teach the new vector shuffle lowering about VBROADCAST and VPBROADCAST. This has the somewhat expected pervasive impact. I don't know why I forgot about this. Everything seems good with lots of significant improvements in the tests. llvm-svn: 218724	2014-10-01 00:41:21 +00:00
Andrea Di Biagio	196e873cdc	[X86][SchedModel] SSE reciprocal square root instruction latencies. The SSE rsqrt instruction (a fast reciprocal square root estimate) was grouped in the same scheduling IIC_SSE_SQRT* class as the accurate (but very slow) SSE sqrt instruction. For code which uses rsqrt (possibly with newton-raphson iterations) this poor scheduling was affecting performances. This patch splits off the rsqrt instruction from the sqrt instruction scheduling classes and creates new IIC_SSE_RSQER* classes with latency values based on Agner's table. Differential Revision: http://reviews.llvm.org/D5370 Patch by Simon Pilgrim. llvm-svn: 218517	2014-09-26 12:56:44 +00:00
Robert Khasanov	6d62c0202b	[AVX512] Added load/store from BW/VL subsets to Register2Memory opcode tables. Added lowering tests for these instructions. llvm-svn: 218508	2014-09-26 09:48:50 +00:00
Akira Hatanaka	8cc48bd159	[X86,AVX] Add an isel pattern for X86VBroadcast. This fixes PR21050 and rdar://problem/18434607. llvm-svn: 218431	2014-09-25 00:26:15 +00:00
Chandler Carruth	6d5916a2d7	[x86] Teach the AVX1 path of the new vector shuffle lowering one more trick that I missed. VPERMILPS has a non-immediate memory operand mode that allows it to do asymetric shuffles in the two 128-bit lanes. Use this rather than two shuffles and a blend. However, it turns out the variable shuffle path to VPERMILPS (and VPERMILPD, although that one offers no functional differenc from the immediate operand other than variability) wasn't even plumbed through codegen. Do such plumbing so that we can reasonably emit a variable-masked VPERMILP instruction. Also plumb basic comment parsing and printing through so that the tests are reasonable. There are still a few tests which don't show the shuffle pattern. These are tests with undef lanes. I'll teach the shuffle decoding and printing to handle undef mask entries in a follow-up. I've looked at the masks and they seem reasonable. llvm-svn: 218300	2014-09-23 10:08:29 +00:00
Chandler Carruth	ed5dfff865	[x86] Rename X86ISD::VPERMILP to X86ISD::VPERMILPI (and the same for the td pattern). Currently we only model the immediate operand variation of VPERMILPS and VPERMILPD, we should make that clear in the pseudos used. Will be adding support for the variable mask variant in my next commit. llvm-svn: 218282	2014-09-22 22:29:42 +00:00
Sanjay Patel	7939d7229d	Use broadcasts to optimize overall size when loading constant splat vectors (x86-64 with AVX or AVX2). We generate broadcast instructions on CPUs with AVX2 to load some constant splat vectors. This patch should preserve all existing behavior with regular optimization levels, but also use splats whenever possible when optimizing for size on any CPU with AVX or AVX2. The tradeoff is up to 5 extra instruction bytes for the broadcast instruction to save at least 8 bytes (up to 31 bytes) of constant pool data. Differential Revision: http://reviews.llvm.org/D5347 llvm-svn: 218263	2014-09-22 18:54:01 +00:00
Chandler Carruth	74acb46d26	[x86] Remove the last vestiges of the BLENDI-based ADDSUB pattern matching. This design just fundamentally didn't work because ADDSUB is available prior to any legal lowerings of BLENDI nodes. Instead, we have a dedicated ADDSUB synthetic ISD node which is pattern matched trivially into the instructions. These nodes are then recognized by both the existing and a trivial new lowering combine in the backend. Removing these patterns required adding 2 missing shuffle masks to the DAG combine, without which tests would have failed. Added the masks and a helpful assert as well to catch if anything ever goes wrong here. llvm-svn: 217851	2014-09-16 00:39:08 +00:00
Chandler Carruth	204ad4c613	[x86] Start fixing our emission of ADDSUBPS and ADDSUBPD instructions by introducing a synthetic X86 ISD node representing this generic operation. The relevant patterns for mapping these nodes into the concrete instructions are also added, and a gnarly bit of C++ code in the target-specific DAG combiner is replaced with simple code emitting this primitive. The next step is to generically combine blends of adds and subs into this node so that we can drop the reliance on an SSE4.1 ISD node (BLENDI) when matching an SSE3 feature (ADDSUB). llvm-svn: 217819	2014-09-15 20:09:47 +00:00
Chandler Carruth	373b2b1728	[x86] Fix a pretty horrible bug and inconsistency in the x86 asm parsing (and latent bug in the instruction definitions). This is effectively a revert of r136287 which tried to address a specific and narrow case of immediate operands failing to be accepted by x86 instructions with a pretty heavy hammer: it introduced a new kind of operand that behaved differently. All of that is removed with this commit, but the test cases are both preserved and enhanced. The core problem that r136287 and this commit are trying to handle is that gas accepts both of the following instructions: insertps $192, %xmm0, %xmm1 insertps $-64, %xmm0, %xmm1 These will encode to the same byte sequence, with the immediate occupying an 8-bit entry. The first form was fixed by r136287 but that broke the prior handling of the second form! =[ Ironically, we would still emit the second form in some cases and then be unable to re-assemble the output. The reason why the first instruction failed to be handled is because prior to r136287 the operands ere marked 'i32i8imm' which forces them to be sign-extenable. Clearly, that won't work for 192 in a single byte. However, making thim zero-extended or "unsigned" doesn't really address the core issue either because it breaks negative immediates. The correct fix is to make these operands 'i8imm' reflecting that they can be either signed or unsigned but must be 8-bit immediates. This patch backs out r136287 and then changes those places as well as some others to use 'i8imm' rather than one of the extended variants. Naturally, this broke something else. The custom DAG nodes had to be updated to have a much more accurate type constraint of an i8 node, and a bunch of Pat immediates needed to be specified as i8 values. The fallout didn't end there though. We also then ceased to be able to match the instruction-specific intrinsics to the instructions so modified. Digging, this is because they too used i32 rather than i8 in their signature. So I've also switched those intrinsics to i8 arguments in line with the instructions. In order to make the intrinsic adjustments of course, I also had to add auto upgrading for the intrinsics. I suspect that the intrinsic argument types may have led everything down this rabbit hole. Pretty happy with the result. llvm-svn: 217310	2014-09-06 10:00:01 +00:00
Robert Khasanov	ed8829703f	[SKX] Extended non-temporal load/store instructions for AVX512VL subsets. Added avx512_movnt_vl multiclass for handling 256/128-bit forms of instruction. Added encoding and lowering tests. Reviewed by Elena Demikhovsky <elena.demikhovsky@intel.com> llvm-svn: 215536	2014-08-13 10:46:00 +00:00
Quentin Colombet	0233d49574	[X86][SchedModel] Fixed missing/wrong scheduling model found by code inspection. Source: Agner Fog's Instruction tables. Related to <rdar://problem/15607571> llvm-svn: 215045	2014-08-07 00:20:44 +00:00
Quentin Colombet	33ea1681ce	[X86][SchedModel] Fixed some wrong scheduling model found by code inspection. Source: Agner Fog's Instruction tables. Related to <rdar://problem/15607571> llvm-svn: 214940	2014-08-06 00:22:39 +00:00
Tim Northover	fd7e424935	CodeGen: extend f16 conversions to permit types > float. This makes the two intrinsics @llvm.convert.from.f16 and @llvm.convert.to.f16 accept types other than simple "float". This is only strictly needed for the truncate operation, since otherwise double rounding occurs and there's no way to represent the strict IEEE conversion. However, for symmetry we allow larger types in the extend too. During legalization, we can expand an "fp16_to_double" operation into two extends for convenience, but abort when the truncate isn't legal. A new libcall is probably needed here. Even after this commit, various target tweaks are needed to actually use the extended intrinsics. I've put these into separate commits for clarity, so there are no actual tests of f64 conversion here. llvm-svn: 213248	2014-07-17 10:51:23 +00:00
Andrea Di Biagio	a37a2fc81f	[X86] Add ISel patterns to select 'f32_to_f16' and 'f16_to_f32' dag nodes. This patch adds tablegen patterns to select F16C float-to-half-float conversion instructions from 'f32_to_f16' and 'f16_to_f32' dag nodes. If the target doesn't have F16C, then 'f32_to_f16' and 'f16_to_f32' are expanded into library calls. llvm-svn: 212293	2014-07-03 21:51:06 +00:00
Andrea Di Biagio	07cdffc324	[X86] Always prefer to lower a VECTOR_SHUFFLE into a BLENDI instead of SHUFP (or VPERM2X128). This patch teaches method 'LowerVECTOR_SHUFFLE' to give higher precedence to the check for 'isBlendMask'; the idea is that, when possible, we should firstly check if a shuffle performs a blend, and in case, try to lower it into a BLENDI instead of selecting a SHUFP or (worse) a VPERM2X128. In general: - AVX VBLENDPS/D always have better latency and throughput than VPERM2F128; - BLENDPS/D instructions tend to always have better 'reciprocal throughput' than the equivalent SHUFPS/D; - Both BLENDPS/D and SHUFPS/D are often decoded into the same number of m-ops; however, a m-op obtained from a BLENDPS/D can be scheduled to more than one execution port. This patch: - Moves the check for 'isBlendMask' immediately before the check for 'isSHUFPMask' within method 'LowerVECTOR_SHUFFLE'; - Updates existing tests for sse/avx shuffle/blend instructions to verify that we select (v)blendps/d when possible (instead of (v)shufps/d or vperm2f128). llvm-svn: 211720	2014-06-25 17:41:58 +00:00
Andrea Di Biagio	6d9b9e125d	[X86] Add target combine rule to select ADDSUB instructions from a build_vector This patch teaches the backend how to combine a build_vector that implements an 'addsub' between packed float vectors into a sequence of vector add and vector sub followed by a VSELECT. The new VSELECT is expected to be lowered into a BLENDI. At ISel stage, the sequence 'vector add + vector sub + BLENDI' is pattern-matched against ISel patterns added at r211427 to select 'addsub' instructions. Added three more ISel patterns for ADDSUB. Added test sse3-avx-addsub-2.ll to verify that we correctly emit 'addsub' instructions. llvm-svn: 211679	2014-06-25 10:02:21 +00:00
Andrea Di Biagio	e5015d8aba	[X86] Add ISel patterns to select SSE3/AVX ADDSUB instructions. This patch adds ISel patterns to select SSE3/AVX ADDSUB instructions from a sequence of "vadd + vsub + blend". Example: /// typedef float float4 __attribute__((ext_vector_type(4))); float4 foo(float4 A, float4 B) { float4 X = A - B; float4 Y = A + B; return (float4){X[0], Y[1], X[2], Y[3]}; } /// Before this patch, (with flag -mcpu=corei7) llc produced the following assembly sequence: movaps %xmm0, %xmm2 addps %xmm1, %xmm2 subps %xmm1, %xmm0 blendps $10, %xmm2, %xmm0 With this patch, we now get a single addsubps %xmm1, %xmm0 llvm-svn: 211427	2014-06-21 01:31:15 +00:00
Chandler Carruth	8366cebeb5	[x86] Make the x86 PACKSSWB, PACKSSDW, PACKUSWB, and PACKUSDW instructions available as synthetic SDNodes PACKSS and PACKUS that will select to the correct instruction variants based on the return type. This allows us to use these rather important instructions when lowering vector shuffles. Also moves the relevant instruction definitions to be split out from the fully generic multiclasses to allow them to match these new SDNodes in the same way that the UNPCK instructions do. No functionality should actually be changed here. llvm-svn: 211332	2014-06-20 01:05:28 +00:00
Adam Nemet	35b80eaef1	[X86] Remove AVX1 vbroadcast intrinsics The corresponding CFE patch replaces these intrinsics with vector initializers in avxintrin.h. This patch removes the LLVM intrinsics from the backend. We now stop lowering at X86ISD::VBROADCAST custom node rather than lowering that further to the intrinsics. The patch only changes VBROADCASTS* and leaves VBROADCAST[FI]128 to continue to use intrinsics. As explained in the CFE patch, the reason is that we currently don't generate as good code for them without the intrinsics. CodeGen/X86/avx-vbroadcast.ll already provides coverage for this change. It checks that for a series of insertelements we generate the appropriate vbroadcast instruction. Also verified that there was no assembly change in the test-suite before and after this patch. llvm-svn: 209864	2014-05-29 23:35:36 +00:00
Filipe Cabecinhas	dc92102766	Added more insertps optimizations Summary: When inserting an element that's coming from a vector load or a broadcast of a vector (or scalar) load, combine the load into the insertps instruction. Added PerformINSERTPSCombine for the case where we need to fix the load (load of a vector + insertps with a non-zero CountS). Added patterns for the broadcasts. Also added tests for SSE4.1, AVX, and AVX2. Reviewers: delena, nadav, craig.topper Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D3581 llvm-svn: 209156	2014-05-19 19:45:57 +00:00
Tim Northover	60091cfeb9	TableGen: use correct MIOperand when printing aliases Previously, TableGen assumed that every aliased operand consumed precisely 1 MachineInstr slot (this was reasonable because until a couple of days ago, nothing more complicated was eligible for printing). This allows a couple more ARM64 aliases to print so we can remove the special code. On the X86 side, I've gone for explicit AT&T size specifiers as the default, so turned off a few of the aliases that would have just started printing. llvm-svn: 208880	2014-05-15 13:36:01 +00:00
Tim Northover	d8d65a69cf	TableGen/ARM64: print aliases even if they have syntax variants. To get at least one use of the change (and some actual tests) in with its commit, I've enabled the AArch64 & ARM64 NEON mov aliases. llvm-svn: 208867	2014-05-15 11:16:32 +00:00
Benjamin Kramer	6d2dff61f9	X86: Lower SMUL_LOHI of v4i32 to pmuldq when SSE4.1 is available. llvm-svn: 207318	2014-04-26 14:12:19 +00:00
Benjamin Kramer	c9827ab103	X86: Add patterns for MULHU/MULHS of v8i16 and v16i16. This gets us pretty code for divs of i16 vectors. Turn the existing intrinsics into the corresponding nodes. llvm-svn: 207317	2014-04-26 13:01:03 +00:00
Quentin Colombet	04f7b74c39	[X86] Fix missing/wrong scheduling model found by code inspection. llvm-svn: 207014	2014-04-23 19:30:26 +00:00
Filipe Cabecinhas	20352216fb	Rename X86insrtps to the proper instruction name. Summary: The INSERTPS pattern fragment was called insrtps (mising 'e'), which would make it harder to grep for the patterns related to this instruction. Renaming it to use the proper instruction name. Reviewers: nadav CC: llvm-commits Differential Revision: http://reviews.llvm.org/D3443 llvm-svn: 206779	2014-04-21 20:07:29 +00:00
Benjamin Kramer	e6c821ef4c	X86: Pattern match scalar loads + vcvtph2ps into just vcvtph2ps. vcvtph2ps only reads the lower 64 bits of the address passed to the intrinsic. llvm-svn: 206579	2014-04-18 10:45:33 +00:00
Jim Grosbach	e4fef71981	Add support for load folding of avx1 logical instructions AVX supports logical operations using an operand from memory. Unfortunately because integer operations were not added until AVX2 the AVX1 logical operation's types were preventing the isel from folding the loads. In a limited number of cases the peephole optimizer would fold the loads, but most were missed. This patch adds explicit patterns with appropriate casts in order for these loads to be folded. The included test cases run on reduced examples and disable the peephole optimizer to ensure the folds are being pattern matched. Patch by Louis Gerbarg <lgg@apple.com> rdar://16355124 llvm-svn: 205938	2014-04-09 23:39:25 +00:00
Quentin Colombet	9c816f39ad	Revert r205599, the commit was not intended to have so many changes llvm-svn: 205600	2014-04-04 02:02:49 +00:00
Quentin Colombet	7ee4e79dec	[RegAllocGreedy][Last Chance Recoloring] Emit diagnostics when last chance recoloring cut-offs are hit. This is related to PR18747. Patch by MAYUR PANDEY <mayur.p@samsung.com> llvm-svn: 205599	2014-04-04 01:58:57 +00:00
Cameron McInally	45dc489403	Fix AVX2 Gather execution domains. llvm-svn: 204713	2014-03-25 12:36:38 +00:00
Quentin Colombet	2d5c156b96	[X86][ISelDAG] Add missing fallback patterns for avx2 broadcast instructions. Those patterns are used when the load cannot be folded into the related broadcast during the select phase. This happens when the load gets additional uses that were not anticipated during the previous lowering phases (constant vector to constant load, then constant load reused) or when selection DAG is not able to prove that folding the load will not create a cycle in the DAG. <rdar://problem/16074331> llvm-svn: 204631	2014-03-24 17:54:19 +00:00
Quentin Colombet	ca49851833	[X86][SchedModel] Add missing scheduling model for SSE related instructions. The patch defines new or refines existing generic scheduling classes to match the behavior of the SSE instructions. It also maps those scheduling classes on the related SSE instructions. <rdar://problem/15607571> llvm-svn: 202065	2014-02-24 19:33:51 +00:00
Craig Topper	e2347df24d	[x86] Switch PAUSE instruction to use XS prefix instead of HasREPPrefix. Remove HasREPPrefix support from disassembler table generator since its now only used by CodeGenOnly instructions. llvm-svn: 201767	2014-02-20 07:59:43 +00:00
Craig Topper	6872fd3ad9	Add a bunch of OpSize32 tags to 64-bit mode only instructions to match their 32-bit mode counterparts for cases where there is also a OpSize16 instruction. llvm-svn: 201550	2014-02-18 08:18:29 +00:00
Craig Topper	5ccb61781f	Add an x86 prefix encoding for instructions that would decode to a different instruction with 0xf2/f3/66 were in front of them, but don't themselves have a prefix. For now this doesn't change any bbehavior, but plan to use it to fix some bugs in the disassembler. llvm-svn: 201538	2014-02-18 00:21:49 +00:00
Craig Topper	a0869dceea	Recommit r201059 and r201060 with hopefully a fix for its original failure. Original commits messages: Add MRMXr/MRMXm form to X86 for use by instructions which treat the 'reg' field of modrm byte as a don't care value. Will allow for simplification of disassembler code. Simplify a bunch of code by removing the need for the x86 disassembler table builder to know about extended opcodes. The modrm forms are sufficient to convey the information. llvm-svn: 201065	2014-02-10 06:55:41 +00:00
Bob Wilson	ebdae7c2ff	Revert r201059 and r201060. r201059 appears to cause a crash in a bootstrapped build of clang. Craig isn't available to look at it right now, so I'm reverting it while he investigates. llvm-svn: 201064	2014-02-10 05:28:30 +00:00
Craig Topper	0d88de8c56	Add MRMXr/MRMXm form to X86 for use by instructions which treat the 'reg' field of modrm byte as a don't care value. Will allow for simplification of disassembler code. llvm-svn: 201059	2014-02-10 00:50:34 +00:00
Jim Grosbach	e9008de652	X86: Resolve a long standing FIXME and properly isel pextr[bw]. Generalize the AArch64 .td nodes for AssertZext and AssertSext. Use them to match the relevant pextr store instructions. The test widen_load-2.ll requires a slight change because with the stores gone, the remaining instructions are scheduled in a different order. Add test cases for SSE4 and AVX variants. Resolves rdar://13414672. Patch by Adam Nemet <anemet@apple.com>. llvm-svn: 200957	2014-02-07 00:16:33 +00:00
Tim Northover	546b57b011	X86: deduplicate V[SZ]EXT_MOVL and V[SZ]EXT nodes I believe VZEXT_MOVL means "zero all vector elements except the first" (and should have identical input & output types) whereas VZEXT means "zero extend each element of a vector (discarding higher elements if necessary)". For example: (v4i32 (vzext (v16i8 ...))) should zero extend the low 4 bytes of the incoming vector to 32-bits, discarding higher bytes. However, somewhere in the past, these two concepts had become confused, even leading to a nonsensical VSEXT_MOVL. This re-merges the nodes where appropriate (all VSEXT_MOVL -> VSEXT, VZEXT_MOVL -> VZEXT when it's an actual extension). rdar://problem/15981990 llvm-svn: 200918	2014-02-06 09:54:51 +00:00
Craig Topper	fa6298a162	Merge x86 HasOpSizePrefix/HasOpSize16Prefix into a 2-bit OpSize field with 0 meaning no 0x66 prefix in any mode. Rename Opsize16->OpSize32 and OpSize->OpSize16. The classes now refer to their operand size rather than the mode in which they need a 0x66 prefix. Hopefully can merge REX_W into this as OpSize64. llvm-svn: 200626	2014-02-02 09:25:09 +00:00
Craig Topper	348cbdacda	Remove duplicate patterns llvm-svn: 200461	2014-01-30 07:19:10 +00:00
Craig Topper	c45da1619c	Remove some AddedComplexity tags that were forcing priority for AVX over SSE. Use predicates instead. llvm-svn: 200458	2014-01-30 06:26:25 +00:00
Craig Topper	f124c6a5ef	Add OpSize16 flags to 32-bit CRC32 instructions so they can be encoded correctly in 16-bit mode. llvm-svn: 199478	2014-01-17 08:01:20 +00:00
Craig Topper	ae11aed9d7	Separate the concept of 16-bit/32-bit operand size controlled by 0x66 prefix and the current mode from the concept of SSE instructions using 0x66 prefix as part of their encoding without being affected by the mode. This should allow SSE instructions to be encoded correctly in 16-bit mode which r198586 probably broke. llvm-svn: 199193	2014-01-14 07:41:20 +00:00
Craig Topper	7894e812bb	Add the other form of movq xmm,xmm for the disassembler. llvm-svn: 198551	2014-01-05 07:16:04 +00:00
Craig Topper	d9e1669d1c	Use patterns to remove some duplicate instructions. llvm-svn: 198550	2014-01-05 06:55:48 +00:00
Craig Topper	0550ce7ac1	Mark x86 _alt instructions as AsmParserOnly so they will be omitted from disassembler without string matches. llvm-svn: 198545	2014-01-05 04:55:55 +00:00
Craig Topper	3484fc2161	Add a new x86 specific instruction flag to force some isCodeGenOnly instructions to go through to the disassembler tables without resorting to string matches. Apply flag to all _REV instructions. llvm-svn: 198543	2014-01-05 04:17:28 +00:00
Craig Topper	9dd48c8ed4	Mark all x86 Int_ and _Int patterns as isCodeGenOnly so the disassembler table builder doesn't need to string match them to exclude them. llvm-svn: 198323	2014-01-02 17:28:14 +00:00
Eric Christopher	c0a5aaeab0	[x86] Rename In32BitMode predicate to Not64BitMode That's what it actually means, and with 16-bit support it's going to be a little more relevant since in a few corner cases we may actually want to distinguish between 16-bit and 32-bit mode (for example the bare 'push' aliases to pushw/pushl etc.) Patch by David Woodhouse llvm-svn: 197768	2013-12-20 02:04:49 +00:00
Elena Demikhovsky	47fc44e52e	AVX-512: Added legal type MVT::i1 and VK1 register for it. Added scalar compare VCMPSS, VCMPSD. Implemented LowerSELECT for scalar FP operations. I replaced FSETCCss, FSETCCsd with one node type FSETCCs. Node extract_vector_elt(v16i1/v8i1, idx) returns an element of type i1. llvm-svn: 197384	2013-12-16 13:52:35 +00:00
Andrea Di Biagio	9b5c3dcf01	Added new X86 patterns to select SSE scalar fp arithmetic instructions from a vector packed single/double fp operation followed by a vector insert. The effect is that the backend coverts the packed fp instruction followed by a vectro insert into a SSE or AVX scalar fp instruction. For example, given the following code: __m128 foo(__m128 A, __m128 B) { __m128 C = A + B; return (__m128) {c[0], a[1], a[2], a[3]}; } previously we generated: addps %xmm0, %xmm1 movss %xmm1, %xmm0 we now generate: addss %xmm1, %xmm0 llvm-svn: 197145	2013-12-12 11:50:47 +00:00
Andrea Di Biagio	f7c33c8162	Ensure that the backend no longer emits unnecessary vector insert instructions immediately after SSE scalar fp instructions like addss or mulss. Added patterns to select SSE scalar fp arithmetic instructions from a scalar fp operation followed by a blend. For example, given the following code: __m128 foo(__m128 A, __m128 B) { A[0] += B[0]; return A; } previously we generated: addss %xmm0, %xmm1 movss %xmm1, %xmm0 now we generate: addss %xmm1, %xmm0 llvm-svn: 196925	2013-12-10 15:22:48 +00:00

1 2 3 4 5 ...

1220 Commits