llvm-project

Commit Graph

Author	SHA1	Message	Date
Benjamin Kramer	f4575db2fd	X86: Emit test instead of constant shift + compare if the shift result is unused. This allows us to compile return (mask & 0x8 ? a : b); into testb $8, %dil cmovnel %edx, %esi instead of andl $8, %edi shrl $3, %edi cmovnel %edx, %esi which we formed previously because dag combiner canonicalizes setcc of and into shift. llvm-svn: 207088	2014-04-24 08:15:31 +00:00
Elena Demikhovsky	acc5c9e83e	AVX-512: store and truncstore for i1 values llvm-svn: 206897	2014-04-22 14:13:10 +00:00
Lang Hames	3067ab2344	[X86] Use tablegen instead of DAG combines to match BZHI instructions, as suggested by Ben Kramer in review of r206738. Thanks again Ben! llvm-svn: 206879	2014-04-22 10:41:56 +00:00
Lang Hames	f6f42cac3f	[X86] Don't use BZHI for short masks (>=32 bits). Thanks to Ben Kramer for the review. llvm-svn: 206869	2014-04-22 07:40:34 +00:00
Chandler Carruth	84e68b2994	[Modules] Fix potential ODR violations by sinking the DEBUG_TYPE definition below all of the header #include lines, lib/Target/... edition. llvm-svn: 206842	2014-04-22 02:41:26 +00:00
Lang Hames	5aa6ee80b6	[X86] ISEL (and X, <constant mask>) to BZHI when BMI2 is available. Generating BZHI in the variable mask case, i.e. (and X, (sub (shl 1, N), 1)), was already supported, but we were missing the constant-mask case. This patch fixes that. <rdar://problem/15480077> llvm-svn: 206738	2014-04-21 08:18:53 +00:00
Adam Nemet	ee7a3e38c9	[X86] Improve buildFromShuffleMostly for AVX For a 256-bit BUILD_VECTOR consisting mostly of shuffles of 256-bit vectors, both the BUILD_VECTOR and its operands may need to be legalized in multiple steps. Consider: (v8f32 (BUILD_VECTOR (extract_vector_elt (v8f32 %vreg0,) Constant<1>), (extract_vector_elt %vreg0, Constant<2>), (extract_vector_elt %vreg0, Constant<3>), (extract_vector_elt %vreg0, Constant<4>), (extract_vector_elt %vreg0, Constant<5>), (extract_vector_elt %vreg0, Constant<6>), (extract_vector_elt %vreg0, Constant<7>), %vreg1)) a. We can't build a 256-bit vector efficiently so, we need to split it into two 128-bit vecs and combine them with VINSERTX128. b. Operands like (extract_vector_elt (v8f32 %vreg0), Constant<7>) needs to be split into a VEXTRACTX128 and a further extract_vector_elt from the resulting 128-bit vector. c. The extract_vector_elt from b. is lowered into a shuffle to the first element and a movss. Depending on the order in which we legalize the BUILD_VECTOR and its operands[1], buildFromShuffleMostly may be faced with: (v4f32 (BUILD_VECTOR (extract_vector_elt (vector_shuffle<1,u,u,u> (extract_subvector %vreg0, Constant<4>), undef), Constant<0>), (extract_vector_elt (vector_shuffle<2,u,u,u> (extract_subvector %vreg0, Constant<4>), undef), Constant<0>), (extract_vector_elt (vector_shuffle<3,u,u,u> (extract_subvector %vreg0, Constant<4>), undef), Constant<0>), %vreg1)) In order to figure out the underlying vector and their identity we need to see through the shuffles. [1] Note that the order in which operations and their operands are legalized is only guaranteed in the first iteration of LegalizeDAG. Fixes <rdar://problem/16296956> llvm-svn: 206634	2014-04-18 19:44:16 +00:00
Andrea Di Biagio	aac2eac4c2	[X86] Improve the lowering of packed shifts by constant build_vector. This patch teaches the backend how to efficiently lower logical and arithmetic packed shifts on both SSE and AVX/AVX2 machines. When possible, instead of scalarizing a vector shift, the backend should try to expand the shift into a sequence of two packed shifts by immedate count followed by a MOVSS/MOVSD. Example (v4i32 (srl A, (build_vector < X, Y, Y, Y>))) Can be rewritten as: (v4i32 (MOVSS (srl A, <Y,Y,Y,Y>), (srl A, <X,X,X,X>))) [with X and Y ConstantInt] The advantage is that the two new shifts from the example would be lowered into X86ISD::VSRLI nodes. This is always cheaper than scalarizing the vector into four scalar shifts plus four pairs of vector insert/extract. llvm-svn: 206316	2014-04-15 19:30:48 +00:00
Nick Lewycky	aad475b324	Break PseudoSourceValue out of the Value hierarchy. It is now the root of its own tree containing FixedStackPseudoSourceValue (which you can use isa/dyn_cast on) and MipsCallEntry (which you can't). Anything that needs to use either a PseudoSourceValue* and Value* is strongly encouraged to use a MachinePointerInfo instead. llvm-svn: 206255	2014-04-15 07:22:52 +00:00
David Blaikie	9027abae53	Change argument order and add explanatory comment to r206130 Changes requested in code review by Eric Christopher of r206130. llvm-svn: 206219	2014-04-14 22:23:06 +00:00
David Blaikie	269e0fb2e4	Fix instruction debug info location during legalization I found this from a particular GDB test suite case of inlining (something similar is provided as a test case) but came across a few other related cases (other callers of the same functions, and one other instance of the same coding mistake in a separate function). I'm not sure what the best way to test this is (let alone to cover the other cases I discovered), so hopefully this sufficies - open to ideas. llvm-svn: 206130	2014-04-13 06:39:55 +00:00
Reid Kleckner	9c6582129a	Move the segmented stack switch to a function attribute This removes the -segmented-stacks command line flag in favor of a per-function "split-stack" attribute. Patch by Luqman Aden and Alex Crichton! llvm-svn: 205997	2014-04-10 22:58:43 +00:00
Elena Demikhovsky	cf0b9bafc3	AVX-512: insert element to mask vector; store i1 data Implemented INSERT_VECTOR_ELT operation for v16i1 and v8i1 vectors; Implemented "store" for i1 type llvm-svn: 205850	2014-04-09 12:37:50 +00:00
Elena Demikhovsky	3dcfbdfa54	AVX-512: Added fp_to_uint and uint_to_fp patterns. llvm-svn: 205754	2014-04-08 07:24:02 +00:00
Matt Arsenault	cf6f688a40	Add DAG parameter to ComputeNumSignBitsForTargetNode This way, you can check the number of sign bits in the operands. The depth parameter it already has is pretty useless without this. llvm-svn: 205649	2014-04-04 20:13:13 +00:00
Craig Topper	840beec2d0	Make consistent use of MCPhysReg instead of uint16_t throughout the tree. llvm-svn: 205610	2014-04-04 05:16:06 +00:00
Yaron Keren	2895496852	Added isTargetWindowsMSVC(), renamed isTargetMingw() to isTargetWindowsGNU() and isTargetCygwin() to isTargetWindowsCygwin() to be consistent with the four Windows environments in Triple.h. Suggestion by Saleem Abdulrasool! llvm-svn: 205393	2014-04-02 04:27:51 +00:00
Yaron Keren	48d68d439a	If isKnownWindowsMSVCEnvironment then getOS == Triple::Win32 and Environment == Triple::MSVC so it will never be MinGW or Cygwin. llvm-svn: 205349	2014-04-01 18:52:55 +00:00
Yaron Keren	136fe7db46	isTargetWindows() renamed to isTargetKnownWindowsMSVC() to reflect its current functionality. Based on Takumi NAKAMURA suggestion. llvm-svn: 205338	2014-04-01 18:15:34 +00:00
Aaron Ballman	8bf5a548ea	Attempting to fix r205124, which had failed asserts when built with MSVC. Suggestion from Yaron Keren. llvm-svn: 205313	2014-04-01 13:56:35 +00:00
Alexey Volkov	1328b28dc6	[x86] Do not convert to cmp32 for Atom arch by Sergey Okunev Differential Revision: http://llvm-reviews.chandlerc.com/D2824 llvm-svn: 205288	2014-04-01 08:13:07 +00:00
Rafael Espindola	24a669d225	Prevent alias from pointing to weak aliases. This adds back r204781. Original message: Aliases are just another name for a position in a file. As such, the regular symbol resolutions are not applied. For example, given define void @my_func() { ret void } @my_alias = alias weak void ()* @my_func @my_alias2 = alias void ()* @my_alias We produce without this patch: .weak my_alias my_alias = my_func .globl my_alias2 my_alias2 = my_alias That is, in the resulting ELF file my_alias, my_func and my_alias are just 3 names pointing to offset 0 of .text. That is not the semantics of IR linking. For example, linking in a @my_alias = alias void ()* @other_func would require the strong my_alias to override the weak one and my_alias2 would end up pointing to other_func. There is no way to represent that with aliases being just another name, so the best solution seems to be to just disallow it, converting a miscompile into an error. llvm-svn: 204934	2014-03-27 15:26:56 +00:00
Rafael Espindola	65481d7b97	Revert "Prevent alias from pointing to weak aliases." This reverts commit r204781. I will follow up to with msan folks to see what is what they were trying to do with aliases to weak aliases. llvm-svn: 204784	2014-03-26 06:14:40 +00:00
Rafael Espindola	3b712a84a9	Prevent alias from pointing to weak aliases. Aliases are just another name for a position in a file. As such, the regular symbol resolutions are not applied. For example, given define void @my_func() { ret void } @my_alias = alias weak void ()* @my_func @my_alias2 = alias void ()* @my_alias We produce without this patch: .weak my_alias my_alias = my_func .globl my_alias2 my_alias2 = my_alias That is, in the resulting ELF file my_alias, my_func and my_alias are just 3 names pointing to offset 0 of .text. That is not the semantics of IR linking. For example, linking in a @my_alias = alias void ()* @other_func would require the strong my_alias to override the weak one and my_alias2 would end up pointing to other_func. There is no way to represent that with aliases being just another name, so the best solution seems to be to just disallow it, converting a miscompile into an error. llvm-svn: 204781	2014-03-26 04:48:47 +00:00
Adam Nemet	4beef4c90d	[X86] Generate VPSHUFB for in-place v16i16 shuffles This used to resort to splitting the 256-bit operation into two 128-bit shuffles and then recombining the results. Fixes <rdar://problem/16167303> llvm-svn: 204735	2014-03-25 17:47:06 +00:00
Adam Nemet	ac6d6383a3	[X86] Factor out new helper getPSHUFB I found three implementations of this. This splits it out into a new function and uses it from the three places. My plan is to add a fourth use when lowering a vector_shuffle:v16i16. Compared the assembly output of test/CodeGen/X86 before and after. The only change is due to how the first PSHUFB was generated in LowerVECTOR_SHUFFLEv8i16. If the shuffle mask specified undef (i.e. -1), the old implementation would write -1 * 2 and -1 * 2 + 1 (254 and 255) in the control mask. Now we write 0x80. These are of course interchangeable since bit 7 decides if a constant zero is written in the result byte. The other instances of this code use 0x80 consistently. Related to <rdar://problem/16167303> llvm-svn: 204734	2014-03-25 17:47:03 +00:00
Adam Nemet	b47372f555	[X86] Fix non-determinism in LowerVectorAllZeroTest This can be observed with the old testcase of CodeGen/X86/pr12312.ll: 47c47 < vorps %ymm0, %ymm1, %ymm0 --- > vorps %ymm1, %ymm0, %ymm0 97c97 < vorps %ymm1, %ymm0, %ymm0 --- > vorps %ymm0, %ymm1, %ymm0 The vector VecIns is populated with all the values from VecInMap. This is done while iterating VecInMap. VecInMap uses a hash of pointer values so the resulting order can vary depending on the memory layout. The fix is to populate the vector VecIns earlier as VecInMap is populated. This is done in DAG traversal order. Fixes <rdar://problem/16398806> llvm-svn: 204623	2014-03-24 16:52:08 +00:00
Craig Topper	c6d4efa1e5	Prune includes in X86 target. llvm-svn: 204216	2014-03-19 06:53:25 +00:00
Adam Nemet	8a130a5f86	[X86] Fix unused variable warning with NDEBUG from r204058 llvm-svn: 204063	2014-03-17 17:32:53 +00:00
Adam Nemet	24381f1cb7	[VectorLegalizer/X86] Don't unvectorize fp_to_uint for v8f32->v8i16 Rather than LegalizeAction::Expand, this needs LegalizeAction::Promote to get promoted to fp_to_sint v8f32->v8i32. This is a legal operation on AVX. For that to work properly, we also need to teach the legalizer about the specific promotion required here. The default vector promotion uses bitcasting to a vector type of the same total size. We want to promote the vector element type, effectively widening the operation and then truncating the result. This is analogous to the current logic of how int_to_fp is promoted. The change also factors out some code from the int_to_fp promotion code to ValueType::widenIntegerVectorElementType. This is now shared between int_to_fp and fp_to_int. There is no longer need for the custom lowering of fp_to_sint f32->v8i16 in X86. It can now go through the new target-independent fp_to_*int promotion logic. I also checked that no other target uses Promote for these ops yet, so there shouldn't be any unexpected change in behavior. Fixes <rdar://problem/16202247> llvm-svn: 204058	2014-03-17 17:06:14 +00:00
Arnaud A. de Grandmaison	75c9e6dedf	Remove some dead assignements found by scan-build llvm-svn: 204013	2014-03-15 22:13:15 +00:00
Owen Anderson	16c6bf49b7	Phase 2 of the great MachineRegisterInfo cleanup. This time, we're changing operator* on the by-operand iterators to return a MachineOperand& rather than a MachineInstr&. At this point they almost behave like normal iterators! Again, this requires making some existing loops more verbose, but should pave the way for the big range-based for-loop cleanups in the future. llvm-svn: 203865	2014-03-13 23:12:04 +00:00
Hans Wennborg	6c37f8b985	X86: Don't generate 64-bit movd after cmpneqsd in 32-bit mode (PR19059) This fixes the bug where we would bitcast the 64-bit floating point result of cmpneqsd to a 64-bit integer even on 32-bit targets. Differential Revision: http://llvm-reviews.chandlerc.com/D3009 llvm-svn: 203581	2014-03-11 15:49:24 +00:00
Tim Northover	e94a518a22	IR: add a second ordering operand to cmpxhg for failure The syntax for "cmpxchg" should now look something like: cmpxchg i32* %addr, i32 42, i32 3 acquire monotonic where the second ordering argument gives the required semantics in the case that no exchange takes place. It should be no stronger than the first ordering constraint and cannot be either "release" or "acq_rel" (since no store will have taken place). rdar://problem/15996804 llvm-svn: 203559	2014-03-11 10:48:52 +00:00
Jim Grosbach	c94d993adf	X86: Enable ISel of 16-bit MOVBE instructions. When the MOVBE instructions are available, use them for 16-bit endian swapping as well as for 32 and 64 bit. The patterns were already present on the instructions, but weren't being matched because the operation was unconditionally marked to 'Expand.' Change that to be conditional on whether the MOVBE instructions are available. Use 'rolw' to implement the in-register version (32 and 64 bit have the dedicated 'bswap' instruction for that). Patch by Louis Gerbarg <lgg@apple.com>. rdar://15479984 llvm-svn: 203524	2014-03-11 00:44:14 +00:00
Cameron McInally	791ae9927c	Lower AVX v4i64->v4i32 truncate to one shuffle. llvm-svn: 202996	2014-03-05 19:41:16 +00:00
Chandler Carruth	219b89b987	[Modules] Move CallSite into the IR library where it belogs. It is abstracting between a CallInst and an InvokeInst, both of which are IR concepts. llvm-svn: 202816	2014-03-04 11:01:28 +00:00
Benjamin Kramer	b6d0bd48bd	[C++11] Replace llvm::next and llvm::prior with std::next and std::prev. Remove the old functions. llvm-svn: 202636	2014-03-02 12:27:27 +00:00
Elena Demikhovsky	9737e3886b	AVX-512: Fixed extract_vector_elt for v8i1 vector llvm-svn: 202624	2014-03-02 09:19:44 +00:00
Quentin Colombet	85c9e16291	Lower unsigned vsetcc to psubus in certain cases The current approach to lower a vsetult is to flip the sign bit of the operands, swap the operands and then use a (signed) pcmpgt. psubus (unsigned saturating subtract) can be used to emulate a vsetult more efficiently: + case ISD::SETULT: { + // If the comparison is against a constant we can turn this into a + // setule. With psubus, setule does not require a swap. This is + // beneficial because the constant in the register is no longer + // destructed as the destination so it can be hoisted out of a loop. I also enable lowering via psubus in a few other cases where it's clearly beneficial: setule and setuge if minu/maxu cannot be used. rdar://problem/14338765 Patch by Adam Nemet <anemet@apple.com>. llvm-svn: 202301	2014-02-26 21:39:12 +00:00
Tim Northover	aeb8e06d4c	X86 CodeGenPrep: sink shufflevectors before shifts On x86, shifting a vector by a scalar is significantly cheaper than shifting a vector by another fully general vector. Unfortunately, because SelectionDAG operates on just one basic block at a time, the shufflevector instruction that reveals whether the right-hand side of a shift is really a scalar is often not visible to CodeGen when it's needed. This adds another handler to CodeGenPrepare, to sink any useful shufflevector instructions down to the basic block where they're used, predicated on a target hook (since on other architectures, doing so will often just introduce extra real work). rdar://problem/16063505 llvm-svn: 201655	2014-02-19 10:02:43 +00:00
Tim Northover	f06df5866f	X86: use vpsllvd (& friends) for 16-bit shifts on Haswell llvm-svn: 201558	2014-02-18 11:15:32 +00:00
Elena Demikhovsky	1fad075974	AVX-512: simpyfied BUILD_VECTOR for masks; fixed cmp/test sequence llvm-svn: 201487	2014-02-16 11:34:23 +00:00
Andrea Di Biagio	386d566395	[X86] Teach the backend how to lower vector shift left into multiply rather than scalarizing it. Instead of expanding a packed shift into a sequence of scalar shifts, the backend now tries (when possible) to convert the vector shift into a vector multiply. Before this change, a shift of a MVT::v8i16 vector by a build_vector of constants was always scalarized into a long sequence of "vector extracts + scalar shifts + vector insert". With this change, if there is SSE2 support, we emit a single vector multiply. This change also affects SSE4.1, AVX, AVX2 shifts: - A shift of a MVT::v4i32 vector by a build_vector of non uniform constants is now lowered when possible into a single SSE4.1 vector multiply. - Packed v16i16 shift left by constant build_vector are now expanded when possible into a single AVX2 vpmullw. This change also improves the lowering of AVX512f vector shifts. Added test CodeGen/X86/vec_shift6.ll with some code examples that are affected by this change. llvm-svn: 201271	2014-02-12 23:42:28 +00:00
Elena Demikhovsky	1f32c313f1	AVX: fixed a bug in LowerVECTOR_SHUFFLE llvm-svn: 201140	2014-02-11 10:21:53 +00:00
Elena Demikhovsky	2aafc22ed9	AVX-512: Optimized BUILD_VECTOR pattern; fixed encoding of VEXTRACTPS instruction. llvm-svn: 201134	2014-02-11 07:25:59 +00:00
Elena Demikhovsky	9f423d6f25	AVX-512: Fixed extract_vector_elt for v16i1 and v8i1 vectors. llvm-svn: 201066	2014-02-10 07:02:39 +00:00
Tim Northover	546b57b011	X86: deduplicate V[SZ]EXT_MOVL and V[SZ]EXT nodes I believe VZEXT_MOVL means "zero all vector elements except the first" (and should have identical input & output types) whereas VZEXT means "zero extend each element of a vector (discarding higher elements if necessary)". For example: (v4i32 (vzext (v16i8 ...))) should zero extend the low 4 bytes of the incoming vector to 32-bits, discarding higher bytes. However, somewhere in the past, these two concepts had become confused, even leading to a nonsensical VSEXT_MOVL. This re-merges the nodes where appropriate (all VSEXT_MOVL -> VSEXT, VZEXT_MOVL -> VZEXT when it's an actual extension). rdar://problem/15981990 llvm-svn: 200918	2014-02-06 09:54:51 +00:00
Matt Arsenault	25793a3f22	Add address space argument to allowsUnalignedMemoryAccess. On R600, some address spaces have more strict alignment requirements than others. llvm-svn: 200887	2014-02-05 23:15:53 +00:00
Elena Demikhovsky	0b79be8ab2	AVX-512: optimized icmp -> sext -> icmp pattern llvm-svn: 200849	2014-02-05 16:17:36 +00:00
Craig Topper	7ee163842f	Move matching for x86 BMI BLSI/BLSMSK/BLSR instructions to isel patterns instead of DAG combine. This weakens the ability to fold loads with them because we aren't able to match patterns that load the same thing twice. But maybe we should fix that if we care. The peephole optimizer will be able to fold some loads in its absense. llvm-svn: 200824	2014-02-05 07:09:40 +00:00
Reid Kleckner	f5b76518c9	Implement inalloca codegen for x86 with the new inalloca design Calls with inalloca are lowered by skipping all stores for arguments passed in memory and the initial stack adjustment to allocate argument memory. Now the frontend is responsible for the memory layout, and the backend doesn't have to do any work. As a result these changes are pretty minimal. Reviewers: echristo Differential Revision: http://llvm-reviews.chandlerc.com/D2637 llvm-svn: 200596	2014-01-31 23:50:57 +00:00
Reid Kleckner	c5d9e159eb	x86: Rename NumBytesForCalleeToPush to ...Pop for accuracy If we have a callee cleanup convention, the callee is going to pop the arguments off the stack, not push them on. llvm-svn: 200566	2014-01-31 19:07:18 +00:00
Andrea Di Biagio	2ea61f17ad	[X86] Add extra rules for combining vselect dag nodes into movsd. This improves the fix committed at revision 199683 adding the following new target specific combine rules: 1) fold (v4i32: vselect <0,0,-1,-1>, A, B) -> (v4i32 (bitcast (movsd (v2i64 (bitcast A)), (v2i64 (bitcast B))) )) 2) fold (v4f32: vselect <0,0,-1,-1>, A, B) -> (v4f32 (bitcast (movsd (v2f64 (bitcast A)), (v2f64 (bitcast B))) )) 3) fold (v4i32: vselect <-1,-1,0,0>, A, B) -> (v4i32 (bitcast (movsd (v2i64 (bitcast B)), (v2i64 (bitcast A))) )) 4) fold (v4f32: vselect <-1,-1,0,0>, A, B) -> (v4f32 (bitcast (movsd (v2i64 (bitcast B)), (v2i64 (bitcast A))) )) llvm-svn: 200324	2014-01-28 18:14:21 +00:00
Juergen Ributzka	659ce00d60	[TLI] Add a new hook to TargetLowering to query the target if a load of a constant should be converted to simply the constant itself. Before this patch we used getIntImmCost from TargetTransformInfo to determine if a load of a constant should be converted to just a constant, but the threshold for this was set to an arbitrary value. This value works well for the two targets (X86 and ARM) that implement this target-hook, but it isn't target-independent at all. Now targets have the possibility to decide directly if this optimization should be performed. The default value is set to false to preserve the current behavior. The target hook has been moved to TargetLowering, which removed the last use and need of TargetTransformInfo in SelectionDAG. llvm-svn: 200271	2014-01-28 01:20:14 +00:00
Juergen Ributzka	e758ddcd16	[X86] Prevent the creation of redundant ops for sadd and ssub with overflow. This commit teaches the X86 backend to create the same X86 instructions when it lowers an sadd/ssub with overflow intrinsic and a conditional branch that uses that overflow result. This allows SelectionDAG to recognize and remove one of the redundant operations. This fixes <rdar://problem/15874016> and <rdar://problem/15661073>. Reviewed by Nadav llvm-svn: 199976	2014-01-24 06:47:57 +00:00
Lang Hames	b1ce33379a	Add a few missing cases from r199933. Testcase coming shortly. llvm-svn: 199938	2014-01-23 21:27:27 +00:00
Lang Hames	23de211c5d	Replace vfmaddxx213 instructions with their 231-type equivalents in accumulator loops. Writing back to the accumulator (231-type) allows the coalescer to eliminate an extra copy. llvm-svn: 199933	2014-01-23 20:23:36 +00:00
Elena Demikhovsky	a5d38a39a0	AVX-512: added VPERM2D VPERM2Q VPERM2PS VPERM2PD instructions, they give better sequences than VPERMI llvm-svn: 199893	2014-01-23 14:27:26 +00:00
Andrea Di Biagio	450d1661be	[X86] Teach how to combine a vselect into a movss/movsd Add target specific rules for combining vselect dag nodes into movss/movsd when possible. If the vector type of the vselect dag node in input is either MVT::v4i13 or MVT::v4f32, then try to fold according to rules: 1) fold (vselect (build_vector (0, -1, -1, -1)), A, B) -> (movss A, B) 2) fold (vselect (build_vector (-1, 0, 0, 0)), A, B) -> (movss B, A) If the vector type of the vselect dag node in input is either MVT::v2i64 or MVT::v2f64 (and we have SSE2), then try to fold according to rules: 3) fold (vselect (build_vector (0, -1)), A, B) -> (movsd A, B) 4) fold (vselect (build_vector (-1, 0)), A, B) -> (movsd B, A) llvm-svn: 199683	2014-01-20 19:35:22 +00:00
Michael Gottesman	8347c34e67	Move the retrieval of VT after all of the early exits from PerformOrCombine that do not use VT. NFC. llvm-svn: 199612	2014-01-19 21:06:00 +00:00
Elena Demikhovsky	d1487261a0	AVX-512: fixed a compare pattern llvm-svn: 199366	2014-01-16 08:45:54 +00:00
David Majnemer	dee105772c	WinCOFF: Transform IR expressions featuring __ImageBase into image relative relocations MSVC on x64 requires that we create image relative symbol references to refer to RTTI data. Seeing as how there is no way to explicitly make reference to a given relocation type in LLVM IR, pattern match expressions of the form &foo - &__ImageBase. Differential Revision: http://llvm-reviews.chandlerc.com/D2523 llvm-svn: 199312	2014-01-15 09:16:42 +00:00
Elena Demikhovsky	79b75d9048	Fixed identation. llvm-svn: 199301	2014-01-15 07:18:11 +00:00
Lang Hames	06234ec147	Add FPExt option to CCValAssign::LocInfo. When generating calling-convention promotion code, Tablegen will now select FPExt for floating point promotions (previously it had returned AExt, which is not valid for floating point types). Any out-of-tree targets that were relying on AExt being returned for FP promotions will need to update their code check for FPExt instead. llvm-svn: 199252	2014-01-14 19:56:36 +00:00
Nico Rieck	7157bb765e	Decouple dllexport/dllimport from linkage Representing dllexport/dllimport as distinct linkage types prevents using these attributes on templates and inline functions. Instead of introducing further mixed linkage types to include linkonce and weak ODR, the old import/export linkage types are replaced with a new separate visibility-like specifier: define available_externally dllimport void @f() {} @Var = dllexport global i32 1, align 4 Linkage for dllexported globals and functions is now equal to their linkage without dllexport. Imported globals and functions must be either declarations with external linkage, or definitions with AvailableExternallyLinkage. llvm-svn: 199218	2014-01-14 15:22:47 +00:00
Elena Demikhovsky	767fc967b4	AVX-512: optimized scalar compare patterns removed AVX512SI format, since it is similar to AVX512BI. llvm-svn: 199217	2014-01-14 15:10:08 +00:00
Andrea Di Biagio	5448a3c771	[X86] Fix assertion failure caused by a wrong folding of vector shifts by immediate count. This fixes a regression intruced by r198113. Revision r198113 introduced an algorithm that tries to fold a vector shift by immediate count into a build_vector if the input vector is a known vector of constants. However the algorithm only worked under the assumption that the input vector type and the shift type are exactly the same. This patch disables the folding of vector shift by immediate count if the input vector type and the shift value type are not the same. llvm-svn: 199213	2014-01-14 13:17:12 +00:00
Nico Rieck	9d2e0df049	Revert "Decouple dllexport/dllimport from linkage" Revert this for now until I fix an issue in Clang with it. This reverts commit r199204. llvm-svn: 199207	2014-01-14 12:38:32 +00:00
Nico Rieck	e43aaf7967	Decouple dllexport/dllimport from linkage Representing dllexport/dllimport as distinct linkage types prevents using these attributes on templates and inline functions. Instead of introducing further mixed linkage types to include linkonce and weak ODR, the old import/export linkage types are replaced with a new separate visibility-like specifier: define available_externally dllimport void @f() {} @Var = dllexport global i32 1, align 4 Linkage for dllexported globals and functions is now equal to their linkage without dllexport. Imported globals and functions must be either declarations with external linkage, or definitions with AvailableExternallyLinkage. llvm-svn: 199204	2014-01-14 11:55:03 +00:00
Elena Demikhovsky	172a27c750	AVX-512: Added more intrinsics for pmin/pmax, pabs, blend, pmuldq. llvm-svn: 198745	2014-01-08 10:54:22 +00:00
Bill Wendling	13199b17f8	Remove unnecessary #includes. llvm-svn: 198585	2014-01-06 06:00:00 +00:00
Bill Wendling	908bf814e7	Refactor function that checks that __builtin_returnaddress's argument is constant. This moves the check up into the parent class so that all targets can use it without having to copy (and keep in sync) the same error message. llvm-svn: 198579	2014-01-06 00:43:20 +00:00
Elena Demikhovsky	f404e054a1	AVX-512: changed property name from "neverHasSideEffects=1" to "hasSideEffects=0", added this property to VMOVSS/VMOVSD; Optimized a truncate pattern. llvm-svn: 198562	2014-01-05 14:21:07 +00:00
Elena Demikhovsky	52e4a0e109	AVX-512: Added more intrinsics for convert and min/max. Removed vzeroupper from AVX-512 mode - our optimization gude does not recommend to insert vzeroupper at all. llvm-svn: 198557	2014-01-05 10:46:09 +00:00
Bill Wendling	df7dd28dc8	Emit an error message if the value passed to __builtin_returnaddress isn't a constant __builtin_returnaddress requires that the value passed into is be a constant. However, at -O0 even a constant expression may not be converted to a constant. Emit an error message intead of crashing. llvm-svn: 198531	2014-01-05 01:47:20 +00:00
Craig Topper	a448bd868f	Make more of the x86 lowering helper functions static. llvm-svn: 198146	2013-12-29 01:48:38 +00:00
Craig Topper	059e8e0da1	Switch from EVT to MVT in more of the x86 instruction lowering code. llvm-svn: 198144	2013-12-29 01:10:06 +00:00
Craig Topper	bf096926c9	Use getSimpleValueType in a few spots where the type should be simple. llvm-svn: 198117	2013-12-28 18:35:48 +00:00
Craig Topper	e829fe42af	Minor indentation fix to match other switch statements. Change llvm_unreachable text to match similar places. llvm-svn: 198116	2013-12-28 17:37:32 +00:00
Andrea Di Biagio	eaceba0ed0	[X86] Teach the backend how to fold target specific dag node for packed vector shift by immedate count (VSHLI/VSRLI/VSRAI) into a build_vector when the vector in input to the shift is a build_vector of all constants or UNDEFs. Target specific nodes for packed shifts by immediate count are in general introduced by function 'getTargetVShiftByConstNode' (in X86ISelLowering.cpp) when lowering shift operations, SSE/AVX immediate shift intrinsics and (only in very few cases) SIGN_EXTEND_INREG dag nodes. This patch adds extra rules for simplifying vector shifts inside function 'getTargetVShiftByConstNode'. Added file test/CodeGen/X86/vec_shift5.ll to verify that packed shifts by immediate are correctly folded into a build_vector when the input vector to the shift dag node is a vector of constants or undefs. llvm-svn: 198113	2013-12-28 11:11:52 +00:00
Elena Demikhovsky	b64d7e8586	AVX-512: Result type of scalar SETCC is MVT::i1 for AVX-512. llvm-svn: 198008	2013-12-25 10:06:40 +00:00
Elena Demikhovsky	64c9548d66	AVX-512: fixed some patterns for MVT::i1 llvm-svn: 197981	2013-12-24 14:24:07 +00:00
Elena Demikhovsky	fe24a30e38	AVX512: SETCC returns i1 for AVX-512 and i8 for all others llvm-svn: 197876	2013-12-22 10:13:18 +00:00
Duncan P. N. Exon Smith	ab5dbebc11	Assert that the last operand is actually EFLAGS This is another follow-up to r197503, after a post-commit review by Andy. <rdar://problem/15627766> llvm-svn: 197520	2013-12-17 20:28:21 +00:00
Duncan P. N. Exon Smith	512601d77f	Revert "Revert "Mark vastart_save_xmm_regs as changing EFLAGS"" This reverts commit r197481, recommiting r197469 with an extra fix. The vastart_save_xmm_regs pseudo-instruction expands to a test and a branch, so it modifies EFLAGS. Mark it so, or else the scheduler might place it in the middle of another test+branch. This fixes a bug exposed by r192750, which changed the initial scheduler to source-order as part of enabling the MI Scheduler for X86. This re-commit changes the VASTART_SAVE_XMM_REGS custom inserter not to try to save %flags, and adds a test that catches the bad behavior of r197469. <rdar://problem/15627766> llvm-svn: 197503	2013-12-17 15:54:45 +00:00
Stepan Dyatkovskiy	7f7c2710e0	Fix for PR18045: http://llvm.org/bugs/show_bug.cgi?id=18045 Short issue description: For X86 machines with sse < sse4.1 we got failures for some particular load/store vector sequences: $ clang-trunk -m32 -O2 test-case.c fatal error: error in backend: Cannot select: 0x4200920: v4i32,ch = load 0x41d6ab0, 0x4205850, 0x41dcb10<LD16[getelementptr inbounds ([4 x i32]* @e, i32 0, i32 0)](align=4)> [ORD=82] [ID=58] 0x4205850: i32 = X86ISD::Wrapper 0x41d5490 [ORD=26] [ID=43] 0x41d5490: i32 = TargetGlobalAddress<[4 x i32]* @e> 0 [ORD=26] [ID=23] 0x41dcb10: i32 = undef [ID=2] The reason is that EltsFromConsecutiveLoads could emit such load instruction both before and after legalize stage. Though this instruction is not legal for machines with SSSE3 and lower. The fix: In EltsFromConsecutiveLoads, if we have passed legalize stage, we check whether nodes it emits are legal. P.S.: If you get failure in time from 12:00 and till 22:00 (UTC-8), perhaps I'll slow with response, so you better reject this commit. Thanks! llvm-svn: 197492	2013-12-17 12:07:33 +00:00
Elena Demikhovsky	c5f6726a24	AVX-512: Added implementation of CONCAT_VECTORS for v8i1 vectors (by Alexey Bader). Added implementation of "truncate" from integer type (i64/i32/i16/i8) to i1. llvm-svn: 197482	2013-12-17 08:33:15 +00:00
Elena Demikhovsky	47fc44e52e	AVX-512: Added legal type MVT::i1 and VK1 register for it. Added scalar compare VCMPSS, VCMPSD. Implemented LowerSELECT for scalar FP operations. I replaced FSETCCss, FSETCCsd with one node type FSETCCs. Node extract_vector_elt(v16i1/v8i1, idx) returns an element of type i1. llvm-svn: 197384	2013-12-16 13:52:35 +00:00
Benjamin Kramer	e723bb10b0	X86: When lowering shl_parts, don't emit shift amounts larger than the bit width. While it's safe for the X86-specific shift nodes, dag combining will kill generic nodes. Insert an AND to make it safe, isel will nuke it as x86's shift instructions have an implicit AND. Fixes PR16108, which contains a contraption to hit this case in between constant folders. llvm-svn: 197228	2013-12-13 13:40:24 +00:00
Rafael Espindola	32cb5ac904	Switch to the new MingW ABI. GCC 4.7 changed the MingW ABI. On the LLVM side it means that sret functions don't pop the stack. llvm-svn: 197163	2013-12-12 16:06:58 +00:00
Tim Northover	9653eb5759	Make Triple's isOSBinFormatXXX functions partition triple-space. Most users would be surprised if "isCOFF" and "isMachO" were simultaneously true, unless they'd put the compiler in a box with a gun attached to a photon detector. This makes sure precisely one of the three formats is true for any triple and simplifies some target logic based on that. llvm-svn: 196934	2013-12-10 16:57:43 +00:00
Elena Demikhovsky	e382c3fdcd	AVX-512: changed intrinsics for mask operations llvm-svn: 196918	2013-12-10 13:53:10 +00:00
Cameron McInally	c5f420e129	Suppress '(x < y) ? a : 0 -> (x < y) & a' transform on X86 architectures with dedicated mask registers. Patch by Aleksey Bader. llvm-svn: 196386	2013-12-04 14:52:33 +00:00
Lang Hames	39609996d9	Refactor a lot of patchpoint/stackmap related code to simplify and make it target independent. Most of the x86 specific stackmap/patchpoint handling was necessitated by the use of the native address-mode format for frame index operands. PEI has now been modified to treat stackmap/patchpoint similarly to DEBUG_INFO, allowing us to use a simple, platform independent register/offset pair for frame indexes on stackmap/patchpoints. Notes: - Folding is now platform independent and automatically supported. - Emiting patchpoints with direct memory references now just involves calling the TargetLoweringBase::emitPatchPoint utility method from the target's XXXTargetLowering::EmitInstrWithCustomInserter method. (See X86TargetLowering for an example). - No more ugly platform-specific operand parsers. This patch shouldn't change the generated output for X86. llvm-svn: 195944	2013-11-29 03:07:54 +00:00
Michael Liao	d617a3015d	Fix PR18054 - Fix bug in (vsext (vzext x)) -> (vsext x) in SIGN_EXTEND_IN_REG lowering where we need to check whether x is a vector type (in-reg type) of i8, i16 or i32; otherwise, that optimization is not valid. llvm-svn: 195779	2013-11-26 20:31:31 +00:00
Andrew Trick	391dbadb51	StackMap: Implement support for DirectMemRefOp. A Direct stack map location records the address of frame index. This address is itself the value that the runtime requested. This differs from IndirectMemRefOp locations, which refer to a stack locations from which the requested values must be loaded. Direct locations can directly communicate the address if an alloca, while IndirectMemRefOp handle register spills. For example: entry: %a = alloca i64... llvm.experimental.stackmap(i32 <ID>, i32 <shadowBytes>, i64* %a) Since both the alloca and stackmap intrinsic are in the entry block, and the intrinsic takes the address of the alloca, the runtime can assume that LLVM will not substitute alloca with any intervening value. This must be verified by the runtime by checking that the stack map's location is a Direct location type. The runtime can then determine the alloca's relative location on the stack immediately after compilation, or at any time thereafter. This differs from Register and Indirect locations, because the runtime can only read the values in those locations when execution reaches the instruction address of the stack map. llvm-svn: 195712	2013-11-26 02:03:25 +00:00
Andrew Trick	d3ab37cfeb	whitespace llvm-svn: 195711	2013-11-26 02:03:20 +00:00
Jim Grosbach	860934a924	X86: Perform integer comparisons at i32 or larger. Utilizing the 8 and 16 bit comparison instructions, even when an input can be folded into the comparison instruction itself, is typically not worth it. There are too many partial register stalls as a result, leading to significant slowdowns. By always performing comparisons on at least 32-bit registers, performance of the calculation chain leading to the comparison improves. Continue to use the smaller comparisons when minimizing size, as that allows better folding of loads into the comparison instructions. rdar://15386341 llvm-svn: 195496	2013-11-22 19:57:47 +00:00
Michael Liao	02160d580b	Fix PR18014 - When simplifying the mask generation for BLEND, check whether that mask is also consumed by other non-BLEND insns. If true, skip that simplification. llvm-svn: 195476	2013-11-22 17:56:57 +00:00
Rafael Espindola	5a8e985ad3	Don't produce tail calls when the caller is x86_thiscallcc. The callee will not pop the stack for us. llvm-svn: 195467	2013-11-22 15:18:28 +00:00
Kostya Serebryany	4007009815	Revert r195318 as it causes miscompilation (PR18029) llvm-svn: 195439	2013-11-22 10:30:39 +00:00
Ekaterina Romanova	d5fa55470c	SHLD/SHRD are VectorPath (microcode) instructions known to have poor latency on certain architectures. While generating SHLD/SHRD instructions is acceptable when optimizing for size, optimizing for speed on these platforms should be implemented using alternative sequences of instructions composed of add, adc, shr, shl, or and lea which are directPath instructions. These alternative instructions not only have a lower latency but they also increase the decode bandwidth by allowing simultaneous decoding of a third directPath instruction. AMD's processors family K7, K8, K10, K12, K15 and K16 are known to have SHLD/SHRD instructions with very poor latency. Optimization guides for these processors recommend using an alternative sequence of instructions. For these AMD's processors, I disabled folding (or (x << c) \| (y >> (64 - c))) when we are not optimizing for size. It might be beneficial to disable this folding for some of the Intel's processors. However, since I couldn't find specific recommendations regarding using SHLD/SHRD instructions on Intel's processors, I haven't disabled this peephole for Intel. llvm-svn: 195383	2013-11-21 23:21:26 +00:00
Bill Wendling	07787f8747	The basic problem is that some mainstream programs cannot deal with the way clang optimizes tail calls, as in this example: int foo(void); int bar(void) { return foo(); } where the call is transformed to: calll .L0$pb .L0$pb: popl %eax .Ltmp0: addl $_GLOBAL_OFFSET_TABLE_+(.Ltmp0-.L0$pb), %eax movl foo@GOT(%eax), %eax popl %ebp jmpl *%eax # TAILCALL However, the GOT references must all be resolved at dlopen() time, and so this approach cannot be used with lazy dynamic linking (e.g. using RTLD_LAZY), which usually populates the PLT with stubs that perform the actual resolving. This patch changes X86TargetLowering::LowerCall() to skip tail call optimization, if the called function is a global or external symbol. Patch by Dimitry Andric! PR15086 llvm-svn: 195318	2013-11-21 07:04:30 +00:00
NAKAMURA Takumi	3dedf827f8	X86ISelLowering.cpp: Mark a variable VT as LLVM_ATTRIBUTE_UNUSED. [-Wunused-variable] llvm-svn: 195238	2013-11-20 10:55:22 +00:00
NAKAMURA Takumi	f2392ebb94	Whitespace. llvm-svn: 195237	2013-11-20 10:55:15 +00:00
Elena Demikhovsky	a5967af97d	Fixed compilation error. llvm-svn: 195230	2013-11-20 09:23:22 +00:00
Elena Demikhovsky	e1f9bf054f	AVX-512: Concat 4 128-bit vectors in one 512-bit vector. llvm-svn: 195229	2013-11-20 09:10:40 +00:00
Cameron McInally	ad41f1f693	Add AVX512 unmasked FMA intrinsics and support. llvm-svn: 194824	2013-11-15 17:01:14 +00:00
Matt Arsenault	b03bd4d96b	Add addrspacecast instruction. Patch by Michele Scandale! llvm-svn: 194760	2013-11-15 01:34:59 +00:00
Elena Demikhovsky	0a74b7da35	AVX-512: Handled extractelement from mask vector; Added VMOSHDUP/VMOVSLDUP shuffle instructions. llvm-svn: 194691	2013-11-14 11:29:27 +00:00
Juergen Ributzka	34c652d34d	SelectionDAG: Teach the legalizer to split SETCC if VSELECT needs splitting too. This patch reapplies r193676 with an additional fix for the Hexagon backend. The SystemZ backend has already been fixed by r194148. The Type Legalizer recognizes that VSELECT needs to be split, because the type is to wide for the given target. The same does not always apply to SETCC, because less space is required to encode the result of a comparison. As a result VSELECT is split and SETCC is unrolled into scalar comparisons. This commit fixes the issue by checking for VSELECT-SETCC patterns in the DAG Combiner. If a matching pattern is found, then the result mask of SETCC is promoted to the expected vector mask type for the given target. Now the type legalizer will split both VSELECT and SETCC. This allows the following X86 DAG Combine code to sucessfully detect the MIN/MAX pattern. This fixes PR16695, PR17002, and <rdar://problem/14594431>. Reviewed by Nadav llvm-svn: 194542	2013-11-13 01:57:54 +00:00
Juergen Ributzka	87ed906b2e	[Stackmap] Materialize the jump address within the patchpoint noop slide. This patch moves the jump address materialization inside the noop slide. This enables patching of the materialization itself or its complete removal. This patch also adds the ability to define scratch registers that can be used safely by the code called from the patchpoint intrinsic. At least one scratch register is required, because that one is used for the materialization of the jump address. This patch depends on D2009. Differential Revision: http://llvm-reviews.chandlerc.com/D2074 Reviewed by Andy llvm-svn: 194306	2013-11-09 01:51:33 +00:00
Juergen Ributzka	9969d3e6e8	[Stackmap] Add AnyReg calling convention support for patchpoint intrinsic. The idea of the AnyReg Calling Convention is to provide the call arguments in registers, but not to force them to be placed in a paticular order into a specified set of registers. Instead it is up tp the register allocator to assign any register as it sees fit. The same applies to the return value (if applicable). Differential Revision: http://llvm-reviews.chandlerc.com/D2009 Reviewed by Andy llvm-svn: 194293	2013-11-08 23:28:16 +00:00
Eric Christopher	542c8d934d	Check for both styles of clobbers, those produced by dragonegg and those produced by clang for the inline asm bswap conversion. Modified from a patch by Chris Smowton. llvm-svn: 194016	2013-11-04 21:41:21 +00:00
Elena Demikhovsky	496656900e	AVX-512: Implemented CMOV for 512-bit vectors llvm-svn: 193747	2013-10-31 13:15:32 +00:00
Juergen Ributzka	3bd686d493	Revert "SelectionDAG: Teach the legalizer to split SETCC if VSELECT needs splitting too." Now Hexagon and SystemZ are not happy with it :-( llvm-svn: 193677	2013-10-30 06:36:19 +00:00
Juergen Ributzka	6ad05d6b95	SelectionDAG: Teach the legalizer to split SETCC if VSELECT needs splitting too. The Type Legalizer recognizes that VSELECT needs to be split, because the type is to wide for the given target. The same does not always apply to SETCC, because less space is required to encode the result of a comparison. As a result VSELECT is split and SETCC is unrolled into scalar comparisons. This commit fixes the issue by checking for VSELECT-SETCC patterns in the DAG Combiner. If a matching pattern is found, then the result mask of SETCC is promoted to the expected vector mask type for the given target. This mask has usually the same size as the VSELECT return type (except for Intel KNL). Now the type legalizer will split both VSELECT and SETCC. This allows the following X86 DAG Combine code to sucessfully detect the MIN/MAX pattern. This fixes PR16695, PR17002, and <rdar://problem/14594431>. Reviewed by Nadav llvm-svn: 193676	2013-10-30 05:48:18 +00:00
Elena Demikhovsky	199c823555	AVX-512: PMIN/PMAX intrinsics and patterns Patch by Cameron McInally <cameron.mcinally@nyu.edu> llvm-svn: 193497	2013-10-27 08:18:37 +00:00
Nadav Rotem	d369d4bdf9	Optimize concat_vectors(X, undef) -> scalar_to_vector(X). This optimization is not SSE specific so I am moving it to DAGco. The new scalar_to_vector dag node exposed a missing pattern in the AArch64 target that I needed to add. llvm-svn: 193393	2013-10-25 06:41:18 +00:00
Yaron Keren	79bb266346	(this is a corrected patch) Calling _chkstk is required on ELF as well as COFF on Windows. Without _chkstk, functions requiring large stack crash in initialization code. Previous code tested for COFF format but not Mach-O and this patch modifies the code to test for Windows OS (both Windows target and MingW target) but not Mach-O object format: Looks like macho environment was used to build some EFI code. Credits to Andrew MacPherson. llvm-svn: 193289	2013-10-23 23:37:01 +00:00
Rafael Espindola	bca3ab0905	Revert "Calling _chkstk is required on ELF as well as COFF on Windows. Without _chkstk functions requiring large stack crash in initialization code. Previous code tested for COFF format but not Mach-O and this patch modifies the code to test for Windows." This reverts commit r193263. It is causing CodeGen/X86/mingw-alloca.ll to fail. llvm-svn: 193275	2013-10-23 21:45:09 +00:00
Benjamin Kramer	0ccab2d66c	X86: Custom lower sext v16i8 to v16i16, and the corresponding truncate. Also update the cost model. llvm-svn: 193270	2013-10-23 21:06:07 +00:00
Yaron Keren	03ac82edf5	Calling _chkstk is required on ELF as well as COFF on Windows. Without _chkstk functions requiring large stack crash in initialization code. Previous code tested for COFF format but not Mach-O and this patch modifies the code to test for Windows. Credits to Andrew MacPherson. llvm-svn: 193263	2013-10-23 19:40:07 +00:00
Benjamin Kramer	da8446b833	X86: Custom lower zext v16i8 to v16i16. On sandy bridge (PR17654) we now get vpxor %xmm1, %xmm1, %xmm1 vpunpckhbw %xmm1, %xmm0, %xmm2 vpunpcklbw %xmm1, %xmm0, %xmm0 vinsertf128 $1, %xmm2, %ymm0, %ymm0 On haswell it's a simple vpmovzxbw %xmm0, %ymm0 There is a maze of duplicated and dead transforms and patterns in this area. Remove the dead custom lowering of zext v8i16 to v8i32, that's already handled by LowerAVXExtend. llvm-svn: 193262	2013-10-23 19:19:04 +00:00
Jim Grosbach	005e5b55a7	X86: Make concat_vectors combine a bit more conservative. Per Nadav's review comments for r192866. llvm-svn: 193252	2013-10-23 17:37:40 +00:00
Lang Hames	2783993fca	X86 vector element shift-by-immediate instructions take i8 immediates. Make the instruction defenitions and ISEL reflect this. Prior to this patch these instructions took an i32i8imm, and the high bits were dropped during encoding. This led to incorrect behavior for shifts by immediates higher than 255. This patch fixes that issue by detecting large immediate shifts and returning constant zero (for logical shifts) or capping the shift amount at an encodable value (for arithmetic shifts). Fixes <rdar://problem/14968098> llvm-svn: 193096	2013-10-21 17:51:24 +00:00
Elena Demikhovsky	665c90e184	AVX-512: MUL operation lowering for v8i64 llvm-svn: 193083	2013-10-21 13:27:34 +00:00
Jim Grosbach	c044c65470	x86: Move bitcasts outside concat_vector. Consider the following: typedef unsigned short ushort4U __attribute__((ext_vector_type(4), aligned(2))); typedef unsigned short ushort4 __attribute__((ext_vector_type(4))); typedef unsigned short ushort8 __attribute__((ext_vector_type(8))); typedef int int4 __attribute__((ext_vector_type(4))); int4 __bbase_cvt_int(ushort4 v) { ushort8 a; a.lo = v; return _mm_cvtepu16_epi32(a); } This generates the, not unreasonable, IR: define <4 x i32> @foo0(double %v.coerce) nounwind ssp { %tmp = bitcast double %v.coerce to <4 x i16> %tmp1 = shufflevector <4 x i16> %tmp, <4 x i16> undef, <8 x i32> <i32 %0, i32 1, i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef> %tmp2 = tail call <4 x i32> @llvm.x86.sse41.pmovzxwd(<8 x i16> %tmp1) ret <4 x i32> %tmp2 } The problem is when type legalization gets hold of the v4i16. It legalizes that by spilling to the stack, then doing a zero-extending load. Things go even more silly from there, ending up with something like: _foo0: movsd %xmm0, -8(%rsp) <== Spill to the stack. movq -8(%rsp), %xmm0 <== Reload it right back out. pmovzxwd %xmm0, %xmm1 <== Here's what we actually asked for. pblendw $1, %xmm1, %xmm0 <== We don't need this at all pmovzxwd %xmm0, %xmm0 <== We already did this ret The v8i8 to v8i16 zext intrinsic gives even worse results, with two table lookups via pshufb instructions(!!). To avoid all that, we can move the bitcasting until after we've formed the wider (legal) vector type. Then our normal codegen flows along nicely and we get the expected: _foo0: pmovzxwd %xmm0, %xmm0 ret rdar://15245794 llvm-svn: 192866	2013-10-17 02:58:06 +00:00
Michael Liao	ad71659def	Fix PR17546 - Type of index used in extract_vector_elt or insert_vector_elt supposes to be TLI.getVectorIdxTy() which is pointer type on most targets. It'd better to truncate (or zero-extend in case it's changed later) it to mask element type to guarantee they are matching instead of asserting that. llvm-svn: 192722	2013-10-15 17:51:58 +00:00
Michael Liao	8ba068211d	Fix PR16807 - Lower signed division by constant powers-of-2 to target-independent DAG operators instead of target-dependent ones to support them better on targets where vector types are legal but shift operators on that types are illegal. E.g., on AVX, PSRAW is only available on <8 x i16> though <16 x i16> is a legal type. llvm-svn: 192721	2013-10-15 17:51:02 +00:00
Eric Christopher	755711e510	Reformat this routine slightly. llvm-svn: 192630	2013-10-14 21:52:23 +00:00
Elena Demikhovsky	82a46ebe0a	Fixed a bug in dynamic allocation memory on stack. The alignment of allocated space was wrong, see Bugzila 17345. Done by Zvi Rackover <zvi.rackover@intel.com>. llvm-svn: 192573	2013-10-14 07:26:51 +00:00
Benjamin Kramer	7b5e159450	X86: Fix type check. Just because an integer type is illegal doesn't mean it's i64. Fixes PR17495, where an i24 triggered this code. It's intended to optimize i64 loads on 32 bit x86. llvm-svn: 192123	2013-10-07 19:11:35 +00:00
Elena Demikhovsky	2e408aefe0	AVX-512: added scalar convert instructions and intrinsics. Fixed load folding in VPERM2I instruction. llvm-svn: 192063	2013-10-06 13:11:09 +00:00
Elena Demikhovsky	462a2d235b	AVX-512: fixed shuffle lowering in case of BLEND and added VSHUFPS patterns. llvm-svn: 192055	2013-10-06 06:11:18 +00:00
Craig Topper	b01cd1aa74	Add patterns for selecting TBM instructions from logical operations. Patch from Yunzhong Gao. llvm-svn: 191871	2013-10-03 04:16:45 +00:00
Rafael Espindola	44fee4e0eb	Remove several unused variables. Patch by Alp Toker. llvm-svn: 191757	2013-10-01 13:32:03 +00:00
Robert Wilhelm	f0cfb83bb4	Fix spelling intruction -> instruction. llvm-svn: 191610	2013-09-28 11:46:15 +00:00
Juergen Ributzka	f043a65327	Revert "SelectionDAG: Teach the legalizer to split SETCC if VSELECT needs splitting too." This reverts commit r191130. llvm-svn: 191138	2013-09-21 15:09:46 +00:00
Juergen Ributzka	c2551eb4ff	Fix the buildbot llvm-svn: 191133	2013-09-21 05:15:01 +00:00
Juergen Ributzka	ab930591c7	[X86] Emulate AVX 256bit MIN/MAX support by splitting the vector. In AVX 256bit vectors are valid vectors and therefore the Type Legalizer doesn't split the VSELECT and SETCC nodes. AVX only supports MIN/MAX on 128bit vectors and this fix enables vector splitting for this special case in the X86 DAG Combiner. This fix is related to PR16695, PR17002, and <rdar://problem/14594431>. llvm-svn: 191131	2013-09-21 04:55:22 +00:00
Juergen Ributzka	e9a80fc912	SelectionDAG: Teach the legalizer to split SETCC if VSELECT needs splitting too. The Type Legalizer recognizes that VSELECT needs to be split, because the type is to wide for the given target. The same does not always apply to SETCC, because less space is required to encode the result of a comparison. As a result VSELECT is split and SETCC is unrolled into scalar comparisons. This commit fixes the issue by checking for VSELECT-SETCC patterns in the DAG Combiner. If a matching pattern is found, then the result mask of SETCC is promoted to the expected vector mask for the given target. This mask has usually te same size as the VSELECT return type (except for Intel KNL). Now the type legalizer will split both VSELECT and SETCC. This allows the following X86 DAG Combine code to sucessfully detect the MIN/MAX pattern. This fixes PR16695, PR17002, and <rdar://problem/14594431>. llvm-svn: 191130	2013-09-21 04:55:18 +00:00
Elena Demikhovsky	8952974e29	AVX-512: implemented extractelement with variable index. Added parsing of mask register and "zeroing" semantic, like {%k1} {z}. llvm-svn: 190595	2013-09-12 08:55:00 +00:00
Juergen Ributzka	53d0b492f5	[X86] Perform VSELECT DAG combines also before DAG type legalization. If the DAG already has only legal types, then the second round of DAG combines is skipped. In this case VSELECT+SETCC patterns that match a more efficient instruction (e.g. min/max) are never recognized. This fix allows VSELECT+SETCC combines if the types are already legal before DAG type legalization. Reviewer: Nadav llvm-svn: 190105	2013-09-05 23:02:56 +00:00
Craig Topper	b25f0f5538	Create BEXTR instructions for (and ((sra or srl) x, imm), (2**size - 1)). Fixes PR17028. llvm-svn: 189742	2013-09-02 07:53:17 +00:00
Elena Demikhovsky	4def4b088f	AVX-512: Added GATHER and SCATTER instructions. llvm-svn: 189729	2013-09-01 14:24:41 +00:00
Craig Topper	f78c19c3bb	Fixup BZHI selection to remove an unneeded zero extension. llvm-svn: 189656	2013-08-30 07:16:16 +00:00
Craig Topper	0bccad2d43	Teach X86 backend to create BMI2 BZHI instructions from (and X, (add (shl 1, Y), -1)). Fixes PR17038. llvm-svn: 189653	2013-08-30 06:52:21 +00:00
Elena Demikhovsky	980c6b08b1	AVX-512: added extend and truncate instructions. llvm-svn: 189580	2013-08-29 11:56:53 +00:00
Elena Demikhovsky	12f24673e0	AVX-512: Added FMA instructions. llvm-svn: 189326	2013-08-27 08:39:25 +00:00
Elena Demikhovsky	0a2b6290f1	AVX-512: Added shuffle instructions - VPSHUFD, VPERMILPS, VMOVDDUP, VMOVLHPS, VMOVHLPS, VSHUFPS, VALIGN single and double forms. llvm-svn: 189215	2013-08-26 12:45:35 +00:00
Elena Demikhovsky	f8f478b19d	AVX-512: added UNPACK instructions and tests for all-zero/all-ones vectors llvm-svn: 189189	2013-08-25 12:54:30 +00:00
Elena Demikhovsky	33d447a2d6	AVX-512: Added SHIFT instructions. llvm-svn: 188899	2013-08-21 09:36:02 +00:00
Elena Demikhovsky	1490c5eb5b	AVX-512: added arithmetic and logical operations. ADD, SUB, MUL integer and FP types. OR, AND, XOR. Added embeded broadcast form for these instructions. llvm-svn: 188673	2013-08-19 13:26:14 +00:00
Elena Demikhovsky	3ce8dbbac2	AVX-512: Added VMOVD, VMOVQ, VMOVSS, VMOVSD instructions. llvm-svn: 188637	2013-08-18 13:08:57 +00:00
Craig Topper	e6861c9ce5	Make more of the lowering helpers static. Also use MVT instead of EVT in a couple places. llvm-svn: 188629	2013-08-18 08:53:01 +00:00
Craig Topper	8dbc7e9d35	Revert r188449 as it turns out we're just missing the instructions that need the v16i32/v16f32 matching. llvm-svn: 188454	2013-08-15 08:38:25 +00:00
Craig Topper	2ffd06528d	Don't let isPermImmMask handle v16i32 since VPERMI doesn't match on that type. Remove 128-bit vector handling from isPermImmMask too, it's covered by isPSHUFDMask. llvm-svn: 188449	2013-08-15 07:30:51 +00:00
Craig Topper	6f4dd2dacf	Use MVT in place of EVT in more X86 operation lowering functions. llvm-svn: 188445	2013-08-15 05:33:45 +00:00
Craig Topper	5671010cbb	Replace getValueType().getSimpleVT() with getSimpleValueType(). Also remove one weird cast from MVT->EVT just to call getSimpleVT(). llvm-svn: 188441	2013-08-15 02:33:50 +00:00
Craig Topper	d03748cf5e	Make more helper methods into static functions. llvm-svn: 188366	2013-08-14 07:53:41 +00:00
Craig Topper	7b7b159574	Remove tab characters. llvm-svn: 188365	2013-08-14 07:35:18 +00:00
Craig Topper	d905fded68	Make some helper methods static. llvm-svn: 188364	2013-08-14 07:34:43 +00:00
Craig Topper	60769e050d	Use MVT in more lowering code. llvm-svn: 188363	2013-08-14 07:04:42 +00:00
Craig Topper	52b00359b1	Replace EVT with MVT in isVectorShift. Keeps compiler from generating unneeded checks and handling for extended types. llvm-svn: 188362	2013-08-14 06:21:10 +00:00
Craig Topper	67476d7485	Replace EVT with MVT in many of the shuffle lowering functions. Keeps compiler from generating unneeded checks and handling for extended types. llvm-svn: 188361	2013-08-14 05:58:39 +00:00
Evgeniy Stepanov	7dee697faa	Fix compiler warnings. ../lib/Target/X86/X86ISelLowering.cpp:9715:7: error: unused variable 'OpVT' [-Werror,-Wunused-variable] EVT OpVT = Op0.getValueType(); ^ ../lib/Target/X86/X86ISelLowering.cpp:9763:14: error: unused variable 'NumElems' [-Werror,-Wunused-variable] unsigned NumElems = VT.getVectorNumElements(); llvm-svn: 188269	2013-08-13 14:04:20 +00:00
Elena Demikhovsky	60b1f289f2	AVX-512: Added CMP and BLEND instructions. Lowering for SETCC. llvm-svn: 188265	2013-08-13 13:24:07 +00:00
Elena Demikhovsky	5fed3b95db	AVX-512: Added more tests for BROADCAST llvm-svn: 188148	2013-08-11 12:29:16 +00:00
Elena Demikhovsky	cf5b1458e6	AVX-512: Added VPERM* instructons and MOV* zmm-to-zmm instructions. Added a test for shuffles using VPERM. llvm-svn: 188147	2013-08-11 07:55:09 +00:00
Elena Demikhovsky	45c54ad8dc	AVX-512 set: Added BROADCAST instructions with lowering logic and a test. llvm-svn: 187884	2013-08-07 12:34:55 +00:00
Craig Topper	c5b0ad27ab	Simplify code. No functional change intended. llvm-svn: 187870	2013-08-07 08:16:07 +00:00
Tim Northover	a4415854db	Refactor isInTailCallPosition handling This change came about primarily because of two issues in the existing code. Niether of: define i64 @test1(i64 %val) { %in = trunc i64 %val to i32 tail call i32 @ret32(i32 returned %in) ret i64 %val } define i64 @test2(i64 %val) { tail call i32 @ret32(i32 returned undef) ret i32 42 } should be tail calls, and the function sameNoopInput is responsible. The main problem is that it is completely symmetric in the "tail call" and "ret" value, but in reality different things are allowed on each side. For these cases: 1. Any truncation should lead to a larger value being generated by "tail call" than needed by "ret". 2. Undef should only be allowed as a source for ret, not as a result of the call. Along the way I noticed that a mismatch between what this function treats as a valid truncation and what the backends see can lead to invalid calls as well (see x86-32 test case). This patch refactors the code so that instead of being based primarily on values which it recurses into when necessary, it starts by inspecting the type and considers each fundamental slot that the backend will see in turn. For example, given a pathological function that returned {{}, {{}, i32, {}}, i32} we would consider each "real" i32 in turn, and ask if it passes through unchanged. This is much closer to what the backend sees as a result of ComputeValueVTs. Aside from the bug fixes, this eliminates the recursion that's going on and, I believe, makes the bulk of the code significantly easier to understand. The trade-off is the nasty iterators needed to find the real types inside a returned value. llvm-svn: 187787	2013-08-06 09:12:35 +00:00
Craig Topper	cf969eadaf	Simplify vector lane handling math a bit. No functional change intended. llvm-svn: 187783	2013-08-06 07:23:12 +00:00
Craig Topper	7418ff460c	Simplify math a little bit. llvm-svn: 187781	2013-08-06 06:54:25 +00:00
Craig Topper	9bc00b65b6	Replace EVT with MVT in isHorizontalBinOp as it is only called with legal types. llvm-svn: 187779	2013-08-06 06:05:05 +00:00
Craig Topper	47d7c5c8fe	Simplify code slightly. No functional change. llvm-svn: 187771	2013-08-06 04:12:40 +00:00
Aaron Ballman	5b4634576e	Silencing an MSVC11 type conversion warning. llvm-svn: 187727	2013-08-05 13:47:03 +00:00
Elena Demikhovsky	40864b690b	AVX-512 set: added mask operations, lowering BUILD_VECTOR for i1 vector types. Added intrinsics and tests. llvm-svn: 187717	2013-08-05 08:52:21 +00:00
Benjamin Kramer	5bc180c14f	X86: Turn fp selects into mask operations. double test(double a, double b, double c, double d) { return a<b ? c : d; } before: _test: ucomisd %xmm0, %xmm1 ja LBB0_2 movaps %xmm3, %xmm2 LBB0_2: movaps %xmm2, %xmm0 after: _test: cmpltsd %xmm1, %xmm0 andpd %xmm0, %xmm2 andnpd %xmm3, %xmm0 orpd %xmm2, %xmm0 Small speedup on Benchmarks/SmallPT llvm-svn: 187706	2013-08-04 12:05:16 +00:00
Tim Northover	ecc018c7b7	X86: correct tail return address calculation Due to the weird and wondeful usual arithmetic conversions, some calculations involving negative values were getting performed in uint32_t and then promoted to int64_t, which is really not a good idea. Patch by Katsuhiro Ueno. llvm-svn: 187703	2013-08-04 09:35:57 +00:00
Elena Demikhovsky	b1266b5447	EVEX and compressed displacement encoding for AVX512 llvm-svn: 187576	2013-08-01 13:34:06 +00:00
Elena Demikhovsky	b0a75431ad	Fixed assertion in Extract128BitVector() llvm-svn: 187493	2013-07-31 12:03:08 +00:00
Elena Demikhovsky	67b05fc0b3	Added INSERT and EXTRACT intructions from AVX-512 ISA. All insertf/extractf functions replaced with insert/extract since we have insertf and inserti forms. Added lowering for INSERT_VECTOR_ELT / EXTRACT_VECTOR_ELT for 512-bit vectors. Added lowering for EXTRACT/INSERT subvector for 512-bit vectors. Added a test. llvm-svn: 187491	2013-07-31 11:35:14 +00:00
Nico Rieck	06d17c80cc	Proper va_arg/va_copy lowering on win64 Win64 uses CharPtrBuiltinVaList instead of X86_64ABIBuiltinVaList like other 64-bit targets. llvm-svn: 187355	2013-07-29 13:07:06 +00:00
Justin Holewinski	d3f2035a3c	Add a target legalize hook for SplitVectorOperand (again) CustomLowerNode was not being called during SplitVectorOperand, meaning custom legalization could not be used by targets. This also adds a test case for NVPTX that depends on this custom legalization. Differential Revision: http://llvm-reviews.chandlerc.com/D1195 Attempt to fix the buildbots by making the X86 test I just added platform independent llvm-svn: 187202	2013-07-26 13:28:29 +00:00
Rafael Espindola	1d812728cc	Revert "Add a target legalize hook for SplitVectorOperand" This reverts commit 187198. It broke the bots. The soft float test probably needs a -triple because of name differences. On the hard float test I am getting a "roundss $1, %xmm0, %xmm0", instead of "vroundss $1, %xmm0, %xmm0, %xmm0". llvm-svn: 187201	2013-07-26 13:18:16 +00:00
Justin Holewinski	f848a24e50	Add a target legalize hook for SplitVectorOperand CustomLowerNode was not being called during SplitVectorOperand, meaning custom legalization could not be used by targets. This also adds a test case for NVPTX that depends on this custom legalization. Differential Revision: http://llvm-reviews.chandlerc.com/D1195 llvm-svn: 187198	2013-07-26 12:46:39 +00:00
Elena Demikhovsky	8cfb43f73b	I'm starting to commit KNL backend. I'll push patches one-by-one. This patch includes support for the extended register set XMM16-31, YMM16-31, ZMM0-31. The full ISA you can see here: http://software.intel.com/en-us/intel-isa-extensions llvm-svn: 187030	2013-07-24 11:02:47 +00:00
Juergen Ributzka	3d527d80b8	[X86] Use min/max to optimze unsigend vector comparison on X86 Use PMIN/PMAX for UGE/ULE vector comparions to reduce the number of required instructions. This trick also works for UGT/ULT, but there is no advantage in doing so. It wouldn't reduce the number of instructions and it would actually reduce performance. Reviewer: Ben radar:5972691 llvm-svn: 186432	2013-07-16 18:20:45 +00:00
Craig Topper	202fbc2c9b	Add 'static' keyword to some const arrays for consistency. llvm-svn: 186308	2013-07-15 06:54:12 +00:00
Craig Topper	b94011fd28	Use SmallVectorImpl& instead of SmallVector to avoid repeating small vector size. llvm-svn: 186274	2013-07-14 04:42:23 +00:00
Stephen Lin	fda967fdea	X86: fold SSE2/AVX2 logical shift by immediate amount into zero vector when possible Patch by Andrea Di Biagio llvm-svn: 186165	2013-07-12 15:31:36 +00:00
Charles Davis	e8f297ca94	Target/X86: Add explicit Win64 and System V/x86-64 calling conventions. Summary: This patch adds explicit calling convention types for the Win64 and System V/x86-64 ABIs. This allows code to override the default, and use the Win64 convention on a target that wants to use SysV (and vice-versa). This is needed to implement the `ms_abi` and `sysv_abi` GNU attributes. Reviewers: CC: llvm-svn: 186144	2013-07-12 06:02:35 +00:00
Stephen Lin	73de7bf5de	AArch64/PowerPC/SystemZ/X86: This patch fixes the interface, usage, and all in-tree implementations of TargetLoweringBase::isFMAFasterThanMulAndAdd in order to resolve the following issues with fmuladd (i.e. optional FMA) intrinsics: 1. On X86(-64) targets, ISD::FMA nodes are formed when lowering fmuladd intrinsics even if the subtarget does not support FMA instructions, leading to laughably bad code generation in some situations. 2. On AArch64 targets, ISD::FMA nodes are formed for operations on fp128, resulting in a call to a software fp128 FMA implementation. 3. On PowerPC targets, FMAs are not generated from fmuladd intrinsics on types like v2f32, v8f32, v4f64, etc., even though they promote, split, scalarize, etc. to types that support hardware FMAs. The function has also been slightly renamed for consistency and to force a merge/build conflict for any out-of-tree target implementing it. To resolve, see comments and fixed in-tree examples. llvm-svn: 185956	2013-07-09 18:16:56 +00:00
Nico Rieck	51969be724	Reuse %rax after calling __chkstk on win64 Reapply this as I reverted the wrong commit. llvm-svn: 185807	2013-07-08 11:20:11 +00:00
Nico Rieck	4801303ce1	Revert "Proper va_arg/va_copy lowering on win64" This reverts commit 2b52880592a525cfe04d8f9008a35da8c2ea94c3. Needs review. llvm-svn: 185806	2013-07-08 11:19:44 +00:00
Nico Rieck	43b51056d6	Revert "Reuse %rax after calling __chkstk on win64" This reverts commit 01f8d579f7672872324208ac5bc4ac311e81b22e. llvm-svn: 185781	2013-07-08 01:30:57 +00:00
Nico Rieck	7adf6111a8	Reuse %rax after calling __chkstk on win64 llvm-svn: 185778	2013-07-07 16:48:39 +00:00

... 2 3 4 5 6 ...

2689 Commits