llvm-project

Commit Graph

Author	SHA1	Message	Date
Eric Christopher	11e5983658	Move the MMX subtarget feature out of the SSE set of features and into its own variable. This is needed so that we can explicitly turn off MMX without turning off SSE and also so that we can diagnose feature set incompatibilities that involve MMX without SSE. Rationale: // sse3 __m128d test_mm_addsub_pd(__m128d A, __m128d B) { return _mm_addsub_pd(A, B); } // mmx void shift(__m64 a, __m64 b, int c) { _mm_slli_pi16(a, c); _mm_slli_pi32(a, c); _mm_slli_si64(a, c); _mm_srli_pi16(a, c); _mm_srli_pi32(a, c); _mm_srli_si64(a, c); _mm_srai_pi16(a, c); _mm_srai_pi32(a, c); } clang -msse3 -mno-mmx file.c -c For this code we should be able to explicitly turn off MMX without affecting the compilation of the SSE3 function and then diagnose and error on compiling the MMX function. This matches the existing gcc behavior and follows the spirit of the SSE/MMX separation in llvm where we can (and do) turn off MMX code generation except in the presence of intrinsics. Updated a couple of tests, but primarily tested with a couple of tests for turning on only mmx and only sse. This is paired with a patch to clang to take advantage of this behavior. llvm-svn: 249731	2015-10-08 20:10:06 +00:00
Reid Kleckner	b2244cb8f0	[WinEH] Relax assertion in the presence of stack realignment The code is correct as is, but we should test it. llvm-svn: 249715	2015-10-08 18:41:52 +00:00
Frederic Riss	263b772bda	[X86] Disable X86CallFrameOptimization on Darwin in presence of EH We emit 1 compact unwind encoding per function, and this can’t represent the varying stack pointer that will be generated by X86CallFrameOptimization. Disable the optimization on Darwin. (It might be possible to split the function into multiple ranges and emit 1 compact unwind info per range. The compact unwind emission code isn’t ready for that and this kind of info certainly isn’t tested/used anywhere. It might be worth exploring this path if we want to get the space savings at some point though) llvm-svn: 249694	2015-10-08 15:45:08 +00:00
Igor Breger	defab3c1ef	AVX512: vpextrb/w/d/q and vpinsrb/w/d/q implementation. This instructions doesn't have intrincis. Added tests for lowering and encoding. Differential Revision: http://reviews.llvm.org/D12317 llvm-svn: 249688	2015-10-08 12:55:01 +00:00
Michael Kuperstein	04e79329d0	[X86] Fix wrong treatment of multi-lane blends in BUILD_VECTORtoBlendMask() This fixes two separate bugs: 1) The mask for the high lane was not set correctly. That fixes PR24532. 2) The transformation should bail out if it believes it involves more than 2 lanes, as it does not currently do anything sensible in this case. Differential Revision: http://reviews.llvm.org/D13505 llvm-svn: 249669	2015-10-08 08:13:02 +00:00
Michael Kuperstein	2b3c16ca17	Do not assert on first non-prologue instruction being a CFI directive. llvm-svn: 249668	2015-10-08 07:48:49 +00:00
Reid Kleckner	94fe836afa	[WinEH] Add missing test case for llvm.eh.exceptioncode llvm-svn: 249638	2015-10-07 23:55:06 +00:00
Reid Kleckner	97797419e6	[WinEH] Fix 32-bit funclet epilogues in the presence of dynamic allocas In particular, passing non-trivially copyable objects by value on win32 uses a dynamic alloca (inalloca). We would clobber ESP in the epilogue and end up returning to outer space. llvm-svn: 249637	2015-10-07 23:55:01 +00:00
David Majnemer	6af5f82c20	[WinEH] Refer to filter funclets using their symbol-table symbol The relocation for the filter funclet will be against a symbol table entry for a function instead of the section, making it easier to understand what is going on. llvm-svn: 249621	2015-10-07 21:34:00 +00:00
Reid Kleckner	70bf6bb5e6	[WinEH] Undo the effect of r249578 for 32-bit The __CxxFrameHandler3 tables for 32-bit are supposed to hold stack offsets relative to EBP, not ESP. I blindly updated the win-catchpad.ll test case, and immediately noticed that 32-bit catching stopped working. While I'm at it, move the frame index to frame offset WinEH table logic out of PEI. PEI shouldn't have to know about WinEHFuncInfo. I realized we can calculate frame index offsets just fine from the table printer. llvm-svn: 249618	2015-10-07 21:13:15 +00:00
David Majnemer	c289c9ff55	[WinEH] Remove unreachable blocks before preparation We remove unreachable blocks because it is pointless to consider them for coloring. However, we still had stale pointers to these blocks in some data structures after we removed them from the function. Instead, remove the unreachable blocks before attempting to do anything with the function. This fixes PR25099. llvm-svn: 249617	2015-10-07 21:08:25 +00:00
Kevin B. Smith	99e8c0fffb	[X86]Update test to use FileCheck. Updates this test to use FileCheck and a single llc invocation rather than 3 llc invocations and grep. llvm-svn: 249583	2015-10-07 18:21:41 +00:00
Reid Kleckner	33bd2d99d8	[WinEH] Fix two minor issues in __CxxFrameHandler3 tables There was an off-by-one bug in ip2state tables which manifested when one call immediately preceded the try-range of the next. The return address of the previous call would appear to be within the try range of the next scope, resulting in extra destructors or catches running. We also computed the wrong offset for catch parameter stack objects. The offset should be from RSP, not from RBP. llvm-svn: 249578	2015-10-07 17:49:32 +00:00
Michael Kuperstein	259f1508f0	[X86] Emit .cfi_escape GNU_ARGS_SIZE when adjusting the stack before calls When outgoing function arguments are passed using push instructions, and EH is enabled, we may need to indicate to the stack unwinder that the stack pointer was adjusted before the call. This should fix the exception handling issues in PR24792. Differential Revision: http://reviews.llvm.org/D13132 llvm-svn: 249522	2015-10-07 07:01:31 +00:00
Eric Christopher	ab2802c58f	Update test to use FileCheck and clean up run lines to match the expected behavior. llvm-svn: 249498	2015-10-07 01:21:49 +00:00
David Majnemer	7735a6d07a	[WinEH] Create a separate MBB for funclet prologues Our current emission strategy is to emit the funclet prologue in the CatchPad's normal destination. This is problematic because intra-funclet control flow to the normal destination is not erroneous and results in us reevaluating the prologue if said control flow is taken. Instead, use the CatchPad's location for the funclet prologue. This correctly models our desire to have unwind edges evaluate the prologue but edges to the normal destination result in typical control flow. Differential Revision: http://reviews.llvm.org/D13424 llvm-svn: 249483	2015-10-06 23:31:59 +00:00
Craig Topper	2c4068f409	[TwoAddressInstructionPass] When looking for a 3 addr conversion after commuting, make sure regB has been updated to take into account the commute. llvm-svn: 249378	2015-10-06 05:39:59 +00:00
Craig Topper	79dd1bf094	[X86] Teach constant hoisting that ANDs with 64-bit immediates in the range 0x80000000-0xffffffff can be handled cheaply and don't need to be hoisted. Most importantly, this keeps constant hoisting from preventing instruction selections ability to turn an AND with 0xffffffff into a move into a 32-bit subregister. llvm-svn: 249370	2015-10-06 02:50:24 +00:00
Simon Pilgrim	bb01c6fda2	[X86][SSE4A] Added shuffle decode tests for 'special case' SSE4A EXTRQI/INSERTQI ops. llvm-svn: 249263	2015-10-04 10:12:53 +00:00
Igor Breger	78741a1b1e	AVX512: Implemented encoding and intrinsics for VPERMILPS/PD instructions. Added tests for intrinsics and encoding. Differential Revision: http://reviews.llvm.org/D12690 llvm-svn: 249261	2015-10-04 07:20:41 +00:00
David Majnemer	161935520d	[WinEH] Permit branch folding in the face of funclets Track which basic blocks belong to which funclets. Permit branch folding to fire but only if it can prove that doing so will not cause code in one funclet to be reused in another. llvm-svn: 249257	2015-10-04 02:22:52 +00:00
Simon Pilgrim	dde63374c5	[DAGCombiner] Generalize FADD constant combines to work with vectors Updated the FADD combines to work with vectors as well as scalars. Differential Revision: http://reviews.llvm.org/D13416 llvm-svn: 249251	2015-10-03 22:06:06 +00:00
Sanjay Patel	004ea240ad	add test cases that demonstrate bad behavior These are based on PR25016 and likely caused by a bug in MachineCombiner's definition of improvesCriticalPathLen(). llvm-svn: 249249	2015-10-03 20:52:55 +00:00
Simon Pilgrim	93ea954e6d	[X86][SSE] Add FADD combine tests. llvm-svn: 249240	2015-10-03 18:17:43 +00:00
Richard Trieu	e0129e474d	Call the correct overload. Call the correct overload so a string literal does not get converted to a bool. Also fix the test case to match the names given. llvm-svn: 249183	2015-10-02 20:52:14 +00:00
Andrea Di Biagio	77f62652c1	Reapply r249121 : "[FastISel][x86] Teach how to select SSE2/AVX bitcasts between 128/256-bit vector types." This patch teaches FastIsel the following two things: 1) On SSE2, no instructions are needed for bitcasts between 128-bit vector types; 2) On AVX, no instructions are needed for bitcasts between 256-bit vector types. Example: %1 = bitcast <4 x i31> %V to <2 x i64> Before (-fast-isel -fast-isel-abort=1): FastIsel miss: %1 = bitcast <4 x i31> %V to <2 x i64> Now we don't fall back to SelectionDAG and we correctly fold that computation propagating the register associated to %V. Originally reviewed here: http://reviews.llvm.org/D13347 llvm-svn: 249147	2015-10-02 16:08:05 +00:00
Andrea Di Biagio	45874e67a1	Revert: [FastISel][x86] Teach how to select SSE2/AVX bitcasts between 128/256-bit vector types. r249121 caused a Clang test failure (avx2-buitins.c). Revert r249121 while I keep investigating on the reason why that test failed. llvm-svn: 249124	2015-10-02 13:06:19 +00:00
Andrea Di Biagio	cb33456122	[FastISel][x86] Teach how to select SSE2/AVX bitcasts between 128/256-bit vector types. This patch teaches FastIsel the following two things: 1) On SSE2, no instructions are needed for bitcasts between 128-bit vector types; 2) On AVX, no instructions are needed for bitcasts between 256-bit vector types. Example: %1 = bitcast <4 x i31> %V to <2 x i64> Before (-fast-isel -fast-isel-abort=1): FastIsel miss: %1 = bitcast <4 x i31> %V to <2 x i64> Now we don't fall back to SelectionDAG and we correctly fold that computation propagating the register associated to %V. Differential Revision: http://reviews.llvm.org/D13347 llvm-svn: 249121	2015-10-02 12:45:37 +00:00
Reid Kleckner	fc64fae6e3	[WinEH] Emit __C_specific_handler tables for the new IR We emit denormalized tables, where every range of invokes in the same state gets a complete list of EH action entries. This is significantly simpler than trying to infer the correct nested scoping structure from the MI. Fortunately, for SEH, the nesting structure is really just a size optimization. With this, some basic __try / __except examples work. llvm-svn: 249078	2015-10-01 21:38:24 +00:00
David Majnemer	4600c06434	[WinEH] Stop BranchFolding from merging across funclets BranchFolding would merge two funclets together, this is not OK. Disable this and strengthen the assertion in FuncletLayout. llvm-svn: 249069	2015-10-01 21:04:13 +00:00
David Majnemer	f828a0ccc7	[WinEH] Make FuncletLayout more robust against catchret Catchret transfers control from a catch funclet to an earlier funclet. However, it is not completely clear which funclet the catchret target is part of. Make this clear by stapling the catchret target's funclet membership onto the CATCHRET SDAG node. llvm-svn: 249052	2015-10-01 18:44:59 +00:00
NAKAMURA Takumi	1ed20db720	Revert r248959, "[WinEH] Emit int3 after noreturn calls on Win64" It broke; LLVM :: CodeGen__Generic__2009-11-16-BadKillsCrash.ll llvm-svn: 249032	2015-10-01 17:00:56 +00:00
Ahmed Bougacha	23a0d1a1d6	[X86] Don't custom-lower vNi32 uint_to_fp when unsafe-fp-math. The custom code produces incorrect results if later reassociated. Since r221657, on x86, vNi32 uitofp is lowered using an optimized sequence: movdqa LCPI0_0(%rip), %xmm1 ## xmm1 = [65535, ...] pand %xmm0, %xmm1 por LCPI0_1(%rip), %xmm1 ## [0x4b000000, ...] psrld $16, %xmm0 por LCPI0_2(%rip), %xmm0 ## [0x53000000, ...] addps LCPI0_3(%rip), %xmm0 ## [float -5.497642e+11, ...] addps %xmm1, %xmm0 Since r240361, the machine combiner opportunistically reassociates 2-instruction sequences (with -ffast-math). In the new code sequence, the ADDPS' are eligible. In isolation, for simple examples (without reassociable users), this makes no performance difference (the goal being to enable reassociation of longer chains). In the trivial example (just one uitofp), the reassociation doesn't happen, because (I think) it would require the emission of a separate movaps for a constantpool load (instead of folding it into addps). However, when we have multiple uitofp sequences, and the constantpool loads are CSE'd earlier, the machine combiner can do the reassociation. When the ADDPS' are reassociated, the resulting sequence isn't correct anymore, as we'd be adding large (239) constants with comparatively smaller values (~223). Given that two of the three inputs are powers of 2 larger than 216, and that ulp(239) == 2(39-24) == 215, the reassociated chain will produce 0 for any input in [0, 214[. In my testing, it also produces wrong results for 99.5% of [0, 232[. Avoid this by disabling the new lowering when -ffast-math. It does mean that we'll get slower code than without it, but at least we won't get egregiously incorrect code. One might argue that, considering -ffast-math is all but meaningless, uitofp producing wrong results isn't a compiler bug. But it really is. Fixes PR24512. ...though this is really more of a workaround. Ideally, we'd have some sort of Machine FMF, but that's a problem that's not worth tackling until we do more with machine IR. llvm-svn: 248965	2015-10-01 00:11:07 +00:00
Reid Kleckner	6dec87a8a0	[WinEH] Emit int3 after noreturn calls on Win64 The Win64 unwinder disassembles forwards from each PC to try to determine if this PC is in an epilogue. If so, it skips calling the EH personality function for that frame. Typically, this means you cannot catch an exception in the same frame that you threw it, because 'throw' calls a noreturn runtime function. Previously we avoided this problem with the TrapUnreachable TargetOption, but that's a much bigger hammer than we need. All we need is a 1 byte non-epilogue instruction right after the call. Instead, what we got was an unconditional branch to a shared block containing the ud2, potentially 7 bytes instead of 1. So, this reverts r206684, which added TrapUnreachable, and replaces it with something better. The new code pattern matches for invoke/call followed by unreachable and inserts an int3 into the DAG. To be 100% watertight, we would need to insert SEH_Epilogue instructions into all basic blocks ending in a call with no terminators or successors, but in practice this is unlikely to come up. llvm-svn: 248959	2015-09-30 23:09:23 +00:00
Sanjay Patel	a114a10bbe	[x86] enable machine combiner reassociations for 256-bit vector logical integer insts llvm-svn: 248955	2015-09-30 22:25:55 +00:00
Simon Pilgrim	3d11c994f7	[X86][XOP] Added support for the lowering of 128-bit vector shifts to XOP shift instructions The XOP shifts just have logical/arithmetic versions and the left/right shifts are controlled by whether the value is positive/negative. Because of this I've added new X86ISD nodes instead of trying to force them to use the existing shift nodes. Additionally Excavator cores (bdver4) support XOP and AVX2 - meaning that it should use the AVX2 shifts when it can and fall back to XOP in other cases. Differential Revision: http://reviews.llvm.org/D8690 llvm-svn: 248878	2015-09-30 08:17:50 +00:00
Evgeniy Stepanov	d3f544f271	[safestack] Fix a stupid mix-up in the direct-tls code path. llvm-svn: 248863	2015-09-30 00:01:47 +00:00
Reid Kleckner	a13dfd539b	[WinEH] Setup RBP correctly in Win64 funclet prologues Previously local variable captures just didn't work in 64-bit. Now we can access local variables more or less correctly. llvm-svn: 248857	2015-09-29 23:32:01 +00:00
David Majnemer	91b0ab9172	[WinEH] Ensure that funclets obey the x64 ABI The x64 ABI requires that epilogues do not contain code other than stack adjustments and some limited control flow. However, we'd insert code to initialize the return address after stack adjustments. Instead, insert EAX/RAX with the current value before we create the stack adjustments in the epilogue. llvm-svn: 248839	2015-09-29 22:33:36 +00:00
Maksim Panchenko	cce239c45d	HHVM calling conventions. HHVM calling convention, hhvmcc, is used by HHVM JIT for functions in translated cache. We currently support LLVM back end to generate code for X86-64 and may support other architectures in the future. In HHVM calling convention any GP register could be used to pass and return values, with the exception of R12 which is reserved for thread-local area and is callee-saved. Other than R12, we always pass RBX and RBP as args, which are our virtual machine's stack pointer and frame pointer respectively. When we enter translation cache via hhvmcc function, we expect the stack to be aligned at 16 bytes, i.e. skewed by 8 bytes as opposed to standard ABI alignment. This affects stack object alignment and stack adjustments for function calls. One extra calling convention, hhvm_ccc, is used to call C++ helpers from HHVM's translation cache. It is almost identical to standard C calling convention with an exception of first argument which is passed in RBP (before we use RDI, RSI, etc.) Differential Revision: http://reviews.llvm.org/D12681 llvm-svn: 248832	2015-09-29 22:09:16 +00:00
David Majnemer	a80c151286	[WinEH] Teach AsmPrinter about funclets Summary: Funclets have been turned into functions by the time they hit the object file. Make sure that they have decent names for the symbol table and CFI directives explaining how to reason about their prologues. Differential Revision: http://reviews.llvm.org/D13261 llvm-svn: 248824	2015-09-29 20:12:33 +00:00
Jeroen Ketema	740f9d79ca	Arguments spilled on the stack before a function call may have alignment requirements, for example in the case of vectors. These requirements are exploited by the code generator by using move instructions that have similar alignment requirements, e.g., movaps on x86. Although the code generator properly aligns the arguments with respect to the displacement of the stack pointer it computes, the displacement itself may cause misalignment. For example if we have %3 = load <16 x float>, <16 x float>* %1, align 64 call void @bar(<16 x float> %3, i32 0) the x86 back-end emits: movaps 32(%ecx), %xmm2 movaps (%ecx), %xmm0 movaps 16(%ecx), %xmm1 movaps 48(%ecx), %xmm3 subl $20, %esp <-- if %esp was 16-byte aligned before this instruction, it no longer will be afterwards movaps %xmm3, (%esp) <-- movaps requires 16-byte alignment, while %esp is not aligned as such. movl $0, 16(%esp) calll __bar To solve this, we need to make sure that the computed value with which the stack pointer is changed is a multiple af the maximal alignment seen during its computation. With this change we get proper alignment: subl $32, %esp movaps %xmm3, (%esp) Differential Revision: http://reviews.llvm.org/D12337 llvm-svn: 248786	2015-09-29 10:12:57 +00:00
Reid Kleckner	c71d6275ca	[WinEH] Fix ip2state table emission with funclets Previously we were hijacking the old LandingPadInfo data structures to communicate our state numbers. Now we don't need that anymore. llvm-svn: 248763	2015-09-28 23:56:30 +00:00
Matthias Braun	a3b701f828	SelectionDAGDumper: Print simple operands inline. Print simple operands inline instead of their pointer/value number. Simple operands are SDNodes without predecessors like Constant(FP), Register, UNDEF. This unifies the behaviour with dumpr() which was already doing this. Previously: t0: ch = EntryToken t1: i64 = Register %vreg0 t2: i64,ch = CopyFromReg t0, t1 t3: i64 = Constant<1> t4: i64 = add t2, t3 t5: i64 = Constant<2> t6: i64 = add t2, t5 t10: i64 = undef t11: i8,ch = load t0, t2, t10<LD1[%tmp81]> t12: i8,ch = load t0, t4, t10<LD1[%tmp10]> t13: i8,ch = load t0, t6, t10<LD1[%tmp12]> Now: t0: ch = EntryToken t2: i64,ch = CopyFromReg t0, Register:i64 %vreg0 t4: i64 = add t2, Constant:i64<1> t6: i64 = add t2, Constant:i64<2> t11: i8,ch = load<LD1[%tmp81]> t0, t2, undef:i64 t12: i8,ch = load<LD1[%tmp10]> t0, t4, undef:i64 t13: i8,ch = load<LD1[%tmp12]> t0, t6, undef:i64 Differential Revision: http://reviews.llvm.org/D12567 llvm-svn: 248628	2015-09-25 22:27:02 +00:00
Sanjay Patel	bbbf9a1a34	merge vector stores into wider vector stores and fix AArch64 misaligned access TLI hook (PR21711) This is a redo of D7208 ( r227242 - http://llvm.org/viewvc/llvm-project?view=revision&revision=227242 ). The patch was reverted because an AArch64 target could infinite loop after the change in DAGCombiner to merge vector stores. That happened because AArch64's allowsMisalignedMemoryAccesses() wasn't telling the truth. It reported all unaligned memory accesses as fast, but then split some 128-bit unaligned accesses up in performSTORECombine() because they are slow. This patch attempts to fix the problem in AArch's allowsMisalignedMemoryAccesses() while preserving existing (perhaps questionable) lowering behavior. The x86 test shows that store merging is working as intended for a target with fast 32-byte unaligned stores. Differential Revision: http://reviews.llvm.org/D12635 llvm-svn: 248622	2015-09-25 21:49:48 +00:00
Simon Pilgrim	68d0050c6a	[X86][SSE2] Fix zero/any extension shuffles that don't start from the first element Fix for D12561 - we weren't correctly ensuring that the base element for extension was moved to start on a boundary suitable for UNPCKL/H llvm-svn: 248536	2015-09-24 21:02:17 +00:00
Mohammad Shahid	d0203cbf1c	Regression Test: Deletes redundant/invalid test. Removes absdiff_expand.ll regression test file which is invalid. Diffrential Revision: http://reviews.llvm.org/D11678 llvm-svn: 248493	2015-09-24 14:37:25 +00:00
Mohammad Shahid	13f1dfdf2e	Codegen: Fix llvm.absdiff semantic. Fixes the overflow case of llvm.absdiff intrinsic also updats the tests and LangRef.rst accordingly. Differential Revision: http://reviews.llvm.org/D11678 llvm-svn: 248483	2015-09-24 10:35:03 +00:00
Matt Arsenault	c721df0478	Use new TokenFactor chain when merging stores If the stores are storing values from loads which partially alias the stores, we could end up placing the merged loads and stores on the same chain which has the potential to break. Each store may have a different chain dependency on only some of the original loads. Create a new TokenFactor to capture all of the required dependencies of the stores rather than assuming all stores can use the same chain. The testcase is a situation where this happens, although it does not have an observable change from this. The DAG nodes just happened to not be reordered before despite this missing chain dependency. This is based on an off-list report for an out of tree target which regressed due to r246307 and I haven't managed to find a case where the nodes do end up reordered with an in tree target. llvm-svn: 248468	2015-09-24 07:22:38 +00:00
Sanjay Patel	1a6534661b	[x86] replace integer 'xor' ops with packed SSE FP 'xor' ops when operating on FP scalars Turn this: movd %xmm0, %eax movd %xmm1, %ecx xorl %eax, %ecx movd %ecx, %xmm0 into this: xorps %xmm1, %xmm0 This is related to, but does not solve: https://llvm.org/bugs/show_bug.cgi?id=22428 This is an extension of: http://reviews.llvm.org/rL248395 llvm-svn: 248415	2015-09-23 18:33:42 +00:00
Sanjay Patel	aba37553c4	[x86] replace integer 'or' ops with packed SSE FP 'or' ops when operating on FP scalars Turn this: movd %xmm0, %eax movd %xmm1, %ecx orl %eax, %ecx movd %ecx, %xmm0 into this: orps %xmm1, %xmm0 This is related to, but does not solve: https://llvm.org/bugs/show_bug.cgi?id=22428 This is an extension of: http://reviews.llvm.org/rL248395 llvm-svn: 248409	2015-09-23 18:19:07 +00:00
Sanjay Patel	df2495f331	[x86] replace integer 'and' ops with packed SSE FP 'and' ops when operating on FP scalars Turn this: movd %xmm0, %eax movd %xmm1, %ecx andl %eax, %ecx movd %ecx, %xmm0 into this: andps %xmm1, %xmm0 This is related to, but does not solve: https://llvm.org/bugs/show_bug.cgi?id=22428 Differential Revision: http://reviews.llvm.org/D13065 llvm-svn: 248395	2015-09-23 17:00:06 +00:00
Simon Pilgrim	9cb018b6b6	[X86][SSE] Replace 128-bit SSE41 PMOVSX intrinsics with native IR This patches removes the x86.sse41.pmovsx* intrinsics, provides a suitable upgrade path and updates relevant tests to sign extend a subvector instead. LLVM counterpart to D12835 Differential Revision: http://reviews.llvm.org/D13002 llvm-svn: 248368	2015-09-23 08:48:33 +00:00
Cong Hou	b54a72ef78	Add a test case for the fix of profile update issue when lowering switch statement. llvm-svn: 248356	2015-09-23 00:34:56 +00:00
Stephen Canon	8216d88511	Don't raise inexact when lowering ceil, floor, round, trunc. The C standard has historically not specified whether or not these functions should raise the inexact flag. Traditionally on Darwin, these functions did raise inexact, and the llvm lowerings followed that conventions. n1778 (C bindings for IEEE-754 (2008)) clarifies that these functions should not set inexact. This patch brings the lowerings for arm64 and x86 in line with the newly specified behavior. This also lets us fold some logic into TD patterns, which is nice. Differential Revision: http://reviews.llvm.org/D12969 llvm-svn: 248266	2015-09-22 11:43:17 +00:00
Simon Pilgrim	1cad0cd3ce	[X86][SSE] Match zero/any extension shuffles that don't start from the first element This patch generalizes the lowering of shuffles as zero extensions to allow extensions that don't start from the first element. It now recognises extensions starting anywhere in the lower 128-bits or at the start of any higher 128-bit lane. The motivation was to reduce the number of high cost pshufb calls, but it also improves the SSE2 case as well. Differential Revision: http://reviews.llvm.org/D12561 llvm-svn: 248250	2015-09-22 08:16:08 +00:00
Simon Pilgrim	4003ed2da3	[DAGCombiner] Improve FMA support for interpolation patterns This patch adds support for combining patterns such as (FMUL(FADD(1.0, x), y)) and (FMUL(FSUB(x, 1.0), y)) to their FMA equivalents. This is useful in particular for linear interpolation cases such as (FADD(FMUL(x, t), FMUL(y, FSUB(1.0, t)))) Differential Revision: http://reviews.llvm.org/D13003 llvm-svn: 248210	2015-09-21 20:32:48 +00:00
Matt Arsenault	b774834429	DAGCombiner: Replace store of FP constant after attemping store merges If storing multiple FP constants, some subset of the stores would be replaced with integers due to visit order, so MergeConsecutiveStores would only partially merge these. llvm-svn: 248169	2015-09-21 15:59:46 +00:00
Sanjay Patel	bab5d6c636	add test file ahead of any functional changes for PR22428 llvm-svn: 248123	2015-09-20 15:58:00 +00:00
Simon Pilgrim	c6a553241c	[X86][SSE] Intrinsics builtins test refresh. NFCI llvm-svn: 248122	2015-09-20 15:41:35 +00:00
Igor Breger	b7e1f9d680	AVX512: Implemented encoding and intrinsics for vcmpss/sd. Added tests for intrinsics and encoding. Differential Revision: http://reviews.llvm.org/D12593 llvm-svn: 248121	2015-09-20 15:15:10 +00:00
Asaf Badouh	2744d21fb8	[X86][AVX512] extend support in Scalar conversion add scalar FP to Int conversion with truncation intrinsics add scalar conversion FP32 from/to FP64 intrinsics add rounding mode and SAE mode encoding for these intrinsics Differential Revision: http://reviews.llvm.org/D12665 llvm-svn: 248117	2015-09-20 14:31:19 +00:00
Igor Breger	4c4cd789c9	AVX512: vsqrtss/sd encoding and intrinsics implementation. Added tests for intrinsics and encoding. Differential Revision: http://reviews.llvm.org/D12102 llvm-svn: 248116	2015-09-20 09:13:41 +00:00
Asaf Badouh	572bbceecc	[X86][AVX512DQ] Add fpclass instruction Differential Revision: http://reviews.llvm.org/D12931 llvm-svn: 248115	2015-09-20 08:46:07 +00:00
Michael Kuperstein	58e86bc893	[X86] Fix sitofp and uitofp instruction matching failures with long double and avx512 The operation action for i32 and i64 cannot be set to legal, as long double needs custom lowering. Patch by: mitch.l.bodart@intel.com Differential Revision: http://reviews.llvm.org/D12372 llvm-svn: 248114	2015-09-20 08:12:17 +00:00
Igor Breger	1d55f20bee	AVX512: Implemented intrinsics for vshuff32x4, vshuff64x2, vshufi64x2, vshufi32x4 Added tests for intrinsics. Differential Revision: http://reviews.llvm.org/D12525 llvm-svn: 248113	2015-09-20 07:18:53 +00:00
Igor Breger	0ede3cbb5c	AVX512: Implement instructions encoding, lowering and intrinsics vinserti64x4, vinserti64x2, vinserti32x8, vinserti32x4, vinsertf64x4, vinsertf64x2, vinsertf32x8, vinsertf32x4 Added tests for encoding, lowering and intrinsics. Differential Revision: http://reviews.llvm.org/D11893 llvm-svn: 248111	2015-09-20 06:52:42 +00:00
Simon Pilgrim	27f81776ad	[X86][AVX2] Use general sext IR for vpmovsx stack folding tests llvm-svn: 248093	2015-09-19 17:04:18 +00:00
Simon Pilgrim	d0448ee59f	[X86][SSE] Vectorize CTTZ + CTTZ_ZERO_UNDEF Now that we have fast vector CTPOP implementations we can use this to speed up vector CTTZ using the pattern (cttz(x) = ctpop((x & -x) - 1)) Additionally, for AVX512CD that provides lzcnt instructions we can use the pattern (cttz_undef(x) = (width - 1) - ctlz(x & -x)) Differential Revision: http://reviews.llvm.org/D12663 llvm-svn: 248091	2015-09-19 13:22:57 +00:00
Quentin Colombet	b4c6886215	[ShrinkWrap] Refactor the handling of infinite loop in the analysis. - Strenghten the logic to be sure we hoist the restore point out of the current loop. (The fixes a bug with infinite loop, added as part of the patch.) - Walk over the exit blocks of the current loop to conver to the desired restore point in one iteration of the update loop. llvm-svn: 247958	2015-09-17 23:21:34 +00:00
David Majnemer	163b7f121c	[WinEH] Fix tests broken by funclet-layout llvm-svn: 247944	2015-09-17 21:11:12 +00:00
David Majnemer	978902309a	[WinEH] Add a funclet layout pass Windows EH funclets need to be contiguous. The FuncletLayout pass will ensure that the funclets are together and begin with a funclet entry MBB. Differential Revision: http://reviews.llvm.org/D12943 llvm-svn: 247937	2015-09-17 20:45:18 +00:00
Reid Kleckner	5b8a46e771	[WinEH] Make funclet return instrs pseudo instrs This makes catchret look more like a branch, and less like a weird use of BlockAddress. It also lets us get away from llvm.x86.seh.restoreframe, which relies on the old parentfpoffset label arithmetic. llvm-svn: 247936	2015-09-17 20:43:47 +00:00
Reid Kleckner	b78585b3c2	Fix the test case I just committed llvm-svn: 247905	2015-09-17 17:21:45 +00:00
Reid Kleckner	ed17079b52	[WinEH] Add and use hasEHPadSuccessor instead of getLandingPadSuccessor getLandingPadSuccessor assumes that each invoke can have at most one EH pad successor, but WinEH invokes can have more than one. Two out of three callers of getLandingPadSuccessor don't use the returned landingpad, so we can make them use this simple predicate instead. Eventually we'll have to circle back and fix SplitKit.cpp so that register allocation works. Baby steps. llvm-svn: 247904	2015-09-17 17:19:40 +00:00
Elena Demikhovsky	702a6adfaa	AVX-512: shufflevector for i1 vectors <2 x i1> .. <64 x i1> AVX-512 does not provide an instruction that shuffles mask register. So I do the following way: mask-2-simd , shuffle simd , simd-2-mask Differential Revision: http://reviews.llvm.org/D12727 llvm-svn: 247876	2015-09-17 06:53:12 +00:00
Reid Kleckner	813f1b65bc	[WinEH] Rip out the landingpad-based C++ EH state numbering code It never really worked, and the new code is working better every day. llvm-svn: 247860	2015-09-16 22:14:46 +00:00
Reid Kleckner	b005d281c3	[WinEH] Pull Adjectives and CatchObj out of the catchpad arg list Clang now passes the adjectives as an argument to catchpad. Getting the CatchObj working is simply a matter of threading another static alloca through codegen, first as an alloca, then as a frame index, and finally as a frame offset. llvm-svn: 247844	2015-09-16 20:16:27 +00:00
Sanjay Patel	a260701bbb	propagate fast-math-flags on DAG nodes After D10403, we had FMF in the DAG but disabled by default. Nick reported no crashing errors after some stress testing, so I enabled them at r243687. However, Escha soon notified us of a bug not covered by any in-tree regression tests: if we don't propagate the flags, we may fail to CSE DAG nodes because differing FMF causes them to not match. There is one test case in this patch to prove that point. This patch hopes to fix or leave a 'TODO' for all of the in-tree places where we create nodes that are FMF-capable. I did this by putting an assert in SelectionDAG.getNode() to find any FMF-capable node that was being created without FMF ( D11807 ). I then ran all regression tests and test-suite and confirmed that everything passes. This patch exposes remaining work to get DAG FMF to be fully functional: (1) add the flags to non-binary nodes such as FCMP, FMA and FNEG; (2) add the flags to intrinsics; (3) use the flags as conditions for transforms rather than the current global settings. Differential Revision: http://reviews.llvm.org/D12095 llvm-svn: 247815	2015-09-16 16:31:21 +00:00
Michael Kuperstein	d926465342	[X86] Do not generate 64-bit pops of 32-bit GPRs. When trying emit a stack adjustments using pops, frame lowering selects an arbitrary free GPR. It should always select one from an appropriate class... This fixes PR24649. Patch by: amjad.aboud@intel.com Differential Revision: http://reviews.llvm.org/D12609 llvm-svn: 247785	2015-09-16 11:27:20 +00:00
Mehdi Amini	d178f4fc89	Make the default triple optional by allowing an empty string When building LLVM as a (potentially dynamic) library that can be linked against by multiple compilers, the default triple is not really meaningful. We allow to explicitely set it to an empty string when configuring LLVM. In this case, said "target independent" tests in the test suite that are using the default triple are disabled by matching the newly available feature "default_triple". Reviewers: probinson, echristo Differential Revision: http://reviews.llvm.org/D12660 From: Mehdi Amini <mehdi.amini@apple.com> llvm-svn: 247775	2015-09-16 05:34:32 +00:00
Quentin Colombet	7b13976254	[ShrinkWrapping] Add a test case for r247710. llvm-svn: 247713	2015-09-15 18:51:43 +00:00
Simon Pilgrim	f8f86ab176	[X86][MMX] Added shuffle decodes for MMX/3DNow! shuffles. Added shuffle decodes for MMX PUNPCK + PSHUFW shuffles. Added shuffle decodes for 3DNow! PSWAPD shuffles. llvm-svn: 247526	2015-09-13 11:28:45 +00:00
Elena Demikhovsky	8671fcbbd6	AVX-512: Fixed a bug in OR/XOR operations for 512-bit FP values on KNL. KNL does not have VXORPS, VORPS for 512-bit values. I use integer VPXOR, VPOR that actually do the same. X86ISD::FXOR/FOR are generated as a result of FSUB combining. Differential Revision: http://reviews.llvm.org/D12753 llvm-svn: 247523	2015-09-13 08:15:15 +00:00
Sanjay Patel	8b960d22ad	[x86] enable machine combiner reassociations for 128-bit vector logical integer insts (2nd try) The changes in: test/CodeGen/X86/machine-cp.ll are just due to scheduling differences after some logic instructions were reassociated. llvm-svn: 247516	2015-09-12 19:47:50 +00:00
Simon Pilgrim	42c834bad5	[X86] Added i1 vector sextload tests llvm-svn: 247509	2015-09-12 15:36:41 +00:00
Simon Pilgrim	779bcf3e3d	[X86][FMA] Refreshed fma tests llvm-svn: 247508	2015-09-12 15:33:05 +00:00
Sanjay Patel	99f7370a79	revert r247506; need to verify changes in existing tests llvm-svn: 247507	2015-09-12 15:27:31 +00:00
Sanjay Patel	08755c7dbc	[x86] enable machine combiner reassociations for 128-bit vector logical integer insts llvm-svn: 247506	2015-09-12 14:58:04 +00:00
Simon Pilgrim	8ee50e9cff	[X86][SSE] Use general sext IR for (v)pmovsx stack folding tests llvm-svn: 247502	2015-09-12 11:45:24 +00:00
Akira Hatanaka	bc497c93f5	Use function attribute "stackrealign" to decide whether stack realignment should be forced. With this commit, we can now force stack realignment when doing LTO and do so on a per-function basis. Also, add a new cl::opt option "stackrealign" to CommandFlags.h which is used to force stack realignment via llc's command line. Out-of-tree projects currently using -force-align-stack to force stack realignment should make changes to attach the attribute to the functions in the IR. Differential Revision: http://reviews.llvm.org/D11814 llvm-svn: 247450	2015-09-11 18:54:38 +00:00
David Majnemer	0e70598a5b	[X86] Make sure startproc/endproc are paired We used different conditions to determine if we should emit startproc vs endproc. Use the same condition to ensure that they will always be paired. This fixes PR24374. llvm-svn: 247435	2015-09-11 17:34:34 +00:00
David Blaikie	2f40830dde	[opaque pointer type] Add textual IR support for explicit type parameter for global aliases update.py: import fileinput import sys import re alias_match_prefix = r"(.(?:=\|:\|^)\s(?:external \|)(?:(?:private\|internal\|linkonce\|linkonce_odr\|weak\|weak_odr\|common\|appending\|extern_weak\|available_externally) )?(?:default \|hidden \|protected )?(?:dllimport \|dllexport )?(?:unnamed_addr \|)(?:thread_local(?:$[a-z]$)? )?alias" plain = re.compile(alias_match_prefix + r" (.?))(\| addrspace$\d+$ )\($\| (?:%\|@\|null\|undef\|blockaddress\|addrspacecast\|\[\[[a-zA-Z]\|\{\{).$)") cast = re.compile(alias_match_prefix + r") ((?:bitcast\|inttoptr\|addrspacecast)\s$. to (.?)(\| addrspace\(\d+$ )\\)\s(?:;.)?$)") gep = re.compile(alias_match_prefix + r") ((?:getelementptr)\s(?:inbounds)?\s$(?P<type>.), (?P=type)(?:\saddrspace\(\d+$\s)?\* .\)\s(?:;.)?$)") def conv(line): m = re.match(cast, line) if m: return m.group(1) + " " + m.group(3) + ", " + m.group(2) m = re.match(gep, line) if m: return m.group(1) + " " + m.group(3) + ", " + m.group(2) m = re.match(plain, line) if m: return m.group(1) + ", " + m.group(2) + m.group(3) + "" + m.group(4) + "\n" return line for line in sys.stdin: sys.stdout.write(conv(line)) apply.sh: for name in "$@" do python3 `dirname "$0"`/update.py < "$name" > "$name.tmp" && mv "$name.tmp" "$name" rm -f "$name.tmp" done The actual commands: From llvm/src: find test/ -name .ll \| xargs ./apply.sh From llvm/src/tools/clang: find test/ -name .mm -o -name .m -o -name .cpp -o -name .c \| xargs -I '{}' ../../apply.sh "{}" From llvm/src/tools/polly: find test/ -name .ll \| xargs ./apply.sh llvm-svn: 247378	2015-09-11 03:22:04 +00:00
Reid Kleckner	da6dcc5d92	[WinEH] Push and pop EBP for 32-bit funclets The Win32 EH runtime caller does not preserve EBP, even though it does preserve the CSRs (EBX, ESI, EDI) for us. The result was that each finally funclet call would leave the frame pointer off by 12 bytes. llvm-svn: 247348	2015-09-10 22:00:02 +00:00
Reid Kleckner	7bb20bd69e	Fix SEH state numbering algorithm to handle cleanupendpads WinEHPrepare's new coloring algorithm really expects to see cleanupendpads now, so Clang will start emitting them soon. llvm-svn: 247341	2015-09-10 21:46:36 +00:00
David Blaikie	a7970f3cf8	Tidy up some alias syntax to make explicit pointer type migration easier llvm-svn: 247312	2015-09-10 18:03:45 +00:00
Igor Breger	7f69a99c54	AVX512: Implemented encoding and intrinsics for vextracti64x4 ,vextracti64x2, vextracti32x8, vextracti32x4, vextractf64x4, vextractf64x2, vextractf32x8, vextractf32x4 Added tests for intrinsics and encoding. Differential Revision: http://reviews.llvm.org/D11802 llvm-svn: 247276	2015-09-10 12:54:54 +00:00
Reid Kleckner	7878391208	[WinEH] Add codegen support for cleanuppad and cleanupret All of the complexity is in cleanupret, and it mostly follows the same codepaths as catchret, except it doesn't take a return value in RAX. This small example now compiles and executes successfully on win32: extern "C" int printf(const char *, ...) noexcept; struct Dtor { ~Dtor() { printf("~Dtor\n"); } }; void has_cleanup() { Dtor o; throw 42; } int main() { try { has_cleanup(); } catch (int) { printf("caught it\n"); } } Don't try to put the cleanup in the same function as the catch, or Bad Things will happen. llvm-svn: 247219	2015-09-10 00:25:23 +00:00
Reid Kleckner	94b704c469	[SEH] Emit 32-bit SEH tables for the new EH IR The 32-bit tables don't actually contain PC range data, so emitting them is incredibly simple. The 64-bit tables, on the other hand, use the same table for state numbering as well as label ranges. This makes things more difficult, so it will be implemented later. llvm-svn: 247192	2015-09-09 21:10:03 +00:00
Renato Golin	db7ea86bf4	Revert "AVX512: Implemented encoding and intrinsics for vextracti64x4 ,vextracti64x2, vextracti32x8, vextracti32x4, vextractf64x4, vextractf64x2, vextractf32x8, vextractf32x4 Added tests for intrinsics and encoding." This reverts commit r247149, as it was breaking numerous buildbots of varied architectures. llvm-svn: 247177	2015-09-09 19:44:40 +00:00
Igor Breger	ac29a82921	AVX512: Implemented encoding and intrinsics for vextracti64x4 ,vextracti64x2, vextracti32x8, vextracti32x4, vextractf64x4, vextractf64x2, vextractf32x8, vextractf32x4 Added tests for intrinsics and encoding. Differential Revision: http://reviews.llvm.org/D11802 llvm-svn: 247149	2015-09-09 14:35:09 +00:00
Reid Kleckner	51189f0a1d	[WinEH] Avoid creating MBBs for LLVM BBs that cannot contain code Typically these are catchpads, which hold data used to decide whether to catch the exception or continue unwinding. We also shouldn't create MBBs for catchendpads, cleanupendpads, or terminatepads, since no real code can live in them. This fixes a problem where MI passes (like the register allocator) would try to put code into catchpad blocks, which are not executed by the runtime. In the new world, blocks ending in invokes now have many possible successors. llvm-svn: 247102	2015-09-08 23:28:38 +00:00
Reid Kleckner	df1295173f	[WinEH] Emit prologues and epilogues for funclets Summary: 32-bit funclets have short prologues that allocate enough stack for the largest call in the whole function. The runtime saves CSRs for the funclet. It doesn't restore CSRs after we finally transfer control back to the parent funciton via a CATCHRET, but that's a separate issue. 32-bit funclets also have to adjust the incoming EBP value, which is what llvm.x86.seh.recoverframe does in the old model. 64-bit funclets need to spill CSRs as normal. For simplicity, this just spills the same set of CSRs as the parent function, rather than trying to compute different CSR sets for the parent function and each funclet. 64-bit funclets also allocate enough stack space for the largest outgoing call frame, like 32-bit. Reviewers: majnemer Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D12546 llvm-svn: 247092	2015-09-08 22:44:41 +00:00
Derek Schuff	45c832c5d8	Fix comments and RUN line in x86-64 stdarg test leftover from last commit From http://reviews.llvm.org/D12346 llvm-svn: 247070	2015-09-08 20:58:41 +00:00
Derek Schuff	eef533f422	x32. Fixes a bug in how struct va_list is initialized in x32 Summary: This patch modifies X86TargetLowering::LowerVASTART so that struct va_list is initialized with 32 bit pointers in x32. It also includes tests that call @llvm.va_start() for x32. Patch by João Porto Subscribers: llvm-commits, hjl.tools Differential Revision: http://reviews.llvm.org/D12346 llvm-svn: 247069	2015-09-08 20:51:31 +00:00
Derek Schuff	ee4e947e23	x32. Fixes a bug in i8mem_NOREX declaration. The old implementation assumed LP64 which is broken for x32. Specifically, the MOVE8rm_NOREX and MOVE8mr_NOREX, when selected, would cause a 'Cannot emit physreg copy instruction' error message to be reported. This patch also enable the h-register*ll tests for x32. Differential Revision: http://reviews.llvm.org/D12336 Patch by João Porto llvm-svn: 247058	2015-09-08 19:47:15 +00:00
Igor Breger	a54a1a84dd	AVX512: kunpck encoding implementation Added tests for encoding. Differential Revision: http://reviews.llvm.org/D12061 llvm-svn: 247010	2015-09-08 13:10:00 +00:00
Elena Demikhovsky	e88038f235	AVX-512: Lowering for 512-bit vector shuffles. Vector types: <8 x 64>, <16 x 32>, <32 x 16> float and integer. Differential Revision: http://reviews.llvm.org/D10683 llvm-svn: 246981	2015-09-08 06:38:21 +00:00
Simon Pilgrim	15fc134865	[X86][MMX] Added missing stack folding tests for MMX/SSSE3 instructions llvm-svn: 246949	2015-09-06 17:50:15 +00:00
Simon Pilgrim	b38c09d7ff	[X86][AVX512] Added 512-bit vector shift tests. Only works for avx512f (dq) targets so far - need to add avx512bw tests once char/short shifts are supported. llvm-svn: 246943	2015-09-06 13:36:32 +00:00
Simon Pilgrim	a1ceba8ab4	[X86] Updated vector lzcnt tests. Added missing vec512 tests. llvm-svn: 246927	2015-09-05 11:56:30 +00:00
Simon Pilgrim	5cce4381ab	[X86] Updated vector tzcnt tests. Added vec512 tests. llvm-svn: 246922	2015-09-05 10:19:07 +00:00
Simon Pilgrim	ba8ae5a814	[X86] Updated vector popcnt tests. Added vec512 tests. llvm-svn: 246921	2015-09-05 09:59:59 +00:00
Simon Pilgrim	0fccef6ac2	[X86][AVX] Test tidyup + regeneration. NFCI. llvm-svn: 246863	2015-09-04 19:47:56 +00:00
Sanjay Patel	c9ae9d72f8	[x86] enable machine combiner reassociations for scalar 'xor' insts llvm-svn: 246781	2015-09-03 16:36:16 +00:00
Sanjay Patel	ce74db9d8d	check for fastness before merging in DAGCombiner::MergeConsecutiveStores() Use and check the 'IsFast' optional parameter to TLI.allowsMemoryAccess() any time we have a merged access candidate. Without this patch, we were generating unaligned 16-byte (SSE) memops for x86 targets where those accesses are slow. This change was mentioned in: http://reviews.llvm.org/D10662 and http://reviews.llvm.org/D10905 and will help solve PR21711. Differential Revision: http://reviews.llvm.org/D12573 llvm-svn: 246771	2015-09-03 15:03:19 +00:00
Igor Breger	0dcd8bcf24	AVX512: Implemented encoding and intrinsics for vplzcntq, vplzcntd, vpconflictq, vpconflictd Added tests for intrinsics and encoding. Differential Revision: http://reviews.llvm.org/D11931 llvm-svn: 246750	2015-09-03 09:05:31 +00:00
Ahmed Bougacha	b03ea02479	[X86] Require 32-byte alignment for 32-byte VMOVNTs. We used to accept (and even test, and generate) 16-byte alignment for 32-byte nontemporal stores, but they require 32-byte alignment, per SDM. Found by inspection. Instead of hardcoding 16 in the patfrag, check for natural alignment. Also fix the autoupgrade and the various tests. Also, use explicit -mattr instead of -mcpu: I stared at the output several minutes wondering why I get 2x movntps for the unaligned case (which is the ideal output, but needs some work: see FIXME), until I remembered corei7-avx implies +slow-unaligned-mem-32. llvm-svn: 246733	2015-09-02 23:25:39 +00:00
Ahmed Bougacha	b09c543538	[X86] Cleanup nontemporal tests a little. NFC. Also: add a missing test for movntiq. llvm-svn: 246730	2015-09-02 22:47:09 +00:00
Sanjay Patel	fff7c6dc73	use "unpredictable" metadata in SelectionDAG when splitting compares This patch uses the metadata defined in D12341 to avoid creating an unpredictable branch. Differential Revision: http://reviews.llvm.org/D12343 llvm-svn: 246691	2015-09-02 19:17:25 +00:00
Sanjay Patel	fbcd189f8a	[x86] fix allowsMisalignedMemoryAccesses() for 8-byte and smaller accesses This is a continuation of the fix from: http://reviews.llvm.org/D10662 and discussion in: http://reviews.llvm.org/D12154 Here, we distinguish slow unaligned SSE (128-bit) accesses from slow unaligned scalar (64-bit and under) accesses. Other lowering (eg, getOptimalMemOpType) assumes that unaligned scalar accesses are always ok, so this changes allowsMisalignedMemoryAccesses() to match that behavior. Differential Revision: http://reviews.llvm.org/D12543 llvm-svn: 246658	2015-09-02 15:42:49 +00:00
Asaf Badouh	d2c3599c5f	[X86][AVX512VLBW] add support in byte shift and SAD add byte shift left/right add SAD - compute sum of absolute differences Differential Revision: http://reviews.llvm.org/D12479 llvm-svn: 246654	2015-09-02 14:21:54 +00:00
Igor Breger	1e58e8adf6	AVX512: Implemented encoding and intrinsics for VGETMANTPD/S , VGETMANTSD/S instructions Added tests for intrinsics and encoding. Differential Revision: http://reviews.llvm.org/D11593 llvm-svn: 246642	2015-09-02 11:18:55 +00:00
Igor Breger	a6297c701e	AVX512: Implemented encoding and intrinsics for vshufps/d. Added tests for intrinsics and encoding. Differential Revision: http://reviews.llvm.org/D11709 llvm-svn: 246640	2015-09-02 10:50:58 +00:00
Elena Demikhovsky	9f83c7346f	AVX-512: store <4 x i1> and <2 x i1> values in memory Enabled DAG pattern lowering for SKX with DQI predicate. Differential Revision: http://reviews.llvm.org/D12550 llvm-svn: 246625	2015-09-02 09:20:58 +00:00
Elena Demikhovsky	1b9d6914d3	Optimization for Gather/Scatter with uniform base Vector 'getelementptr' with scalar base is an opportunity for gather/scatter intrinsic to generate a better sequence. While looking for uniform base, we want to use the scalar base pointer of GEP, if exists. Differential Revision: http://reviews.llvm.org/D11121 llvm-svn: 246622	2015-09-02 08:39:13 +00:00
Vedant Kumar	b5c2fd7257	[CodeGen] Fix FREM on 32-bit MSVC on x86 Patch by Dylan McKay! Differential Revision: http://reviews.llvm.org/D12099 llvm-svn: 246615	2015-09-02 01:31:58 +00:00
Igor Breger	f6f1bb6ddc	AVX512: Implemented intrinsics for valign. Differential Revision: http://reviews.llvm.org/D12526 llvm-svn: 246551	2015-09-01 15:27:18 +00:00
Sanjay Patel	c413558842	use CHECK-LABEL for more precision llvm-svn: 246547	2015-09-01 14:35:05 +00:00
Cong Hou	511298b919	Distribute the weight on the edge from switch to default statement to edges generated in lowering switch. Currently, when edge weights are assigned to edges that are created when lowering switch statement, the weight on the edge to default statement (let's call it "default weight" here) is not considered. We need to distribute this weight properly. However, without value profiling, we have no idea how to distribute it. In this patch, I applied the heuristic that this weight is evenly distributed to successors. For example, given a switch statement with cases 1,2,3,5,10,11,20, and every edge from switch to each successor has weight 10. If there is a binary search tree built to test if n < 10, then its two out-edges will have weight 4x10+10/2 = 45 and 3x10 + 10/2 = 35 respectively (currently they are 40 and 30 without considering the default weight). Each distribution (which is 5 here) will be stored in each SwitchWorkListItem for further distribution. There are some exceptions: For a jump table header which doesn't have any edge to default statement, we don't distribute the default weight to it. For a bit test header which covers a contiguous range and hence has no edges to default statement, we don't distribute the default weight to it. When the branch checks a single value or a contiguous range with no edge to default statement, we don't distribute the default weight to it. In other cases, the default weight is evenly distributed to successors. Differential Revision: http://reviews.llvm.org/D12418 llvm-svn: 246522	2015-09-01 01:42:16 +00:00
Sanjay Patel	989364c101	remove unnecessary/conflicting target info llvm-svn: 246514	2015-09-01 00:27:36 +00:00
Sanjay Patel	e554d59eba	fixed test to specify triple rather than arch and CPU llvm-svn: 246513	2015-09-01 00:25:23 +00:00
Sanjay Patel	d9a5c225d1	[x86] enable machine combiner reassociations for scalar 'or' insts llvm-svn: 246481	2015-08-31 20:27:03 +00:00
Reid Kleckner	e00faf8ce1	[EH] Handle non-Function personalities like unknown personalities Also delete and simplify a lot of MachineModuleInfo code that used to be needed to handle personalities on landingpads. Now that the personality is on the LLVM Function, we no longer need to track it this way on MMI. Certainly it should not live on LandingPadInfo. llvm-svn: 246478	2015-08-31 20:02:16 +00:00
Igor Breger	f3ded811b2	AVX512: Implemented encoding and intrinsics for vdbpsadbw Added tests for intrinsics and encoding. Differential Revision: http://reviews.llvm.org/D12491 llvm-svn: 246436	2015-08-31 13:09:30 +00:00
Igor Breger	2ae0fe3ac3	AVX512: Implemented encoding and intrinsics for vpalignr Added tests for intrinsics and encoding. Differential Revision: http://reviews.llvm.org/D12270 llvm-svn: 246428	2015-08-31 11:14:02 +00:00
Duncan P. N. Exon Smith	814b8e91c7	DI: Require subprogram definitions to be distinct As a follow-up to r246098, require `DISubprogram` definitions (`isDefinition: true`) to be 'distinct'. Specifically, add an assembler check, a verifier check, and bitcode upgrading logic to combat testcase bitrot after the `DIBuilder` change. While working on the testcases, I realized that test/Linker/subprogram-linkonce-weak-odr.ll isn't relevant anymore. Its purpose was to check for a corner case in PR22792 where two subprogram definitions match exactly and share the same metadata node. The new verifier check, requiring that subprogram definitions are 'distinct', precludes that possibility. I updated almost all the IR with the following script: git grep -l -E -e '= !DISubprogram$.* isDefinition: true' \| grep -v test/Bitcode \| xargs sed -i '' -e 's/= \(!DISubprogram(.*, isDefinition: true$/= distinct \1/' Likely some variant of would work for out-of-tree testcases. llvm-svn: 246327	2015-08-28 20:26:49 +00:00
David Majnemer	9c51053dd5	Test case for r246304. llvm-svn: 246306	2015-08-28 17:19:54 +00:00
Sanjay Patel	7c912898a5	[x86] enable machine combiner reassociations for scalar 'and' insts llvm-svn: 246300	2015-08-28 14:09:48 +00:00
Reid Kleckner	0e2882345d	[WinEH] Add some support for code generating catchpad We can now run 32-bit programs with empty catch bodies. The next step is to change PEI so that we get funclet prologues and epilogues. llvm-svn: 246235	2015-08-27 23:27:47 +00:00
Ahmed Bougacha	87166905c8	[CodeGen] Check FoldConstantArithmetic result before using it. Fixes PR24602: r245689 introduced an unguarded use of SelectionDAG::FoldConstantArithmetic, which returns 0 when it fails because of opaque (hoisted) constants. llvm-svn: 246217	2015-08-27 21:46:04 +00:00
Cong Hou	08cb4fc688	Fixed a bug that edge weights are not assigned correctly when lowering switch statement. This is a one-line-change patch that moves the update to UnhandledWeights to the correct position: it should be updated for all clusters instead of just range clusters. Differential Revision: http://reviews.llvm.org/D12391 llvm-svn: 246129	2015-08-27 00:37:40 +00:00
Cong Hou	03127700d5	Assign weights to edges to jump table / bit test header when lowering switch statement. Currently, when lowering switch statement and a new basic block is built for jump table / bit test header, the edge to this new block is not assigned with a correct weight. This patch collects the edge weight from all its successors and assign this sum of weights to the edge (and also the other fall-through edge). Test cases are adjusted accordingly. Differential Revision: http://reviews.llvm.org/D12166#fae6eca7 llvm-svn: 246104	2015-08-26 23:15:32 +00:00
Matthias Braun	4e7ded834f	SelectionDAGBuilder: Fix SPDescriptor not resetting GuardReg This was causing problems when some functions use a GuardReg and some don't as can happen when mixing SelectionDAG and FastISel generated functions. llvm-svn: 246075	2015-08-26 20:46:52 +00:00
Matthias Braun	4816b18d86	FastISel: Avoid adding a successor block twice for degenerate IR. This fixes http://llvm.org/PR24581 Differential Revision: http://reviews.llvm.org/D12350 llvm-svn: 246074	2015-08-26 20:46:49 +00:00
Charles Davis	119525914c	Make variable argument intrinsics behave correctly in a Win64 CC function. Summary: This change makes the variable argument intrinsics, `llvm.va_start` and `llvm.va_copy`, and the `va_arg` instruction behave as they do on Windows inside a `CallingConv::X86_64_Win64` function. It's needed for a Clang patch I have to add support for GCC's `__builtin_ms_va_list` constructs. Reviewers: nadav, asl, eugenis CC: llvm-commits Differential Revision: http://llvm-reviews.chandlerc.com/D1622 llvm-svn: 245990	2015-08-25 23:27:41 +00:00
Cong Hou	cd59591396	Remove the final bit test during lowering switch statement if all cases in bit test cover a contiguous range. When lowering switch statement, if bit tests are used then LLVM will always generates a jump to the default statement in the last bit test. However, this is not necessary when all cases in bit tests cover a contiguous range. This is because when generating the bit tests header MBB, there is a range check that guarantees cases in bit tests won't go outside of [low, high], where low and high are minimum and maximum case values in the bit tests. This patch checks if this is the case and then doesn't emit jump to default statement and hence saves a bit test and a branch. Differential Revision: http://reviews.llvm.org/D12249 llvm-svn: 245976	2015-08-25 21:34:38 +00:00
Sanjay Patel	deb8f826a5	make fast unaligned memory accesses implicit with SSE4.2 or SSE4a This is a follow-on from the discussion in http://reviews.llvm.org/D12154. This change allows memset/memcpy to use SSE or AVX memory accesses for any chip that has generally fast unaligned memory ops. A motivating use case for this change is a clang invocation that doesn't explicitly set the CPU, but does target a feature that we know only exists on a CPU that supports fast unaligned memops. For example: $ clang -O1 foo.c -mavx This resolves a difference in lowering noted in PR24449: https://llvm.org/bugs/show_bug.cgi?id=24449 Before this patch, we used different store types depending on whether the example can be lowered as a memset or not. Differential Revision: http://reviews.llvm.org/D12288 llvm-svn: 245950	2015-08-25 16:29:21 +00:00
Michael Kuperstein	6e3fee07f7	[X86] Remove references to _ftol2 As of r245924, _ftol2 is no longer used for fptoui on MS platforms. Remove the dead code associated with it. llvm-svn: 245925	2015-08-25 07:58:33 +00:00
Michael Kuperstein	8515893be8	[X86] Fix fptoui conversions This fixes two issues in x86 fptoui lowering. 1) Makes conversions from f80 go through the right path on AVX-512. 2) Implements an inline sequence for fptoui i64 instead of a library call. This improves performance by 6X on SSE3+ and 3X otherwise. Incidentally, it also removes the use of ftol2 for fptoui, which was wrong to begin with, as ftol2 converts to a signed i64, producing wrong results for values >= 2^63. Patch by: mitch.l.bodart@intel.com Differential Revision: http://reviews.llvm.org/D11316 llvm-svn: 245924	2015-08-25 07:42:09 +00:00
Steve King	5cdbd20cc3	Pass function attributes instead of boolean in isIntDivCheap(). llvm-svn: 245921	2015-08-25 02:31:21 +00:00
Simon Pilgrim	342f60384e	[X86][SSE] Added tests for zero-extension vector shuffles that don't extend starting from the 0'th lane. llvm-svn: 245878	2015-08-24 21:28:13 +00:00
Sanjay Patel	453b4df973	remove FIXME; fixed by r245733 llvm-svn: 245819	2015-08-23 20:43:25 +00:00
Simon Pilgrim	2a7049abe0	[DAGCombiner] Fold CONCAT_VECTORS of bitcasted EXTRACT_SUBVECTOR Minor generalization of D12125 - peek through any bitcast to the original vector that we're extracting from. llvm-svn: 245814	2015-08-23 15:22:14 +00:00
Sanjay Patel	f0bc07f7a5	[x86] enable machine combiner reassociations for 256-bit vector min/max llvm-svn: 245735	2015-08-21 21:04:21 +00:00
Sanjay Patel	dddad10241	remove 'FeatureSlowUAMem' from AMD CPUs based on 10H micro-arch or later See discussion in D12154 ( http://reviews.llvm.org/D12154 ), AMD Software Optimization Guides for 10H/12H/15H/16H, and Agner Fog's experimental data. llvm-svn: 245733	2015-08-21 20:39:17 +00:00
Sanjay Patel	cf942fa905	[x86] enable machine combiner reassociations for 128-bit vector min/max llvm-svn: 245715	2015-08-21 18:06:49 +00:00
Sanjay Patel	c9f4aa6b8c	save some testing time; get rid of the non-SSE chips in this test It doesn't matter what slow/fast unaligned attribute the old chips have - they can't use anything more than 4-byte stores. llvm-svn: 245709	2015-08-21 17:16:51 +00:00
Sanjay Patel	44d7bef0fa	add a test case to check the fast-unaligned-mem attribute per CPU This will confirm that the patch in D12154 is actually NFC. It will also confirm that the proposed changes for the AMD chips are behaving as expected. llvm-svn: 245704	2015-08-21 16:08:26 +00:00
Ahmed Bougacha	0cdc7719f0	[X86] Look for scalar through one bitcast when lowering to VBROADCAST. Fixes PR23464: one way to use the broadcast intrinsics is: _mm256_broadcastw_epi16(_mm_cvtsi32_si128((int)src)); We don't currently fold this, but now that we use native IR for the intrinsics (r245605), we can look through one bitcast to find the broadcast scalar. Differential Revision: http://reviews.llvm.org/D10557 llvm-svn: 245613	2015-08-20 21:02:39 +00:00
Ahmed Bougacha	69a17acb74	[X86] Add some broadcast-from-memory tests. llvm-svn: 245612	2015-08-20 20:59:41 +00:00
Ahmed Bougacha	1a498705e4	[X86] Replace avx2 broadcast intrinsics with native IR. Since r245605, the clang headers don't use these anymore. r245165 updated some of the tests already; update the others, add an autoupgrade, remove the intrinsics, and cleanup the definitions. Differential Revision: http://reviews.llvm.org/D10555 llvm-svn: 245606	2015-08-20 20:36:19 +00:00
David Majnemer	cfc1df553e	[X86] Fix the (shl (and (setcc_c), c1), c2) -> (and setcc_c, (c1 << c2)) fold We didn't check for the necessary preconditions before folding a mask/shift into a single mask. This fixes PR24516. llvm-svn: 245544	2015-08-20 09:00:56 +00:00
Sanjay Patel	9e5927fdc3	[x86] enable machine combiner reassociations for scalar double-precision min/max llvm-svn: 245506	2015-08-19 21:27:27 +00:00
Sanjay Patel	4e3ee1e548	[x86] enable machine combiner reassociations for scalar single-precision maximums llvm-svn: 245504	2015-08-19 21:18:46 +00:00
Simon Pilgrim	35f528262f	[DAGCombiner] Added SMAX/SMIN/UMAX/UMIN constant folding We still need to add constant folding of vector comparisons to fold the tests for targets that don't support the respective min/max nodes I needed to update 2011-12-06-AVXVectorExtractCombine to load a vector instead of using a constant vector to prevent it folding Differential Revision: http://reviews.llvm.org/D12118 llvm-svn: 245503	2015-08-19 21:11:58 +00:00
David Majnemer	f25fe64716	[X86] Emit more efficient >= comparisons against 0 We don't do a great job with >= 0 comparisons against zero when the result is used as an i8. Given something like: void f(long long LL, bool B) { B = LL >= 0; } We used to generate: shrq $63, %rdi xorb $1, %dil movb %dil, (%rsi) Now we generate: testq %rdi, %rdi setns (%rsi) Differential Revision: http://reviews.llvm.org/D12136 llvm-svn: 245498	2015-08-19 20:51:40 +00:00
Simon Pilgrim	989cbbd2f5	[DAGCombiner] Fold CONCAT_VECTORS of EXTRACT_SUBVECTOR (or undef) to VECTOR_SHUFFLE. Check to see if this is a CONCAT_VECTORS of a bunch of EXTRACT_SUBVECTOR operations. If so, and if the EXTRACT_SUBVECTOR vector inputs come from at most two distinct vectors the same size as the result, attempt to turn this into a legal shuffle. Differential Revision: http://reviews.llvm.org/D12125 llvm-svn: 245490	2015-08-19 20:09:50 +00:00
Bruno Cardoso Lopes	27fd06922b	[PeepholeOptimizer] Look through PHIs to find additional register sources Reintroduce r245442. Remove an overly conservative assertion introduced in r245442. We could replace the assertion to use `shareSameRegisterFile` instead, but in that point in `insertPHI` we already lost the original Def subreg to check against. So drop the assertion completely. Original commit message: - Teaches the ValueTracker in the PeepholeOptimizer to look through PHI instructions. - Add findNextSourceAndRewritePHI method to lookup into multiple sources returnted by the ValueTracker and rewrite PHIs with new sources. With these changes we can find more register sources and rewrite more copies to allow coaslescing of bitcast instructions. Hence, we eliminate unnecessary VR64 <-> GR64 copies in x86, but it could be extended to other archs by marking "isBitcast" on target specific instructions. The x86 example follows: A: psllq %mm1, %mm0 movd %mm0, %r9 jmp C B: por %mm1, %mm0 movd %mm0, %r9 jmp C C: movd %r9, %mm0 pshufw $238, %mm0, %mm0 Becomes: A: psllq %mm1, %mm0 jmp C B: por %mm1, %mm0 jmp C C: pshufw $238, %mm0, %mm0 Differential Revision: http://reviews.llvm.org/D11197 rdar://problem/20404526 llvm-svn: 245479	2015-08-19 18:53:36 +00:00
Derek Schuff	55817ee604	x32. Fixes a bug in x32 exception handling. This patch updates the X86 lowering so that the Exception Pointer and Selector are 64-bit wide only if Subtarget.isTarget64BitLP64. Patch by João Porto Reviewers: dschuff, rnk Differential Revision: http://reviews.llvm.org/D12111 llvm-svn: 245454	2015-08-19 16:28:21 +00:00
JF Bastien	5ab87edbb4	x32. Fixes jmp %reg in x32 x32 has 32-bit pointers; x86-64 can't jmp %r32. This patch addresses this issue by explicitly zero-extending brind's target to 64-bits. Author: jpp Reviewers: jfb, dschuff, pavel.v.chupin Subscribers: llvm-commits Differential revision: http://reviews.llvm.org/D12112 llvm-svn: 245452	2015-08-19 16:17:08 +00:00
Bruno Cardoso Lopes	61009142b8	Revert "[PeepholeOptimizer] Look through PHIs to find additional register sources" Revert r245442 while investigating a fix. An assertion hit in http://lab.llvm.org:8080/green/job/clang-stage1-configure-RA_build/11380 llvm-svn: 245446	2015-08-19 15:10:32 +00:00
Bruno Cardoso Lopes	0a1c126684	[PeepholeOptimizer] Look through PHIs to find additional register sources Reapply r243486. - Teaches the ValueTracker in the PeepholeOptimizer to look through PHI instructions. - Add findNextSourceAndRewritePHI method to lookup into multiple sources returnted by the ValueTracker and rewrite PHIs with new sources. With these changes we can find more register sources and rewrite more copies to allow coaslescing of bitcast instructions. Hence, we eliminate unnecessary VR64 <-> GR64 copies in x86, but it could be extended to other archs by marking "isBitcast" on target specific instructions. The x86 example follows: A: psllq %mm1, %mm0 movd %mm0, %r9 jmp C B: por %mm1, %mm0 movd %mm0, %r9 jmp C C: movd %r9, %mm0 pshufw $238, %mm0, %mm0 Becomes: A: psllq %mm1, %mm0 jmp C B: por %mm1, %mm0 jmp C C: pshufw $238, %mm0, %mm0 Differential Revision: http://reviews.llvm.org/D11197 rdar://problem/20404526 llvm-svn: 245442	2015-08-19 14:34:41 +00:00
Tobias Grosser	85508e804b	Revert "[X86] Widen the 'AND' mask if doing so shrinks the encoding size" This reverts commit 245169 which miscompiles MultiSource/Applications/siod from LNT. llvm-svn: 245432	2015-08-19 11:35:10 +00:00
Michael Kuperstein	9fe42604aa	[X86] Do not lower scalar sdiv/udiv to a shifts + mul sequence when optimizing for minsize There are some cases where the mul sequence is smaller, but for the most part, using a div is preferable. This does not apply to vectors, since x86 doesn't have vector idiv, and a vector mul/shifts sequence ought to be smaller than a scalarized division. Differential Revision: http://reviews.llvm.org/D12082 llvm-svn: 245431	2015-08-19 11:21:43 +00:00
Simon Pilgrim	4bce6d6b73	[X86] Refreshed sign extension tests. llvm-svn: 245358	2015-08-18 21:21:35 +00:00
Simon Pilgrim	ce30ae62e2	[X86][AVX] Added shuffle concatenation tests llvm-svn: 245351	2015-08-18 20:51:15 +00:00
Simon Pilgrim	edaba3b7c3	Updated constants to give more useful min/max constant folding tests llvm-svn: 245348	2015-08-18 20:46:48 +00:00
Simon Pilgrim	9f4374d361	Fixed max/min typo in test names llvm-svn: 245278	2015-08-18 09:02:51 +00:00
Simon Pilgrim	19ffd57f45	[X86][SSE} Added constant SMAX/SMIN/UMAX/UMIN tests Constant folding patch to follow soon llvm-svn: 245276	2015-08-18 08:52:43 +00:00
Simon Pilgrim	08d823afe4	[X86][SSE] Added extra vector truncation tests. Including cases for PR14866 llvm-svn: 245274	2015-08-18 08:37:09 +00:00
David Majnemer	1a59e49f3c	[X86] Widen the 'AND' mask if doing so shrinks the encoding size We can set additional bits in a mask given that we know the other operand of an AND already has some bits set to zero. This can be more efficient if doing so allows us to use an instruction which implicitly sign extends the immediate. This fixes PR24085. Differential Revision: http://reviews.llvm.org/D11289 llvm-svn: 245169	2015-08-16 04:52:11 +00:00
Sanjay Patel	40d4eb40f6	[x86] enable machine combiner reassociations for scalar single-precision minimums llvm-svn: 245166	2015-08-15 17:01:54 +00:00
Simon Pilgrim	d65ace84c7	Updated broadcast stack folding test to avoid use of broadcast intrinsics. llvm-svn: 245165	2015-08-15 16:54:18 +00:00
Sanjay Patel	3b7e3677e3	fix typos; NFC llvm-svn: 245164	2015-08-15 16:53:08 +00:00
Sanjay Patel	9f6c7dddd2	add test case to show current codegen llvm-svn: 245163	2015-08-15 16:49:50 +00:00
Simon Pilgrim	0750c84623	[DAGCombiner] Attempt to mask vectors before zero extension instead of after. For cases where we TRUNCATE and then ZERO_EXTEND to a larger size (often from vector legalization), see if we can mask the source data and then ZERO_EXTEND (instead of after a ANY_EXTEND). This can help avoid having to generate a larger mask, and possibly applying it to several sub-vectors. (zext (truncate x)) -> (zext (and(x, m)) Includes a minor patch to SystemZ to better recognise 8/16-bit zero extension patterns from RISBG bit-extraction code. This is the first of a number of minor patches to help improve the conversion of byte masks to clear mask shuffles. Differential Revision: http://reviews.llvm.org/D11764 llvm-svn: 245160	2015-08-15 13:27:30 +00:00
Sanjay Patel	7332e0455f	make current codegen visible in the checks, so we can decide if it's right llvm-svn: 245120	2015-08-14 23:03:01 +00:00
Pat Gavlin	b399095c3f	Add a target environment for CoreCLR. Although targeting CoreCLR is similar to targeting MSVC, there are certain important differences that the backend must be aware of (e.g. differences in stack probes, EH, and library calls). Differential Revision: http://reviews.llvm.org/D11012 llvm-svn: 245115	2015-08-14 22:41:43 +00:00
Sanjay Patel	dd175bc6c4	make current codegen visible in the checks, so we can decide if it's right llvm-svn: 245108	2015-08-14 22:10:59 +00:00
Sanjay Patel	ed502905f7	[x86] fix allowsMisalignedMemoryAccess() implementation This patch fixes the x86 implementation of allowsMisalignedMemoryAccess() to correctly return the 'Fast' output parameter for 32-byte accesses. To test that, an existing load merging optimization is changed to use the TLI hook. This exposes a shortcoming in the current logic and results in the regression test update. Changing other direct users of the isUnalignedMem32Slow() x86 CPU attribute would be a follow-on patch. Without the fix in allowsMisalignedMemoryAccesses(), we will infinite loop when targeting SandyBridge because LowerINSERT_SUBVECTOR() creates 32-byte loads from two 16-byte loads while PerformLOADCombine() splits them back into 16-byte loads. Differential Revision: http://reviews.llvm.org/D10662 llvm-svn: 245075	2015-08-14 17:53:40 +00:00
Rafael Espindola	dbaf0498a9	Revert "Centralize the information about which object format we are using." This reverts commit r245047. It was failing on the darwin bots. The problem was that when running ./bin/llc -march=msp430 llc gets to if (TheTriple.getTriple().empty()) TheTriple.setTriple(sys::getDefaultTargetTriple()); Which means that we go with an arch of msp430 but a triple of x86_64-apple-darwin14.4.0 which fails badly. That code has to be updated to select a triple based on the value of march, but that is not a trivial fix. llvm-svn: 245062	2015-08-14 15:48:41 +00:00
Rafael Espindola	90eb70c8a7	Centralize the information about which object format we are using. Other than some places that were handling unknown as ELF, this should have no change. The test updates are because we were detecting arm-coff or x86_64-win64-coff as ELF targets before. It is not clear if the enum should live on the Triple. At least now it lives in a single location and should be easier to move somewhere else. llvm-svn: 245047	2015-08-14 13:31:17 +00:00
Simon Pilgrim	1fdc177ccf	Renamed min tests (typo) llvm-svn: 245038	2015-08-14 11:03:31 +00:00
Alex Lorenz	5022f6bb81	MIR Serialization: Change MIR syntax - use custom syntax for MBBs. This commit modifies the way the machine basic blocks are serialized - now the machine basic blocks are serialized using a custom syntax instead of relying on YAML primitives. Instead of using YAML mappings to represent the individual machine basic blocks in a machine function's body, the new syntax uses a single YAML block scalar which contains all of the machine basic blocks and instructions for that function. This is an example of a function's body that uses the old syntax: body: - id: 0 name: entry instructions: - '%eax = MOV32r0 implicit-def %eflags' - 'RETQ %eax' ... The same body is now written like this: body: \| bb.0.entry: %eax = MOV32r0 implicit-def %eflags RETQ %eax ... This syntax change is motivated by the fact that the bundled machine instructions didn't map that well to the old syntax which was using a single YAML sequence to store all of the machine instructions in a block. The bundled machine instructions internally use flags like BundledPred and BundledSucc to determine the bundles, and serializing them as MI flags using the old syntax would have had a negative impact on the readability and the ease of editing for MIR files. The new syntax allows me to serialize the bundled machine instructions using a block construct without relying on the internal flags, for example: BUNDLE implicit-def dead %itstate, implicit-def %s1 ... { t2IT 1, 24, implicit-def %itstate %s1 = VMOVS killed %s0, 1, killed %cpsr, implicit killed %itstate } This commit also converts the MIR testcases to the new syntax. I developed a script that can convert from the old syntax to the new one. I will post the script on the llvm-commits mailing list in the thread for this commit. llvm-svn: 244982	2015-08-13 23:10:16 +00:00
Simon Pilgrim	829091e310	[X86][SSE] Tests for SMAX/SMIN/UMAX/UMIN vector instructions llvm-svn: 244944	2015-08-13 20:31:03 +00:00
Simon Pilgrim	a5737a44da	Cleaned up test. NFCI. llvm-svn: 244765	2015-08-12 17:00:50 +00:00
Michael Kuperstein	fe0d9bb6eb	[X86] Disable mul -> shl + lea combine when compiling for minsize Differential Revision: http://reviews.llvm.org/D11904 llvm-svn: 244740	2015-08-12 11:27:26 +00:00
Michael Kuperstein	bc7f99a3ab	[X86] Allow x86 call frame optimization to fold more loads into pushes This abstracts away the test for "when can we fold across a MachineInstruction" into the the MI interface, and changes call-frame optimization use the same test the peephole optimizer users. Differential Revision: http://reviews.llvm.org/D11945 llvm-svn: 244729	2015-08-12 10:14:58 +00:00
Simon Pilgrim	8c049d5c03	[InstCombine] Move SSE/AVX vector blend folding to instcombiner As discussed in D11886, this patch moves the SSE/AVX vector blend folding to instcombiner from PerformINTRINSIC_WO_CHAINCombine (which allows us to remove this completely). InstCombiner already had partial support for this, I just had to add support for zero (ConstantAggregateZero) masks and also the case where both selection inputs were the same (allowing us to ignore the mask). I also moved all the relevant combine tests into InstCombine/blend_x86.ll Differential Revision: http://reviews.llvm.org/D11934 llvm-svn: 244723	2015-08-12 08:08:56 +00:00
Sanjay Patel	260b6d36f4	[x86] enable machine combiner reassociations for 256-bit vector FP mul/add llvm-svn: 244705	2015-08-12 00:29:10 +00:00
Sanjay Patel	2c6a01570d	[x86] enable machine combiner reassociations for 128-bit vector single/double multiplies llvm-svn: 244657	2015-08-11 20:19:23 +00:00
Sanjay Patel	82d91ddb4f	fix minsize detection: minsize attribute implies optimizing for size Also, add a test for optsize because this was not part of any existing regression test. llvm-svn: 244651	2015-08-11 19:39:36 +00:00
Sanjay Patel	070df89928	fix minsize detection: minsize attribute implies optimizing for size llvm-svn: 244631	2015-08-11 17:04:31 +00:00
Sanjay Patel	caddda56aa	add missing tests for powi expansion with size optimizations The minsize test will be fixed in the next commit. llvm-svn: 244630	2015-08-11 16:58:49 +00:00
Sanjay Patel	c3e8349a3e	fixed to use FileCheck llvm-svn: 244627	2015-08-11 16:51:31 +00:00
Sanjay Patel	605b6adf31	fixed to test attribute, rather than CPU llvm-svn: 244625	2015-08-11 16:43:18 +00:00
Michael Kuperstein	243c073a2e	[X86] Allow merging of immediates within a basic block for code size savings First step in preventing immediates that occur more than once within a single basic block from being pulled into their users, in order to prevent unnecessary large instruction encoding .Currently enabled only when optimizing for size. Patch by: zia.ansari@intel.com Differential Revision: http://reviews.llvm.org/D11363 llvm-svn: 244601	2015-08-11 14:10:58 +00:00
Michael Kuperstein	7337ee23d8	[X86] When optimizing for minsize, use POP for small post-call stack clean-up When optimizing for size, replace "addl $4, %esp" and "addl $8, %esp" following a call by one or two pops, respectively. We don't try to do it in general, but only when the stack adjustment immediately follows a call - which is the most common case. That allows taking a short-cut when trying to find a free register to pop into, instead of a full-blown liveness check. If the adjustment immediately follows a call, then every register the call clobbers but doesn't define should be dead at that point, and can be used. Differential Revision: http://reviews.llvm.org/D11749 llvm-svn: 244578	2015-08-11 08:48:48 +00:00
Michael Kuperstein	82814f63c0	Allow PeepholeOptimizer to fold a few more cases The condition for clearing the folding candidate list was clamped together with the "uninteresting instruction" condition. This is too conservative, e.g. we don't need to clear the list when encountering an IMPLICIT_DEF. Differential Revision: http://reviews.llvm.org/D11591 llvm-svn: 244577	2015-08-11 08:19:43 +00:00
Sanjay Patel	d967a878fa	fix minsize detection: minsize attribute implies optimizing for size llvm-svn: 244528	2015-08-10 23:07:26 +00:00
Alex Lorenz	e5101e2016	MachineVerifier: Handle the optional def operand in a PATCHPOINT instruction. The PATCHPOINT instructions have a single optional defined register operand, but the machine verifier can't verify the optional defined register operands. This commit makes sure that the machine verifier won't report an error when a PATCHPOINT instruction doesn't have its optional defined register operand. This change will allow us to enable the machine verifier for the code generation tests for the patchpoint intrinsics. Reviewers: Juergen Ributzka llvm-svn: 244513	2015-08-10 21:47:36 +00:00
Alex Lorenz	2f43dd5a12	StackMap: FastISel: Add an appropriate number of immediate operands to the frame setup instruction. This commit ensures that the stack map lowering code in FastISel adds an appropriate number of immediate operands to the frame setup instruction. The previous code added just one immediate operand, which was fine for a target like AArch64, but on X86 the ADJCALLSTACKDOWN64 instruction needs two explicit operands. This caused the machine verifier to report an error when the old code added just one. Reviewers: Juergen Ributzka Differential Revision: http://reviews.llvm.org/D11853 llvm-svn: 244508	2015-08-10 21:27:03 +00:00
JF Bastien	fa9746dc8d	x86: Emit LAHF/SAHF instead of PUSHF/POPF NaCl's sandbox doesn't allow PUSHF/POPF out of security concerns (priviledged emulators have forgotten to mask system bits in the past, and EFLAGS's DF bit is a constant source of hilarity). Commit r220529 fixed PR20376 by saving cmpxchg's flags result using EFLAGS, this commit now generated LAHF/SAHF instead, for all of x86 (not just NaCl) because it leads to an overall performance gain over PUSHF/POPF. As with the previous patch this code generation is pretty bad because it occurs very later, after register allocation, and in many cases it rematerializes flags which were already available (e.g. already in a register through SETE). Fortunately it's somewhat rare that this code needs to fire. I did [[ https://github.com/jfbastien/benchmark-x86-flags \| a bit of benchmarking ]], the results on an Intel Haswell E5-2690 CPU at 2.9GHz are: \| Time per call (ms) \| Runtime (ms) \| Benchmark \| \| 0.000012514 \| 6257 \| sete.i386 \| \| 0.000012810 \| 6405 \| sete.i386-fast \| \| 0.000010456 \| 5228 \| sete.x86-64 \| \| 0.000010496 \| 5248 \| sete.x86-64-fast \| \| 0.000012906 \| 6453 \| lahf-sahf.i386 \| \| 0.000013236 \| 6618 \| lahf-sahf.i386-fast \| \| 0.000010580 \| 5290 \| lahf-sahf.x86-64 \| \| 0.000010304 \| 5152 \| lahf-sahf.x86-64-fast \| \| 0.000028056 \| 14028 \| pushf-popf.i386 \| \| 0.000027160 \| 13580 \| pushf-popf.i386-fast \| \| 0.000023810 \| 11905 \| pushf-popf.x86-64 \| \| 0.000026468 \| 13234 \| pushf-popf.x86-64-fast \| Clearly `PUSHF`/`POPF` are suboptimal. It doesn't really seems to be worth teaching LLVM about individual flags, at least not for this purpose. Reviewers: rnk, jvoung, t.p.northover Subscribers: llvm-commits Differential revision: http://reviews.llvm.org/D6629 llvm-svn: 244503	2015-08-10 20:59:36 +00:00
Sanjay Patel	d09391c8cd	fix minsize detection: minsize attribute implies optimizing for size llvm-svn: 244499	2015-08-10 20:45:44 +00:00
Sanjay Patel	178f8cba51	[x86, SSE]]add missing tests for load folding with partial register update The minsize case is wrong; that will be fixed in the next commit. llvm-svn: 244498	2015-08-10 20:34:34 +00:00
Simon Pilgrim	a3a72b41de	[InstCombine] Move SSE2/AVX2 arithmetic vector shift folding to instcombiner As discussed in D11760, this patch moves the (V)PSRA(WD) arithmetic shift-by-constant folding to InstCombine to match the logical shift implementations. Differential Revision: http://reviews.llvm.org/D11886 llvm-svn: 244495	2015-08-10 20:21:15 +00:00
Jonathan Roelofs	f45295c366	Fix a few more cases of 'CHECK[^:]*$'. NFCI llvm-svn: 244491	2015-08-10 19:56:39 +00:00
Jonathan Roelofs	49e46ce8e2	Fix a bunch of trivial cases of 'CHECK[^:]*$' in the tests. NFCI I looked into adding a warning / error for this to FileCheck, but there doesn't seem to be a good way to avoid it triggering on the instances of it in RUN lines. llvm-svn: 244481	2015-08-10 19:01:27 +00:00
Sanjay Patel	10294b59de	fix minsize detection: minsize attribute implies optimizing for size llvm-svn: 244464	2015-08-10 17:15:17 +00:00
Sanjay Patel	0f12d71b49	fix minsize detection: minsize attribute implies optimizing for size llvm-svn: 244463	2015-08-10 17:00:44 +00:00
Sanjay Patel	68b0325a9e	fix minsize detection: minsize attribute implies optimizing for size llvm-svn: 244460	2015-08-10 16:47:47 +00:00
Sanjay Patel	9a9003d94c	fix minsize detection: minsize attribute implies optimizing for size llvm-svn: 244458	2015-08-10 16:43:20 +00:00
Robert Lougher	11a44b78a3	Trace copies when checking for rematerializability in spill weight calculation PR24139 contains an analysis of poor register allocation. One of the findings was that when calculating the spill weight, a rematerializable interval once split is no longer rematerializable. This is because the isRematerializable check in CalcSpillWeights.cpp does not follow the copies introduced by live range splitting (after splitting, the live interval register definition is a copy which is not rematerializable). Reviewers: qcolombet Differential Revision: http://reviews.llvm.org/D11686 llvm-svn: 244439	2015-08-10 11:59:44 +00:00
Sanjay Patel	e0178262d4	[x86] enable machine combiner reassociations for 128-bit vector single/double adds llvm-svn: 244403	2015-08-08 19:08:20 +00:00
Sanjay Patel	0f51d14957	add a missing regression test for a DAGCombiner FDIV optimization There's no test for this transform in any backend. Discovered while debugging fast-math-flag propagation in the DAG (r244053). llvm-svn: 244373	2015-08-07 23:19:41 +00:00
Sanjay Patel	8d768b5b3a	redo r244360 (tighten checks...) after specifying triple llvm-svn: 244361	2015-08-07 21:42:24 +00:00
Sanjay Patel	0e1d731631	tighten checks using update_llc_test_checks.py llvm-svn: 244360	2015-08-07 21:38:53 +00:00
Kit Barton	a7bf96ab5c	Fix possible infinite loop in shrink wrapping when searching for save/restore points. There is an infinite loop that can occur in Shrink Wrapping while searching for the Save/Restore points. Part of this search checks whether the save/restore points are located in different loop nests and if so, uses the (post) dominator trees to find the immediate (post) dominator blocks. However, if the current block does not have any immediate (post) dominators then this search will result in an infinite loop. This can occur in code containing an infinite loop. The modification checks whether the immediate (post) dominator is different from the current save/restore block. If it is not, then the search terminates and the current location is not considered as a valid save/restore point for shrink wrapping. Phabricator: http://reviews.llvm.org/D11607 llvm-svn: 244247	2015-08-06 19:01:57 +00:00
Michael Kuperstein	868dc65444	[X86] Improve EmitLoweredSelect for contiguous CMOV pseudo instructions. This change improves EmitLoweredSelect() so that multiple contiguous CMOV pseudo instructions with the same (or exactly opposite) conditions get lowered using a single new basic-block. This eliminates unnecessary extra basic-blocks (and CFG merge points) when contiguous CMOVs are being lowered. Patch by: kevin.b.smith@intel.com Differential Revision: http://reviews.llvm.org/D11428 llvm-svn: 244202	2015-08-06 08:45:34 +00:00
Reid Kleckner	10cac7a190	Fix Windows test failure with triple instead of using the native OS llvm-svn: 244159	2015-08-05 22:27:08 +00:00
JF Bastien	8662083770	x86 atomic: optimize a.store(reg op a.load(acquire), release) Summary: PR24191 finds that the expected memory-register operations aren't generated when relaxed { load ; modify ; store } is used. This is similar to PR17281 which was addressed in D4796, but only for memory-immediate operations (and for memory orderings up to acquire and release). This patch also handles some floating-point operations. Reviewers: reames, kcc, dvyukov, nadav, morisset, chandlerc, t.p.northover, pete Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D11382 llvm-svn: 244128	2015-08-05 21:04:59 +00:00
JF Bastien	7c4218f49c	Revert "Fix MO's analyzePhysReg, it was confusing sub- and super-registers. Problem pointed out by Michael Hordijk." I mistakenly committed the patch for D6629, and was trying to commit another. Reverting until it gets proper signoff. llvm-svn: 244121	2015-08-05 20:53:56 +00:00
JF Bastien	ce5256f5c5	Fix MO's analyzePhysReg, it was confusing sub- and super-registers. Problem pointed out by Michael Hordijk. llvm-svn: 244120	2015-08-05 20:49:46 +00:00
Sanjay Patel	75ced2782b	[x86] machine combiner reassociation: mark EFLAGS operand as 'dead' In the commentary for D11660, I wasn't sure if it was alright to create new integer machine instructions without also creating the implicit EFLAGS operand. From what I can see, the implicit operand is always created by the MachineInstrBuilder based on the instruction type, so we don't have to do that explicitly. However, in reviewing the debug output, I noticed that the operand was not marked as 'dead'. The machine combiner should do that to preserve future optimization opportunities that may be checking for that dead EFLAGS operand themselves. Differential Revision: http://reviews.llvm.org/D11696 llvm-svn: 243990	2015-08-04 15:21:56 +00:00
Duncan P. N. Exon Smith	55ca964e94	DI: Disallow uniquable DICompileUnits Since r241097, `DIBuilder` has only created distinct `DICompileUnit`s. The backend is liable to start relying on that (if it hasn't already), so make uniquable `DICompileUnit`s illegal and automatically upgrade old bitcode. This is a nice cleanup, since we can remove an unnecessary `DenseSet` (and the associated uniquing info) from `LLVMContextImpl`. Almost all the testcases were updated with this script: git grep -e '= !DICompileUnit' -l -- test \| grep -v test/Bitcode \| xargs sed -i '' -e 's,= !DICompileUnit,= distinct !DICompileUnit,' I imagine something similar should work for out-of-tree testcases. llvm-svn: 243885	2015-08-03 17:26:41 +00:00
Simon Pilgrim	87368fdd30	[X86][SSE] Refreshed sse2 vector shift tests llvm-svn: 243850	2015-08-02 15:23:53 +00:00
Simon Pilgrim	503a2594c3	[DAGCombiner] Convert constant AND masks to shuffle clear masks down to the byte level The XformToShuffleWithZero method currently checks AND masks at the per-lane level for all-one and all-zero constants and attempts to convert them to legal shuffle clear masks. This patch generalises XformToShuffleWithZero, splitting and checking the sub-lanes of the constants down to the byte level to see if any legal shuffle clear masks are possible. This allows a lot of masks (often from legalization or truncation) to be folded into existing shuffle patterns and removes a lot of constant mask loading. There are a few examples of poor shuffle lowering that are exposed by this patch that will be cleaned up in future patches (e.g. merging shuffles that are separated by bitcasts, x86 legalized v8i8 zero extension uses PMOVZX+AND+AND instead of AND+PMOVZX, etc.) Differential Revision: http://reviews.llvm.org/D11518 llvm-svn: 243831	2015-08-01 10:01:46 +00:00
Duncan P. N. Exon Smith	ed013cd221	DI: Remove DW_TAG_arg_variable and DW_TAG_auto_variable Remove the fake `DW_TAG_auto_variable` and `DW_TAG_arg_variable` tags, using `DW_TAG_variable` in their place Stop exposing the `tag:` field at all in the assembly format for `DILocalVariable`. Most of the testcase updates were generated by the following sed script: find test/ -name ".ll" -o -name ".mir" \| xargs grep -l 'DILocalVariable' \| xargs sed -i '' \ -e 's/tag: DW_TAG_arg_variable, //' \ -e 's/tag: DW_TAG_auto_variable, //' There were only a handful of tests in `test/Assembly` that I needed to update by hand. (Note: a follow-up could change `DILocalVariable::DILocalVariable()` to set the tag to `DW_TAG_formal_parameter` instead of `DW_TAG_variable` (as appropriate), instead of having that logic magically in the backend in `DbgVariable`. I've added a FIXME to that effect.) llvm-svn: 243774	2015-07-31 18:58:39 +00:00
Sanjay Patel	9ff4626028	[x86] reassociate integer multiplies using machine combiner pass Add i16, i32, i64 imul machine instructions to the list of reassociation candidates. A new bit of logic is needed to handle integer instructions: they have an implicit EFLAGS operand, so we have to make sure it's dead in order to do any reassociation with integer ops. Differential Revision: http://reviews.llvm.org/D11660 llvm-svn: 243756	2015-07-31 16:21:55 +00:00
Sanjay Patel	1166f2ff9f	fix memcpy/memset/memmove lowering when optimizing for size Fixing MinSize attribute handling was discussed in D11363. This is a prerequisite patch to doing that. The handling of OptSize when lowering mem* functions was broken on Darwin because it wants to ignore -Os for these cases, but the existing logic also made it ignore -Oz (MinSize). The Linux change demonstrates a widespread problem. The backend doesn't usually recognize the MinSize attribute by itself; it assumes that if the MinSize attribute exists, then the OptSize attribute must also exist. Fixing this more generally will be a follow-on patch or two. Differential Revision: http://reviews.llvm.org/D11568 llvm-svn: 243693	2015-07-30 21:41:50 +00:00
Nick Lewycky	c3890d2969	Fix typo "fuction" noticed in comments in AssumptionCache.h, and also all the other files that have the same typo. All comments, no functionality change! (Merely a "fuctionality" change.) Bonus change to remove emacs major mode marker from SystemZMachineFunctionInfo.cpp because emacs already knows it's C++ from the extension. Also fix typo "appeary" in AMDGPUMCAsmInfo.h. llvm-svn: 243585	2015-07-29 22:32:47 +00:00
Simon Pilgrim	ba10f76705	[X86][SSE] Keep 32-bit target i64 vector shifts on SSE unit. This patch improves the 32-bit target i64 constant matching to detect the shuffle vector splats that are introduced by i64 vector shift vectorization (D8416). Differential Revision: http://reviews.llvm.org/D11327 llvm-svn: 243577	2015-07-29 21:44:27 +00:00
Simon Pilgrim	86478c6909	[X86][SSE] Vectorize i64 ASHR operations This patch vectorizes the v2i64/v4i64 ASHR shift operations - the last remaining integer vector shifts that are still being transferred to/from the scalar unit to be completed. Differential Revision: http://reviews.llvm.org/D11439 llvm-svn: 243569	2015-07-29 20:31:45 +00:00
Bruno Cardoso Lopes	38c0250679	Revert "[PeepholeOptimizer] Look through PHIs to find additional register sources" Reported to Broke some internal tests: PR24303 This reverts commit r243486. llvm-svn: 243540	2015-07-29 17:46:47 +00:00
Sanjay Patel	133e68b45c	ignore duplicate divisor uses when transforming into reciprocal multiplies (PR24141) PR24141: https://llvm.org/bugs/show_bug.cgi?id=24141 contains a test case where we have duplicate entries in a node's uses() list. After r241826, we use CombineTo() to delete dead nodes when combining the uses into reciprocal multiplies, but this fails if we encounter the just-deleted node again in the list. The solution in this patch is to not add duplicate entries to the list of users that we will subsequently iterate over. For the test case, this avoids triggering the combine divisors logic entirely because there really is only one user of the divisor. Differential Revision: http://reviews.llvm.org/D11345 llvm-svn: 243500	2015-07-28 23:28:22 +00:00
Bruno Cardoso Lopes	3c235763e5	[PeepholeOptimizer] Look through PHIs to find additional register sources Reapply 243271 with more fixes; although we are not handling multiple sources with coalescable copies, we were not properly skipping this case. - Teaches the ValueTracker in the PeepholeOptimizer to look through PHI instructions. - Add findNextSourceAndRewritePHI method to lookup into multiple sources returnted by the ValueTracker and rewrite PHIs with new sources. With these changes we can find more register sources and rewrite more copies to allow coaslescing of bitcast instructions. Hence, we eliminate unnecessary VR64 <-> GR64 copies in x86, but it could be extended to other archs by marking "isBitcast" on target specific instructions. The x86 example follows: A: psllq %mm1, %mm0 movd %mm0, %r9 jmp C B: por %mm1, %mm0 movd %mm0, %r9 jmp C C: movd %r9, %mm0 pshufw $238, %mm0, %mm0 Becomes: A: psllq %mm1, %mm0 jmp C B: por %mm1, %mm0 jmp C C: pshufw $238, %mm0, %mm0 Differential Revision: http://reviews.llvm.org/D11197 rdar://problem/20404526 llvm-svn: 243486	2015-07-28 21:45:50 +00:00
Alex Lorenz	305e8f6312	Add a test case for r242191 ([MMX] Use the appropriate instructions for GR64 <-> VR64 copies). This commit adds a MIR test case for the commit r242191, which was committed without one. This test case verifies that the ExpandPostRA pass expands the GR64 <-> VR64 copies into the appropriate MMX_MOV instructions. llvm-svn: 243457	2015-07-28 17:52:59 +00:00
Chih-Hung Hsieh	9843f406ec	Move unit tests to target specific directories. Differential Revision: http://reviews.llvm.org/D10522 llvm-svn: 243454	2015-07-28 17:32:49 +00:00
Sanjay Patel	94a7433cde	add tests to show broken current behavior of minsize attribute llvm-svn: 243451	2015-07-28 17:18:25 +00:00

... 3 4 5 6 7 ...

6704 Commits