llvm-project

Commit Graph

Author	SHA1	Message	Date
Michael Liao	a880186030	Add missing i8 max/min/umax/umin support - Fix PR5145 and turn on test 8-bit atomic ops llvm-svn: 164358	2012-09-21 03:18:52 +00:00
Benjamin Kramer	8554206652	Fix broken check lines. llvm-svn: 164317	2012-09-20 19:54:13 +00:00
Michael Liao	83bc2119dc	Specify CPu to prevent failure on ATOM due to different code scheduling llvm-svn: 164283	2012-09-20 03:34:04 +00:00
Michael Liao	3237662b65	Re-work X86 code generation of atomic ops with spin-loop - Rewrite/merge pseudo-atomic instruction emitters to address the following issue: * Reduce one unnecessary load in spin-loop previously the spin-loop looks like thisMBB: newMBB: ld t1 = [bitinstr.addr] op t2 = t1, [bitinstr.val] not t3 = t2 (if Invert) mov EAX = t1 lcs dest = [bitinstr.addr], t3 [EAX is implicit] bz newMBB fallthrough -->nextMBB the 'ld' at the beginning of newMBB should be lift out of the loop as lcs (or CMPXCHG on x86) will load the current memory value into EAX. This loop is refined as: thisMBB: EAX = LOAD [MI.addr] mainMBB: t1 = OP [MI.val], EAX LCMPXCHG [MI.addr], t1, [EAX is implicitly used & defined] JNE mainMBB sinkMBB: * Remove immopc as, so far, all pseudo-atomic instructions has all-register form only, there is no immedidate operand. * Remove unnecessary attributes/modifiers in pseudo-atomic instruction td * Fix issues in PR13458 - Add comprehensive tests on atomic ops on various data types. NOTE: Some of them are turned off due to missing functionality. - Revise tests due to the new spin-loop generated. llvm-svn: 164281	2012-09-20 03:06:15 +00:00
Michael Liao	8372539543	Unify the logic in SelectAtomicLoadAdd and SelectAtomicLoadArith - Merge the processing of LOAD_ADD with other atomic load-arith operations - Separate the logic getting target constant for atomic-load-op and add an optimization for atomic-load-add on i16 with negative value - Optimize a minor case for atomic-fetch-add i16 with negative operand. Test case is revised. llvm-svn: 164243	2012-09-19 19:36:58 +00:00
Jan Wen Voung	4ce1d7b4f1	Add some cases to x86 OptimizeCompare to handle DEC and INC, too. While we are setting the earlier def to true, also make it live. llvm-svn: 164056	2012-09-17 22:04:23 +00:00
Michael Liao	b503b323f3	Fix PR13859 - Preserve the original NOutVT during casting from vector to integer by extracting vector elements. llvm-svn: 164042	2012-09-17 18:05:20 +00:00
Nadav Rotem	ae6809b19a	Fix the testcase to work on all platforms. llvm-svn: 163997	2012-09-16 07:58:47 +00:00
Nadav Rotem	37521aa89c	The PMOVZXWD family of functions had patterns extends narrow vector types to wide vector types. It had patterns for zext-loading and extending. This commit adds patterns for loading a wide type, performing a bitcast, and extending. This is an odd pattern, but it is commonly used when writing code with intrinsics. rdar://11897677 llvm-svn: 163995	2012-09-16 07:39:07 +00:00
Benjamin Kramer	ece434252c	X86: Emitting x87 fsin/fcos for sinf/cosf is not safe without unsafe fp math. This was only an issue if sse is disabled. llvm-svn: 163967	2012-09-15 12:44:27 +00:00
Eric Christopher	b83dba2b84	Fix both the test for zero and what we do if we have a zero for umulo legalization. Fixes PR13839 llvm-svn: 163856	2012-09-13 23:24:02 +00:00
Michael Liao	137f8aedea	Add wider vector/integer support for PR12312 - Enhance the fix to PR12312 to support wider integer, such as 256-bit integer. If more than 1 fully evaluated vectors are found, POR them first followed by the final PTEST. llvm-svn: 163832	2012-09-13 20:24:54 +00:00
Michael Liao	460fc46e0f	Enhance type legalization on bitcast from vector to integer - Find a legal vector type before casting and extracting element from it. - As the new vector type may have more than 2 elements, build the final hi/lo pair by BFS pairing them from bottom to top. llvm-svn: 163830	2012-09-13 19:58:21 +00:00
Jakob Stoklund Olesen	32a56fa3ba	Fix test case to avoid PIC magic. llvm-svn: 163827	2012-09-13 19:47:45 +00:00
Jakob Stoklund Olesen	3cf3ffce24	Fix the TCRETURNmi64 bug differently. Add a PatFrag to match X86tcret using 6 fixed registers or less. This avoids folding loads into TCRETURNmi64 using 7 or more volatile registers. <rdar://problem/12282281> llvm-svn: 163819	2012-09-13 18:31:27 +00:00
Jakob Stoklund Olesen	78b9f8fc67	Revert r163761 "Don't fold indexed loads into TCRETURNmi64." The patch caused "Wrong topological sorting" assertions. llvm-svn: 163810	2012-09-13 16:52:17 +00:00
Nadav Rotem	24a822a5cb	Fix a dagcombine optimization. The optimization attempts to optimize a bitcast of fneg to integers by xoring the high-bit. This fails if the source operand is a vector because we need to negate each of the elements in the vector. Fix rdar://12281066 PR13813. llvm-svn: 163802	2012-09-13 14:54:28 +00:00
Nadav Rotem	4e9ad06617	Stack Coloring: We have code that checks that all of the uses of allocas are within the lifetime zone. Sometime legitimate usages of allocas are hoisted outside of the lifetime zone. For example, GEPS may calculate the address of a member of an allocated struct. This commit makes sure that we only check (abort regions or assert) for instructions that read and write memory using stack frames directly. Notice that by allowing legitimate usages outside the lifetime zone we also stop checking for instructions which use derivatives of allocas. We will catch less bugs in user code and in the compiler itself. llvm-svn: 163791	2012-09-13 12:38:37 +00:00
Jakob Stoklund Olesen	bfacef45eb	Don't fold indexed loads into TCRETURNmi64. We don't have enough GR64_TC registers when calling a varargs function with 6 arguments. Since %al holds the number of vector registers used, only %r11 is available as a scratch register. This means that addressing modes using both base and index registers can't be folded into TCRETURNmi64. <rdar://problem/12282281> llvm-svn: 163761	2012-09-13 00:25:00 +00:00
Michael Liao	abb87d4857	Fix PR11985 - BlockAddress has no support of BA + offset form and there is no way to propagate that offset into machine operand; - Add BA + offset support and a new interface 'getTargetBlockAddress' to simplify target block address forming; - All targets are modified to use new interface and X86 backend is enhanced to support BA + offset addressing. llvm-svn: 163743	2012-09-12 21:43:09 +00:00
Nadav Rotem	8ff00989fc	Stack coloring: remove lifetime intervals which contain escaped allocas. The input program may contain intructions which are not inside lifetime markers. This can happen due to a bug in the compiler or due to a bug in user code (for example, returning a reference to a local variable). This commit adds checks that all of the instructions in the function and invalidates lifetime ranges which do not contain all of the instructions. llvm-svn: 163678	2012-09-12 04:57:37 +00:00
Chad Rosier	1778831a3d	[ms-inline asm] Split the parsing of IR asm strings into GCC and MS variants. Add support in the EmitMSInlineAsmStr() function for handling integer consts. llvm-svn: 163645	2012-09-11 19:09:56 +00:00
Chad Rosier	ab51c9de34	Formatting. No functional change intended. llvm-svn: 163627	2012-09-11 16:33:10 +00:00
Nadav Rotem	65ba95ebf9	Stack Coloring: Dont crash on dbg values which use stack frames. llvm-svn: 163616	2012-09-11 12:34:27 +00:00
NAKAMURA Takumi	8c72306cdb	test/CodeGen/X86/ms-inline-asm.ll: Relax for non-darwin x86 targets. '##InlineAsm' could not be seen in other hosts. llvm-svn: 163554	2012-09-10 22:04:54 +00:00
Chad Rosier	7641f58784	[ms-inline asm] Properly emit the asm directives when the AsmPrinterVariant and InlineAsmVariant don't match. llvm-svn: 163550	2012-09-10 21:36:05 +00:00
Chad Rosier	1c1319b9e7	Update test case for Release builds. llvm-svn: 163549	2012-09-10 21:31:43 +00:00
Chad Rosier	db20a41d99	[ms-inline asm] Pass the correct AsmVariant to the PrintAsmOperand() function and update the printOperand() function accordingly. llvm-svn: 163544	2012-09-10 21:10:49 +00:00
Nadav Rotem	3c86b78ae4	Stack Coloring: Handle the case where END markers come before BEGIN markers properly. llvm-svn: 163530	2012-09-10 18:51:09 +00:00
Michael Liao	400f7ef871	Enhance PR11334 fix to support extload from v2f32/v4f32 - Fix an remaining issue of PR11674 as well llvm-svn: 163528	2012-09-10 18:33:51 +00:00
Michael Liao	c3d5b21c39	Add boolean simplification support from CMOV - If a boolean value is generated from CMOV and tested as boolean value, simplify the use of test result by referencing the original condition. RDRAND intrinisc is one of such cases. llvm-svn: 163516	2012-09-10 16:36:16 +00:00
Nadav Rotem	6731363185	Stack Coloring: Add support for multiple regions of the same slot, within a single basic block. llvm-svn: 163507	2012-09-10 12:39:35 +00:00
Elena Demikhovsky	264fb0217e	The VPSHUFB 256-bit instruction may be generated when one of input vector is undefined or zeroinitializer. I've added the "zeroinitializer" case in this patch. llvm-svn: 163506	2012-09-10 12:13:11 +00:00
Nadav Rotem	d753a952ca	Teach the DAGBuilder about lifetime markers which are generated from PHINodes. llvm-svn: 163494	2012-09-10 08:43:23 +00:00
Craig Topper	03f39773e0	Teach DAG combiner to constant fold fneg of a BUILD_VECTOR of constants. llvm-svn: 163483	2012-09-09 22:58:45 +00:00
Craig Topper	4ed79bd7d7	Add instruction selection for ffloor of vectors when SSE4.1 or AVX is enabled. llvm-svn: 163473	2012-09-08 17:42:27 +00:00
Craig Topper	98f2e861a0	Add support for lowering FABS of vector types. llvm-svn: 163461	2012-09-08 07:31:51 +00:00
Jakob Stoklund Olesen	866908c42c	Allow overlaps between virtreg and physreg live ranges. The RegisterCoalescer understands overlapping live ranges where one register is defined as a copy of the other. With this change, register allocators using LiveRegMatrix can do the same, at least for copies between physical and virtual registers. When a physreg is defined by a copy from a virtreg, allow those live ranges to overlap: %CL<def> = COPY %vreg11:sub_8bit; GR32_ABCD:%vreg11 %vreg13<def,tied1> = SAR32rCL %vreg13<tied0>, %CL<imp-use,kill> We can assign %vreg11 to %ECX, overlapping the live range of %CL. llvm-svn: 163336	2012-09-06 18:15:23 +00:00
Nadav Rotem	9e3cc9f884	Disable stack coloring by default in order to resolve the i386 failures. llvm-svn: 163316	2012-09-06 14:27:06 +00:00
Elena Demikhovsky	42777877c2	AVX2 optimization. Added generation of VPSHUB instruction for <32 x i8> vector shuffle when possible. llvm-svn: 163312	2012-09-06 12:42:01 +00:00
Nadav Rotem	ea0d36be95	Fix the test by specifying an exact cpu model. llvm-svn: 163307	2012-09-06 10:33:33 +00:00
Nadav Rotem	7c277da364	Add a new optimization pass: Stack Coloring, that merges disjoint static allocations (allocas). Allocas are known to be disjoint if they are marked by disjoint lifetime markers (@llvm.lifetime.XXX intrinsics). llvm-svn: 163299	2012-09-06 09:17:37 +00:00
Craig Topper	daa5ed1e0a	Add patterns for converting stores of subvector_extracts of lower 128-bits of a 256-bit vector to VMOVAPSmr/VMOVUPSmr. llvm-svn: 163292	2012-09-06 05:15:01 +00:00
Preston Gurd	cdf540d5d6	Generic Bypass Slow Div - CodeGenPrepare pass for identifying div/rem ops - Backend specifies the type mapping using addBypassSlowDivType - Enabled only for Intel Atom with O2 32-bit -> 8-bit - Replace IDIV with instructions which test its value and use DIVB if the value is positive and less than 256. - In the case when the quotient and remainder of a divide are used a DIV and a REM instruction will be present in the IR. In the non-Atom case they are both lowered to IDIVs and CSE removes the redundant IDIV instruction, using the quotient and remainder from the first IDIV. However, due to this optimization CSE is not able to eliminate redundant IDIV instructions because they are located in different basic blocks. This is overcome by calculating both the quotient (DIV) and remainder (REM) in each basic block that is inserted by the optimization and reusing the result values when a subsequent DIV or REM instruction uses the same operands. - Test cases check for the presents of the optimization when calculating either the quotient, remainder, or both. Patch by Tyler Nowicki! llvm-svn: 163150	2012-09-04 18:22:17 +00:00
Elena Demikhovsky	cbe99bbb36	This patch optimizes shuffle instruction - generates 2 instructions instead of 4. Since this specific shuffle is widely used in many workloads we have ~10% performance on them. shufflevector <8 x float> %A, <8 x float> %B, <8 x i32> <i32 0, i32 8, i32 2, i32 10, i32 4, i32 12, i32 6, i32 14> vmovaps (%rdx), %ymm0 vshufps $8, %ymm0, %ymm0, %ymm0 vmovaps (%rcx), %ymm1 vshufps $8, %ymm0, %ymm1, %ymm1 vunpcklps %ymm0, %ymm1, %ymm0 vmovaps (%rcx), %ymm0 vmovsldup (%rdx), %ymm1 vblendps $85, %ymm0, %ymm1, %ymm0 llvm-svn: 163134	2012-09-04 12:49:02 +00:00
Pete Cooper	2117ac40c9	Revert "Take account of boolean vector contents when promoting a build vector from i1 to some other type. rdar://problem/12210060" This reverts commit 5dd9e214fb92847e947f9edab170f9b4e52b908f. Thanks to Duncan for explaining how this should have been done. Conflicts: test/CodeGen/X86/vec_select.ll llvm-svn: 163064	2012-09-01 17:37:55 +00:00
NAKAMURA Takumi	d35a4ff88b	llvm/test/CodeGen/X86/fp-fast.ll: Suppress FMA4 on AMD Bulldozer host, corresponding to r162999. llvm-svn: 163041	2012-09-01 00:26:28 +00:00
Manman Ren	3590361bf0	Fix Atom bots for r163036. llvm-svn: 163040	2012-09-01 00:17:06 +00:00
Manman Ren	26c5d0f607	SelectionDAG: when constructing VZEXT_LOAD from other loads, make sure its output chain is correctly setup. As an example, if the original load must happen before later stores, we need to make sure the constructed VZEXT_LOAD is constrained to be before the stores. rdar://11457792 llvm-svn: 163036	2012-08-31 23:16:57 +00:00
Craig Topper	908e685102	Mark FMA4 instructions as commutable and add them to the folding tables. llvm-svn: 163035	2012-08-31 23:10:34 +00:00
Michael Liao	3224543bf9	Fix PR12359 - In addition to undefined, if V2 is zero vector, skip 2nd PSHUFB and POR as well as PSHUFB will zero elements with negative indices. Patch by Sriram Murali <sriram.murali@intel.com> llvm-svn: 163018	2012-08-31 20:12:31 +00:00
Craig Topper	c0387f6b23	Mark FMA3 instructions as commutable so that the operands to the multiply part can be commuted. llvm-svn: 163001	2012-08-31 16:31:13 +00:00
Craig Topper	c30fdbc46c	Add support for converting llvm.fma to fma4 instructions. llvm-svn: 162999	2012-08-31 15:40:30 +00:00
Jakob Stoklund Olesen	96f87069c4	Don't enforce ordered inline asm operands. I was too optimistic, inline asm can have tied operands that don't follow the def order. Fixes PR13742. llvm-svn: 162998	2012-08-31 15:34:59 +00:00
NAKAMURA Takumi	2762dadf2c	llvm/test/CodeGen/X86/vec_select.ll: Fix failure on xmm-less hosts, to add -mattr=+sse2. FIXME: Should this be tested with both +avx and -avx,+sse2? llvm-svn: 162983	2012-08-31 10:02:22 +00:00
Pete Cooper	e969340fea	Take account of boolean vector contents when promoting a build vector from i1 to some other type. rdar://problem/12210060 llvm-svn: 162960	2012-08-30 23:58:52 +00:00
Owen Anderson	d1545e3715	Try to make this test more generic to unbreak buildbots. llvm-svn: 162958	2012-08-30 23:51:20 +00:00
Owen Anderson	cc61f87cf7	Teach the DAG combiner to turn chains of FADDs (x+x+x+x+...) into FMULs by constants. This is only enabled in unsafe FP math mode, since it does not preserve rounding effects for all such constants. llvm-svn: 162956	2012-08-30 23:35:16 +00:00
Michael Liao	bbd10792c2	Introduce 'UseSSEx' to force SSE legacy encoding - Add 'UseSSEx' to force SSE legacy insn not being selected when AVX is enabled. As the penalty of inter-mixing SSE and AVX instructions, we need prevent SSE legacy insn from being generated except explicitly specified through some intrinsics. For patterns supported by both SSE and AVX, so far, we force AVX insn will be tried first relying on AddedComplexity or position in td file. It's error-prone and introduces bugs accidentally. 'UseSSEx' is disabled when AVX is turned on. For SSE insns inherited by AVX, we need this predicate to force VEX encoding or SSE legacy encoding only. For insns not inherited by AVX, we still use the previous predicates, i.e. 'HasSSEx'. So far, these insns fall into the following categories: * SSE insns with MMX operands * SSE insns with GPR/MEM operands only (xFENCE, PREFETCH, CLFLUSH, CRC, and etc.) * SSE4A insns. * MMX insns. * x87 insns added by SSE. 2 test cases are modified: - test/CodeGen/X86/fast-isel-x86-64.ll AVX code generation is different from SSE one. 'vcvtsi2sdq' cannot be selected by fast-isel due to complicated pattern and fast-isel fallback to materialize it from constant pool. - test/CodeGen/X86/widen_load-1.ll AVX code generation is different from SSE one after fixing SSE/AVX inter-mixing. Exec-domain fixing prefers 'vmovapd' instead of 'vmovaps'. llvm-svn: 162919	2012-08-30 16:54:46 +00:00
Michael Liao	271f11b571	Should put test case under test/ExecutionEngine/MCJIT/ llvm-svn: 162885	2012-08-30 00:43:57 +00:00
Michael Liao	3c8980646b	Fix PR13727 - The root cause is that target constant materialization in X86 fast-isel creates a PC-rel addressing which may overflow 32-bit range in non-Small code model if .rodata section is allocated too far away from code segment in MCJIT, which uses Large code model so far. - Follow the similar logic to fix non-Small code model in fast-isel by skipping non-Small code model. llvm-svn: 162881	2012-08-30 00:30:16 +00:00
Bill Wendling	cc56718038	The commutative flag is already correctly set within the multiclass. If we set it here, then a 'register-memory' version would wrongly get the commutative flag. <rdar://problem/12180135> llvm-svn: 162741	2012-08-28 07:36:46 +00:00
Craig Topper	bd509eea4a	Merge AVX_SET0PSY/AVX_SET0PDY/AVX2_SET0 into a single post-RA pseudo. llvm-svn: 162738	2012-08-28 07:05:28 +00:00
NAKAMURA Takumi	cdfe1d1cdb	llvm/test/CodeGen/X86/pr12312.ll: Add -mtriple=x86_64-unknown-unknown. llvm-svn: 162736	2012-08-28 04:04:29 +00:00
Michael Liao	b7d85b6328	Fix PR12312 - Add a target-specific DAG optimization to recognize a pattern PTEST-able. Such a pattern is a OR'd tree with X86ISD::OR as the root node. When X86ISD::OR node has only its flag result being used as a boolean value and all its leaves are extracted from the same vector, it could be folded into an X86ISD::PTEST node. llvm-svn: 162735	2012-08-28 03:34:40 +00:00
NAKAMURA Takumi	fee50c8cf1	llvm/test/CodeGen/X86/fma.ll: Add -march=x86, or two tests would fail on non-x86 hosts. llvm-svn: 162667	2012-08-27 11:50:26 +00:00
NAKAMURA Takumi	10eb4cfc3e	llvm/test/CodeGen/X86/fma_patterns.ll: Add -mtriple=x86_64. It was incompatible on i686 and Windows x64. llvm-svn: 162664	2012-08-27 09:37:54 +00:00
Craig Topper	bfc1d0ed48	Commit test change for r162658. llvm-svn: 162660	2012-08-27 07:55:50 +00:00
Anitha Boyapati	0dd589c5f1	FMA3 tests on bdver2 target for changes made in rev 162012. Also made corresponding changes to existing tests for darwin triple to ensure that same pattern is tested for bdver2 target. llvm-svn: 162655	2012-08-27 06:59:01 +00:00
Craig Topper	9e4f0aae17	Make sure that FMA3 is favored even when FMA4 is also enabled. Test case for r162454. llvm-svn: 162653	2012-08-27 03:38:15 +00:00
Michael Liao	10ff96ce8c	fix a case where all operands of BUILD_VECTOR are undefined llvm-svn: 162214	2012-08-20 17:59:18 +00:00
Nadav Rotem	178250ad87	When unsafe math is used, we can use commutative FMAX and FMIN. In some cases this allows for better code generation. Added a new DAGCombine transformation to convert FMAX and FMIN to FMANC and FMINC, which are commutative. For example: movaps %xmm0, %xmm1 movsd LC(%rip), %xmm0 minsd %xmm1, %xmm0 becomes: minsd LC(%rip), %xmm0 llvm-svn: 162187	2012-08-19 13:06:16 +00:00
Nadav Rotem	a136939fa9	Reapply r162160 with a fix: Optimize Arith->Trunc->SETCC sequence to allow better compare/branch code. llvm-svn: 162172	2012-08-18 17:53:03 +00:00
Nadav Rotem	c324af609e	Revert r162160 because it made a few buildbots fail. llvm-svn: 162164	2012-08-18 05:02:36 +00:00
Nadav Rotem	2cb14a5c4b	The X86 backend has a number of optimizations for SETCC nodes which use arithmetic instructions. However, when small data types are used, a truncate node appears between the SETCC node and the arithmetic operation. This patch adds support for this pattern. Before: xorl %esi, %edi testb %dil, %dil setne %al ret After: xorb %dil, %sil setne %al ret rdar://12081007 llvm-svn: 162160	2012-08-18 02:43:28 +00:00
Eli Friedman	79a6b30d8a	Make atomic load and store of pointers work. Tighten verification of atomic operations so other unexpected operations don't slip through. Based on patch by Logan Chien. PR11786/PR13186. llvm-svn: 162146	2012-08-17 23:24:29 +00:00
Benjamin Kramer	ca7ca4f6c6	TargetLowering: Use the large shift amount during legalize types. The legalizer may call us with an overly large type. llvm-svn: 162101	2012-08-17 15:54:21 +00:00
Benjamin Kramer	2f47a3fb07	Fix broken check lines. I really need to find a way to automate this, but I can't come up with a regex that has no false positives while handling tricky cases like custom check prefixes. llvm-svn: 162097	2012-08-17 12:28:26 +00:00
Bill Wendling	d63f1f5a9c	Remove invalid test. This test requires that dead basic blocks be kept around. That's not how we do things. Besides, the commit message tells us that it is covered by the GCC test suite. ------------------------------------------------------------------------ r127497 \| zwarich \| 2011-03-11 13:51:56 -0800 (Fri, 11 Mar 2011) \| 3 lines Fix the GCC test suite issue exposed by r127477, which was caused by stack protector insertion not working correctly with unreachable code. Since that revision was rolled out, this test doesn't actual fail before this fix. ------------------------------------------------------------------------ llvm-svn: 161985	2012-08-15 20:54:09 +00:00
Michael Liao	34107b9177	fix PR11334 - FP_EXTEND only support extending from vectors with matching elements. This results in the scalarization of extending to v2f64 from v2f32, which will be legalized to v4f32 not matching with v2f64. - add X86-specific VFPEXT supproting extending from v4f32 to v2f64. - add BUILD_VECTOR lowering helper to recover back the original extending from v4f32 to v2f64. - test case is enhanced to include different vector width. llvm-svn: 161894	2012-08-14 21:24:47 +00:00
Nadav Rotem	70409991bc	During the CodeGenPrepare we often lower intrinsics (such as objsize) and allow some optimizations to turn conditional branches into unconditional. This commit adds a simple control-flow optimization which merges two consecutive basic blocks which are connected by a single edge. This allows the codegen to operate on larger basic blocks. rdar://11973998 llvm-svn: 161852	2012-08-14 05:19:07 +00:00
Bill Wendling	72baa6eeae	Rename test since it's not linux-specific. llvm-svn: 161792	2012-08-13 21:32:42 +00:00
Jakob Stoklund Olesen	83a927d84a	Handle extra Tail predecessors in if-conversion. It is still possible to if-convert if the tail block has extra predecessors, but the tail phis must be rewritten instead of being removed. llvm-svn: 161781	2012-08-13 20:49:04 +00:00
Manman Ren	9746b33e26	Fix failure on Atom bot due to r161769 llvm-svn: 161777	2012-08-13 19:34:29 +00:00
Manman Ren	959acb106b	X86: move Int_CVTSD2SSrr, Int_CVTSI2SSrr, Int_CVTSI2SDrr, Int_CVTSS2SDrr from OpTbl1 to OpTbl2 since they have 3 operands and the last operand can be changed to a memory operand. PR13576 llvm-svn: 161769	2012-08-13 18:29:41 +00:00
Michael Liao	e7e828fd64	fix PR13577, an issue introduced by r161687 - FCMOV only supports a subset of X86 conditions. Skip boolean simplification if X86 condition is not valid for FCMOV. - add a minimal test case for PR13577. llvm-svn: 161732	2012-08-11 23:47:06 +00:00
Benjamin Kramer	ef6494f24d	PR13578: Teach MachineCSE that instructions that use a constant register can be CSE'd safely. This is common e.g. when doing rip-relative addressing on x86_64. llvm-svn: 161728	2012-08-11 19:05:13 +00:00
Manman Ren	1acb6707cd	X86: when we are auto-detecting the subtarget features, make sure we turn on FeatureFastUAMem for Nehalem, Westmere and Sandy Bridge. FeatureFastUAMem is already on if we pass in nehalem or westmere as a command argument. rdar: 7252306 llvm-svn: 161717	2012-08-10 23:43:32 +00:00
Michael Liao	5248e9913f	add X86-specific DAG optimization to simplify boolean test - if a boolean test (X86ISD::CMP or X86ISD:SUB) checks a boolean value generated from X86ISD::SETCC, try to simplify the boolean value generation and checking by reusing the original EFLAGS with proper condition code - add hooks to X86 specific SETCC/BRCOND/CMOV, the major 3 places consuming EFLAGS part of patches fixing PR12312 llvm-svn: 161687	2012-08-10 19:58:13 +00:00
Jakob Stoklund Olesen	8c28ac9ec9	Update edge weights correctly in replaceSuccessor(). When replacing Old with New, it can happen that New is already a successor. Add the old and new edge weights instead of creating a duplicate edge. llvm-svn: 161653	2012-08-10 03:23:27 +00:00
Jakob Stoklund Olesen	d9b66506a3	Reapply r161633-161634 "Partition use lists so defs always come before uses."" No changes to these patches, MRI needed to be notified when changing uses into defs and vice versa. llvm-svn: 161644	2012-08-10 00:21:30 +00:00
Jakob Stoklund Olesen	acd27c9279	Revert r161633-161634 "Partition use lists so defs always come before uses." These commits broke a number of buildbots. llvm-svn: 161640	2012-08-09 23:31:36 +00:00
Jakob Stoklund Olesen	df01e00710	Partition use lists so defs always come before uses. This makes it possible to speed up def_iterator by stopping at the first use. This makes def_empty() and getUniqueVRegDef() much faster when there are many uses. In a +Asserts build, LiveVariables is 100x faster in one case because getVRegDef() has an assertion that would scan to the end of a def_iterator chain. Spill weight calculation is significantly faster (300x in one case) because isTriviallyReMaterializable() calls MRI->isConstantPhysReg(%RIP) which calls def_empty(%RIP). llvm-svn: 161634	2012-08-09 22:49:46 +00:00
Jakob Stoklund Olesen	7d7051ca3c	Don't use pointer-pointers for the register use lists. Use a more conventional doubly linked list where the Prev pointers form a cycle. This means it is no longer necessary to adjust the Prev pointers when reallocating the VRegInfo array. The test changes are required because the register allocation hint is using the use-list order to break ties. llvm-svn: 161633	2012-08-09 22:49:42 +00:00
Bob Wilson	4c65c505e0	Add test triples to fix win32 failures. Revert workaround from r161292. I don't have a win32 system to test, so hopefully I got them all fixed here. llvm-svn: 161519	2012-08-08 20:31:37 +00:00
Manman Ren	1be131ba27	X86: enable CSE between CMP and SUB We perform the following: 1> Use SUB instead of CMP for i8,i16,i32 and i64 in ISel lowering. 2> Modify MachineCSE to correctly handle implicit defs. 3> Convert SUB back to CMP if possible at peephole. Removed pattern matching of (a>b) ? (a-b):0 and like, since they are handled by peephole now. rdar://11873276 llvm-svn: 161462	2012-08-08 00:51:41 +00:00
Evan Cheng	fbdd25c135	X86 cmp lowering is looking past truncate on the condition node. It should only do so when the high bits are known zero. This caused a subtle miscompilation. rdar://12027825 llvm-svn: 161451	2012-08-07 22:21:00 +00:00
Chandler Carruth	881d0a7966	Add a much more conservative strategy for aligning branch targets. Previously, MBP essentially aligned every branch target it could. This bloats code quite a bit, especially non-looping code which has no real reason to prefer aligned branch targets so heavily. As Andy said in review, it's still a bit odd to do this without a real cost model, but this at least has much more plausible heuristics. Fixes PR13265. llvm-svn: 161409	2012-08-07 09:45:24 +00:00
Manman Ren	cb36b8c2e6	MachineCSE: Update the heuristics for isProfitableToCSE. If the result of a common subexpression is used at all uses of the candidate expression, CSE should not increase the live range of the common subexpression. rdar://11393714 and rdar://11819721 llvm-svn: 161396	2012-08-07 06:16:46 +00:00
Craig Topper	ab47fe4e16	Implement proper handling for pcmpistri/pcmpestri intrinsics. Requires custom handling in DAGISelToDAG due to limitations in TableGen's implicit def handling. Fixes PR11305. llvm-svn: 161318	2012-08-06 06:22:36 +00:00
Craig Topper	812005e562	Update test to check for r161305 llvm-svn: 161307	2012-08-05 09:06:28 +00:00
Bob Wilson	874886cd66	Refactor and check "onlyReadsMemory" before optimizing builtins. This patch is mostly just refactoring a bunch of copy-and-pasted code, but it also adds a check that the call instructions are readnone or readonly. That check was already present for sin, cos, sqrt, log2, and exp2 calls, but it was missing for the rest of the builtins being handled in this code. llvm-svn: 161282	2012-08-03 23:29:17 +00:00
Bob Wilson	fa59485b94	Fix memcmp code-gen to honor -fno-builtin. I noticed that SelectionDAGBuilder::visitCall was missing a check for memcmp in TargetLibraryInfo, so that it would use custom code for memcmp calls even with -fno-builtin. I also had to add a new -disable-simplify-libcalls option to llc so that I could write a test for this. llvm-svn: 161262	2012-08-03 21:26:18 +00:00
Bob Wilson	3e6fa462f3	Fall back to selection DAG isel for calls to builtin functions. Fast isel doesn't currently have support for translating builtin function calls to target instructions. For embedded environments where the library functions are not available, this is a matter of correctness and not just optimization. Most of this patch is just arranging to make the TargetLibraryInfo available in fast isel. <rdar://problem/12008746> llvm-svn: 161232	2012-08-03 04:06:28 +00:00
Manman Ren	ba8122cc25	X86 Peephole: fold loads to the source register operand if possible. Add more comments and use early returns to reduce nesting in isLoadFoldable. Also disable folding for V_SET0 to avoid introducing a const pool entry and a const pool load. rdar://10554090 and rdar://11873276 llvm-svn: 161207	2012-08-02 19:37:32 +00:00
NAKAMURA Takumi	7020f51622	llvm/test/CodeGen/X86/fold-pcmpeqd-1.ll: Make sure this is testing without +avx. FIXME: Could +avx be checked here too? llvm-svn: 161156	2012-08-02 06:36:56 +00:00
NAKAMURA Takumi	aaca1e690d	llvm/test/CodeGen/X86/fold-pcmpeqd-1.ll: Rewrite expressions to pass regardless of PR11031. - Relax to match even if epilogue (pop %ebp) were emitted. - Assume the return value is stored to %xmm0. llvm-svn: 161155	2012-08-02 06:33:58 +00:00
Manman Ren	5759d01230	X86 Peephole: fold loads to the source register operand if possible. Machine CSE and other optimizations can remove instructions so folding is possible at peephole while not possible at ISel. This patch is a rework of r160919 and was tested on clang self-host on my local machine. rdar://10554090 and rdar://11873276 llvm-svn: 161152	2012-08-02 00:56:42 +00:00
Matt Beaumont-Gay	7947aecaf1	Line endings. llvm-svn: 161117	2012-08-01 16:42:35 +00:00
Elena Demikhovsky	3cb3b0045c	Added FMA functionality to X86 target. llvm-svn: 161110	2012-08-01 12:06:00 +00:00
Chad Rosier	710be7df71	[x86 frame lowering] In 32-bit mode, use ESI as the base pointer. Previously, we were using EBX, but PIC requires the GOT to be in EBX before function calls via PLT GOT pointer. llvm-svn: 161066	2012-07-31 18:29:21 +00:00
Manman Ren	8c549b586c	MachineSink: Sort the successors before trying to find SuccToSinkTo. One motivating example is to sink an instruction from a basic block which has two successors: one outside the loop, the other inside the loop. We should try to sink the instruction outside the loop. rdar://11980766 llvm-svn: 161062	2012-07-31 18:10:39 +00:00
Manman Ren	2b6a0dfd4c	Reverse order of the two branches at end of a basic block if it is profitable. We branch to the successor with higher edge weight first. Convert from je LBB4_8 --> to outer loop jmp LBB4_14 --> to inner loop to jne LBB4_14 jmp LBB4_8 PR12750 rdar: 11393714 llvm-svn: 161018	2012-07-31 01:11:07 +00:00
Pete Cooper	91244268d7	Consider address spaces for hashing and CSEing DAG nodes. Otherwise two loads from different x86 segments but the same address would get CSEd llvm-svn: 160987	2012-07-30 20:23:19 +00:00
Manman Ren	f87dd7c01b	Revert r160920 and r160919 due to dragonegg and clang selfhost failure llvm-svn: 160927	2012-07-29 02:44:09 +00:00
Manman Ren	9de95e779c	X86 Peephole: fold loads to the source register operand if possible. Trying to fix the bot by specifying a triple in the failing testing cases. llvm-svn: 160920	2012-07-28 17:51:24 +00:00
Manman Ren	0fa3ab88ba	X86 Peephole: fold loads to the source register operand if possible. Machine CSE and other optimizations can remove instructions so folding is possible at peephole while not possible at ISel. rdar://10554090 and rdar://11873276 llvm-svn: 160919	2012-07-28 16:48:01 +00:00
Manman Ren	32367c063b	X86 Peephole: fix PR13475 in optimizeCompare. It is possible that an instruction can use and update EFLAGS. When checking the safety, we should check the usage of EFLAGS first before declaring it is safe to optimize due to the update. llvm-svn: 160912	2012-07-28 03:15:46 +00:00
Evan Cheng	249716e8ae	Teach CodeGenPrep to look past bitcast when it's duplicating return instruction into predecessor blocks to enable tail call optimization. rdar://11958338 llvm-svn: 160894	2012-07-27 21:21:26 +00:00
Jakob Stoklund Olesen	bc65e8f94e	Add <imp-def> of super-register when lowering SUBREG_TO_REG. Patch by Tyler Nowicki! llvm-svn: 160888	2012-07-27 20:19:49 +00:00
Jakob Stoklund Olesen	ceee4a9d0c	Eliminate a batch of uses of sub_ss and sub_sd in the X86 target. These idempotent sub-register indices don't do anything --- They simply map XMM registers to themselves. They no longer affect register classes either since the SubRegClasses field has been removed from Target.td. This patch replaces XMM->XMM EXTRACT_SUBREG and INSERT_SUBREG patterns with COPY_TO_REGCLASS patterns which simply become COPY instructions. The number of IMPLICIT_DEF instructions before register allocation is reduced, and that is the cause of the test case changes. llvm-svn: 160816	2012-07-26 21:40:42 +00:00
Manman Ren	e8c6b15137	Update testing case for Atom when disabling rematerialization in TwoAddressInstructionPass. The generated code for Atom has a different code sequence. This is realted to commit r160749. llvm-svn: 160755	2012-07-25 20:17:14 +00:00
Manman Ren	cc1dc6dc11	Disable rematerialization in TwoAddressInstructionPass. It is redundant; RegisterCoalescer will do the remat if it can't eliminate the copy. Collected instruction counts before and after this. A few extra instructions are generated due to spilling but it is normal to see these kinds of changes with almost any small codegen change, according to Jakob. This also fixed rdar://11830760 where xor is expected instead of movi0. llvm-svn: 160749	2012-07-25 18:28:13 +00:00
Rafael Espindola	11c38b9657	When a return struct pointer is passed in registers, the called has nothing to pop. llvm-svn: 160725	2012-07-25 13:41:10 +00:00
Rafael Espindola	a92cf29f0d	Add a cpu to the test. Should fix the atom bot. llvm-svn: 160701	2012-07-24 22:56:06 +00:00
Rafael Espindola	f30e9bfb90	Add a triple to the test. llvm-svn: 160698	2012-07-24 21:55:04 +00:00
Rafael Espindola	a44e193a11	In order to correctly compile struct s { double x1; float x2; }; __attribute__((regparm(3))) struct s f(int a, int b, int c); void g(void) { f(41, 42, 43); } We need to be able to represent passing the address of s to f (sret) in a register (inreg). Turns out that all that is needed is to not mark them as mutually incompatible. llvm-svn: 160695	2012-07-24 21:40:17 +00:00
David Chisnall	5b8c1680de	ELF does not imply GNU/Linux. Do not assume GNU conventions just because we are targeting an ELF platform. Only fold gs-relative (and fs-relative) loads if it is actually sensible to do so for the target platform. This fixes PR13438. llvm-svn: 160687	2012-07-24 20:04:16 +00:00
Sylvestre Ledru	35521e2310	Fix a typo (the the => the) llvm-svn: 160621	2012-07-23 08:51:15 +00:00
Nadav Rotem	9056076cab	Fixed DAGCombine optimizations which generate select_cc for targets that do not support it (X86 does not lower select_cc). PR: 13428 Together with Michael Kuperstein <michael.m.kuperstein@intel.com> llvm-svn: 160619	2012-07-23 07:59:50 +00:00
Jakob Stoklund Olesen	e2cfd0d45a	Avoid folding loads that are unsafe to move. LiveRangeEdit::foldAsLoad() can eliminate a register by folding a load into its only use. Only do that when the load is safe to move, and it won't extend any live ranges. This fixes PR13414. llvm-svn: 160575	2012-07-20 21:29:31 +00:00
Jakob Stoklund Olesen	f62c07f147	Split loop exiting edges more aggressively. PHIElimination splits critical edges when it predicts it can resolve interference and eliminate copies. It doesn't split the edge if the interference wouldn't be resolved anyway because the phi-use register is live in the critical edge anyway. Teach PHIElimination to split loop exiting edges with interference, even if it wouldn't resolve the interference. This removes the necessary copies from the loop, which is still an improvement from injecting the copies into the loop. The test case demonstrates the improvement. Before: LBB0_1: cmpb $0, (%rdx) leaq 1(%rdx), %rdx movl %esi, %eax je LBB0_1 After: LBB0_1: cmpb $0, (%rdx) leaq 1(%rdx), %rdx je LBB0_1 movl %esi, %eax llvm-svn: 160571	2012-07-20 20:49:53 +00:00
Preston Gurd	f2ea70ae4a	Fix remaining lit tests which were failing when run on an Atom processor. Patches by Tyler Nowicki, Andy Zhang, and Preston Gurd! llvm-svn: 160520	2012-07-19 18:53:21 +00:00
Manman Ren	d0a4ee8427	X86: remove redundant cmp against zero. Updated OptimizeCompare in peephole to remove redundant cmp against zero. We only remove Compare if CF and OF are not used. rdar://11855129 llvm-svn: 160454	2012-07-18 21:40:01 +00:00
Preston Gurd	f0a48ec8f1	This patch fixes 8 out of 20 unexpected failures in "make check" when run on an Intel Atom processor. The failures have arisen due to changes elsewhere in the trunk over the past 8 weeks or so. These failures were not detected by the Atom buildbot because the CPU on the Atom buildbot was not being detected as an Atom CPU. The fix for this problem is in Host.cpp and X86Subtarget.cpp, but shall remain commented out until the current set of Atom test failures are fixed. Patch by Andy Zhang and Tyler Nowicki! llvm-svn: 160451	2012-07-18 20:49:17 +00:00
Evan Cheng	f73d7553cc	Add test case for r160387 llvm-svn: 160389	2012-07-17 19:40:05 +00:00
Nadav Rotem	277a40bc0a	Fix a crash in the legalization of large vectors. When truncating a result of a vector that is split we need to use the result of the split vector, and not re-split the dead node. llvm-svn: 160357	2012-07-17 09:07:37 +00:00
Evan Cheng	780f9b5f92	Implement r160312 as target indepedenet dag combine. llvm-svn: 160354	2012-07-17 08:31:11 +00:00
Evan Cheng	f579beca6d	This is another case where instcombine demanded bits optimization created large immediates. Add dag combine logic to recover in case the large immediates doesn't fit in cmp immediate operand field. int foo(unsigned long l) { return (l>> 47) == 1; } we produce %shr.mask = and i64 %l, -140737488355328 %cmp = icmp eq i64 %shr.mask, 140737488355328 %conv = zext i1 %cmp to i32 ret i32 %conv which codegens to movq $0xffff800000000000,%rax andq %rdi,%rax movq $0x0000800000000000,%rcx cmpq %rcx,%rax sete %al movzbl %al,%eax ret TargetLowering::SimplifySetCC would transform (X & -256) == 256 -> (X >> 8) == 1 if the immediate fails the isLegalICmpImmediate() test. For x86, that's immediates which are not a signed 32-bit immediate. Based on a patch by Eli Friedman. PR10328 rdar://9758774 llvm-svn: 160346	2012-07-17 06:53:39 +00:00
Evan Cheng	75315b877c	For something like uint32_t hi(uint64_t res) { uint_32t hi = res >> 32; return !hi; } llvm IR looks like this: define i32 @hi(i64 %res) nounwind uwtable ssp { entry: %lnot = icmp ult i64 %res, 4294967296 %lnot.ext = zext i1 %lnot to i32 ret i32 %lnot.ext } The optimizer has optimize away the right shift and truncate but the resulting constant is too large to fit in the 32-bit immediate field. The resulting x86 code is worse as a result: movabsq $4294967296, %rax ## imm = 0x100000000 cmpq %rax, %rdi sbbl %eax, %eax andl $1, %eax This patch teaches the x86 lowering code to handle ult against a large immediate with trailing zeros. It will issue a right shift and a truncate followed by a comparison against a shifted immediate. shrq $32, %rdi testl %edi, %edi sete %al movzbl %al, %eax It also handles a ugt comparison against a large immediate with trailing bits set. i.e. X > 0x0ffffffff -> (X >> 32) >= 1 rdar://11866926 llvm-svn: 160312	2012-07-16 19:35:43 +00:00
Nadav Rotem	839a06e9d7	Make ComputeDemandedBits return a deterministic result when computing an AssertZext value. In the added testcase the constant 55 was behind an AssertZext of type i1, and ComputeDemandedBits reported that some of the bits were both known to be one and known to be zero. Together with Michael Kuperstein <michael.m.kuperstein@intel.com> llvm-svn: 160305	2012-07-16 18:34:53 +00:00
Alexey Samsonov	893d3d336a	Fix tests that failed on i686-win32 after r160248: 1. FileCheck-ize epilogue.ll and allow another asm instruction to restore %rsp. 2. Remove check in widen_arith-3.ll that was hitting instruction in epilogue instead of vector add. llvm-svn: 160274	2012-07-16 14:33:36 +00:00
Nadav Rotem	4968e45b9f	Fix a bug in the 3-address conversion of LEA when one of the operands is an undef virtual register. The problem is that ProcessImplicitDefs removes the definition of the register and marks all uses as undef. If we lose the undef marker then we get a register which has no def, is not marked as undef. The live interval analysis does not collect information for these virtual registers and we crash in later passes. Together with Michael Kuperstein <michael.m.kuperstein@intel.com> llvm-svn: 160260	2012-07-16 10:52:25 +00:00
Alexey Samsonov	dcc1291d17	This CL changes the function prologue and epilogue emitted on X86 when stack needs realignment. It is intended to fix PR11468. Old prologue and epilogue looked like this: push %rbp mov %rsp, %rbp and $alignment, %rsp push %r14 push %r15 ... pop %r15 pop %r14 mov %rbp, %rsp pop %rbp The problem was to reference the locations of callee-saved registers in exception handling: locations of callee-saved had to be re-calculated regarding the stack alignment operation. It would take some effort to implement this in LLVM, as currently MachineLocation can only have the form "Register + Offset". Funciton prologue and epilogue are now changed to: push %rbp mov %rsp, %rbp push %14 push %15 and $alignment, %rsp ... lea -$size_of_saved_registers(%rbp), %rsp pop %r15 pop %r14 pop %rbp Reviewed by Chad Rosier. llvm-svn: 160248	2012-07-16 06:54:09 +00:00
Nadav Rotem	eec74c7279	Teach getTargetVShiftNode about TargetConstant nodes. llvm-svn: 160234	2012-07-15 20:27:43 +00:00
NAKAMURA Takumi	032dc0a06c	llvm/test/CodeGen/X86/2012-07-15-broadcastfold.ll: Rewrite expressions to fit various targets. - Make sure existence of "barrier". - Confirm reload corresponding to spill. llvm-svn: 160232	2012-07-15 14:38:35 +00:00
Nadav Rotem	ee3552f88d	Rename VBROADCASTSDrm into VBROADCASTSDYrm to match the naming convention. Allow the folding of vbroadcastRR to vbroadcastRM, where the memory operand is a spill slot. PR12782. Together with Michael Kuperstein <michael.m.kuperstein@intel.com> llvm-svn: 160230	2012-07-15 12:26:30 +00:00
Nadav Rotem	9466e81df6	AVX: Fix a bug in getTargetVShiftNode. The shift amount has to be a 128bit vector with the same element type as the input vector. This is needed because of the patterns we have for the VP[SLL/SRA/SRL][W/D/Q] instructions. llvm-svn: 160222	2012-07-14 22:26:05 +00:00
Nadav Rotem	018921002e	Add a dagcombine optimization to convert concat_vectors of undefs into a single undef. The unoptimized concat_vectors isd prevented the canonicalization of the vector_shuffle node. llvm-svn: 160221	2012-07-14 21:30:27 +00:00
Duncan Sands	a9c373e49d	Restrict this to x86, hopefully fixing ARM buildbots. llvm-svn: 160163	2012-07-13 07:02:00 +00:00
Benjamin Kramer	4d0916788d	Give the rdrand instructions a SideEffect flag and a chain so MachineCSE and MachineLICM don't touch it. I already had the necessary things in place for IR-level passes but missed the machine passes. llvm-svn: 160137	2012-07-12 18:14:57 +00:00
Nadav Rotem	fdce33a495	The LIT tests below do not specify the exact cpu model and fail on AVX2 machines, because we select different instructions such as vbroadcast, new shuffles, etc. Patch by Michael Liao. llvm-svn: 160129	2012-07-12 13:45:15 +00:00
NAKAMURA Takumi	f415fe70f3	llvm/test/CodeGen/X86/rdrand.ll: Relax expression corresponding to Win64 CC. llvm-svn: 160124	2012-07-12 10:22:57 +00:00
Benjamin Kramer	cbac2f3bc9	Use %s instead of the explicit name, the latter doesn't work in out-of-tree builds. llvm-svn: 160120	2012-07-12 09:36:29 +00:00
Benjamin Kramer	0ab2794eda	Add intrinsics for Ivy Bridge's rdrand instruction. The rdrand/cmov sequence is the same that is emitted by both GCC and ICC. Fixes PR13284. llvm-svn: 160117	2012-07-12 09:31:43 +00:00
Duncan Sands	671cc2575d	The result type of EXTRACT_VECTOR_ELT doesn't have to match the element type of the input vector, it can be bigger (this is helpful for powerpc where <2 x i16> is a legal vector type but i16 isn't a legal type, IIRC). However this wasn't being taken into account by ExpandRes_EXTRACT_VECTOR_ELT, causing PR13220. Lightly tweaked version of a patch by Michael Liao. llvm-svn: 160116	2012-07-12 09:01:35 +00:00
Craig Topper	f7755df776	Update GATHER instructions to support 2 read-write operands. Patch from myself and Manman Ren. llvm-svn: 160110	2012-07-12 06:52:41 +00:00
Manman Ren	1553ce0e81	X86: Update to peephole optimization to move Movr0 before (Sub, Cmp) pair. When Movr0 is between sub and cmp, we move Movr0 before sub if it enables removal of Cmp. llvm-svn: 160066	2012-07-11 19:35:12 +00:00
Benjamin Kramer	3aab6a86a2	PR13326: Fix a subtle edge case in the udiv -> magic multiply generator. This caused 6 of 65k possible 8 bit udivs to be wrong. llvm-svn: 160058	2012-07-11 18:31:59 +00:00
Nadav Rotem	d2bdcebb14	When ext-loading and trunc-storing vectors to memory, on x86 32bit systems, allow loads/stores of 64bit values from xmm registers. llvm-svn: 160044	2012-07-11 13:27:05 +00:00
Chad Rosier	3ee9a4c29e	Add newline. llvm-svn: 160006	2012-07-10 17:57:00 +00:00
Chad Rosier	579b1fee6b	Add test case accidentally omitted from r160002. llvm-svn: 160004	2012-07-10 17:49:39 +00:00
Chad Rosier	bdb08ac50a	Add support for dynamic stack realignment in the presence of dynamic allocas on X86. Basically, this is a reapplication of r158087 with a few fixes. Specifically, (1) the stack pointer is restored from the base pointer before popping callee-saved registers and (2) in obscure cases (see comments in patch) we must cache the value of the original stack adjustment in the prologue and apply it in the epilogue. rdar://11496434 llvm-svn: 160002	2012-07-10 17:45:53 +00:00
Nadav Rotem	d908ddc186	Improve the loading of load-anyext vectors by allowing the codegen to load multiple scalars and insert them into a vector. Next, we shuffle the elements into the correct places, as before. Also fix a small dagcombine bug in SimplifyBinOpWithSameOpcodeHands, when the migration of bitcasts happened too late in the SelectionDAG process. llvm-svn: 159991	2012-07-10 13:25:08 +00:00
Manman Ren	5f6fa428fa	X86: implement functions to analyze & synthesize CMOV\|SET\|Jcc getCondFromSETOpc, getCondFromCMovOpc, getSETFromCond, getCMovFromCond No functional change intended. If we want to update the condition code of CMOV\|SET\|Jcc, we first analyze the opcode to get the condition code, then update the condition code, finally synthesize the new opcode form the new condition code. llvm-svn: 159955	2012-07-09 18:57:12 +00:00
Manman Ren	bb36074047	X86: Fix optimizeCompare to correctly check safe condition. It is safe if EFLAGS is killed or re-defined. When we are done with the basic block, check whether EFLAGS is live-out. Do not optimize away cmp if EFLAGS is live-out. llvm-svn: 159888	2012-07-07 03:34:46 +00:00
Manman Ren	c965673707	X86: peephole optimization to remove cmp instruction For each Cmp, we check whether there is an earlier Sub which make Cmp redundant. We handle the case where SUB operates on the same source operands as Cmp, including the case where the two source operands are swapped. llvm-svn: 159838	2012-07-06 17:36:20 +00:00
Chad Rosier	88d53eae56	[fast-isel] Tell fast-isel to do nothing with the new donothing intrinsic. llvm-svn: 159837	2012-07-06 17:33:39 +00:00
Duncan Sands	c65aa3f6ae	Attempt to fix windows buildbots. Patch by James Benton. llvm-svn: 159826	2012-07-06 14:43:16 +00:00
NAKAMURA Takumi	4f934676fb	test/CodeGen/X86/sext-setcc-self.ll: Mark it as XFAIL: cygwin,mingw32,win32. Investigating. llvm-svn: 159820	2012-07-06 12:12:39 +00:00
Duncan Sands	0552a2cad2	Use the right kind of booleans: we were emitting 0/1 booleans, instead of 0/-1 booleans. Patch by James Benton. llvm-svn: 159739	2012-07-05 09:32:46 +00:00
Jakob Stoklund Olesen	2dee812445	Ensure CopyToReg nodes are always glued to the call instruction. The CopyToReg nodes that set up the argument registers before a call must be glued to the call instruction. Otherwise, the scheduler may emit the physreg copies long before the call, causing long live ranges for the fixed registers. Besides disabling good register allocation, that can also expose problems when EmitInstrWithCustomInserter() splits a basic block during the live range of a physreg. llvm-svn: 159721	2012-07-04 19:28:31 +00:00
Rafael Espindola	1a7cf13215	Add a testcase for pr13209. It is not a great test, but it still fails if 159509 and 159479 are reverted. It would be really nice to be able to run just the coalescer :-( llvm-svn: 159715	2012-07-04 16:06:00 +00:00
Jakob Stoklund Olesen	49e4d4b3ef	Add early if-conversion support to X86. Implement the TII hooks needed by EarlyIfConversion to create cmov instructions and estimate their latency. Early if-conversion is still not enabled by default. llvm-svn: 159695	2012-07-04 00:09:58 +00:00
NAKAMURA Takumi	dff1a78321	test/CodeGen/X86/sincos.ll: FileCheck-ize. llvm-svn: 159639	2012-07-03 03:59:22 +00:00
NAKAMURA Takumi	10dc235746	test/CodeGen/X86/fabs.ll: FileCheck-ize. llvm-svn: 159638	2012-07-03 03:59:15 +00:00
NAKAMURA Takumi	ff680b1db6	test/CodeGen/X86/2007-09-05-InvalidAsm.ll: FileCheck-ize. llvm-svn: 159637	2012-07-03 03:59:08 +00:00
NAKAMURA Takumi	e5e19e4f7b	test/CodeGen/X86/2004-03-30-Select-Max.ll: FileCheck-ize. llvm-svn: 159636	2012-07-03 03:58:59 +00:00
Chandler Carruth	ff123d5c63	Fix the remaining TCL-style quotes found in the testsuite. This is another mechanical change accomplished though the power of terrible Perl scripts. I have manually switched some "s to 's to make escaping simpler. While I started this to fix tests that aren't run in all configurations, the massive number of tests is due to a really frustrating fragility of our testing infrastructure: things like 'grep -v', 'not grep', and 'expected failures' can mask broken tests all too easily. Essentially, I'm deeply disturbed that I can change the testsuite so radically without causing any change in results for most platforms. =/ llvm-svn: 159547	2012-07-02 19:09:46 +00:00
Chandler Carruth	5da53436d5	Convert the uses of '\|&' to use '2>&1 \|' instead, which works on old versions of Bash. In addition, I can back out the change to the lit built-in shell test runner to support this. This should fix the majority of fallout on Darwin, but I suspect there will be a few straggling issues. llvm-svn: 159544	2012-07-02 18:37:59 +00:00
Chandler Carruth	a5a29f970e	Convert all tests using TCL-style quoting to use shell-style quoting. This was done through the aid of a terrible Perl creation. I will not paste any of the horrors here. Suffice to say, it require multiple staged rounds of replacements, state carried between, and a few nested-construct-parsing hacks that I'm not proud of. It happens, by luck, to be able to deal with all the TCL-quoting patterns in evidence in the LLVM test suite. If anyone is maintaining large out-of-tree test trees, feel free to poke me and I'll send you the steps I used to convert things, as well as answer any painful questions etc. IRC works best for this type of thing I find. Once converted, switch the LLVM lit config to use ShTests the same as Clang. In addition to being able to delete large amounts of Python code from 'lit', this will also simplify the entire test suite and some of lit's architecture. Finally, the test suite runs 33% faster on Linux now. ;] For my 16-hardware-thread (2x 4-core xeon e5520): 36s -> 24s llvm-svn: 159525	2012-07-02 12:47:22 +00:00
Elena Demikhovsky	9af899fa88	Optimization of shuffle node that can fit to the register form of VBROADCAST instruction on AVX2. llvm-svn: 159504	2012-07-01 06:12:26 +00:00
Jakob Stoklund Olesen	3e3cdecf98	Clear kill flags in InstrEmitter::EmitSubregNode(). When a local virtual register is made global, make sure to clear any existing kill flags. llvm-svn: 159461	2012-06-29 21:00:03 +00:00
Rafael Espindola	efdfb1e6b2	In the initial exec mode we always do a load to find the address of a variable. Before this patch in pic 32 bit code we would add the global base register and not load from that address. This is a really old bug, but before the introduction of the tls attributes we would never select initial exec for pic code. llvm-svn: 159409	2012-06-29 04:22:35 +00:00
Manman Ren	98a5bf24a9	X86: add more GATHER intrinsics in LLVM Corrected type for index of llvm.x86.avx2.gather.d.pd.256 from 256-bit to 128-bit. Corrected types for src\|dst\|mask of llvm.x86.avx2.gather.q.ps.256 from 256-bit to 128-bit. Support the following intrinsics: llvm.x86.avx2.gather.d.q, llvm.x86.avx2.gather.q.q llvm.x86.avx2.gather.d.q.256, llvm.x86.avx2.gather.q.q.256 llvm.x86.avx2.gather.d.d, llvm.x86.avx2.gather.q.d llvm.x86.avx2.gather.d.d.256, llvm.x86.avx2.gather.q.d.256 llvm-svn: 159402	2012-06-29 00:54:20 +00:00
Manman Ren	a09820414a	X86: add GATHER intrinsics (AVX2) in LLVM Support the following intrinsics: llvm.x86.avx2.gather.d.pd, llvm.x86.avx2.gather.q.pd llvm.x86.avx2.gather.d.pd.256, llvm.x86.avx2.gather.q.pd.256 llvm.x86.avx2.gather.d.ps, llvm.x86.avx2.gather.q.ps llvm.x86.avx2.gather.d.ps.256, llvm.x86.avx2.gather.q.ps.256 Modified Disassembler to handle VSIB addressing mode. llvm-svn: 159221	2012-06-26 19:47:59 +00:00
Elena Demikhovsky	26088d2e24	Shuffle optimization for AVX/AVX2. The current patch optimizes frequently used shuffle patterns and gives these instruction sequence reduction. Before: vshufps $-35, %xmm1, %xmm0, %xmm2 ## xmm2 = xmm0[1,3],xmm1[1,3] vpermilps $-40, %xmm2, %xmm2 ## xmm2 = xmm2[0,2,1,3] vextractf128 $1, %ymm1, %xmm1 vextractf128 $1, %ymm0, %xmm0 vshufps $-35, %xmm1, %xmm0, %xmm0 ## xmm0 = xmm0[1,3],xmm1[1,3] vpermilps $-40, %xmm0, %xmm0 ## xmm0 = xmm0[0,2,1,3] vinsertf128 $1, %xmm0, %ymm2, %ymm0 After: vshufps $13, %ymm0, %ymm1, %ymm1 ## ymm1 = ymm1[1,3],ymm0[0,0],ymm1[5,7],ymm0[4,4] vshufps $13, %ymm0, %ymm0, %ymm0 ## ymm0 = ymm0[1,3,0,0,5,7,4,4] vunpcklps %ymm1, %ymm0, %ymm0 ## ymm0 = ymm0[0],ymm1[0],ymm0[1],ymm1[1],ymm0[4],ymm1[4],ymm0[5],ymm1[5] llvm-svn: 159188	2012-06-26 08:04:10 +00:00
Andrew Trick	fb2ba3e1cb	Enable the new LoopInfo algorithm by default. The primary advantage is that loop optimizations will be applied in a stable order. This helps debugging and unit test creation. It is also a better overall implementation without pathologically bad performance on deep functions. On large functions (llvm-stress --size=200000 \| opt -loops) Before: 0.1263s After: 0.0225s On deep functions (after tweaking llvm-stress, thanks Nadav): Before: 0.2281s After: 0.0227s See r158790 for more comments. The loop tree is now consistently generated in forward order, but loop passes are applied in reverse order over the program. If we have a loop optimization that prefers forward order, that can easily be achieved by adding a different type of LoopPassManager. llvm-svn: 159183	2012-06-26 04:11:38 +00:00
Eli Friedman	bbcd09cc00	Make some ugly hacks for inline asm operands which name a specific register a bit more thorough. PR13196. llvm-svn: 159176	2012-06-25 23:42:33 +00:00
Jakob Stoklund Olesen	eb49566447	Run ProcessImplicitDefs on SSA form where it can be much simpler. Implicitly defined virtual registers can simply have the <undef> bit set on all uses, and copies can be turned into implicit defs recursively. Physical registers are a bit trickier. We handle the common case where a physreg def is used by a nearby instruction in the same basic block. For more complicated cases, just leave the IMPLICIT_DEF instruction in. llvm-svn: 159149	2012-06-25 18:12:18 +00:00
Jakob Stoklund Olesen	2e22e6a361	%RCX is not a function live-out in eh.return functions. The function live-out registers must be live at all function returns, and %RCX is only used by eh.return. When a function also has a normal return, only %RAX holds a return value. This fixes PR13188. llvm-svn: 159116	2012-06-24 15:53:01 +00:00
Hans Wennborg	cbe34b4cc9	Extend the IL for selecting TLS models (PR9788) This allows the user/front-end to specify a model that is better than what LLVM would choose by default. For example, a variable might be declared as @x = thread_local(initialexec) global i32 42 if it will not be used in a shared library that is dlopen'ed. If the specified model isn't supported by the target, or if LLVM can make a better choice, a different model may be used. llvm-svn: 159077	2012-06-23 11:37:03 +00:00
Chad Rosier	1ce3805b23	FileCheckize tests. llvm-svn: 159044	2012-06-22 23:04:02 +00:00
Evan Cheng	f5bd6c6510	EmitZerofill should take a 64-bit size or else it's chopping off large zero-filled global. rdar://11729134 llvm-svn: 159023	2012-06-22 20:14:46 +00:00
Jakob Stoklund Olesen	321d41a871	Functions calling __builtin_eh_return must have a frame pointer. The code in X86TargetLowering::LowerEH_RETURN() assumes that a frame pointer exists, but the frame pointer was forced by the presence of llvm.eh.unwind.init which isn't guaranteed. If llvm.eh.unwind.init is actually required in functions calling eh.return (is it?), we should diagnose that instead of emitting bad machine code. This should fix the dragonegg-x86_64-linux-gcc-4.6-test bot. llvm-svn: 158961	2012-06-22 03:04:27 +00:00
Jakob Stoklund Olesen	51c63e64e3	Remove the -live-regunits command line option. Register allocators depend on it being permanently enabled now. llvm-svn: 158873	2012-06-20 23:31:34 +00:00
Jakob Stoklund Olesen	833308d785	Only update regunit live ranges that have been precomputed. Regunit live ranges are computed on demand, so when mi-sched calls handleMove, some regunits may not have live ranges yet. That makes updating them easier: Just skip the non-existing ranges. They will be computed correctly from the rescheduled machine code when they are needed. llvm-svn: 158831	2012-06-20 18:00:57 +00:00
Craig Topper	b9e8e18949	Don't insert 128-bit UNDEF into 256-bit vectors. Just keep the 256-bit vector. Original patch by Elena Demikhovsky. Tweaked by me to allow possibility of covering more cases. llvm-svn: 158792	2012-06-20 05:39:26 +00:00
Rafael Espindola	31567515ed	really add a triple :-( llvm-svn: 158696	2012-06-19 02:17:35 +00:00
Rafael Espindola	f2ae4075c8	Add a triple to the test. llvm-svn: 158695	2012-06-19 01:42:34 +00:00
Rafael Espindola	ca3e0ee8b3	Move the support for using .init_array from ARM to the generic TargetLoweringObjectFileELF. Use this to support it on X86. Unlike ARM, on X86 it is not easy to find out if .init_array should be used or not, so the decision is made via TargetOptions and defaults to off. Add a command line option to llc that enables it. llvm-svn: 158692	2012-06-19 00:48:28 +00:00
Chandler Carruth	a1da0bf5ef	Add a regression test for the bug exposed by r158087, which has been temporarily reverted. This test is annoyingly overspecified, but I don't know of another way to thoroughly test the saving and restoring of the registers. While this will have to be adjusted even with the issue fixed in order to re-apply r158087, those adjustments should very clearly indicate that it is still correct (%esp getting restored prior to pops), whereas without it, this case can easily slip under the radar. Still, any suggestions for improvements are very welcome. All credit to Matt Beaumont-Gay for reducing this out of an insane Address Sanitizer crash to a reasonably small seg-faulting C program when built with -mstackrealign. I just reduced it to IR, which was much simpler. =] llvm-svn: 158656	2012-06-18 09:15:04 +00:00
Chandler Carruth	2cc11fd8c7	Temporarily revert r158087. This patch causes problems when both dynamic stack realignment and dynamic allocas combine in the same function. With this patch, we no longer build the epilog correctly, and silently restore registers from the wrong position in the stack. Thanks to Matt for tracking this down, and getting at least an initial test case to Chad. I'm going to try to check a variation of that test case in so we can easily track the fixes required. llvm-svn: 158654	2012-06-18 07:03:12 +00:00
Craig Topper	71dc02d659	Fix intrinsics for XOP frczss/sd instructions. These instructions only take one source register and zero the upper bits of the destination rather than preserving them. llvm-svn: 158396	2012-06-13 07:18:53 +00:00
Craig Topper	3352ba55b9	Replace XOP vpcom intrinsics with fewer intrinsics that take the immediate as an argument. llvm-svn: 158278	2012-06-09 16:46:13 +00:00
Jakob Stoklund Olesen	33a1b416ac	Don't run RAFast in the optimizing regalloc pipeline. The fast register allocator is not supposed to work in the optimizing pipeline. It doesn't make sense to compute live intervals, run full copy coalescing, and then run RAFast. Fast register allocation in the optimizing pipeline is better done by RABasic. llvm-svn: 158242	2012-06-08 23:15:12 +00:00
Manman Ren	bf86b295bb	Test case for r158160 llvm-svn: 158218	2012-06-08 18:42:37 +00:00
Manman Ren	2cdc8afccf	X86: optimize generated code for integer ABS This patch will generate the following for integer ABS: movl %edi, %eax negl %eax cmovll %edi, %eax INSTEAD OF movl %edi, %ecx sarl $31, %ecx leal (%rdi,%rcx), %eax xorl %ecx, %eax There exists a target-independent DAG combine for integer ABS, which converts integer ABS to sar+add+xor. For X86, we match this pattern back to neg+cmov. This is implemented in PerformXorCombine. rdar://10695237 llvm-svn: 158175	2012-06-07 22:39:10 +00:00
Rafael Espindola	55d1145bd5	Use a base register instead of an index register with the local dynamic model. Fixes pr13048. llvm-svn: 158158	2012-06-07 18:39:19 +00:00
Manman Ren	ae02c5a93e	X86: replace SUB with CMP if possible This patch will optimize the following movq %rdi, %rax subq %rsi, %rax cmovsq %rsi, %rdi movq %rdi, %rax to cmpq %rsi, %rdi cmovsq %rsi, %rdi movq %rdi, %rax Perform this optimization if the actual result of SUB is not used. rdar: 11540023 llvm-svn: 158126	2012-06-07 00:42:47 +00:00
Manman Ren	9c9641812c	Revert r157755. The commit is intended to fix rdar://11540023. It is implemented as part of peephole optimization. We can actually implement this in the SelectionDAG lowering phase. llvm-svn: 158122	2012-06-06 23:53:03 +00:00
Chad Rosier	5d6f01ad77	Add support for dynamic stack realignment in the presence of dynamic allocas on X86. rdar://11496434 llvm-svn: 158087	2012-06-06 17:37:40 +00:00
Nadav Rotem	b7bb72e4f3	Remove the "-promote-elements" flag. This flag is now enabled by default. llvm-svn: 157925	2012-06-04 11:27:21 +00:00
Craig Topper	79dbb0c6e4	Rename FMA3 feature flag to just FMA to match gcc so it can be added to clang. llvm-svn: 157903	2012-06-03 18:58:46 +00:00
Craig Topper	fd53b80219	Rename fma4 intrinsics to just fma since they are now used for both FMA4 and FMA3. Autoupgrade support coming in a separate commit. llvm-svn: 157898	2012-06-03 07:26:46 +00:00
Manman Ren	5097e4f38a	Revert r157831 llvm-svn: 157896	2012-06-03 03:14:24 +00:00
Craig Topper	29eafea292	Use sse_load_f32/64 for scalar FMA3 intrinsic patterns instead of 128-bit loads to match instruction behavior. llvm-svn: 157895	2012-06-03 01:40:43 +00:00
Manman Ren	879ca9d47d	X86: peephole optimization to remove cmp instruction This patch will optimize the following: sub r1, r3 cmp r3, r1 or cmp r1, r3 bge L1 TO sub r1, r3 bge L1 or ble L1 If the branch instruction can use flag from "sub", then we can eliminate the "cmp" instruction. llvm-svn: 157831	2012-06-01 19:49:33 +00:00
Chris Lattner	b1359894f3	testcase for PR13006, thanks to Duncan for filing it. llvm-svn: 157824	2012-06-01 18:19:46 +00:00
Hans Wennborg	789acfb63d	Implement the local-dynamic TLS model for x86 (PR3985) This implements codegen support for accesses to thread-local variables using the local-dynamic model, and adds a clean-up pass so that the base address for the TLS block can be re-used between local-dynamic access on an execution path. llvm-svn: 157818	2012-06-01 16:27:21 +00:00
Craig Topper	00649d5111	Remove fadd(fmul) patterns for FMA3. This needs to be implemented by paying attention to FP_CONTRACT and matching @llvm.fma which is not available yet. This will allow us to enablle intrinsic use at least though. llvm-svn: 157804	2012-06-01 06:07:48 +00:00
Chris Lattner	466076b95f	enhance the logic for looking through tailcalls to look through transparent casts in multiple-return value scenarios, like what happens on X86-64 when returning small structs. llvm-svn: 157800	2012-06-01 05:29:15 +00:00
Chris Lattner	182fe3eef1	enhance getNoopInput to know about vector<->vector bitcasts of legal types, as well as int<->ptr casts. This allows us to tailcall functions with some trivial casts between the call and return (i.e. because the return types disagree). llvm-svn: 157798	2012-06-01 05:16:33 +00:00
Chris Lattner	22afea7689	add some simple 64-bit tail call tests. llvm-svn: 157797	2012-06-01 05:03:31 +00:00
Chris Lattner	21b1e6bbdc	merge some tests. llvm-svn: 157795	2012-06-01 05:00:54 +00:00
Chris Lattner	d82ae12d8c	rename test llvm-svn: 157794	2012-06-01 04:58:50 +00:00
Manman Ren	9bccb64e56	X86: replace SUB with CMP if possible This patch will optimize the following movq %rdi, %rax subq %rsi, %rax cmovsq %rsi, %rdi movq %rdi, %rax to cmpq %rsi, %rdi cmovsq %rsi, %rdi movq %rdi, %rax Perform this optimization if the actual result of SUB is not used. rdar: 11540023 llvm-svn: 157755	2012-05-31 17:20:29 +00:00
Elena Demikhovsky	602f3a26d6	Added FMA3 Intel instructions. I disabled FMA3 autodetection, since the result may differ from expected for some benchmarks. I added tests for GodeGen and intrinsics. I did not change llvm.fma.f32/64 - it may be done later. llvm-svn: 157737	2012-05-31 09:20:20 +00:00
Craig Topper	c1ac05dad5	Add intrinsic for pclmulqdq instruction. llvm-svn: 157731	2012-05-31 04:37:40 +00:00
Jakob Stoklund Olesen	05e2245fc6	Prioritize smaller register classes for urgent evictions. It helps compile exotic inline asm. In the test case, normal GR32 virtual registers use up eax-edx so the final GR32_ABCD live range has no registers left. Since all the live ranges were tiny, we had no way of prioritizing the smaller register class. This patch allows tiny unspillable live ranges to be evicted by tiny unspillable live ranges from a smaller register class. <rdar://problem/11542429> llvm-svn: 157715	2012-05-30 21:46:58 +00:00
Chris Lattner	1622a99e58	it's pointed out that R11 can be used for magic things, and doing things just for 64-bit registers is silly. Just optimize 3 more. llvm-svn: 157699	2012-05-30 18:08:02 +00:00
Chris Lattner	04d722a68d	Extend the (abi-irrelevant) return convention to be able to return more than two values in integer registers. This is already supported by the fastcc convention, but it doesn't hurt to support it in the standard conventions as well. In cases where we can cheat at the calling convention, this allows us to avoid returning things through memory in more cases. llvm-svn: 157698	2012-05-30 17:50:14 +00:00
Benjamin Kramer	ef479ea854	Add intrinsics, code gen, assembler and disassembler support for the SSE4a extrq and insertq instructions. This required light surgery on the assembler and disassembler because the instructions use an uncommon encoding. They are the only two instructions in x86 that use register operands and two immediates. llvm-svn: 157634	2012-05-29 19:05:25 +00:00
Chris Lattner	f7f59b15aa	These tests used intrinsics with the wrong prototype. They weren't caught because the old verifier just checked that something "was a pointer", but not that the pointee was correct. llvm-svn: 157544	2012-05-27 19:35:41 +00:00
Benjamin Kramer	f2beccf6b4	SelectionDAGBuilder: When emitting small compare chains for switches order them by using edge weights. SimplifyCFG tends to form a lot of 2-3 case switches when merging branches. Move the most likely condition to the front so it is checked first and the others can be skipped. This is currently not as effective as it could be because SimplifyCFG destroys profiling metadata when merging branches and switches. Merging branch weight metadata is tricky though. This code touches at most 3 cases so I didn't use a proper sorting algorithm. llvm-svn: 157521	2012-05-26 20:01:32 +00:00
NAKAMURA Takumi	3eca973bf8	test/CodeGen/X86/bigstructret.ll: Suppress one test. It is msvc-incompatible. (compatible to mingw32 and netbsd, though) llvm-svn: 157474	2012-05-25 15:40:54 +00:00
NAKAMURA Takumi	501dbd06ae	test/CodeGen/X86/bigstructret.ll: Relax stack offsets for hosts of stack-align=8, eg. win32 and netbsd. llvm-svn: 157471	2012-05-25 15:12:21 +00:00
Eli Friedman	315a0c79f3	Simplify code for calling a function where CanLowerReturn fails, fixing a small bug in the process. llvm-svn: 157446	2012-05-25 00:09:29 +00:00
David Blaikie	c575c80c3b	Fix for CHECK-NOT misspelling. Patch by Nicklas Bo Jensen. llvm-svn: 157421	2012-05-24 22:08:29 +00:00
Jakob Stoklund Olesen	5b8f476037	Correctly deal with identity copies in RegisterCoalescer. Now that the coalescer keeps live intervals and machine code in sync at all times, it needs to deal with identity copies differently. When merging two virtual registers, all identity copies are removed right away. This means that other identity copies must come from somewhere else, and they are going to have a value number. Deal with such copies by merging the value numbers before erasing the copy instruction. Otherwise, we leave dangling value numbers in the live interval. This fixes PR12927. llvm-svn: 157340	2012-05-23 20:21:06 +00:00
Nuno Lopes	ad40c0a425	revert my previous patches that introduced an additional parameter to the objectsize intrinsic. After a lot of discussion, we realized it's not the best option for run-time bounds checking llvm-svn: 157255	2012-05-22 15:25:31 +00:00
Jakob Stoklund Olesen	924279ca0e	Only erase virtregs with no uses left. Also make sure registers aren't erased twice if the dead def mentions the register twice. This fixes PR12911. llvm-svn: 157254	2012-05-22 14:52:12 +00:00
Craig Topper	e88f2fd4f7	Allow 256-bit shuffles to still be split even if only half of the shuffle comes from two 128-bit pieces. llvm-svn: 157175	2012-05-21 06:40:16 +00:00
Peter Collingbourne	8eb05fd093	When legalising shifts, do not pre-build a list of operands which may be RAUW'd by the recursive call to LegalizeOps; instead, retrieve the other operands when calling UpdateNodeOperands. Fixes PR12889. llvm-svn: 157162	2012-05-20 18:36:15 +00:00
Jakob Stoklund Olesen	1f1c6add10	Properly constrain register classes for sub-registers. Not all GR64 registers have sub_8bit sub-registers. llvm-svn: 157150	2012-05-20 06:38:37 +00:00
Jakob Stoklund Olesen	a103a516c6	Properly constrain register classes in 2-addr. X86 has 2-addr instructions with different constraints on the tied def and use operands. One is GR32, one is GR32_NOSP. llvm-svn: 157149	2012-05-20 06:38:32 +00:00
Jakob Stoklund Olesen	a34a69ce0c	Fix 12892. Dead code elimination during coalescing could cause a virtual register to be split into connected components. The following rewriting would be confused about the already joined copies present in the code, but without a corresponding value number in the live range. Erase all joined copies instantly when joining intervals such that the MI and LiveInterval representations are always in sync. llvm-svn: 157135	2012-05-19 23:34:59 +00:00
Jakob Stoklund Olesen	25ced18407	Erase joined copies immediately. The late dead code elimination is no longer necessary. The test changes are cause by a register hint that can be either %rdi or %rax. The choice depends on the use list order, which this patch changes. llvm-svn: 157131	2012-05-19 20:54:07 +00:00
Nadav Rotem	c93e91da27	On Haswell, perfer storing YMM registers using a single instruction. llvm-svn: 157129	2012-05-19 20:30:08 +00:00
Nadav Rotem	900c7cb7ce	Add support for additional in-reg vbroadcast patterns llvm-svn: 157127	2012-05-19 19:57:37 +00:00

... 3 4 5 6 7 ...

3776 Commits