llvm-project

Commit Graph

Author	SHA1	Message	Date
Michael Liao	a880186030	Add missing i8 max/min/umax/umin support - Fix PR5145 and turn on test 8-bit atomic ops llvm-svn: 164358	2012-09-21 03:18:52 +00:00
NAKAMURA Takumi	1a38004c1b	llvm/test/CodeGen/ARM/fast-isel.ll: Fix possible typos, s/@unaligned_i16_store/@unaligned_i16_load/g. I guess this had apparently passed in +Asserts possibly due to verborsity. llvm-svn: 164350	2012-09-21 01:15:05 +00:00
Chad Rosier	8ff5a4aa79	Testcase does not need to be this strict. llvm-svn: 164347	2012-09-21 00:47:08 +00:00
Chad Rosier	1fb301aa41	Add newline. llvm-svn: 164346	2012-09-21 00:43:18 +00:00
Chad Rosier	2364f58326	[fast-isel] Fallback to SelectionDAG isel if we require strict alignment for non-halfword-aligned i16 loads/stores. rdar://12304911 llvm-svn: 164345	2012-09-21 00:41:42 +00:00
Jim Grosbach	74b61c398c	ARM: Use a dedicated intrinsic for vector bitwise select. The expression based expansion too often results in IR level optimizations splitting the intermediate values into separate basic blocks, preventing the formation of the VBSL instruction as the code author intended. In particular, LICM would often hoist part of the computation out of a loop. rdar://11011471 llvm-svn: 164340	2012-09-21 00:18:20 +00:00
Jakob Stoklund Olesen	b8707faba3	Ignore PHI-defs for -new-coalescer interference checks. A PHI can't create interference on its own. If two live ranges interfere at a PHI, they must also interfere when leaving one of the PHI predecessors. llvm-svn: 164330	2012-09-20 23:08:42 +00:00
Evan Cheng	363d73c518	Try to make these tests more portable. llvm-svn: 164320	2012-09-20 21:35:21 +00:00
Benjamin Kramer	8554206652	Fix broken check lines. llvm-svn: 164317	2012-09-20 19:54:13 +00:00
Roman Divacky	264f504077	Specify cpu to get the correct instruction ordering. Remove XFAIL. llvm-svn: 164306	2012-09-20 14:59:42 +00:00
Michael Liao	83bc2119dc	Specify CPu to prevent failure on ATOM due to different code scheduling llvm-svn: 164283	2012-09-20 03:34:04 +00:00
Michael Liao	3237662b65	Re-work X86 code generation of atomic ops with spin-loop - Rewrite/merge pseudo-atomic instruction emitters to address the following issue: * Reduce one unnecessary load in spin-loop previously the spin-loop looks like thisMBB: newMBB: ld t1 = [bitinstr.addr] op t2 = t1, [bitinstr.val] not t3 = t2 (if Invert) mov EAX = t1 lcs dest = [bitinstr.addr], t3 [EAX is implicit] bz newMBB fallthrough -->nextMBB the 'ld' at the beginning of newMBB should be lift out of the loop as lcs (or CMPXCHG on x86) will load the current memory value into EAX. This loop is refined as: thisMBB: EAX = LOAD [MI.addr] mainMBB: t1 = OP [MI.val], EAX LCMPXCHG [MI.addr], t1, [EAX is implicitly used & defined] JNE mainMBB sinkMBB: * Remove immopc as, so far, all pseudo-atomic instructions has all-register form only, there is no immedidate operand. * Remove unnecessary attributes/modifiers in pseudo-atomic instruction td * Fix issues in PR13458 - Add comprehensive tests on atomic ops on various data types. NOTE: Some of them are turned off due to missing functionality. - Revise tests due to the new spin-loop generated. llvm-svn: 164281	2012-09-20 03:06:15 +00:00
Jakob Stoklund Olesen	7d3c9c0a2a	Resolve conflicts involving dead vector lanes for -new-coalescer. A common coalescing conflict in vector code is lane insertion: %dst = FOO %src = BAR %dst:ssub0 = COPY %src The live range of %src interferes with the ssub0 lane of %dst, but that lane is never read after %src would have clobbered it. That makes it safe to merge the live ranges and eliminate the COPY: %dst = FOO %dst:ssub0 = BAR This patch teaches the new coalescer to resolve conflicts where dead vector lanes would be clobbered, at least as long as the clobbered vector lanes don't escape the basic block. llvm-svn: 164250	2012-09-19 21:29:18 +00:00
Michael Liao	8372539543	Unify the logic in SelectAtomicLoadAdd and SelectAtomicLoadArith - Merge the processing of LOAD_ADD with other atomic load-arith operations - Separate the logic getting target constant for atomic-load-op and add an optimization for atomic-load-add on i16 with negative value - Optimize a minor case for atomic-fetch-add i16 with negative operand. Test case is revised. llvm-svn: 164243	2012-09-19 19:36:58 +00:00
Jordan Rose	b64c123453	Really XFAIL test/CodeGen/PowerPC/structsinregs.ll. XFAIL needs a trailing colon. Hopefully this will get the buildbots happy again while Bill works on getting it passing. llvm-svn: 164237	2012-09-19 17:03:11 +00:00
Bill Schmidt	479a4588b9	XFAIL test/CodeGen/PowerPC/structsinregs.ll llvm-svn: 164233	2012-09-19 16:18:23 +00:00
Bill Schmidt	019cc6fe03	Small structs for PPC64 SVR4 must be passed right-justified in registers. lib/Target/PowerPC/PPCISelLowering.{h,cpp} Rename LowerFormalArguments_Darwin to LowerFormalArguments_Darwin_Or_64SVR4. Rename LowerFormalArguments_SVR4 to LowerFormalArguments_32SVR4. Receive small structs right-justified in LowerFormalArguments_Darwin_Or_64SVR4. Rename LowerCall_Darwin to LowerCall_Darwin_Or_64SVR4. Rename LowerCall_SVR4 to LowerCall_32SVR4. Pass small structs right-justified in LowerCall_Darwin_Or_64SVR4. test/CodeGen/PowerPC/structsinregs.ll New test. llvm-svn: 164228	2012-09-19 15:42:13 +00:00
Hans Wennborg	ff9b5a8465	Move load_to_switch.ll to test/CodeGen/SPARC/ Because the test invokes llc -march=sparc, it needs to be in a directory which is only run when the sparc target is built. llvm-svn: 164211	2012-09-19 09:25:03 +00:00
Evan Cheng	1de7ec8c7c	MOVi16 (movw) is only legal on cpus with V6T2 support. rdar://12300648 llvm-svn: 164169	2012-09-18 21:24:16 +00:00
Roman Divacky	947148aa45	Add test for r164155 and remove two tests superseded by ppc64-calls.ll. llvm-svn: 164162	2012-09-18 19:51:44 +00:00
Roman Divacky	0be33598ce	Avoid symbol name clash when filling TOC. Patch by Adhemerval Zanella. llvm-svn: 164141	2012-09-18 17:10:37 +00:00
Roman Divacky	d4f6f421a9	On PPC64 emit the environment pointer. Patch by Adhemerval Zanella. llvm-svn: 164139	2012-09-18 16:55:29 +00:00
Roman Divacky	762930637c	Optimize local func calls to not emit nop for TOC restoration. Patch by Adhemerval Zanella. llvm-svn: 164138	2012-09-18 16:47:58 +00:00
James Molloy	ea05256b58	More domain conversion; convert VFP VMOVS to NEON instructions in more cases - when we may clobber the other S-lane by converting an S to a D instruction, make an effort to work out if the S lane is clobberable or not. llvm-svn: 164114	2012-09-18 08:31:15 +00:00
Evan Cheng	90ae8f8442	Use vld1 / vst2 for unaligned v2f64 load / store. e.g. Use vld1.16 for 2-byte aligned address. Based on patch by David Peixotto. Also use vld1.64 / vst1.64 with 128-bit alignment to take advantage of alignment hints. rdar://12090772, rdar://12238782 llvm-svn: 164089	2012-09-18 01:42:45 +00:00
Jakob Stoklund Olesen	0bb3dd78c4	Merge into undefined lanes under -new-coalescer. Add LIS::pruneValue() and extendToIndices(). These two functions are used by the register coalescer when merging two live ranges requires more than a trivial value mapping as supported by LiveInterval::join(). The pruneValue() function can remove the part of a value number that is going to conflict in join(). Afterwards, extendToIndices can restore the live range, using any new dominating value numbers and updating the SSA form. Use this complex value mapping to support merging a register into a vector lane that has a conflicting value, but the clobbered lane is undef. llvm-svn: 164074	2012-09-17 23:03:25 +00:00
Jan Wen Voung	4ce1d7b4f1	Add some cases to x86 OptimizeCompare to handle DEC and INC, too. While we are setting the earlier def to true, also make it live. llvm-svn: 164056	2012-09-17 22:04:23 +00:00
Michael Liao	b503b323f3	Fix PR13859 - Preserve the original NOutVT during casting from vector to integer by extracting vector elements. llvm-svn: 164042	2012-09-17 18:05:20 +00:00
Silviu Baranga	7bd2914683	Removed the VMLxForwarding feature for the Cortex-A15 target. llvm-svn: 164030	2012-09-17 14:10:54 +00:00
Nadav Rotem	ae6809b19a	Fix the testcase to work on all platforms. llvm-svn: 163997	2012-09-16 07:58:47 +00:00
Nadav Rotem	37521aa89c	The PMOVZXWD family of functions had patterns extends narrow vector types to wide vector types. It had patterns for zext-loading and extending. This commit adds patterns for loading a wide type, performing a bitcast, and extending. This is an odd pattern, but it is commonly used when writing code with intrinsics. rdar://11897677 llvm-svn: 163995	2012-09-16 07:39:07 +00:00
Benjamin Kramer	ece434252c	X86: Emitting x87 fsin/fcos for sinf/cosf is not safe without unsafe fp math. This was only an issue if sse is disabled. llvm-svn: 163967	2012-09-15 12:44:27 +00:00
Akira Hatanaka	189d0adde9	Handled unaligned load/stores properly in Mips16 Patch by Reed Kotler. llvm-svn: 163956	2012-09-15 01:02:03 +00:00
Eric Christopher	b83dba2b84	Fix both the test for zero and what we do if we have a zero for umulo legalization. Fixes PR13839 llvm-svn: 163856	2012-09-13 23:24:02 +00:00
Michael Liao	137f8aedea	Add wider vector/integer support for PR12312 - Enhance the fix to PR12312 to support wider integer, such as 256-bit integer. If more than 1 fully evaluated vectors are found, POR them first followed by the final PTEST. llvm-svn: 163832	2012-09-13 20:24:54 +00:00
Michael Liao	460fc46e0f	Enhance type legalization on bitcast from vector to integer - Find a legal vector type before casting and extracting element from it. - As the new vector type may have more than 2 elements, build the final hi/lo pair by BFS pairing them from bottom to top. llvm-svn: 163830	2012-09-13 19:58:21 +00:00
Jakob Stoklund Olesen	32a56fa3ba	Fix test case to avoid PIC magic. llvm-svn: 163827	2012-09-13 19:47:45 +00:00
Jakob Stoklund Olesen	3cf3ffce24	Fix the TCRETURNmi64 bug differently. Add a PatFrag to match X86tcret using 6 fixed registers or less. This avoids folding loads into TCRETURNmi64 using 7 or more volatile registers. <rdar://problem/12282281> llvm-svn: 163819	2012-09-13 18:31:27 +00:00
Jakob Stoklund Olesen	78b9f8fc67	Revert r163761 "Don't fold indexed loads into TCRETURNmi64." The patch caused "Wrong topological sorting" assertions. llvm-svn: 163810	2012-09-13 16:52:17 +00:00
Silviu Baranga	b47bb94f93	This patch introduces A15 as a target in LLVM. llvm-svn: 163803	2012-09-13 15:05:10 +00:00
Nadav Rotem	24a822a5cb	Fix a dagcombine optimization. The optimization attempts to optimize a bitcast of fneg to integers by xoring the high-bit. This fails if the source operand is a vector because we need to negate each of the elements in the vector. Fix rdar://12281066 PR13813. llvm-svn: 163802	2012-09-13 14:54:28 +00:00
Nadav Rotem	4e9ad06617	Stack Coloring: We have code that checks that all of the uses of allocas are within the lifetime zone. Sometime legitimate usages of allocas are hoisted outside of the lifetime zone. For example, GEPS may calculate the address of a member of an allocated struct. This commit makes sure that we only check (abort regions or assert) for instructions that read and write memory using stack frames directly. Notice that by allowing legitimate usages outside the lifetime zone we also stop checking for instructions which use derivatives of allocas. We will catch less bugs in user code and in the compiler itself. llvm-svn: 163791	2012-09-13 12:38:37 +00:00
Jakob Stoklund Olesen	bfacef45eb	Don't fold indexed loads into TCRETURNmi64. We don't have enough GR64_TC registers when calling a varargs function with 6 arguments. Since %al holds the number of vector registers used, only %r11 is available as a scratch register. This means that addressing modes using both base and index registers can't be folded into TCRETURNmi64. <rdar://problem/12282281> llvm-svn: 163761	2012-09-13 00:25:00 +00:00
Michael Liao	abb87d4857	Fix PR11985 - BlockAddress has no support of BA + offset form and there is no way to propagate that offset into machine operand; - Add BA + offset support and a new interface 'getTargetBlockAddress' to simplify target block address forming; - All targets are modified to use new interface and X86 backend is enhanced to support BA + offset addressing. llvm-svn: 163743	2012-09-12 21:43:09 +00:00
Roman Divacky	c9e23d93ae	This patch corrects logic in PPCFrameLowering for save and restore of nonvolatile condition register fields across calls under the SVR4 ABIs. * With the 64-bit ABI, the save location is at a fixed offset of 8 from the stack pointer. The frame pointer cannot be used to access this portion of the stack frame since the distance from the frame pointer may change with alloca calls. * With the 32-bit ABI, the save location is just below the general register save area, and is accessed via the frame pointer like the rest of the save areas. This is an optional slot, so it must only be created if any of CR2, CR3, and CR4 were modified. * For both ABIs, save/restore logic is generated only if one of the nonvolatile CR fields were modified. I also took this opportunity to clean up an extra FIXME in PPCFrameLowering.h. Save area offsets for 32-bit GPRs are meaningless for the 64-bit ABI, so I removed them for correctness and efficiency. Fixes PR13708 and partially also PR13623. It lets us enable exception handling on PPC64. Patch by William J. Schmidt! llvm-svn: 163713	2012-09-12 14:47:47 +00:00
Kristof Beyls	e6b876f4e5	Fix constant folding through bitcasts by no longer relying on undefined behaviour (converting NaN values between float and double). SelectionDAG::getConstantFP(double Val, EVT VT, bool isTarget); should not be used when Val is not a simple constant (as the comment in SelectionDAG.h indicates). This patch avoids using this function when folding an unknown constant through a bitcast, where it cannot be guaranteed that Val will be a simple constant. llvm-svn: 163703	2012-09-12 11:25:02 +00:00
Nadav Rotem	8ff00989fc	Stack coloring: remove lifetime intervals which contain escaped allocas. The input program may contain intructions which are not inside lifetime markers. This can happen due to a bug in the compiler or due to a bug in user code (for example, returning a reference to a local variable). This commit adds checks that all of the instructions in the function and invalidates lifetime ranges which do not contain all of the instructions. llvm-svn: 163678	2012-09-12 04:57:37 +00:00
Chad Rosier	1778831a3d	[ms-inline asm] Split the parsing of IR asm strings into GCC and MS variants. Add support in the EmitMSInlineAsmStr() function for handling integer consts. llvm-svn: 163645	2012-09-11 19:09:56 +00:00
Chad Rosier	ab51c9de34	Formatting. No functional change intended. llvm-svn: 163627	2012-09-11 16:33:10 +00:00
Nadav Rotem	65ba95ebf9	Stack Coloring: Dont crash on dbg values which use stack frames. llvm-svn: 163616	2012-09-11 12:34:27 +00:00
NAKAMURA Takumi	8c72306cdb	test/CodeGen/X86/ms-inline-asm.ll: Relax for non-darwin x86 targets. '##InlineAsm' could not be seen in other hosts. llvm-svn: 163554	2012-09-10 22:04:54 +00:00
Chad Rosier	7641f58784	[ms-inline asm] Properly emit the asm directives when the AsmPrinterVariant and InlineAsmVariant don't match. llvm-svn: 163550	2012-09-10 21:36:05 +00:00
Chad Rosier	1c1319b9e7	Update test case for Release builds. llvm-svn: 163549	2012-09-10 21:31:43 +00:00
Chad Rosier	db20a41d99	[ms-inline asm] Pass the correct AsmVariant to the PrintAsmOperand() function and update the printOperand() function accordingly. llvm-svn: 163544	2012-09-10 21:10:49 +00:00
Jakob Stoklund Olesen	8b9dce5c18	Don't attempt to use flags from predicated instructions. The ARM backend can eliminate cmp instructions by reusing flags from a nearby sub instruction with similar arguments. Don't do that if the sub is predicated - the flags are not written unconditionally. <rdar://problem/12263428> llvm-svn: 163535	2012-09-10 19:17:25 +00:00
Nadav Rotem	3c86b78ae4	Stack Coloring: Handle the case where END markers come before BEGIN markers properly. llvm-svn: 163530	2012-09-10 18:51:09 +00:00
Michael Liao	400f7ef871	Enhance PR11334 fix to support extload from v2f32/v4f32 - Fix an remaining issue of PR11674 as well llvm-svn: 163528	2012-09-10 18:33:51 +00:00
Michael Liao	c3d5b21c39	Add boolean simplification support from CMOV - If a boolean value is generated from CMOV and tested as boolean value, simplify the use of test result by referencing the original condition. RDRAND intrinisc is one of such cases. llvm-svn: 163516	2012-09-10 16:36:16 +00:00
James Molloy	1e5c611815	Fix an assertion failure when optimising a shufflevector incorrectly into concat_vectors, and a followup bug with SelectionDAG::getNode() creating nodes with invalid types. llvm-svn: 163511	2012-09-10 14:01:21 +00:00
Nadav Rotem	6731363185	Stack Coloring: Add support for multiple regions of the same slot, within a single basic block. llvm-svn: 163507	2012-09-10 12:39:35 +00:00
Elena Demikhovsky	264fb0217e	The VPSHUFB 256-bit instruction may be generated when one of input vector is undefined or zeroinitializer. I've added the "zeroinitializer" case in this patch. llvm-svn: 163506	2012-09-10 12:13:11 +00:00
Nadav Rotem	d753a952ca	Teach the DAGBuilder about lifetime markers which are generated from PHINodes. llvm-svn: 163494	2012-09-10 08:43:23 +00:00
Craig Topper	03f39773e0	Teach DAG combiner to constant fold fneg of a BUILD_VECTOR of constants. llvm-svn: 163483	2012-09-09 22:58:45 +00:00
Craig Topper	4ed79bd7d7	Add instruction selection for ffloor of vectors when SSE4.1 or AVX is enabled. llvm-svn: 163473	2012-09-08 17:42:27 +00:00
Craig Topper	98f2e861a0	Add support for lowering FABS of vector types. llvm-svn: 163461	2012-09-08 07:31:51 +00:00
Craig Topper	3e41a5bb31	Set operation action for FFLOOR to Expand for all vector types for X86. Set FFLOOR of v4f32 to Expand for ARM. v2f64 was already correct. llvm-svn: 163458	2012-09-08 04:58:43 +00:00
Jakob Stoklund Olesen	866908c42c	Allow overlaps between virtreg and physreg live ranges. The RegisterCoalescer understands overlapping live ranges where one register is defined as a copy of the other. With this change, register allocators using LiveRegMatrix can do the same, at least for copies between physical and virtual registers. When a physreg is defined by a copy from a virtreg, allow those live ranges to overlap: %CL<def> = COPY %vreg11:sub_8bit; GR32_ABCD:%vreg11 %vreg13<def,tied1> = SAR32rCL %vreg13<tied0>, %CL<imp-use,kill> We can assign %vreg11 to %ECX, overlapping the live range of %CL. llvm-svn: 163336	2012-09-06 18:15:23 +00:00
Nadav Rotem	9e3cc9f884	Disable stack coloring by default in order to resolve the i386 failures. llvm-svn: 163316	2012-09-06 14:27:06 +00:00
Elena Demikhovsky	42777877c2	AVX2 optimization. Added generation of VPSHUB instruction for <32 x i8> vector shuffle when possible. llvm-svn: 163312	2012-09-06 12:42:01 +00:00
Nadav Rotem	ea0d36be95	Fix the test by specifying an exact cpu model. llvm-svn: 163307	2012-09-06 10:33:33 +00:00
James Molloy	49bdbce8e1	Improve codegen for BUILD_VECTORs on ARM. If we have a BUILD_VECTOR that is mostly a constant splat, it is often better to splat that constant then insertelement the non-constant lanes instead of insertelementing every lane from an undef base. llvm-svn: 163304	2012-09-06 09:55:02 +00:00
Nadav Rotem	7c277da364	Add a new optimization pass: Stack Coloring, that merges disjoint static allocations (allocas). Allocas are known to be disjoint if they are marked by disjoint lifetime markers (@llvm.lifetime.XXX intrinsics). llvm-svn: 163299	2012-09-06 09:17:37 +00:00
James Molloy	34e9931bec	Optimize codegen for VSETLNi{8,16,32} operating on Q registers. Degenerate to a VSETLN on D registers, instead of an (INSERT_SUBREG (VSETLN (EXTRACT_SUBREG ))) sequence to help the register coalescer. llvm-svn: 163298	2012-09-06 09:16:01 +00:00
Craig Topper	daa5ed1e0a	Add patterns for converting stores of subvector_extracts of lower 128-bits of a 256-bit vector to VMOVAPSmr/VMOVUPSmr. llvm-svn: 163292	2012-09-06 05:15:01 +00:00
Jakob Stoklund Olesen	f831059f60	Use predication instead of pseudo-opcodes when folding into MOVCC. Now that it is possible to dynamically tie MachineInstr operands, predicated instructions are possible in SSA form: %vreg3<def> = SUBri %vreg1, -2147483647, pred:14, pred:%noreg, %opt:%noreg %vreg4<def,tied1> = MOVCCr %vreg3<tied0>, %vreg1, %pred:12, pred:%CPSR Becomes a predicated SUBri with a tied imp-use: SUBri %vreg1, -2147483647, pred:13, pred:%CPSR, opt:%noreg, %vreg1<imp-use,tied0> This means that any instruction that is safe to move can be folded into a MOVCC, and the *CC pseudo-instructions are no longer needed. The test case changes reflect that Thumb2SizeReduce recognizes the predicated instructions. It didn't understand the pseudos. llvm-svn: 163274	2012-09-05 23:58:02 +00:00
Tim Northover	c8d867d42d	Strip old MachineInstrs after we know we can put them back. Previous patch accidentally decided it couldn't convert a VFP to a NEON instruction after it had already destroyed the old one. Not a good move. llvm-svn: 163230	2012-09-05 18:37:53 +00:00
Pranav Bhandarkar	823f9ebaa3	LLVM Bug Fix 13709: Remove needless lsr(Rp, #32 ) instruction access the subreg_hireg of register pair Rp. * lib/Target/Hexagon/HexagonPeephole.cpp(PeepholeDoubleRegsMap): New DenseMap similar to PeepholeMap that additionally records subreg info too. (runOnMachineFunction): Record information in PeepholeDoubleRegsMap and copy propagate the high sub-reg of Rp0 in Rp1 = lsr(Rp0, #32) to the instruction Rx = COPY Rp1:logreg_subreg. * test/CodeGen/Hexagon/remove_lsr.ll: New test. llvm-svn: 163214	2012-09-05 16:01:40 +00:00
Silviu Baranga	3f40d87207	Fixed the DAG combiner to better handle the folding of AND nodes for vector types. The previous code was making the assumption that the length of the bitmask returned by isConstantSplat was equal to the size of the vector type. Now we first make sure that the splat value has at least the length of the vector lane type, then we only use as many fields as we have available in the splat value. llvm-svn: 163203	2012-09-05 08:57:21 +00:00
Logan Chien	eeaaf65cb6	Fix UseInitArray option for MIPS target. llvm-svn: 163193	2012-09-05 06:17:17 +00:00
Jakob Stoklund Olesen	c7579cdded	Move tie checks into MachineVerifier::visitMachineOperand. llvm-svn: 163152	2012-09-04 18:38:28 +00:00
Preston Gurd	cdf540d5d6	Generic Bypass Slow Div - CodeGenPrepare pass for identifying div/rem ops - Backend specifies the type mapping using addBypassSlowDivType - Enabled only for Intel Atom with O2 32-bit -> 8-bit - Replace IDIV with instructions which test its value and use DIVB if the value is positive and less than 256. - In the case when the quotient and remainder of a divide are used a DIV and a REM instruction will be present in the IR. In the non-Atom case they are both lowered to IDIVs and CSE removes the redundant IDIV instruction, using the quotient and remainder from the first IDIV. However, due to this optimization CSE is not able to eliminate redundant IDIV instructions because they are located in different basic blocks. This is overcome by calculating both the quotient (DIV) and remainder (REM) in each basic block that is inserted by the optimization and reusing the result values when a subsequent DIV or REM instruction uses the same operands. - Test cases check for the presents of the optimization when calculating either the quotient, remainder, or both. Patch by Tyler Nowicki! llvm-svn: 163150	2012-09-04 18:22:17 +00:00
Sergei Larin	4d8986af12	Porting Hexagon MI Scheduler to the new API. Change current Hexagon MI scheduler to use new converging scheduler. Integrates DFA resource model into it. llvm-svn: 163137	2012-09-04 14:49:56 +00:00
Arnold Schwaighofer	f00fb1c581	Patch to implement UMLAL/SMLAL instructions for the ARM architecture This patch corrects the definition of umlal/smlal instructions and adds support for matching them to the ARM dag combiner. Bug 12213 Patch by Yin Ma! llvm-svn: 163136	2012-09-04 14:37:49 +00:00
Elena Demikhovsky	cbe99bbb36	This patch optimizes shuffle instruction - generates 2 instructions instead of 4. Since this specific shuffle is widely used in many workloads we have ~10% performance on them. shufflevector <8 x float> %A, <8 x float> %B, <8 x i32> <i32 0, i32 8, i32 2, i32 10, i32 4, i32 12, i32 6, i32 14> vmovaps (%rdx), %ymm0 vshufps $8, %ymm0, %ymm0, %ymm0 vmovaps (%rcx), %ymm1 vshufps $8, %ymm0, %ymm1, %ymm1 vunpcklps %ymm0, %ymm1, %ymm0 vmovaps (%rcx), %ymm0 vmovsldup (%rdx), %ymm1 vblendps $85, %ymm0, %ymm1, %ymm0 llvm-svn: 163134	2012-09-04 12:49:02 +00:00
Nadav Rotem	9d83202620	Not all targets have efficient ISel code generation for select instructions. For example, the ARM target does not have efficient ISel handling for vector selects with scalar conditions. This patch adds a TLI hook which allows the different targets to report which selects are supported well and which selects should be converted to CF duting codegen prepare. llvm-svn: 163093	2012-09-02 12:10:19 +00:00
Nadav Rotem	500d691d4a	Generate better select code by allowing the target to use scalar select, and not sign-extend. llvm-svn: 163086	2012-09-02 08:20:07 +00:00
Pete Cooper	2117ac40c9	Revert "Take account of boolean vector contents when promoting a build vector from i1 to some other type. rdar://problem/12210060" This reverts commit 5dd9e214fb92847e947f9edab170f9b4e52b908f. Thanks to Duncan for explaining how this should have been done. Conflicts: test/CodeGen/X86/vec_select.ll llvm-svn: 163064	2012-09-01 17:37:55 +00:00
Logan Chien	cea0354c1b	Fix Thumb2 fixup kind in the integrated-as. llvm-svn: 163063	2012-09-01 15:06:36 +00:00
Owen Anderson	90e0eaffa8	Teach DAG combine a number of tricks to simplify FMA expressions in fast-math mode. llvm-svn: 163051	2012-09-01 06:04:27 +00:00
NAKAMURA Takumi	d35a4ff88b	llvm/test/CodeGen/X86/fp-fast.ll: Suppress FMA4 on AMD Bulldozer host, corresponding to r162999. llvm-svn: 163041	2012-09-01 00:26:28 +00:00
Manman Ren	3590361bf0	Fix Atom bots for r163036. llvm-svn: 163040	2012-09-01 00:17:06 +00:00
Manman Ren	26c5d0f607	SelectionDAG: when constructing VZEXT_LOAD from other loads, make sure its output chain is correctly setup. As an example, if the original load must happen before later stores, we need to make sure the constructed VZEXT_LOAD is constrained to be before the stores. rdar://11457792 llvm-svn: 163036	2012-08-31 23:16:57 +00:00
Craig Topper	908e685102	Mark FMA4 instructions as commutable and add them to the folding tables. llvm-svn: 163035	2012-08-31 23:10:34 +00:00
Michael Liao	3224543bf9	Fix PR12359 - In addition to undefined, if V2 is zero vector, skip 2nd PSHUFB and POR as well as PSHUFB will zero elements with negative indices. Patch by Sriram Murali <sriram.murali@intel.com> llvm-svn: 163018	2012-08-31 20:12:31 +00:00
Craig Topper	c0387f6b23	Mark FMA3 instructions as commutable so that the operands to the multiply part can be commuted. llvm-svn: 163001	2012-08-31 16:31:13 +00:00
Craig Topper	c30fdbc46c	Add support for converting llvm.fma to fma4 instructions. llvm-svn: 162999	2012-08-31 15:40:30 +00:00
Jakob Stoklund Olesen	96f87069c4	Don't enforce ordered inline asm operands. I was too optimistic, inline asm can have tied operands that don't follow the def order. Fixes PR13742. llvm-svn: 162998	2012-08-31 15:34:59 +00:00
NAKAMURA Takumi	2762dadf2c	llvm/test/CodeGen/X86/vec_select.ll: Fix failure on xmm-less hosts, to add -mattr=+sse2. FIXME: Should this be tested with both +avx and -avx,+sse2? llvm-svn: 162983	2012-08-31 10:02:22 +00:00
Jakob Stoklund Olesen	d3bda3c5b9	Fix a couple of typos in EmitAtomic. Thumb2 instructions are mostly constrained to rGPR, not tGPR which is for Thumb1. rdar://problem/12203728 llvm-svn: 162968	2012-08-31 02:08:34 +00:00
Pete Cooper	e969340fea	Take account of boolean vector contents when promoting a build vector from i1 to some other type. rdar://problem/12210060 llvm-svn: 162960	2012-08-30 23:58:52 +00:00
Owen Anderson	d1545e3715	Try to make this test more generic to unbreak buildbots. llvm-svn: 162958	2012-08-30 23:51:20 +00:00
Owen Anderson	cc61f87cf7	Teach the DAG combiner to turn chains of FADDs (x+x+x+x+...) into FMULs by constants. This is only enabled in unsafe FP math mode, since it does not preserve rounding effects for all such constants. llvm-svn: 162956	2012-08-30 23:35:16 +00:00
Nadav Rotem	ea973bda26	Currently targets that do not support selects with scalar conditions and vector operands - scalarize the code. ARM is such a target because it does not support CMOV of vectors. To implement this efficientlyi, we broadcast the condition bit and use a sequence of NAND-OR to select between the two operands. This is the same sequence we use for targets that don't have vector BLENDs (like SSE2). rdar://12201387 llvm-svn: 162926	2012-08-30 19:17:29 +00:00
Michael Liao	bbd10792c2	Introduce 'UseSSEx' to force SSE legacy encoding - Add 'UseSSEx' to force SSE legacy insn not being selected when AVX is enabled. As the penalty of inter-mixing SSE and AVX instructions, we need prevent SSE legacy insn from being generated except explicitly specified through some intrinsics. For patterns supported by both SSE and AVX, so far, we force AVX insn will be tried first relying on AddedComplexity or position in td file. It's error-prone and introduces bugs accidentally. 'UseSSEx' is disabled when AVX is turned on. For SSE insns inherited by AVX, we need this predicate to force VEX encoding or SSE legacy encoding only. For insns not inherited by AVX, we still use the previous predicates, i.e. 'HasSSEx'. So far, these insns fall into the following categories: * SSE insns with MMX operands * SSE insns with GPR/MEM operands only (xFENCE, PREFETCH, CLFLUSH, CRC, and etc.) * SSE4A insns. * MMX insns. * x87 insns added by SSE. 2 test cases are modified: - test/CodeGen/X86/fast-isel-x86-64.ll AVX code generation is different from SSE one. 'vcvtsi2sdq' cannot be selected by fast-isel due to complicated pattern and fast-isel fallback to materialize it from constant pool. - test/CodeGen/X86/widen_load-1.ll AVX code generation is different from SSE one after fixing SSE/AVX inter-mixing. Exec-domain fixing prefers 'vmovapd' instead of 'vmovaps'. llvm-svn: 162919	2012-08-30 16:54:46 +00:00
Tim Northover	ca9f384ff8	Add support for moving pure S-register to NEON pipeline if desired llvm-svn: 162898	2012-08-30 10:17:45 +00:00
Michael Liao	271f11b571	Should put test case under test/ExecutionEngine/MCJIT/ llvm-svn: 162885	2012-08-30 00:43:57 +00:00
Michael Liao	3c8980646b	Fix PR13727 - The root cause is that target constant materialization in X86 fast-isel creates a PC-rel addressing which may overflow 32-bit range in non-Small code model if .rodata section is allocated too far away from code segment in MCJIT, which uses Large code model so far. - Follow the similar logic to fix non-Small code model in fast-isel by skipping non-Small code model. llvm-svn: 162881	2012-08-30 00:30:16 +00:00
Hal Finkel	1859d26528	Reserve space for the mandatory traceback fields on PPC64. We need to reserve space for the mandatory traceback fields, though leaving them as zero is appropriate for now. Although the ABI calls for these fields to be filled in fully, no compiler on Linux currently does this, and GDB does not read these fields. GDB uses the first word of zeroes during exception handling to find the end of the function and the size field, allowing it to compute the beginning of the function. DWARF information is used for everything else. We need the extra 8 bytes of pad so the size field is found in the right place. As a comparison, GCC fills in a few of the fields -- language, number of saved registers -- but ignores the rest. IBM's proprietary OSes do make use of the full traceback table facility. Patch by Bill Schmidt. llvm-svn: 162854	2012-08-29 20:22:24 +00:00
Jush Lu	e87e559e62	[arm-fast-isel] Add support for ARM PIC. llvm-svn: 162823	2012-08-29 02:41:21 +00:00
Roman Divacky	8c4b6a307e	Emit word of zeroes after the last instruction as a start of the mandatory traceback table on PowerPC64. This helps gdb handle exceptions. The other mandatory fields are ignored by gdb and harder to implement so just add there a FIXME. Patch by Bill Schmidt. PR13641. llvm-svn: 162778	2012-08-28 19:06:55 +00:00
Hal Finkel	742b535e40	Add PPC Freescale e500mc and e5500 subtargets. Add subtargets for Freescale e500mc (32-bit) and e5500 (64-bit) to the PowerPC backend. Patch by Tobias von Koch. llvm-svn: 162764	2012-08-28 16:12:39 +00:00
Bill Wendling	cc56718038	The commutative flag is already correctly set within the multiclass. If we set it here, then a 'register-memory' version would wrongly get the commutative flag. <rdar://problem/12180135> llvm-svn: 162741	2012-08-28 07:36:46 +00:00
Craig Topper	bd509eea4a	Merge AVX_SET0PSY/AVX_SET0PDY/AVX2_SET0 into a single post-RA pseudo. llvm-svn: 162738	2012-08-28 07:05:28 +00:00
NAKAMURA Takumi	cdfe1d1cdb	llvm/test/CodeGen/X86/pr12312.ll: Add -mtriple=x86_64-unknown-unknown. llvm-svn: 162736	2012-08-28 04:04:29 +00:00
Michael Liao	b7d85b6328	Fix PR12312 - Add a target-specific DAG optimization to recognize a pattern PTEST-able. Such a pattern is a OR'd tree with X86ISD::OR as the root node. When X86ISD::OR node has only its flag result being used as a boolean value and all its leaves are extracted from the same vector, it could be folded into an X86ISD::PTEST node. llvm-svn: 162735	2012-08-28 03:34:40 +00:00
Jakob Stoklund Olesen	87cb471e52	Remove extra MayLoad/MayStore flags from atomic_load/store. These extra flags are not required to properly order the atomic load/store instructions. SelectionDAGBuilder chains atomics as if they were volatile, and SelectionDAG::getAtomic() sets the isVolatile bit on the memory operands of all atomic operations. The volatile bit is enough to order atomic loads and stores during and after SelectionDAG. This means we set mayLoad on atomic_load, mayStore on atomic_store, and mayLoad+mayStore on the remaining atomic read-modify-write operations. llvm-svn: 162733	2012-08-28 03:11:32 +00:00
Akira Hatanaka	b5af7121b1	Fix mips' long branch pass. Instructions emitted to compute branch offsets now use immediate operands instead of symbolic labels. This change was needed because there were problems when R_MIPS_HI16/LO16 relocations were used to make shared objects. llvm-svn: 162731	2012-08-28 03:03:05 +00:00
Akira Hatanaka	adb14f56c7	Fix bug 13532. In SelectionDAGLegalize::ExpandLegalINT_TO_FP, expand INT_TO_FP nodes without using any f64 operations if f64 is not a legal type. Patch by Stefan Kristiansson. llvm-svn: 162728	2012-08-28 02:12:42 +00:00
Hal Finkel	686f2ee226	Allow remat of LI on PPC. Allow load-immediates to be rematerialised in the register coalescer for PPC. This makes test/CodeGen/PowerPC/big-endian-formal-args.ll fail, because it relies on a register move getting emitted. The immediate load is equivalent, so change this test case. Patch by Tobias von Koch. llvm-svn: 162727	2012-08-28 02:10:33 +00:00
Hal Finkel	5ab378037f	Eliminate redundant CR moves on PPC32. The 32-bit ABI requires CR bit 6 to be set if the call has fp arguments and unset if it doesn't. The solution up to now was to insert a MachineNode to set/unset the CR bit, which produces a CR vreg. This vreg was then copied into CR bit 6. When the register allocator saw a bunch of these in the same function, it allocated the set/unset CR bit in some random CR register (1 extra instruction) and then emitted CR moves before every vararg function call, rather than just setting and unsetting CR bit 6 directly before every vararg function call. This patch instead inserts a PPCcrset/PPCcrunset instruction which are then matched by a dedicated instruction pattern. Patch by Tobias von Koch. llvm-svn: 162725	2012-08-28 02:10:27 +00:00
Hal Finkel	e39526a789	Optimize zext on PPC64. The zeroextend IR instruction is lowered to an 'and' node with an immediate mask operand, which in turn gets legalised to a sequence of ori's & ands. This can be done more efficiently using the rldicl instruction. Patch by Tobias von Koch. llvm-svn: 162724	2012-08-28 02:10:15 +00:00
Bill Wendling	988a47d7e5	Make sure we add the predicate after all of the registers are added. <rdar://problem/12183003> llvm-svn: 162703	2012-08-27 22:12:44 +00:00
NAKAMURA Takumi	fee50c8cf1	llvm/test/CodeGen/X86/fma.ll: Add -march=x86, or two tests would fail on non-x86 hosts. llvm-svn: 162667	2012-08-27 11:50:26 +00:00
NAKAMURA Takumi	10eb4cfc3e	llvm/test/CodeGen/X86/fma_patterns.ll: Add -mtriple=x86_64. It was incompatible on i686 and Windows x64. llvm-svn: 162664	2012-08-27 09:37:54 +00:00
Craig Topper	bfc1d0ed48	Commit test change for r162658. llvm-svn: 162660	2012-08-27 07:55:50 +00:00
Anitha Boyapati	0dd589c5f1	FMA3 tests on bdver2 target for changes made in rev 162012. Also made corresponding changes to existing tests for darwin triple to ensure that same pattern is tested for bdver2 target. llvm-svn: 162655	2012-08-27 06:59:01 +00:00
Craig Topper	9e4f0aae17	Make sure that FMA3 is favored even when FMA4 is also enabled. Test case for r162454. llvm-svn: 162653	2012-08-27 03:38:15 +00:00
Jakob Stoklund Olesen	c2272df1be	Infer instruction properties from single-instruction patterns. Previously, instructions without a primary patterns wouldn't get their properties inferred. Now, we use all single-instruction patterns for inference, including 'def : Pat<>' instances. This causes a lot of instruction flags to change. - Many instructions no longer have the UnmodeledSideEffects flag because their flags are now inferred from a pattern. - Instructions with intrinsics will get a mayStore flag if they already have UnmodeledSideEffects and a mayLoad flag if they already have mayStore. This is because intrinsics properties are linear. - Instructions with atomic_load patterns get a mayStore flag because atomic loads can't be reordered. The correct workaround is to create pseudo-instructions instead of using normal loads. PR13693. llvm-svn: 162614	2012-08-24 22:46:53 +00:00
Akira Hatanaka	4a08a4a8b6	Disable Mips' delay slot filler when optimization level is O0. llvm-svn: 162589	2012-08-24 20:40:15 +00:00
Akira Hatanaka	e8e4ef102d	In MipsDAGToDAGISel::SelectAddr, fold add node into address operand, if its second operand is MipsISD::GPRel. llvm-svn: 162584	2012-08-24 20:21:49 +00:00
Manman Ren	cf10446ffa	BranchProb: modify the definition of an edge in BranchProbabilityInfo to handle the case of multiple edges from one block to another. A simple example is a switch statement with multiple values to the same destination. The definition of an edge is modified from a pair of blocks to a pair of PredBlock and an index into the successors. Also set the weight correctly when building SelectionDAG from LLVM IR, especially when converting a Switch. IntegersSubsetMapping is updated to calculate the weight for each cluster. llvm-svn: 162572	2012-08-24 18:14:27 +00:00
Roman Divacky	ace4707ea6	Lower constant pools and jump tables via TOC on PPC64/SVR4. In collaboration with Adhemerval Zanella. llvm-svn: 162562	2012-08-24 16:26:02 +00:00
Stepan Dyatkovskiy	99120e04be	Rejected 169195. As Duncan commented, bitcasting to proper type is wrong approach. We need to insert some valid TRANCATE node here. llvm-svn: 162354	2012-08-22 09:33:55 +00:00
Akira Hatanaka	ad4950258b	Add register Mips::GP to the list of reserved registers if target is bare-metal to prevent it from being clobbered. mips uses $gp to access small data section. This bug was originally reported by Carl Norum. llvm-svn: 162340	2012-08-22 03:18:13 +00:00
Akira Hatanaka	9d957842e1	Add option disable-mips-delay-filler. Turn on mips' delay slot filler by default. Patch by Carl Norum. llvm-svn: 162339	2012-08-22 02:51:28 +00:00
Tim Northover	f39c1a3f72	Add correct set of regression tests for r162094 commit. llvm-svn: 162276	2012-08-21 12:43:03 +00:00
Jakob Stoklund Olesen	74e6f9fc65	Add a missing def flag. * Bad machine code: Explicit definition marked as use * - function: test_cos - basic block: BB#0 L.entry (0x7ff2a2024fd0) - instruction: VSETLNi32 %D11, %D11<undef>, %R0, 0, pred:14, pred:%noreg, %Q5<imp-use,kill>, %Q5<imp-def> - operand 0: %D11 llvm-svn: 162247	2012-08-21 00:34:53 +00:00
Jakob Stoklund Olesen	7d33c5739f	Don't add CFG edges for redundant conditional branches. IR that hasn't been through SimplifyCFG can look like this: br i1 %b, label %r, label %r Make sure we don't create duplicate Machine CFG edges in this case. Fix the machine code verifier to accept conditional branches with a single CFG edge. llvm-svn: 162230	2012-08-20 21:39:52 +00:00
Michael Liao	10ff96ce8c	fix a case where all operands of BUILD_VECTOR are undefined llvm-svn: 162214	2012-08-20 17:59:18 +00:00
Stepan Dyatkovskiy	6ee89aafc8	Forget to add testcase for r162195. Sorry. llvm-svn: 162196	2012-08-20 08:03:18 +00:00
Nadav Rotem	178250ad87	When unsafe math is used, we can use commutative FMAX and FMIN. In some cases this allows for better code generation. Added a new DAGCombine transformation to convert FMAX and FMIN to FMANC and FMINC, which are commutative. For example: movaps %xmm0, %xmm1 movsd LC(%rip), %xmm0 minsd %xmm1, %xmm0 becomes: minsd LC(%rip), %xmm0 llvm-svn: 162187	2012-08-19 13:06:16 +00:00
Jakob Stoklund Olesen	dded061f85	Also combine zext/sext into selects for ARM. This turns common i1 patterns into predicated instructions: (add (zext cc), x) -> (select cc (add x, 1), x) (add (sext cc), x) -> (select cc (add x, -1), x) For a function like: unsigned f(unsigned s, int x) { return s + (x>0); } We now produce: cmp r1, #0 it gt addgt.w r0, r0, #1 Instead of: movs r2, #0 cmp r1, #0 it gt movgt r2, #1 add r0, r2 llvm-svn: 162177	2012-08-18 21:25:22 +00:00
Jakob Stoklund Olesen	aab43dbfbb	Also pass logical ops to combineSelectAndUse. Add these transformations to the existing add/sub ones: (and (select cc, -1, c), x) -> (select cc, x, (and, x, c)) (or (select cc, 0, c), x) -> (select cc, x, (or, x, c)) (xor (select cc, 0, c), x) -> (select cc, x, (xor, x, c)) The selects can then be transformed to a single predicated instruction by peephole. This transformation will make it possible to eliminate the ISD::CAND, COR, and CXOR custom DAG nodes. llvm-svn: 162176	2012-08-18 21:25:16 +00:00
Nadav Rotem	a136939fa9	Reapply r162160 with a fix: Optimize Arith->Trunc->SETCC sequence to allow better compare/branch code. llvm-svn: 162172	2012-08-18 17:53:03 +00:00
Nadav Rotem	c324af609e	Revert r162160 because it made a few buildbots fail. llvm-svn: 162164	2012-08-18 05:02:36 +00:00
Nadav Rotem	2cb14a5c4b	The X86 backend has a number of optimizations for SETCC nodes which use arithmetic instructions. However, when small data types are used, a truncate node appears between the SETCC node and the arithmetic operation. This patch adds support for this pattern. Before: xorl %esi, %edi testb %dil, %dil setne %al ret After: xorb %dil, %sil setne %al ret rdar://12081007 llvm-svn: 162160	2012-08-18 02:43:28 +00:00
Eli Friedman	79a6b30d8a	Make atomic load and store of pointers work. Tighten verification of atomic operations so other unexpected operations don't slip through. Based on patch by Logan Chien. PR11786/PR13186. llvm-svn: 162146	2012-08-17 23:24:29 +00:00
Jakob Stoklund Olesen	7b1a2e8f02	Avoid folding ADD instructions with FI operands. PEI can't handle the pseudo-instructions. This can be removed when the pseudo-instructions are replaced by normal predicated instructions. Fixes PR13628. llvm-svn: 162130	2012-08-17 20:55:34 +00:00
Benjamin Kramer	ca7ca4f6c6	TargetLowering: Use the large shift amount during legalize types. The legalizer may call us with an overly large type. llvm-svn: 162101	2012-08-17 15:54:21 +00:00
Benjamin Kramer	2f47a3fb07	Fix broken check lines. I really need to find a way to automate this, but I can't come up with a regex that has no false positives while handling tricky cases like custom check prefixes. llvm-svn: 162097	2012-08-17 12:28:26 +00:00
Tim Northover	f66181530f	Implement NEON domain switching for scalar <-> S-register vmovs on ARM llvm-svn: 162094	2012-08-17 11:32:52 +00:00
Jakob Stoklund Olesen	0ea1fce6b4	Add ADD and SUB to the predicable ARM instructions. It is not my plan to duplicate the entire ARM instruction set with predicated versions. We need a way of representing predicated instructions in SSA form without requiring a separate opcode. Then the pseudo-instructions can go away. llvm-svn: 162061	2012-08-16 23:21:55 +00:00
Jush Lu	26088cb30e	[arm-fast-isel] Add support for fastcc. Without fastcc support, the caller just falls through to CallingConv::C for fastcc, but callee still uses fastcc, this inconsistency of calling convention is a problem, and fastcc support can fix it. llvm-svn: 162013	2012-08-16 05:15:53 +00:00
Akira Hatanaka	269d3fd101	Test case for r162008. llvm-svn: 162009	2012-08-16 03:48:41 +00:00
Jakob Stoklund Olesen	6cb96120f1	Fold predicable instructions into MOVCC / t2MOVCC. The ARM select instructions are just predicated moves. If the select is the only use of an operand, the instruction defining the operand can be predicated instead, saving one instruction and decreasing register pressure. This implementation can turn AND/ORR/EOR instructions into their corresponding ANDCC/ORRCC/EORCC variants. Ideally, we should be able to predicate any instruction, but we don't yet support predicated instructions in SSA form. llvm-svn: 161994	2012-08-15 22:16:39 +00:00
Bill Wendling	2c8685e327	Rework test so that it reproduces the error without the horrible flag. llvm-svn: 161989	2012-08-15 21:10:18 +00:00
Bill Wendling	d63f1f5a9c	Remove invalid test. This test requires that dead basic blocks be kept around. That's not how we do things. Besides, the commit message tells us that it is covered by the GCC test suite. ------------------------------------------------------------------------ r127497 \| zwarich \| 2011-03-11 13:51:56 -0800 (Fri, 11 Mar 2011) \| 3 lines Fix the GCC test suite issue exposed by r127477, which was caused by stack protector insertion not working correctly with unreachable code. Since that revision was rolled out, this test doesn't actual fail before this fix. ------------------------------------------------------------------------ llvm-svn: 161985	2012-08-15 20:54:09 +00:00
Evan Cheng	eec6bc6270	Use vld1/vst1 to load/store f64 if alignment is < 4 and the target allows unaligned access. rdar://12091029 llvm-svn: 161962	2012-08-15 17:44:53 +00:00
Anton Korobeynikov	c6d945b11a	The names of VFP variants of half-to-float conversion instructions were reversed. This leads to wrong codegen for float-to-half conversion intrinsics which are used to support storage-only fp16 type. NEON variants of same instructions are fine. llvm-svn: 161907	2012-08-14 23:36:01 +00:00
Michael Liao	34107b9177	fix PR11334 - FP_EXTEND only support extending from vectors with matching elements. This results in the scalarization of extending to v2f64 from v2f32, which will be legalized to v4f32 not matching with v2f64. - add X86-specific VFPEXT supproting extending from v4f32 to v2f64. - add BUILD_VECTOR lowering helper to recover back the original extending from v4f32 to v2f64. - test case is enhanced to include different vector width. llvm-svn: 161894	2012-08-14 21:24:47 +00:00
Nadav Rotem	70409991bc	During the CodeGenPrepare we often lower intrinsics (such as objsize) and allow some optimizations to turn conditional branches into unconditional. This commit adds a simple control-flow optimization which merges two consecutive basic blocks which are connected by a single edge. This allows the codegen to operate on larger basic blocks. rdar://11973998 llvm-svn: 161852	2012-08-14 05:19:07 +00:00
NAKAMURA Takumi	245920463d	llvm/test/CodeGen/ARM/floorf.ll: Add explicit -mtriple=arm-unknown-unknown, or it fails on msvc. llvm-svn: 161825	2012-08-14 00:56:06 +00:00
Owen Anderson	a40319b7f1	Add a roundToIntegral method to APFloat, which can be parameterized over various rounding modes. Use this to implement SelectionDAG constant folding of FFLOOR, FCEIL, and FTRUNC. llvm-svn: 161807	2012-08-13 23:32:49 +00:00
Bill Wendling	72baa6eeae	Rename test since it's not linux-specific. llvm-svn: 161792	2012-08-13 21:32:42 +00:00
Jakob Stoklund Olesen	83a927d84a	Handle extra Tail predecessors in if-conversion. It is still possible to if-convert if the tail block has extra predecessors, but the tail phis must be rewritten instead of being removed. llvm-svn: 161781	2012-08-13 20:49:04 +00:00
Arnold Schwaighofer	0bb7f23cfc	[Hexagon] Don't mark callee saved registers as clobbered by a tail call This was causing unnecessary spills/restores of callee saved registers. Fixes PR13572. Patch by Pranav Bhandarkar! llvm-svn: 161778	2012-08-13 19:54:01 +00:00
Manman Ren	9746b33e26	Fix failure on Atom bot due to r161769 llvm-svn: 161777	2012-08-13 19:34:29 +00:00
Nadav Rotem	3a94c545cf	Do not optimize (or (and X,Y), Z) into BFI and other sequences if the AND ISDNode has more than one user. rdar://11876519 llvm-svn: 161775	2012-08-13 18:52:44 +00:00
Manman Ren	959acb106b	X86: move Int_CVTSD2SSrr, Int_CVTSI2SSrr, Int_CVTSI2SDrr, Int_CVTSS2SDrr from OpTbl1 to OpTbl2 since they have 3 operands and the last operand can be changed to a memory operand. PR13576 llvm-svn: 161769	2012-08-13 18:29:41 +00:00
Eric Christopher	7d8b53c1f8	Add support for the %H output modifier. Patch by Weiming Zhao. llvm-svn: 161768	2012-08-13 18:18:52 +00:00
Tim Northover	b4abb84d9c	Add test for previous commit correcting NEON load patterns. llvm-svn: 161750	2012-08-13 10:38:45 +00:00
Arnold Schwaighofer	b73da9453c	Revert 161581: Patch to implement UMLAL/SMLAL instructions for the ARM architecture It broke MultiSource/Applications/JM/ldecod/ldecod on armv7 thumb O0 g and armv7 thumb O3. llvm-svn: 161736	2012-08-12 05:11:56 +00:00
Michael Liao	e7e828fd64	fix PR13577, an issue introduced by r161687 - FCMOV only supports a subset of X86 conditions. Skip boolean simplification if X86 condition is not valid for FCMOV. - add a minimal test case for PR13577. llvm-svn: 161732	2012-08-11 23:47:06 +00:00
Benjamin Kramer	ef6494f24d	PR13578: Teach MachineCSE that instructions that use a constant register can be CSE'd safely. This is common e.g. when doing rip-relative addressing on x86_64. llvm-svn: 161728	2012-08-11 19:05:13 +00:00
Manman Ren	1acb6707cd	X86: when we are auto-detecting the subtarget features, make sure we turn on FeatureFastUAMem for Nehalem, Westmere and Sandy Bridge. FeatureFastUAMem is already on if we pass in nehalem or westmere as a command argument. rdar: 7252306 llvm-svn: 161717	2012-08-10 23:43:32 +00:00
Eli Friedman	4c923b3b3f	The normal edge of an invoke is not allowed to branch to a block with a landingpad. Enforce it in the verifier, and fix the regression tests to match. llvm-svn: 161697	2012-08-10 20:55:20 +00:00
Michael Liao	5248e9913f	add X86-specific DAG optimization to simplify boolean test - if a boolean test (X86ISD::CMP or X86ISD:SUB) checks a boolean value generated from X86ISD::SETCC, try to simplify the boolean value generation and checking by reusing the original EFLAGS with proper condition code - add hooks to X86 specific SETCC/BRCOND/CMOV, the major 3 places consuming EFLAGS part of patches fixing PR12312 llvm-svn: 161687	2012-08-10 19:58:13 +00:00
Jakob Stoklund Olesen	8c28ac9ec9	Update edge weights correctly in replaceSuccessor(). When replacing Old with New, it can happen that New is already a successor. Add the old and new edge weights instead of creating a duplicate edge. llvm-svn: 161653	2012-08-10 03:23:27 +00:00
Jakob Stoklund Olesen	d9b66506a3	Reapply r161633-161634 "Partition use lists so defs always come before uses."" No changes to these patches, MRI needed to be notified when changing uses into defs and vice versa. llvm-svn: 161644	2012-08-10 00:21:30 +00:00
Jakob Stoklund Olesen	acd27c9279	Revert r161633-161634 "Partition use lists so defs always come before uses." These commits broke a number of buildbots. llvm-svn: 161640	2012-08-09 23:31:36 +00:00
Jakob Stoklund Olesen	df01e00710	Partition use lists so defs always come before uses. This makes it possible to speed up def_iterator by stopping at the first use. This makes def_empty() and getUniqueVRegDef() much faster when there are many uses. In a +Asserts build, LiveVariables is 100x faster in one case because getVRegDef() has an assertion that would scan to the end of a def_iterator chain. Spill weight calculation is significantly faster (300x in one case) because isTriviallyReMaterializable() calls MRI->isConstantPhysReg(%RIP) which calls def_empty(%RIP). llvm-svn: 161634	2012-08-09 22:49:46 +00:00
Jakob Stoklund Olesen	7d7051ca3c	Don't use pointer-pointers for the register use lists. Use a more conventional doubly linked list where the Prev pointers form a cycle. This means it is no longer necessary to adjust the Prev pointers when reallocating the VRegInfo array. The test changes are required because the register allocation hint is using the use-list order to break ties. llvm-svn: 161633	2012-08-09 22:49:42 +00:00
Jakob Stoklund Olesen	4238a89db8	Don't modify MO while use_iterator is still pointing to it. llvm-svn: 161626	2012-08-09 22:08:24 +00:00
Arnold Schwaighofer	81b2eec1ab	Patch to implement UMLAL/SMLAL instructions for the ARM architecture This patch corrects the definition of umlal/smlal instructions and adds support for matching them to the ARM dag combiner. Bug 12213 Patch by Yin Ma! llvm-svn: 161581	2012-08-09 15:25:52 +00:00
Nadav Rotem	e0f84d31c8	Fix the legalization of ExtLoad on ARM. ExpandUnalignedLoad did not properly handle the cases where the memory value type was illegal. PR 13111. llvm-svn: 161565	2012-08-09 01:56:44 +00:00
Bob Wilson	4c65c505e0	Add test triples to fix win32 failures. Revert workaround from r161292. I don't have a win32 system to test, so hopefully I got them all fixed here. llvm-svn: 161519	2012-08-08 20:31:37 +00:00
Manman Ren	1be131ba27	X86: enable CSE between CMP and SUB We perform the following: 1> Use SUB instead of CMP for i8,i16,i32 and i64 in ISel lowering. 2> Modify MachineCSE to correctly handle implicit defs. 3> Convert SUB back to CMP if possible at peephole. Removed pattern matching of (a>b) ? (a-b):0 and like, since they are handled by peephole now. rdar://11873276 llvm-svn: 161462	2012-08-08 00:51:41 +00:00
Evan Cheng	fbdd25c135	X86 cmp lowering is looking past truncate on the condition node. It should only do so when the high bits are known zero. This caused a subtle miscompilation. rdar://12027825 llvm-svn: 161451	2012-08-07 22:21:00 +00:00
Chandler Carruth	881d0a7966	Add a much more conservative strategy for aligning branch targets. Previously, MBP essentially aligned every branch target it could. This bloats code quite a bit, especially non-looping code which has no real reason to prefer aligned branch targets so heavily. As Andy said in review, it's still a bit odd to do this without a real cost model, but this at least has much more plausible heuristics. Fixes PR13265. llvm-svn: 161409	2012-08-07 09:45:24 +00:00
Manman Ren	cb36b8c2e6	MachineCSE: Update the heuristics for isProfitableToCSE. If the result of a common subexpression is used at all uses of the candidate expression, CSE should not increase the live range of the common subexpression. rdar://11393714 and rdar://11819721 llvm-svn: 161396	2012-08-07 06:16:46 +00:00
Hal Finkel	33e529d56b	MFTB on PPC64 should really be encoded using MFSPR. The MFTB instruction itself is being phased out, and its functionality is provided by MFSPR. According to the ISA docs, using MFSPR works on all known chips except for the 601 (which did not have a timebase register anyway) and the POWER3. Thanks to Adhemerval Zanella for pointing this out! llvm-svn: 161346	2012-08-06 21:21:44 +00:00
Craig Topper	ab47fe4e16	Implement proper handling for pcmpistri/pcmpestri intrinsics. Requires custom handling in DAGISelToDAG due to limitations in TableGen's implicit def handling. Fixes PR11305. llvm-svn: 161318	2012-08-06 06:22:36 +00:00
Craig Topper	812005e562	Update test to check for r161305 llvm-svn: 161307	2012-08-05 09:06:28 +00:00
Hal Finkel	70381a7b18	Add readcyclecounter lowering on PPC64. On PPC64, this can be done with a simple TableGen pattern. To enable this, I've added the (otherwise missing) readcyclecounter SDNode definition to TargetSelectionDAG.td. llvm-svn: 161302	2012-08-04 14:10:46 +00:00
Anton Korobeynikov	218aaf6d04	Add stack spill / reload instructions for DTriple and DQuad register classes, which were missed for no reason. This fixes PR13377 llvm-svn: 161299	2012-08-04 13:16:12 +00:00
Bob Wilson	874886cd66	Refactor and check "onlyReadsMemory" before optimizing builtins. This patch is mostly just refactoring a bunch of copy-and-pasted code, but it also adds a check that the call instructions are readnone or readonly. That check was already present for sin, cos, sqrt, log2, and exp2 calls, but it was missing for the rest of the builtins being handled in this code. llvm-svn: 161282	2012-08-03 23:29:17 +00:00
Akira Hatanaka	22bec282e9	1. Redo mips16 instructions to avoid multiple opcodes for same instruction. Change these to patterns. 2. Add another 16 instructions. Patch by Reed Kotler. llvm-svn: 161272	2012-08-03 22:57:02 +00:00
Bob Wilson	fa59485b94	Fix memcmp code-gen to honor -fno-builtin. I noticed that SelectionDAGBuilder::visitCall was missing a check for memcmp in TargetLibraryInfo, so that it would use custom code for memcmp calls even with -fno-builtin. I also had to add a new -disable-simplify-libcalls option to llc so that I could write a test for this. llvm-svn: 161262	2012-08-03 21:26:18 +00:00
Bob Wilson	3e6fa462f3	Fall back to selection DAG isel for calls to builtin functions. Fast isel doesn't currently have support for translating builtin function calls to target instructions. For embedded environments where the library functions are not available, this is a matter of correctness and not just optimization. Most of this patch is just arranging to make the TargetLibraryInfo available in fast isel. <rdar://problem/12008746> llvm-svn: 161232	2012-08-03 04:06:28 +00:00
Jush Lu	4705da9020	[arm-fast-isel] Add support for shl, lshr, and ashr. llvm-svn: 161230	2012-08-03 02:37:48 +00:00
Manman Ren	ba8122cc25	X86 Peephole: fold loads to the source register operand if possible. Add more comments and use early returns to reduce nesting in isLoadFoldable. Also disable folding for V_SET0 to avoid introducing a const pool entry and a const pool load. rdar://10554090 and rdar://11873276 llvm-svn: 161207	2012-08-02 19:37:32 +00:00
Akira Hatanaka	fffad897f2	Set transient stack alignment in constructor of MipsFrameLowering and re-enable test o32_cc_vararg.ll. llvm-svn: 161189	2012-08-02 18:15:13 +00:00
NAKAMURA Takumi	7020f51622	llvm/test/CodeGen/X86/fold-pcmpeqd-1.ll: Make sure this is testing without +avx. FIXME: Could +avx be checked here too? llvm-svn: 161156	2012-08-02 06:36:56 +00:00
NAKAMURA Takumi	aaca1e690d	llvm/test/CodeGen/X86/fold-pcmpeqd-1.ll: Rewrite expressions to pass regardless of PR11031. - Relax to match even if epilogue (pop %ebp) were emitted. - Assume the return value is stored to %xmm0. llvm-svn: 161155	2012-08-02 06:33:58 +00:00
Manman Ren	5759d01230	X86 Peephole: fold loads to the source register operand if possible. Machine CSE and other optimizations can remove instructions so folding is possible at peephole while not possible at ISel. This patch is a rework of r160919 and was tested on clang self-host on my local machine. rdar://10554090 and rdar://11873276 llvm-svn: 161152	2012-08-02 00:56:42 +00:00
Matt Beaumont-Gay	7947aecaf1	Line endings. llvm-svn: 161117	2012-08-01 16:42:35 +00:00
Elena Demikhovsky	3cb3b0045c	Added FMA functionality to X86 target. llvm-svn: 161110	2012-08-01 12:06:00 +00:00
Akira Hatanaka	d1c43cee24	Add definitions of two subclasses of MipsFrameLowering, Mips16FrameLowering and MipsSEFrameLowering. Implement MipsSEFrameLowering::hasReservedCallFrame. Call frames will not be reserved if there is a call with a large call frame or there are variable sized objects on the stack. llvm-svn: 161090	2012-07-31 22:50:19 +00:00
Akira Hatanaka	02de0e4425	Let PEI::calculateFrameObjectOffsets compute the final stack size rather than computing it in MipsFrameLowering::emitPrologue. llvm-svn: 161078	2012-07-31 21:28:49 +00:00
Akira Hatanaka	33a25af5a8	Expand DYNAMIC_STACKALLOC nodes rather than doing custom-lowering. The frame object which points to the dynamically allocated area will not be needed after changes are made to cease reserving call frames. llvm-svn: 161076	2012-07-31 20:54:48 +00:00
Akira Hatanaka	beda2241a4	When store nodes or memcpy nodes are created to copy the function call arguments to the stack in MipsISelLowering::LowerCall, use stack pointer and integer offset operands rather than frame object operands. llvm-svn: 161068	2012-07-31 18:46:41 +00:00
Chad Rosier	710be7df71	[x86 frame lowering] In 32-bit mode, use ESI as the base pointer. Previously, we were using EBX, but PIC requires the GOT to be in EBX before function calls via PLT GOT pointer. llvm-svn: 161066	2012-07-31 18:29:21 +00:00
Akira Hatanaka	4ce7c4060d	Fix type of LUXC1 and SUXC1. These instructions were incorrectly defined as single-precision load and store. Also avoid selecting LUXC1 and SUXC1 instructions during isel. It is incorrect to map unaligned floating point load/store nodes to these instructions. llvm-svn: 161063	2012-07-31 18:16:49 +00:00
Manman Ren	8c549b586c	MachineSink: Sort the successors before trying to find SuccToSinkTo. One motivating example is to sink an instruction from a basic block which has two successors: one outside the loop, the other inside the loop. We should try to sink the instruction outside the loop. rdar://11980766 llvm-svn: 161062	2012-07-31 18:10:39 +00:00
Jakob Stoklund Olesen	0c807dfae2	Clear kill flags in removeCopyByCommutingDef(). We are extending live ranges, so kill flags are not accurate. They aren't needed until they are recomputed after RA anyway. <rdar://problem/11950722> llvm-svn: 161023	2012-07-31 02:47:24 +00:00
Manman Ren	2b6a0dfd4c	Reverse order of the two branches at end of a basic block if it is profitable. We branch to the successor with higher edge weight first. Convert from je LBB4_8 --> to outer loop jmp LBB4_14 --> to inner loop to jne LBB4_14 jmp LBB4_8 PR12750 rdar: 11393714 llvm-svn: 161018	2012-07-31 01:11:07 +00:00
Pete Cooper	91244268d7	Consider address spaces for hashing and CSEing DAG nodes. Otherwise two loads from different x86 segments but the same address would get CSEd llvm-svn: 160987	2012-07-30 20:23:19 +00:00
Manman Ren	f87dd7c01b	Revert r160920 and r160919 due to dragonegg and clang selfhost failure llvm-svn: 160927	2012-07-29 02:44:09 +00:00
Manman Ren	9de95e779c	X86 Peephole: fold loads to the source register operand if possible. Trying to fix the bot by specifying a triple in the failing testing cases. llvm-svn: 160920	2012-07-28 17:51:24 +00:00
Manman Ren	0fa3ab88ba	X86 Peephole: fold loads to the source register operand if possible. Machine CSE and other optimizations can remove instructions so folding is possible at peephole while not possible at ISel. rdar://10554090 and rdar://11873276 llvm-svn: 160919	2012-07-28 16:48:01 +00:00
Manman Ren	32367c063b	X86 Peephole: fix PR13475 in optimizeCompare. It is possible that an instruction can use and update EFLAGS. When checking the safety, we should check the usage of EFLAGS first before declaring it is safe to optimize due to the update. llvm-svn: 160912	2012-07-28 03:15:46 +00:00
Evan Cheng	249716e8ae	Teach CodeGenPrep to look past bitcast when it's duplicating return instruction into predecessor blocks to enable tail call optimization. rdar://11958338 llvm-svn: 160894	2012-07-27 21:21:26 +00:00
Jakob Stoklund Olesen	bc65e8f94e	Add <imp-def> of super-register when lowering SUBREG_TO_REG. Patch by Tyler Nowicki! llvm-svn: 160888	2012-07-27 20:19:49 +00:00
Jakob Stoklund Olesen	ceee4a9d0c	Eliminate a batch of uses of sub_ss and sub_sd in the X86 target. These idempotent sub-register indices don't do anything --- They simply map XMM registers to themselves. They no longer affect register classes either since the SubRegClasses field has been removed from Target.td. This patch replaces XMM->XMM EXTRACT_SUBREG and INSERT_SUBREG patterns with COPY_TO_REGCLASS patterns which simply become COPY instructions. The number of IMPLICIT_DEF instructions before register allocation is reduced, and that is the cause of the test case changes. llvm-svn: 160816	2012-07-26 21:40:42 +00:00
Akira Hatanaka	64626fc20f	Fix call setup for PIC. Patch by Reed Kotler. llvm-svn: 160774	2012-07-26 02:24:43 +00:00
Manman Ren	e8c6b15137	Update testing case for Atom when disabling rematerialization in TwoAddressInstructionPass. The generated code for Atom has a different code sequence. This is realted to commit r160749. llvm-svn: 160755	2012-07-25 20:17:14 +00:00
Manman Ren	cc1dc6dc11	Disable rematerialization in TwoAddressInstructionPass. It is redundant; RegisterCoalescer will do the remat if it can't eliminate the copy. Collected instruction counts before and after this. A few extra instructions are generated due to spilling but it is normal to see these kinds of changes with almost any small codegen change, according to Jakob. This also fixed rdar://11830760 where xor is expected instead of movi0. llvm-svn: 160749	2012-07-25 18:28:13 +00:00
Rafael Espindola	11c38b9657	When a return struct pointer is passed in registers, the called has nothing to pop. llvm-svn: 160725	2012-07-25 13:41:10 +00:00
Akira Hatanaka	5a69c235ae	Eliminate the stack slot used to save the global base register. The long branch pass (fixed in r160601) no longer uses the global base register to compute addresses of branch destinations, so it is not necessary to reserve a slot on the stack. llvm-svn: 160703	2012-07-25 03:16:47 +00:00
Rafael Espindola	a92cf29f0d	Add a cpu to the test. Should fix the atom bot. llvm-svn: 160701	2012-07-24 22:56:06 +00:00
Rafael Espindola	f30e9bfb90	Add a triple to the test. llvm-svn: 160698	2012-07-24 21:55:04 +00:00
Rafael Espindola	a44e193a11	In order to correctly compile struct s { double x1; float x2; }; __attribute__((regparm(3))) struct s f(int a, int b, int c); void g(void) { f(41, 42, 43); } We need to be able to represent passing the address of s to f (sret) in a register (inreg). Turns out that all that is needed is to not mark them as mutually incompatible. llvm-svn: 160695	2012-07-24 21:40:17 +00:00
David Chisnall	5b8c1680de	ELF does not imply GNU/Linux. Do not assume GNU conventions just because we are targeting an ELF platform. Only fold gs-relative (and fs-relative) loads if it is actually sensible to do so for the target platform. This fixes PR13438. llvm-svn: 160687	2012-07-24 20:04:16 +00:00
Akira Hatanaka	26e9ecb7a3	Add basic ability to setup call frame, and make procedure calls. Hello world will compile and execute with this patch. Patch by Reed Kotler. llvm-svn: 160651	2012-07-23 23:45:54 +00:00
Sylvestre Ledru	35521e2310	Fix a typo (the the => the) llvm-svn: 160621	2012-07-23 08:51:15 +00:00
Nadav Rotem	9056076cab	Fixed DAGCombine optimizations which generate select_cc for targets that do not support it (X86 does not lower select_cc). PR: 13428 Together with Michael Kuperstein <michael.m.kuperstein@intel.com> llvm-svn: 160619	2012-07-23 07:59:50 +00:00
Akira Hatanaka	f72efdb62f	Fix Mips long branch pass. This pass no longer requires that the global pointer value be saved to the stack or register since it uses bal instruction to compute branch distance. llvm-svn: 160601	2012-07-21 03:30:44 +00:00
Jakob Stoklund Olesen	e2cfd0d45a	Avoid folding loads that are unsafe to move. LiveRangeEdit::foldAsLoad() can eliminate a register by folding a load into its only use. Only do that when the load is safe to move, and it won't extend any live ranges. This fixes PR13414. llvm-svn: 160575	2012-07-20 21:29:31 +00:00
Jakob Stoklund Olesen	f62c07f147	Split loop exiting edges more aggressively. PHIElimination splits critical edges when it predicts it can resolve interference and eliminate copies. It doesn't split the edge if the interference wouldn't be resolved anyway because the phi-use register is live in the critical edge anyway. Teach PHIElimination to split loop exiting edges with interference, even if it wouldn't resolve the interference. This removes the necessary copies from the loop, which is still an improvement from injecting the copies into the loop. The test case demonstrates the improvement. Before: LBB0_1: cmpb $0, (%rdx) leaq 1(%rdx), %rdx movl %esi, %eax je LBB0_1 After: LBB0_1: cmpb $0, (%rdx) leaq 1(%rdx), %rdx je LBB0_1 movl %esi, %eax llvm-svn: 160571	2012-07-20 20:49:53 +00:00
Preston Gurd	f2ea70ae4a	Fix remaining lit tests which were failing when run on an Atom processor. Patches by Tyler Nowicki, Andy Zhang, and Preston Gurd! llvm-svn: 160520	2012-07-19 18:53:21 +00:00
Jush Lu	e67e07b901	[arm-fast-isel] Add support for vararg function calls. llvm-svn: 160500	2012-07-19 09:49:00 +00:00
Manman Ren	d0a4ee8427	X86: remove redundant cmp against zero. Updated OptimizeCompare in peephole to remove redundant cmp against zero. We only remove Compare if CF and OF are not used. rdar://11855129 llvm-svn: 160454	2012-07-18 21:40:01 +00:00
Preston Gurd	f0a48ec8f1	This patch fixes 8 out of 20 unexpected failures in "make check" when run on an Intel Atom processor. The failures have arisen due to changes elsewhere in the trunk over the past 8 weeks or so. These failures were not detected by the Atom buildbot because the CPU on the Atom buildbot was not being detected as an Atom CPU. The fix for this problem is in Host.cpp and X86Subtarget.cpp, but shall remain commented out until the current set of Atom test failures are fixed. Patch by Andy Zhang and Tyler Nowicki! llvm-svn: 160451	2012-07-18 20:49:17 +00:00
Chandler Carruth	985454e0ac	Fix a somewhat nasty crasher in PR13378. This crashes inside of LiveIntervals due to the two-addr pass generating bogus MI code. The crux of the issue was a loop nesting problem. The intent of the code which attempts to transform instructions before converting them to two-addr form is to defer and reprocess any transformed instructions as the second processing is likely to have more opportunities to coalesce copies, etc. Unfortunately, there was one section of processing that was not deferred -- the INSERT_SUBREG rewriting. Due to quirks of how this rewriting proceeded, not only did it occur early, it removed the bits of information needed for the deferred processing to correctly generate the necessary two address form (specifically inserting a copy), but didn't trigger any immediate assertions and produced what appeared to be already valid two-address from code. Thus, the assertion only fired much later in the pipeline. The fix is to hoist the transformation logic up layer to where it can more firmly defer all further processing, and to teach the normal processing to handle an edge case previously handled as part of the transformation logic. This edge case (already matched tied register operands) needs to not defer any steps. As has been brought up repeatedly in the process: wow does this code need refactoring. I may squeeze in some time to at least bring sanity to this loop... but wow... =] Thanks to Jakob for helpful hints on the way here, and the review. llvm-svn: 160443	2012-07-18 18:58:22 +00:00
Victor Oliveira	a1de408aa7	test commit llvm-svn: 160438	2012-07-18 17:53:05 +00:00
Jack Carter	a62ba82825	Mips specific inline asm operand modifier 'M': Print the high order register of a double word register operand. In 32 bit mode, a 64 bit double word integer will be represented by 2 32 bit registers. This modifier causes the high order register to be used in the asm expression. It is useful if you are using doubles in assembler and continue to control register to variable relationships. This patch also fixes a related bug in a previous patch: case 'D': // Second part of a double word register operand case 'L': // Low order register of a double word register operand case 'M': // High order register of a double word register operand I got 'D' and 'M' confused. The second part of a double word operand will only match 'M' for one of the endianesses. I had 'L' and 'D' be the opposite twins when 'L' and 'M' are. llvm-svn: 160429	2012-07-18 06:41:36 +00:00
Joel Jones	b84f7bea09	More replacing of target-dependent intrinsics with target-indepdent intrinsics. The second instruction(s) to be handled are the vector versions of count set bits (ctpop). The changes here are to clang so that it generates a target independent vector ctpop when it sees an ARM dependent vector bits set count. The changes in llvm are to match the target independent vector ctpop and in VMCore/AutoUpgrade.cpp to update any existing bc files containing ARM dependent vector pop counts with target-independent ctpops. There are also changes to an existing test case in llvm for ARM vector count instructions and to a test for the bitcode upgrade. <rdar://problem/11892519> There is deliberately no test for the change to clang, as so far as I know, no consensus has been reached regarding how to test neon instructions in clang; q.v. <rdar://problem/8762292> llvm-svn: 160410	2012-07-18 00:02:16 +00:00
Evan Cheng	f73d7553cc	Add test case for r160387 llvm-svn: 160389	2012-07-17 19:40:05 +00:00
Nadav Rotem	277a40bc0a	Fix a crash in the legalization of large vectors. When truncating a result of a vector that is split we need to use the result of the split vector, and not re-split the dead node. llvm-svn: 160357	2012-07-17 09:07:37 +00:00
Evan Cheng	780f9b5f92	Implement r160312 as target indepedenet dag combine. llvm-svn: 160354	2012-07-17 08:31:11 +00:00
Evan Cheng	f579beca6d	This is another case where instcombine demanded bits optimization created large immediates. Add dag combine logic to recover in case the large immediates doesn't fit in cmp immediate operand field. int foo(unsigned long l) { return (l>> 47) == 1; } we produce %shr.mask = and i64 %l, -140737488355328 %cmp = icmp eq i64 %shr.mask, 140737488355328 %conv = zext i1 %cmp to i32 ret i32 %conv which codegens to movq $0xffff800000000000,%rax andq %rdi,%rax movq $0x0000800000000000,%rcx cmpq %rcx,%rax sete %al movzbl %al,%eax ret TargetLowering::SimplifySetCC would transform (X & -256) == 256 -> (X >> 8) == 1 if the immediate fails the isLegalICmpImmediate() test. For x86, that's immediates which are not a signed 32-bit immediate. Based on a patch by Eli Friedman. PR10328 rdar://9758774 llvm-svn: 160346	2012-07-17 06:53:39 +00:00
Akira Hatanaka	046744467d	Fix function select_cc_f32 in test/CodeGen/Mips/selectcc.ll. llvm-svn: 160329	2012-07-16 23:56:51 +00:00
Evan Cheng	75315b877c	For something like uint32_t hi(uint64_t res) { uint_32t hi = res >> 32; return !hi; } llvm IR looks like this: define i32 @hi(i64 %res) nounwind uwtable ssp { entry: %lnot = icmp ult i64 %res, 4294967296 %lnot.ext = zext i1 %lnot to i32 ret i32 %lnot.ext } The optimizer has optimize away the right shift and truncate but the resulting constant is too large to fit in the 32-bit immediate field. The resulting x86 code is worse as a result: movabsq $4294967296, %rax ## imm = 0x100000000 cmpq %rax, %rdi sbbl %eax, %eax andl $1, %eax This patch teaches the x86 lowering code to handle ult against a large immediate with trailing zeros. It will issue a right shift and a truncate followed by a comparison against a shifted immediate. shrq $32, %rdi testl %edi, %edi sete %al movzbl %al, %eax It also handles a ugt comparison against a large immediate with trailing bits set. i.e. X > 0x0ffffffff -> (X >> 32) >= 1 rdar://11866926 llvm-svn: 160312	2012-07-16 19:35:43 +00:00
Nadav Rotem	839a06e9d7	Make ComputeDemandedBits return a deterministic result when computing an AssertZext value. In the added testcase the constant 55 was behind an AssertZext of type i1, and ComputeDemandedBits reported that some of the bits were both known to be one and known to be zero. Together with Michael Kuperstein <michael.m.kuperstein@intel.com> llvm-svn: 160305	2012-07-16 18:34:53 +00:00
Tom Stellard	fc3db614c0	Revert "test/CodeGen/R600: Add some basic tests v6" This reverts commit 11d3457afcda7848448dd7f11b2ede6552ffb9ea. llvm-svn: 160300	2012-07-16 18:19:43 +00:00
Alexey Samsonov	893d3d336a	Fix tests that failed on i686-win32 after r160248: 1. FileCheck-ize epilogue.ll and allow another asm instruction to restore %rsp. 2. Remove check in widen_arith-3.ll that was hitting instruction in epilogue instead of vector add. llvm-svn: 160274	2012-07-16 14:33:36 +00:00
Tom Stellard	6693fbe3eb	test/CodeGen/R600: Add some basic tests v6 llvm-svn: 160273	2012-07-16 14:17:19 +00:00
Nadav Rotem	4968e45b9f	Fix a bug in the 3-address conversion of LEA when one of the operands is an undef virtual register. The problem is that ProcessImplicitDefs removes the definition of the register and marks all uses as undef. If we lose the undef marker then we get a register which has no def, is not marked as undef. The live interval analysis does not collect information for these virtual registers and we crash in later passes. Together with Michael Kuperstein <michael.m.kuperstein@intel.com> llvm-svn: 160260	2012-07-16 10:52:25 +00:00
Alexey Samsonov	dcc1291d17	This CL changes the function prologue and epilogue emitted on X86 when stack needs realignment. It is intended to fix PR11468. Old prologue and epilogue looked like this: push %rbp mov %rsp, %rbp and $alignment, %rsp push %r14 push %r15 ... pop %r15 pop %r14 mov %rbp, %rsp pop %rbp The problem was to reference the locations of callee-saved registers in exception handling: locations of callee-saved had to be re-calculated regarding the stack alignment operation. It would take some effort to implement this in LLVM, as currently MachineLocation can only have the form "Register + Offset". Funciton prologue and epilogue are now changed to: push %rbp mov %rsp, %rbp push %14 push %15 and $alignment, %rsp ... lea -$size_of_saved_registers(%rbp), %rsp pop %r15 pop %r14 pop %rbp Reviewed by Chad Rosier. llvm-svn: 160248	2012-07-16 06:54:09 +00:00
Nadav Rotem	3050e07108	Fix a bug in the scalarization of BUILD_VECTOR. BUILD_VECTOR elements may be wider than the output element type. Make sure to trunc them if needed. Together with Michael Kuperstein <michael.m.kuperstein@intel.com> llvm-svn: 160235	2012-07-15 20:39:08 +00:00
Nadav Rotem	eec74c7279	Teach getTargetVShiftNode about TargetConstant nodes. llvm-svn: 160234	2012-07-15 20:27:43 +00:00
NAKAMURA Takumi	032dc0a06c	llvm/test/CodeGen/X86/2012-07-15-broadcastfold.ll: Rewrite expressions to fit various targets. - Make sure existence of "barrier". - Confirm reload corresponding to spill. llvm-svn: 160232	2012-07-15 14:38:35 +00:00
Nadav Rotem	ee3552f88d	Rename VBROADCASTSDrm into VBROADCASTSDYrm to match the naming convention. Allow the folding of vbroadcastRR to vbroadcastRM, where the memory operand is a spill slot. PR12782. Together with Michael Kuperstein <michael.m.kuperstein@intel.com> llvm-svn: 160230	2012-07-15 12:26:30 +00:00
Nadav Rotem	9466e81df6	AVX: Fix a bug in getTargetVShiftNode. The shift amount has to be a 128bit vector with the same element type as the input vector. This is needed because of the patterns we have for the VP[SLL/SRA/SRL][W/D/Q] instructions. llvm-svn: 160222	2012-07-14 22:26:05 +00:00
Nadav Rotem	018921002e	Add a dagcombine optimization to convert concat_vectors of undefs into a single undef. The unoptimized concat_vectors isd prevented the canonicalization of the vector_shuffle node. llvm-svn: 160221	2012-07-14 21:30:27 +00:00
Joel Jones	43cb87839c	This is one of the first steps at moving to replace target-dependent intrinsics with target-indepdent intrinsics. The first instruction(s) to be handled are the vector versions of count leading zeros (ctlz). The changes here are to clang so that it generates a target independent vector ctlz when it sees an ARM dependent vector ctlz. The changes in llvm are to match the target independent vector ctlz and in VMCore/AutoUpgrade.cpp to update any existing bc files containing ARM dependent vector ctlzs with target-independent ctlzs. There are also changes to an existing test case in llvm for ARM vector count instructions and a new test for the bitcode upgrade. <rdar://problem/11831778> There is deliberately no test for the change to clang, as so far as I know, no consensus has been reached regarding how to test neon instructions in clang; q.v. <rdar://problem/8762292> llvm-svn: 160200	2012-07-13 23:25:25 +00:00
Duncan Sands	a9c373e49d	Restrict this to x86, hopefully fixing ARM buildbots. llvm-svn: 160163	2012-07-13 07:02:00 +00:00
Benjamin Kramer	4d0916788d	Give the rdrand instructions a SideEffect flag and a chain so MachineCSE and MachineLICM don't touch it. I already had the necessary things in place for IR-level passes but missed the machine passes. llvm-svn: 160137	2012-07-12 18:14:57 +00:00
Nadav Rotem	fdce33a495	The LIT tests below do not specify the exact cpu model and fail on AVX2 machines, because we select different instructions such as vbroadcast, new shuffles, etc. Patch by Michael Liao. llvm-svn: 160129	2012-07-12 13:45:15 +00:00
NAKAMURA Takumi	f415fe70f3	llvm/test/CodeGen/X86/rdrand.ll: Relax expression corresponding to Win64 CC. llvm-svn: 160124	2012-07-12 10:22:57 +00:00
Benjamin Kramer	cbac2f3bc9	Use %s instead of the explicit name, the latter doesn't work in out-of-tree builds. llvm-svn: 160120	2012-07-12 09:36:29 +00:00
Benjamin Kramer	0ab2794eda	Add intrinsics for Ivy Bridge's rdrand instruction. The rdrand/cmov sequence is the same that is emitted by both GCC and ICC. Fixes PR13284. llvm-svn: 160117	2012-07-12 09:31:43 +00:00
Duncan Sands	671cc2575d	The result type of EXTRACT_VECTOR_ELT doesn't have to match the element type of the input vector, it can be bigger (this is helpful for powerpc where <2 x i16> is a legal vector type but i16 isn't a legal type, IIRC). However this wasn't being taken into account by ExpandRes_EXTRACT_VECTOR_ELT, causing PR13220. Lightly tweaked version of a patch by Michael Liao. llvm-svn: 160116	2012-07-12 09:01:35 +00:00
Craig Topper	f7755df776	Update GATHER instructions to support 2 read-write operands. Patch from myself and Manman Ren. llvm-svn: 160110	2012-07-12 06:52:41 +00:00
Manman Ren	34cb93e192	ARM: Fix optimizeCompare to correctly check safe condition. It is safe if CPSR is killed or re-defined. When we are done with the basic block, check whether CPSR is live-out. Do not optimize away cmp if CPSR is live-out. llvm-svn: 160090	2012-07-11 22:51:44 +00:00
Akira Hatanaka	20dced4dbb	Test case for r160036. llvm-svn: 160067	2012-07-11 19:50:46 +00:00
Manman Ren	1553ce0e81	X86: Update to peephole optimization to move Movr0 before (Sub, Cmp) pair. When Movr0 is between sub and cmp, we move Movr0 before sub if it enables removal of Cmp. llvm-svn: 160066	2012-07-11 19:35:12 +00:00
Akira Hatanaka	24cf4e36e5	Implement MipsTargetLowering::LowerSELECT_CC to custom lower SELECT_CC. llvm-svn: 160064	2012-07-11 19:32:27 +00:00
Benjamin Kramer	3aab6a86a2	PR13326: Fix a subtle edge case in the udiv -> magic multiply generator. This caused 6 of 65k possible 8 bit udivs to be wrong. llvm-svn: 160058	2012-07-11 18:31:59 +00:00
Nadav Rotem	d2bdcebb14	When ext-loading and trunc-storing vectors to memory, on x86 32bit systems, allow loads/stores of 64bit values from xmm registers. llvm-svn: 160044	2012-07-11 13:27:05 +00:00
Akira Hatanaka	878ad8b28d	Lower RETURNADDR node in Mips backend. Patch by Sasa Stankovic. llvm-svn: 160031	2012-07-11 00:53:32 +00:00
Jack Carter	e8cb2fc616	Mips specific inline asm operand modifier 'L'. Low order register of a double word register operand. Operands are defined by the name of the variable they are marked with in the inline assembler code. This is a way to specify that the operand just refers to the low order register for that variable. It is the opposite of modifier 'D' which specifies the high order register. Example: main() { long long ll_input = 0x1111222233334444LL; long long ll_val = 3; int i_result = 0; __asm__ __volatile__( "or %0, %L1, %2" : "=r" (i_result) : "r" (ll_input), "r" (ll_val)); } Which results in: lui $2, %hi(_gp_disp) addiu $2, $2, %lo(_gp_disp) addiu $sp, $sp, -8 addu $2, $2, $25 sw $2, 0($sp) lui $2, 13107 ori $3, $2, 17476 <-- Low 32 bits of ll_input lui $2, 4369 ori $4, $2, 8738 <-- High 32 bits of ll_input addiu $5, $zero, 3 <-- Low 32 bits of ll_val addiu $2, $zero, 0 <-- High 32 bits of ll_val #APP or $3, $4, $5 <-- or i_result, high 32 ll_input, low 32 of ll_val #NO_APP addiu $sp, $sp, 8 jr $ra If not direction is done for the long long for 32 bit variables results in using the low 32 bits as ll_val shows. There is an existing bug if 'L' or 'D' is used for the destination register for 32 bit long longs in that the target value will be updated incorrectly for the non-specified part unless explicitly set within the inline asm code. llvm-svn: 160028	2012-07-10 22:41:20 +00:00
Chad Rosier	3ee9a4c29e	Add newline. llvm-svn: 160006	2012-07-10 17:57:00 +00:00
Chad Rosier	579b1fee6b	Add test case accidentally omitted from r160002. llvm-svn: 160004	2012-07-10 17:49:39 +00:00
Chad Rosier	bdb08ac50a	Add support for dynamic stack realignment in the presence of dynamic allocas on X86. Basically, this is a reapplication of r158087 with a few fixes. Specifically, (1) the stack pointer is restored from the base pointer before popping callee-saved registers and (2) in obscure cases (see comments in patch) we must cache the value of the original stack adjustment in the prologue and apply it in the epilogue. rdar://11496434 llvm-svn: 160002	2012-07-10 17:45:53 +00:00
Nadav Rotem	d908ddc186	Improve the loading of load-anyext vectors by allowing the codegen to load multiple scalars and insert them into a vector. Next, we shuffle the elements into the correct places, as before. Also fix a small dagcombine bug in SimplifyBinOpWithSameOpcodeHands, when the migration of bitcasts happened too late in the SelectionDAG process. llvm-svn: 159991	2012-07-10 13:25:08 +00:00
Akira Hatanaka	efff7b763b	Make register Mips::RA allocatable if not in mips16 mode. llvm-svn: 159971	2012-07-10 00:19:06 +00:00
Owen Anderson	d4b841f8f9	Teach the DAG combiner to turn sitofp/uitofp from i1 into a conditional move, since there are only two possible values. Previously, this would become an integer extension operation, followed by a real integer->float conversion. llvm-svn: 159957	2012-07-09 20:31:12 +00:00
Manman Ren	5f6fa428fa	X86: implement functions to analyze & synthesize CMOV\|SET\|Jcc getCondFromSETOpc, getCondFromCMovOpc, getSETFromCond, getCMovFromCond No functional change intended. If we want to update the condition code of CMOV\|SET\|Jcc, we first analyze the opcode to get the condition code, then update the condition code, finally synthesize the new opcode form the new condition code. llvm-svn: 159955	2012-07-09 18:57:12 +00:00
Manman Ren	bb36074047	X86: Fix optimizeCompare to correctly check safe condition. It is safe if EFLAGS is killed or re-defined. When we are done with the basic block, check whether EFLAGS is live-out. Do not optimize away cmp if EFLAGS is live-out. llvm-svn: 159888	2012-07-07 03:34:46 +00:00
Manman Ren	c965673707	X86: peephole optimization to remove cmp instruction For each Cmp, we check whether there is an earlier Sub which make Cmp redundant. We handle the case where SUB operates on the same source operands as Cmp, including the case where the two source operands are swapped. llvm-svn: 159838	2012-07-06 17:36:20 +00:00
Chad Rosier	88d53eae56	[fast-isel] Tell fast-isel to do nothing with the new donothing intrinsic. llvm-svn: 159837	2012-07-06 17:33:39 +00:00
Duncan Sands	c65aa3f6ae	Attempt to fix windows buildbots. Patch by James Benton. llvm-svn: 159826	2012-07-06 14:43:16 +00:00
NAKAMURA Takumi	4f934676fb	test/CodeGen/X86/sext-setcc-self.ll: Mark it as XFAIL: cygwin,mingw32,win32. Investigating. llvm-svn: 159820	2012-07-06 12:12:39 +00:00
NAKAMURA Takumi	0246724cd6	Revert r159804, "[arm-fast-isel] Add support for vararg function calls." It broke LLVM :: CodeGen/Thumb2/large-call.ll on several hosts. llvm-svn: 159817	2012-07-06 11:12:44 +00:00
Jush Lu	5e6e6264f4	[arm-fast-isel] Add support for vararg function calls. llvm-svn: 159804	2012-07-06 03:02:37 +00:00
Jack Carter	b2af512cef	Mips specific inline asm operand modifier D. Print the second half of a double word operand. The include list was cleaned up a bit as well. Also the test case was modified to test for both big and little patterns. llvm-svn: 159787	2012-07-05 23:58:21 +00:00
Akira Hatanaka	bbf374c4c6	test case for r159770. llvm-svn: 159771	2012-07-05 19:29:31 +00:00
Duncan Sands	0552a2cad2	Use the right kind of booleans: we were emitting 0/1 booleans, instead of 0/-1 booleans. Patch by James Benton. llvm-svn: 159739	2012-07-05 09:32:46 +00:00
Jakob Stoklund Olesen	2dee812445	Ensure CopyToReg nodes are always glued to the call instruction. The CopyToReg nodes that set up the argument registers before a call must be glued to the call instruction. Otherwise, the scheduler may emit the physreg copies long before the call, causing long live ranges for the fixed registers. Besides disabling good register allocation, that can also expose problems when EmitInstrWithCustomInserter() splits a basic block during the live range of a physreg. llvm-svn: 159721	2012-07-04 19:28:31 +00:00
Rafael Espindola	1a7cf13215	Add a testcase for pr13209. It is not a great test, but it still fails if 159509 and 159479 are reverted. It would be really nice to be able to run just the coalescer :-( llvm-svn: 159715	2012-07-04 16:06:00 +00:00
Jakob Stoklund Olesen	49e4d4b3ef	Add early if-conversion support to X86. Implement the TII hooks needed by EarlyIfConversion to create cmov instructions and estimate their latency. Early if-conversion is still not enabled by default. llvm-svn: 159695	2012-07-04 00:09:58 +00:00
NAKAMURA Takumi	2338556320	test/CodeGen/SPARC/private.ll: Fixup. Forgot to prune old RUN lines. llvm-svn: 159643	2012-07-03 04:29:20 +00:00
NAKAMURA Takumi	c2a5bd6822	test/CodeGen/SPARC/private.ll: FileCheck-ize. llvm-svn: 159642	2012-07-03 04:21:57 +00:00
NAKAMURA Takumi	dff1a78321	test/CodeGen/X86/sincos.ll: FileCheck-ize. llvm-svn: 159639	2012-07-03 03:59:22 +00:00
NAKAMURA Takumi	10dc235746	test/CodeGen/X86/fabs.ll: FileCheck-ize. llvm-svn: 159638	2012-07-03 03:59:15 +00:00
NAKAMURA Takumi	ff680b1db6	test/CodeGen/X86/2007-09-05-InvalidAsm.ll: FileCheck-ize. llvm-svn: 159637	2012-07-03 03:59:08 +00:00
NAKAMURA Takumi	e5e19e4f7b	test/CodeGen/X86/2004-03-30-Select-Max.ll: FileCheck-ize. llvm-svn: 159636	2012-07-03 03:58:59 +00:00
Jack Carter	b353094f27	mips32 long long register inline asm constraint support. inlineasm-cnstrnt-bad-r-1.ll is NOT supposed to fail, so it was removed. This resulted in the removal of a negative test (inlineasm-cnstrnt-bad-r-1.ll) llvm-svn: 159625	2012-07-02 23:35:23 +00:00
Eric Christopher	dfc3e68c40	Revert " mips32 long long register inline asm constraint support." as it appears to be breaking the bots. This reverts commit 1b055ce320fa13f6f1ac81670d11b45e01f79876. llvm-svn: 159619	2012-07-02 23:22:25 +00:00
Jack Carter	939236c2eb	deleted test/CodeGen/Mips/inlineasm-cnstrnt-bad-r-1.ll llvm-svn: 159617	2012-07-02 23:21:22 +00:00
Jack Carter	5c1a01a625	mips32 long long register inline asm constraint support. inlineasm-cnstrnt-bad-r-1.ll is NOT supposed to fail, so it was removed. This resulted in the removal of a negative test (inlineasm-cnstrnt-bad-r-1.ll) llvm-svn: 159610	2012-07-02 22:39:45 +00:00
Bob Wilson	cac3b90633	Extend TargetPassConfig to allow running only a subset of the normal passes. This is still a work in progress but I believe it is currently good enough to fix PR13122 "Need unit test driver for codegen IR passes". For example, you can run llc with -stop-after=loop-reduce to have it dump out the IR after running LSR. Serializing machine-level IR is not yet supported but we have some patches in progress for that. The plan is to serialize the IR to a YAML file, containing separate sections for the LLVM IR, machine-level IR, and whatever other info is needed. Chad suggested that we stash the stop-after pass in the YAML file and use that instead of the start-after option to figure out where to restart the compilation. I think that's a great idea, but since it's not implemented yet I put the -start-after option into this patch for testing purposes. llvm-svn: 159570	2012-07-02 19:48:45 +00:00
Chandler Carruth	ff123d5c63	Fix the remaining TCL-style quotes found in the testsuite. This is another mechanical change accomplished though the power of terrible Perl scripts. I have manually switched some "s to 's to make escaping simpler. While I started this to fix tests that aren't run in all configurations, the massive number of tests is due to a really frustrating fragility of our testing infrastructure: things like 'grep -v', 'not grep', and 'expected failures' can mask broken tests all too easily. Essentially, I'm deeply disturbed that I can change the testsuite so radically without causing any change in results for most platforms. =/ llvm-svn: 159547	2012-07-02 19:09:46 +00:00
Chandler Carruth	5da53436d5	Convert the uses of '\|&' to use '2>&1 \|' instead, which works on old versions of Bash. In addition, I can back out the change to the lit built-in shell test runner to support this. This should fix the majority of fallout on Darwin, but I suspect there will be a few straggling issues. llvm-svn: 159544	2012-07-02 18:37:59 +00:00
Bob Wilson	2297221028	Do not attempt to use ROR for Thumb1. Patch by Matt Fischer! llvm-svn: 159538	2012-07-02 17:22:47 +00:00
Chandler Carruth	872ac7cfad	Fix the TCL-style quoting in one random test that somehow slipped through my perl nets. With this, the test suite passes even if I force it to run with the built-in shell test logic, except for a test which REQUIREs shell. llvm-svn: 159529	2012-07-02 13:29:47 +00:00
Chandler Carruth	a5a29f970e	Convert all tests using TCL-style quoting to use shell-style quoting. This was done through the aid of a terrible Perl creation. I will not paste any of the horrors here. Suffice to say, it require multiple staged rounds of replacements, state carried between, and a few nested-construct-parsing hacks that I'm not proud of. It happens, by luck, to be able to deal with all the TCL-quoting patterns in evidence in the LLVM test suite. If anyone is maintaining large out-of-tree test trees, feel free to poke me and I'll send you the steps I used to convert things, as well as answer any painful questions etc. IRC works best for this type of thing I find. Once converted, switch the LLVM lit config to use ShTests the same as Clang. In addition to being able to delete large amounts of Python code from 'lit', this will also simplify the entire test suite and some of lit's architecture. Finally, the test suite runs 33% faster on Linux now. ;] For my 16-hardware-thread (2x 4-core xeon e5520): 36s -> 24s llvm-svn: 159525	2012-07-02 12:47:22 +00:00
Chandler Carruth	ae00a80869	Rewrite three tests that had truly egregious abuses of 'grep' in them to use FileCheck. Aside from removing a dependence on TCL-style quoting, this also makes the tests ... significantly more robust. =] It would be really, really great of the maintainer(s) of the CellSPU backend went through and systematically rewrite these tests to use FileCheck. There are a lot more that have nearly this bad of abuses. Another step along the path to a TclTest-free testsuite. llvm-svn: 159523	2012-07-02 12:20:14 +00:00
Rafael Espindola	a77d31d7fd	Now that RegistersDefinedFromSameValue handles one instruction being an implicit_def, the other instruction can be anything, including instructions that define multiple values. Be careful about that and don't assume what operand 0 is. Fixes pr13249. llvm-svn: 159509	2012-07-01 17:08:01 +00:00
Elena Demikhovsky	9af899fa88	Optimization of shuffle node that can fit to the register form of VBROADCAST instruction on AVX2. llvm-svn: 159504	2012-07-01 06:12:26 +00:00
Jakob Stoklund Olesen	3e3cdecf98	Clear kill flags in InstrEmitter::EmitSubregNode(). When a local virtual register is made global, make sure to clear any existing kill flags. llvm-svn: 159461	2012-06-29 21:00:03 +00:00
Rafael Espindola	efdfb1e6b2	In the initial exec mode we always do a load to find the address of a variable. Before this patch in pic 32 bit code we would add the global base register and not load from that address. This is a really old bug, but before the introduction of the tls attributes we would never select initial exec for pic code. llvm-svn: 159409	2012-06-29 04:22:35 +00:00
Manman Ren	98a5bf24a9	X86: add more GATHER intrinsics in LLVM Corrected type for index of llvm.x86.avx2.gather.d.pd.256 from 256-bit to 128-bit. Corrected types for src\|dst\|mask of llvm.x86.avx2.gather.q.ps.256 from 256-bit to 128-bit. Support the following intrinsics: llvm.x86.avx2.gather.d.q, llvm.x86.avx2.gather.q.q llvm.x86.avx2.gather.d.q.256, llvm.x86.avx2.gather.q.q.256 llvm.x86.avx2.gather.d.d, llvm.x86.avx2.gather.q.d llvm.x86.avx2.gather.d.d.256, llvm.x86.avx2.gather.q.d.256 llvm-svn: 159402	2012-06-29 00:54:20 +00:00
Nuno Lopes	ec9653b363	add a new @llvm.donothing intrinsic that, well, does nothing, and teach CodeGen to ignore calls to it llvm-svn: 159383	2012-06-28 22:30:12 +00:00
Jack Carter	6c0bc0b378	The Mips specific inline asm operand modifier 'z' has the following description in the gnu sources: Print $0 if operand is zero otherwise print the op normally. llvm-svn: 159324	2012-06-28 01:33:40 +00:00
Akira Hatanaka	ad31cd9a01	Test case for r159240. llvm-svn: 159242	2012-06-27 00:40:34 +00:00
Rafael Espindola	e0eaa043eb	Fix llc's -print-before=pass and -print-after=pass. llvm-svn: 159227	2012-06-26 21:33:36 +00:00
Manman Ren	a09820414a	X86: add GATHER intrinsics (AVX2) in LLVM Support the following intrinsics: llvm.x86.avx2.gather.d.pd, llvm.x86.avx2.gather.q.pd llvm.x86.avx2.gather.d.pd.256, llvm.x86.avx2.gather.q.pd.256 llvm.x86.avx2.gather.d.ps, llvm.x86.avx2.gather.q.ps llvm.x86.avx2.gather.d.ps.256, llvm.x86.avx2.gather.q.ps.256 Modified Disassembler to handle VSIB addressing mode. llvm-svn: 159221	2012-06-26 19:47:59 +00:00
Jack Carter	5e69cffed5	There are a number of generic inline asm operand modifiers that up to r158925 were handled as processor specific. Making them generic and putting tests for these modifiers in the CodeGen/Generic directory caused a number of targets to fail. This commit addresses that problem by having the targets call the generic routine for generic modifiers that they don't currently have explicit code for. For now only generic print operands 'c' and 'n' are supported.vi Affected files: test/CodeGen/Generic/asm-large-immediate.ll lib/Target/PowerPC/PPCAsmPrinter.cpp lib/Target/NVPTX/NVPTXAsmPrinter.cpp lib/Target/ARM/ARMAsmPrinter.cpp lib/Target/XCore/XCoreAsmPrinter.cpp lib/Target/X86/X86AsmPrinter.cpp lib/Target/Hexagon/HexagonAsmPrinter.cpp lib/Target/CellSPU/SPUAsmPrinter.cpp lib/Target/Sparc/SparcAsmPrinter.cpp lib/Target/MBlaze/MBlazeAsmPrinter.cpp lib/Target/Mips/MipsAsmPrinter.cpp MSP430 isn't represented because it did not even run with the long existing 'c' modifier and it was not apparent what needs to be done to get it inline asm ready. Contributer: Jack Carter llvm-svn: 159203	2012-06-26 13:49:27 +00:00
Elena Demikhovsky	26088d2e24	Shuffle optimization for AVX/AVX2. The current patch optimizes frequently used shuffle patterns and gives these instruction sequence reduction. Before: vshufps $-35, %xmm1, %xmm0, %xmm2 ## xmm2 = xmm0[1,3],xmm1[1,3] vpermilps $-40, %xmm2, %xmm2 ## xmm2 = xmm2[0,2,1,3] vextractf128 $1, %ymm1, %xmm1 vextractf128 $1, %ymm0, %xmm0 vshufps $-35, %xmm1, %xmm0, %xmm0 ## xmm0 = xmm0[1,3],xmm1[1,3] vpermilps $-40, %xmm0, %xmm0 ## xmm0 = xmm0[0,2,1,3] vinsertf128 $1, %xmm0, %ymm2, %ymm0 After: vshufps $13, %ymm0, %ymm1, %ymm1 ## ymm1 = ymm1[1,3],ymm0[0,0],ymm1[5,7],ymm0[4,4] vshufps $13, %ymm0, %ymm0, %ymm0 ## ymm0 = ymm0[1,3,0,0,5,7,4,4] vunpcklps %ymm1, %ymm0, %ymm0 ## ymm0 = ymm0[0],ymm1[0],ymm0[1],ymm1[1],ymm0[4],ymm1[4],ymm0[5],ymm1[5] llvm-svn: 159188	2012-06-26 08:04:10 +00:00
Andrew Trick	fb2ba3e1cb	Enable the new LoopInfo algorithm by default. The primary advantage is that loop optimizations will be applied in a stable order. This helps debugging and unit test creation. It is also a better overall implementation without pathologically bad performance on deep functions. On large functions (llvm-stress --size=200000 \| opt -loops) Before: 0.1263s After: 0.0225s On deep functions (after tweaking llvm-stress, thanks Nadav): Before: 0.2281s After: 0.0227s See r158790 for more comments. The loop tree is now consistently generated in forward order, but loop passes are applied in reverse order over the program. If we have a loop optimization that prefers forward order, that can easily be achieved by adding a different type of LoopPassManager. llvm-svn: 159183	2012-06-26 04:11:38 +00:00
Eli Friedman	bbcd09cc00	Make some ugly hacks for inline asm operands which name a specific register a bit more thorough. PR13196. llvm-svn: 159176	2012-06-25 23:42:33 +00:00
Manman Ren	606953fbe7	ARM: update peephole optimization. More condition codes are included when deciding whether to remove cmp after a sub instruction. Specifically, we extend from GE\|LT\|GT\|LE to GE\|LT\|GT\|LE\|HS\|LS\|HI\|LO\|EQ\|NE. If we have "sub a, b; cmp b, a; movhs", we should be able to replace with "sub a, b; movls". rdar: 11725965 llvm-svn: 159166	2012-06-25 21:49:38 +00:00
Jakob Stoklund Olesen	a57fc12ec9	Enforce stricter liveness rules for PHIs. Verify that all paths from the entry block to a virtual register read pass through a def. Enable this check even when MRI->isSSA() is false. Verify that the live range of a virtual register is live out of all predecessor blocks, even for PHI-values. This requires that PHIElimination sometimes inserts IMPLICIT_DEF instruction in predecessor blocks. llvm-svn: 159150	2012-06-25 18:18:27 +00:00
Jakob Stoklund Olesen	eb49566447	Run ProcessImplicitDefs on SSA form where it can be much simpler. Implicitly defined virtual registers can simply have the <undef> bit set on all uses, and copies can be turned into implicit defs recursively. Physical registers are a bit trickier. We handle the common case where a physreg def is used by a nearby instruction in the same basic block. For more complicated cases, just leave the IMPLICIT_DEF instruction in. llvm-svn: 159149	2012-06-25 18:12:18 +00:00
Jakob Stoklund Olesen	2e22e6a361	%RCX is not a function live-out in eh.return functions. The function live-out registers must be live at all function returns, and %RCX is only used by eh.return. When a function also has a normal return, only %RAX holds a return value. This fixes PR13188. llvm-svn: 159116	2012-06-24 15:53:01 +00:00
Pete Cooper	fe212e762f	DAG legalisation can now handle illegal fma vector types by scalarisation llvm-svn: 159092	2012-06-24 00:05:44 +00:00
Hans Wennborg	cbe34b4cc9	Extend the IL for selecting TLS models (PR9788) This allows the user/front-end to specify a model that is better than what LLVM would choose by default. For example, a variable might be declared as @x = thread_local(initialexec) global i32 42 if it will not be used in a shared library that is dlopen'ed. If the specified model isn't supported by the target, or if LLVM can make a better choice, a different model may be used. llvm-svn: 159077	2012-06-23 11:37:03 +00:00
Rafael Espindola	a3088f09b3	Handle aliases to tls variables in all architectures, not just x86. llvm-svn: 159058	2012-06-23 00:30:03 +00:00
Evan Cheng	68c2f9a9a7	(sub X, imm) gets canonicalized to (add X, -imm) There are patterns to handle immediates when they fit in the immediate field. e.g. %sub = add i32 %x, -123 => sub r0, r0, #123 Add patterns to catch immediates that do not fit but should be materialized with a single movw instruction rather than movw + movt pair. e.g. %sub = add i32 %x, -65535 => movw r1, #65535 sub r0, r0, r1 rdar://11726136 llvm-svn: 159057	2012-06-23 00:29:06 +00:00
Hal Finkel	460e94d842	Add support for the PPC isel instruction. The isel (integer select) instruction is supported on the 440 and A2 embedded cores and on the POWER7. llvm-svn: 159045	2012-06-22 23:10:08 +00:00
Chad Rosier	1ce3805b23	FileCheckize tests. llvm-svn: 159044	2012-06-22 23:04:02 +00:00
Lang Hames	c98ebda325	Rename fp-op fusion option (yet again) for compatibility with GCC option. llvm-svn: 159042	2012-06-22 22:31:00 +00:00
Evan Cheng	f5bd6c6510	EmitZerofill should take a 64-bit size or else it's chopping off large zero-filled global. rdar://11729134 llvm-svn: 159023	2012-06-22 20:14:46 +00:00
NAKAMURA Takumi	c384b95939	test/CodeGen/Generic/asm-large-immediate.ll: Mark it as XFAIL: powerpc, possibly due to r158939. llvm-svn: 158994	2012-06-22 13:41:00 +00:00
Jakob Stoklund Olesen	321d41a871	Functions calling __builtin_eh_return must have a frame pointer. The code in X86TargetLowering::LowerEH_RETURN() assumes that a frame pointer exists, but the frame pointer was forced by the presence of llvm.eh.unwind.init which isn't guaranteed. If llvm.eh.unwind.init is actually required in functions calling eh.return (is it?), we should diagnose that instead of emitting bad machine code. This should fix the dragonegg-x86_64-linux-gcc-4.6-test bot. llvm-svn: 158961	2012-06-22 03:04:27 +00:00
Andrew Trick	3ccb1b8cf9	ARM scheduling fix: compute predicated implicit use properly. Minor drive by fix to cleanup latency computation. Calling getOperandLatency with a deliberately incorrect operand index does not give you the latency you want. llvm-svn: 158959	2012-06-22 02:50:31 +00:00
Lang Hames	b8650f106a	Rename -allow-excess-fp-precision flag to -fuse-fp-ops, and switch from a boolean flag to an enum: { Fast, Standard, Strict } (default = Standard). This option controls the creation by optimizations of fused FP ops that store intermediate results in higher precision than IEEE allows (E.g. FMAs). The behavior of this option is intended to match the behaviour specified by a soon-to-be-introduced frontend flag: '-ffuse-fp-ops'. Fast mode - allows formation of fused FP ops whenever they're profitable. Standard mode - allow fusion only for 'blessed' FP ops. At present the only blessed op is the fmuladd intrinsic. In the future more blessed ops may be added. Strict mode - allow fusion only if/when it can be proven that the excess precision won't effect the result. Note: This option only controls formation of fused ops by the optimizers. Fused operations that are explicitly requested (e.g. FMA via the llvm.fma.* intrinsic) will always be honored, regardless of the value of this option. Internally TargetOptions::AllowExcessFPPrecision has been replaced by TargetOptions::AllowFPOpFusion. llvm-svn: 158956	2012-06-22 01:09:09 +00:00
Jack Carter	c457f62033	The inline asm operand modifier 'n' is suppose to be generic across architectures. It has the following description in the gnu sources: Negate the immediate constant Several Architectures such as x86 have local implementations of operand modifier 'n' which go beyond the above description slightly. This won't affect them. Affected files: lib/CodeGen/AsmPrinter/AsmPrinterInlineAsm.cpp Added 'n' to the switch cases. test/CodeGen/Generic/asm-large-immediate.ll Generic compiled test (x86 for me) test/CodeGen/Mips/asm-large-immediate.ll Mips compiled version of the generic one Contributer: Jack Carter llvm-svn: 158939	2012-06-21 21:37:54 +00:00

... 5 6 7 8 9 ...

6687 Commits