llvm-project

Commit Graph

Author	SHA1	Message	Date
Chad Rosier	282edd7caa	[ms-inline-asm] Add support for memory references that have non-immediate displacements. rdar://12974533 llvm-svn: 175083	2013-02-13 21:33:44 +00:00
Reed Kotler	f662cff689	For Mips 16, add the optimization where the 16 bit form of addiu sp can be used if the offset fits in 11 bits. This makes use of the fact that the abi requires sp to be 8 byte aligned so the actual offset can fit in 8 bits. It will be shifted left and sign extended before being actually used. The assembler or direct object emitter will shift right the 11 bit signed field by 3 bits. We don't need to deal with that here. llvm-svn: 175073	2013-02-13 20:28:27 +00:00
David Peixotto	6eecb28d3a	PR14992 - Tablegen incorrectly converts ARM tLDMIA_UPD pseudo to tLDMIA Fixed bug in tablegen conversion when source pseudo instruction has a different number of arguments than the destination instruction. llvm-svn: 175066	2013-02-13 19:21:47 +00:00
Benjamin Kramer	8e2637e2b0	X86: Disable generation of rep;movsl when %esi is used as a base pointer. This happens when there is both stack realignment and a dynamic alloca in the function. If we overwrite %esi (rep;movsl uses fixed registers) we'll lose the base pointer and the next register spill will write into oblivion. Fixes PR15249 and unbreaks firefox on i386/freebsd. Mozilla uses dynamic allocas and freebsd a 4 byte stack alignment. llvm-svn: 175057	2013-02-13 13:40:35 +00:00
Reed Kotler	9cb8e7b9f5	Make jumptables work for -static llvm-svn: 175044	2013-02-13 08:32:14 +00:00
Elena Demikhovsky	9e0df7cb01	Prevent insertion of "vzeroupper" before call that preserves YMM registers, since a caller uses preserved registers across the call. llvm-svn: 175043	2013-02-13 08:02:04 +00:00
Eric Christopher	389ee71b0a	Check i1 as well as i8 variables for 8 bit registers for x86 inline assembly. llvm-svn: 175036	2013-02-13 06:01:05 +00:00
Eric Christopher	2398a9a175	Finish obviously broken thought. llvm-svn: 175035	2013-02-13 06:01:00 +00:00
Paul Redmond	7e7e3de43d	Fix the lit test added in r174972 Patch by: Kevin Schoedel llvm-svn: 174974	2013-02-12 16:07:27 +00:00
Jyotsna Verma	39f7a2b7a0	Hexagon: Add support to generate predicated absolute addressing mode instructions. llvm-svn: 174973	2013-02-12 16:06:23 +00:00
Paul Redmond	288604ed0c	PR14562 - Truncation of left shift became undef DAGCombiner::ReduceLoadWidth was converting (trunc i32 (shl i64 v, 32)) into (shl i32 v, 32) into undef. To prevent this, check the shift count against the final result size. Patch by: Kevin Schoedel Reviewed by: Nadav Rotem llvm-svn: 174972	2013-02-12 15:21:21 +00:00
Justin Holewinski	be8dc6499a	[NVPTX] Disable vector registers Vectors were being manually scalarized by the backend. Instead, let the target-independent code do all of the work. The manual scalarization was from a time before good target-independent support for scalarization in LLVM. However, this forces us to specially-handle vector loads and stores, which we can turn into PTX instructions that produce/consume multiple operands. llvm-svn: 174968	2013-02-12 14:18:49 +00:00
Arnold Schwaighofer	1f3d3ca769	ARM NEON: Handle v16i8 and v8i16 reverse shuffles Lower reverse shuffles to a vrev64 and a vext instruction instead of the default legalization of storing and loading to the stack. This is important because we generate reverse shuffles in the loop vectorizer when we reverse store to an array. uint8_t Arr[N]; for (i = 0; i < N; ++i) Arr[N - i - 1] = ... radar://13171760 llvm-svn: 174929	2013-02-12 01:58:32 +00:00
Krzysztof Parzyszek	9a278f108a	Extend Hexagon hardware loop generation to handle various additional cases: - variety of compare instructions, - loops with no preheader, - arbitrary lower and upper bounds. llvm-svn: 174904	2013-02-11 21:37:55 +00:00
Justin Holewinski	4f12f53353	[NVPTX] Remove NoCapture from address space conversion intrinsics. NoCapture is not valid in this case, and was causing incorrect optimizations. llvm-svn: 174896	2013-02-11 18:56:35 +00:00
Reed Kotler	b9bf8dca47	Add the 16 bit version of addiu. To the assembler, the 16 and 32 bit are the same so we put in the comment field an indicator when we think we are emitting the 16 bit version. For the direct object emitter, the difference is important as well as for other passes which need an accurate count of program size. There will be other similar putbacks to this for various instructions. llvm-svn: 174747	2013-02-08 21:42:56 +00:00
Hal Finkel	2581905f81	DAGCombiner: Constant folding around pre-increment loads/stores Previously, even when a pre-increment load or store was generated, we often needed to keep a copy of the original base register for use with other offsets. If all of these offsets are constants (including the offset which was combined into the addressing mode), then this is clearly unnecessary. This change adjusts these other offsets to use the new incremented address. llvm-svn: 174746	2013-02-08 21:35:47 +00:00
Bob Wilson	67bbf3aa0c	Revert 172027 and 174336. Remove diagnostics about over-aligned stack objects. Aside from the question of whether we report a warning or an error when we can't satisfy a requested stack object alignment, the current implementation of this is not good. We're not providing any source location in the diagnostics and the current warning is not connected to any warning group so you can't control it. We could improve the source location somewhat, but we can do a much better job if this check is implemented in the front-end, so let's do that instead. <rdar://problem/13127907> llvm-svn: 174741	2013-02-08 20:35:15 +00:00
Reed Kotler	66165c8f96	When Mips16 frames grow large, the immediate field may exceed the maximum allowed size for the instruction. This code uses RegScavenger to fix this. We sometimes need 2 registers for Mips16 so we must handle things differently than how register scavenger is normally used. llvm-svn: 174696	2013-02-08 03:57:41 +00:00
Tom Stellard	e06163a9a6	R600: Add support for SET_DX10 instructions These instructions compare two floating point values and return an integer true (-1) or false (0) value. When compiling code generated by the Mesa GLSL frontend, the SET_DX10 instructions save us four instructions for most branch decisions that use floating-point comparisons. llvm-svn: 174609	2013-02-07 14:02:35 +00:00
Tom Stellard	6d867e8d4d	R600: Add tests for unsupported condition codes. All of the le and lt variants are unsupported. llvm-svn: 174608	2013-02-07 14:02:33 +00:00
Tom Stellard	b40ada9b85	R600: Fix assembly name for SETGT_INT llvm-svn: 174607	2013-02-07 14:02:27 +00:00
Reed Kotler	4a230ffa96	Make sure we call externals from libraries properly when -static. For example, when we are doing mips16 hard float or soft float. llvm-svn: 174583	2013-02-07 04:34:51 +00:00
Reed Kotler	ec60f7d335	Enable jumps when in -static mode. llvm-svn: 174580	2013-02-07 03:49:51 +00:00
Eli Bendersky	ef4558abd3	This is a follow-up on r174446, now taking Atom processors into account. Atoms use LEA for updating SP in prologs/epilogs, and the exact LEA opcode depends on the data model. Also reapplying the test case which was added and then reverted (because of Atom failures), this time specifying explicitly the CPU in addition to the triple. The test case now checks all variations (data mode, cpu Atom vs. Core). llvm-svn: 174542	2013-02-06 20:43:57 +00:00
Tim Northover	228d9d3aa2	Implement external weak (ELF) symbols on AArch64 Weakly defined symbols should evaluate to 0 if they're undefined at link-time. This is impossible to do with the usual address generation patterns, so we should use a literal pool entry to materlialise the address. llvm-svn: 174518	2013-02-06 16:43:33 +00:00
Eli Bendersky	c4446856e3	Remove this test in the meantime, since it won't pass on Atom. Atom uses lea to move the stack pointer in prologs/epilogs. I will fix the test and add it back later. llvm-svn: 174484	2013-02-06 03:15:00 +00:00
Manman Ren	d2c38d684a	Attempt to recover gdb bot after r174445. Failure: undefined symbol 'Lline_table_start0'. Root-cause: we use a symbol subtraction to calculate at_stmt_list, but the line table entries are not dumped in the assembly. Fix: use zero instead of a symbol subtraction for Compile Unit 0. llvm-svn: 174479	2013-02-06 00:59:41 +00:00
Eli Bendersky	59a6fb0381	Test for r174446 llvm-svn: 174464	2013-02-05 23:31:48 +00:00
Manman Ren	4e042a6be6	Dwarf: support for LTO where a single object file can have multiple line tables We generate one line table for each compilation unit in the object file. Reviewed by Eric and Kevin. rdar://problem/13067005 llvm-svn: 174445	2013-02-05 21:52:47 +00:00
Akira Hatanaka	dec25266d7	[mips] Do not use function CC_MipsN_VarArg unless the function being analyzed is a vararg function. The original code was examining flag OutputArg::IsFixed to determine whether CC_MipsN_VarArg or CC_MipsN should be called. This is not correct, since this flag is often set to false when the function being analyzed is a non-variadic function. llvm-svn: 174442	2013-02-05 21:18:11 +00:00
Owen Anderson	de89ecf1fc	Reapply r174343, with a fix for a scary DAG combine bug where it failed to differentiate between the alignment of the base point of a load, and the overall alignment of the load. This caused infinite loops in DAG combine with the original application of this patch. ORIGINAL COMMIT LOG: When the target-independent DAGCombiner inferred a higher alignment for a load, it would replace the load with one with the higher alignment. However, it did not place the new load in the worklist, which prevented later DAG combines in the same phase (for example, target-specific combines) from ever seeing it. This patch corrects that oversight, and updates some tests whose output changed due to slightly different DAGCombine outputs. llvm-svn: 174431	2013-02-05 19:24:39 +00:00
Jyotsna Verma	6031625b03	Hexagon: Use TFR_cond with cmpb.[eq,gt,gtu] to handle zext( set[ne,eq,gt,ugt] (...) ) type of dag patterns. llvm-svn: 174429	2013-02-05 19:20:45 +00:00
Jyotsna Verma	d53b25b47e	Hexagon: Add testcase for post-increment store instructions. llvm-svn: 174419	2013-02-05 18:23:51 +00:00
Chad Rosier	92a54f6d4c	[SjLj Prepare] When demoting an invoke instructions to the stack, if the normal edge is critical, then split it so we can insert the store. rdar://13126179 llvm-svn: 174418	2013-02-05 18:23:10 +00:00
Jyotsna Verma	50ca6dd8a7	Hexagon: Use multiclass for absolute addressing mode stores. llvm-svn: 174412	2013-02-05 18:15:34 +00:00
Jakob Stoklund Olesen	eb1084ee54	Add a test case for PR14750. This was fixed by r174402. llvm-svn: 174405	2013-02-05 18:04:15 +00:00
Tom Stellard	7d41161a2d	R600: Add tests for instruction predicates llvm-svn: 174393	2013-02-05 17:09:13 +00:00
Tom Stellard	2e5e7a5bef	R600: Emit function name in the AsmPrinter Emitting the function name allows us to check for it in the FileCheck tests so we can make sure FileCheck is checking the output of the correct function. llvm-svn: 174392	2013-02-05 17:09:11 +00:00
Jyotsna Verma	6f635b5488	Hexagon: Add V4 compare instructions. Enable relationship mapping for the existing instructions. llvm-svn: 174389	2013-02-05 16:42:24 +00:00
NAKAMURA Takumi	3753b28cd2	Revert r174343, "When the target-independent DAGCombiner inferred a higher alignment for a load," It caused hangups in compiling clang/lib/Parse/ParseDecl.cpp and clang/lib/Driver/Tools.cpp in stage2 on some hosts. llvm-svn: 174374	2013-02-05 14:44:16 +00:00
Logan Chien	4b724429b8	Link .ARM.exidx with corresponding text section. The sh_link in the ELF section header of .ARM.exidx should be filled with the section index of the corresponding text section. llvm-svn: 174372	2013-02-05 14:18:59 +00:00
Jack Carter	9c1a027fe8	This patch that sets the EmitAlias flag in td files and enables the instruction printer to print aliased instructions. Due to usage of RegisterOperands a change in common code (utils/TableGen/AsmWriterEmitter.cpp) is required to get the correct register value if it is a RegisterOperand. Contributer: Vladimir Medic llvm-svn: 174358	2013-02-05 08:32:10 +00:00
Owen Anderson	a47fdbb032	When the target-independent DAGCombiner inferred a higher alignment for a load, it would replace the load with one with the higher alignment. However, it did not place the new load in the worklist, which prevented later DAG combines in the same phase (for example, target-specific combines) from ever seeing it. This patch corrects that oversight, and updates some tests whose output changed due to slightly different DAGCombine outputs. llvm-svn: 174343	2013-02-05 06:25:30 +00:00
Manman Ren	86b1d868ba	[Stack Alignment] emit warning instead of a hard error Per discussion in rdar://13127907, we should emit a hard error only if people write code where the requested alignment is larger than achievable and assumes the low bits are zeros. A warning should be good enough when we are not sure if the source code assumes the low bits are zeros. rdar://13127907 llvm-svn: 174336	2013-02-04 23:45:08 +00:00
Jyotsna Verma	7ab68fbd1d	Hexagon: Add V4 combine instructions and some more Def Pats for V2. llvm-svn: 174331	2013-02-04 15:52:56 +00:00
Benjamin Kramer	c35d526489	Disable a couple more vector splat optimizations on PPC. I didn't see those because the test case used "not grep". FileCheck the test and XFAIL it, preserving the old optimization, so this can be fixed eventually. llvm-svn: 174330	2013-02-04 15:52:32 +00:00
Benjamin Kramer	2c9da989c2	X86: Open up some opportunities for constant folding by postponing shift lowering. Fixes PR15141. llvm-svn: 174327	2013-02-04 15:19:33 +00:00
Benjamin Kramer	548ffa274a	SelectionDAG: Teach FoldConstantArithmetic how to deal with vectors. This required disabling a PowerPC optimization that did the following: input: x = BUILD_VECTOR <i32 16, i32 16, i32 16, i32 16> lowered to: tmp = BUILD_VECTOR <i32 8, i32 8, i32 8, i32 8> x = ADD tmp, tmp The add now gets folded immediately and we're back at the BUILD_VECTOR we started from. I don't see a way to fix this currently so I left it disabled for now. Fix some trivially foldable X86 tests too. llvm-svn: 174325	2013-02-04 15:19:18 +00:00
David Blaikie	33111dfea0	Remove the (apparently) unnecessary debug info metadata indirection. The main lists of debug info metadata attached to the compile_unit had an extra layer of metadata nodes they went through for no apparent reason. This patch removes that (& still passes just as much of the GDB 7.5 test suite). If anyone can show evidence as to why these extra metadata nodes are there I'm open to reverting this patch & documenting why they're there. llvm-svn: 174266	2013-02-02 05:56:24 +00:00
Reed Kotler	f8933f83f0	Start static relocation implementation for mips16. This checkin makes hello world work. llvm-svn: 174264	2013-02-02 04:07:35 +00:00
Shuxin Yang	cadd8a068e	rdar://13126763 Fix a bug in DAGCombine. The symptom is mistakenly optimizing expression "x + xx" into "x 3.0". llvm-svn: 174239	2013-02-02 00:22:03 +00:00
Bill Schmidt	52742c25ae	LLVM enablement for some older PowerPC CPUs llvm-svn: 174230	2013-02-01 22:59:51 +00:00
David Sehr	8114a7a651	Two changes relevant to LEA and x32: 1) allows the use of RIP-relative addressing in 32-bit LEA instructions under x86-64 (ILP32 and LP64) 2) separates the size of address registers in 64-bit LEA instructions from control by ILP32/LP64. llvm-svn: 174208	2013-02-01 19:28:09 +00:00
Jyotsna Verma	10f5c2db4e	Hexagon: Test case to confirm generation of indexed loads with zero offset. llvm-svn: 174196	2013-02-01 16:40:06 +00:00
Tim Northover	e3d4236402	Add explicit triples to AArch64 tests Only Linux is supported at the moment, and other platforms quickly fault. As a result these tests would fail on non-Linux hosts. It may be worth making the tests more generic again as more platforms are supported. llvm-svn: 174170	2013-02-01 11:40:47 +00:00
Tom Stellard	4926921bd4	R600: Fold clamp, neg, abs Patch by: Vincent Lejeune Reviewed-by: Tom Stellard <thomas.stellard@amd.com> llvm-svn: 174099	2013-01-31 22:11:54 +00:00
Lang Hames	dd47804394	When lowering memcpys to loads and stores, make sure we don't promote alignments past the natural stack alignment. llvm-svn: 174085	2013-01-31 20:23:43 +00:00
Tim Northover	e0e3aefdd3	Add AArch64 as an experimental target. This patch adds support for AArch64 (ARM's 64-bit architecture) to LLVM in the "experimental" category. Currently, it won't be built unless requested explicitly. This initial commit should have support for: + Assembly of all scalar (i.e. non-NEON, non-Crypto) instructions (except the late addition CRC instructions). + CodeGen features required for C++03 and C99. + Compilation for the "small" memory model: code+static data < 4GB. + Absolute and position-independent code. + GNU-style (i.e. "__thread") TLS. + Debugging information. The principal omission, currently, is performance tuning. This patch excludes the NEON support also reviewed due to an outbreak of batshit insanity in our legal department. That will be committed soon bringing the changes to precisely what has been approved. Further reviews would be gratefully received. llvm-svn: 174054	2013-01-31 12:12:40 +00:00
Eric Christopher	4e3e94c13d	Check and allow floating point registers to select the size of the register for inline asm. This conforms to how gcc allows for effective casting of inputs into gprs (fprs is already handled). llvm-svn: 174008	2013-01-31 00:50:46 +00:00
Eli Bendersky	6c84b90b70	Replace some more greps with FileChecks in tests llvm-svn: 174006	2013-01-31 00:44:12 +00:00
Eli Bendersky	a320e00e74	Rewrite this test properly with a FileCheck instead of greps llvm-svn: 173997	2013-01-31 00:11:52 +00:00
Hal Finkel	e1df90958d	PPC QPX requires a 32-byte aligned stack On systems which support the QPX vector instructions, the stack must be 32-byte aligned. llvm-svn: 173993	2013-01-30 23:43:27 +00:00
Evan Cheng	9449ec956f	Forgot the test case before. llvm-svn: 173988	2013-01-30 22:57:00 +00:00
Hal Finkel	efb305e54c	Add definitions for the PPC a2q core marked as having QPX available This is the first commit of a large series which will add support for the QPX vector instruction set to the PowerPC backend. This instruction set is used on the IBM Blue Gene/Q supercomputers. llvm-svn: 173973	2013-01-30 21:17:42 +00:00
Eli Bendersky	2e2ce49e59	Add a special ARM trap encoding for NaCl. More details in this thread: http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20130128/163783.html Patch by JF Bastien llvm-svn: 173943	2013-01-30 16:30:19 +00:00
Logan Chien	a436e4c7e4	Add missing header and test cases for r173939. llvm-svn: 173941	2013-01-30 15:48:50 +00:00
Akira Hatanaka	4385564e97	[mips] Test case for r173862. Patch by Sasa Stankovic. llvm-svn: 173863	2013-01-30 00:28:15 +00:00
Tim Northover	a0edd3ee66	Fix 64-bit atomic operations in Thumb mode. The ARM and Thumb variants of LDREXD and STREXD have different constraints and take different operands. Previously the code expanding atomic operations didn't take this into account and asserted in Thumb mode. llvm-svn: 173780	2013-01-29 09:06:13 +00:00
Bill Schmidt	2e4ae4e154	This patch addresses bug 15031. The common code in the post-RA scheduler to break anti-dependencies on the critical path contained a flaw. In the reported case, an anti-dependency between the overlapping registers %X4 and %R4 exists: %X29<def> = OR8 %X4, %X4 %R4<def>, %X3<def,dead,tied3> = LBZU 1, %X3<kill,tied1> The unpatched code breaks the dependency by replacing %R4 and its uses with %R3, the first register on the available list. However, %R3 and %X3 overlap, so this creates two overlapping definitions on the same instruction. The fix is straightforward, preventing selection of a register that overlaps any other defined register on the same instruction. The test case is reduced from the bug report, and verifies that we no longer produce "lbzu 3, 1(3)" when breaking this anti-dependency. llvm-svn: 173706	2013-01-28 18:36:58 +00:00
Benjamin Kramer	05cc93964a	When the legalizer is splitting vector shifts, the result may not have the right shift amount type. Fix that by adding a cast to the shift expander. This came up with vector shifts on sse-less X86 CPUs. <2 x i64> = shl <2 x i64> <2 x i64> -> i64,i64 = shl i64 i64; shl i64 i64 -> i32,i32,i32,i32 = shl_parts i32 i32 i64; shl_parts i32 i32 i64 Now we cast the last two i64s to the right type. Fixes the crash in PR14668. llvm-svn: 173615	2013-01-27 11:19:11 +00:00
Benjamin Kramer	99c68dd964	X86: Do splat promotion later, so the optimizer can chew on it first. This catches many cases where we can emit a more efficient shuffle for a specific mask or when the mask contains undefs. Once the splat is lowered to unpacks we can't do that anymore. There is a possibility of moving the promotion after pshufb matching, but I'm not sure if pshufb with a mask loaded from memory is faster than 3 shuffles, so I avoided that for now. llvm-svn: 173569	2013-01-26 11:44:21 +00:00
Benjamin Kramer	7268a05178	FileCheckize and merge some tests. llvm-svn: 173568	2013-01-26 11:14:32 +00:00
Reid Kleckner	ab083f727b	FileCheck-ify some grep tests These tests in particular try to use escaped square brackets as an argument to grep, which is failing for me with native win32 python. It appears the backslash is being lost near the CreateProcess*() call. llvm-svn: 173506	2013-01-25 22:11:46 +00:00
Eli Bendersky	597fc1233a	In this patch, we teach X86_64TargetMachine that it has a ILP32 (defined by the x32 ABI) mode, in which case its pointers are 32-bits in size. This knowledge is also added to X86RegisterInfo that now returns the appropriate registers in getPointerRegClass. There are many outcomes to this change. In order to keep the patches separate and manageable, we start by focusing on some simple testable cases. The patch adds a test with passing a pointer to a function - focusing on the difference between the two data models for x86-64. Another test is added for handling of 'sret' arguments (and functionality is added in X86ISelLowering to make it work). A note on naming: the "x32 ABI" document refers to the AMD64 architecture (in LLVM it's distinguished by being is64Bits() in the x86 subtarget) with two variations: the LP64 (default) data model, and the ILP32 data model. This patch adds predicates to the subtarget which are consistent with this naming scheme. llvm-svn: 173503	2013-01-25 22:07:43 +00:00
Eli Bendersky	e6abe83258	Now that llvm-dwarfdump supports flags to specify which DWARF section to dump, use them in tests that run llvm-dwarfdump. This is in order to make tests as specific as possible. llvm-svn: 173498	2013-01-25 21:44:53 +00:00
Silviu Baranga	3eb45a03af	Fixed the condition codes for the atomic64 min/umin code generation on ARM. If the sutraction of the higher 32 bit parts gives a 0 result, we need to do the store operation. llvm-svn: 173437	2013-01-25 10:39:49 +00:00
Andrew Trick	e2c3f5c982	MIsched: Improve the interface to SchedDFS analysis (subtrees). Allow the strategy to select SchedDFS. Allow the results of SchedDFS to affect initialization of the scheduler state. llvm-svn: 173425	2013-01-25 06:33:57 +00:00
Andrew Trick	44f750a3e5	MISched: Add SchedDFSResult to ScheduleDAGMI to formalize the interface and allow other strategies to select it. llvm-svn: 173413	2013-01-25 04:01:04 +00:00
Akira Hatanaka	28aed9ca85	[mips] Set flag neverHasSideEffects flag on some of the floating point instructions. llvm-svn: 173401	2013-01-25 00:20:39 +00:00
Reed Kotler	a2d76bce1f	The next phase of Mips16 hard float implementation. Allow Mips16 routines to call Mips32 routines that have abi requirements that either arguments or return values are passed in floating point registers. This handles only the pic case. We have not done non pic for Mips16 yet in any form. The libm functions are Mips32, so with this addition we have a complete Mips16 hard float implementation. We still are not able to complete mix Mip16 and Mips32 with hard float. That will be the next phase which will have several steps. For Mips32 to freely call Mips16 some stub functions must be created. llvm-svn: 173320	2013-01-24 04:24:02 +00:00
Bill Wendling	7c8f96a91b	Add the heuristic to differentiate SSPStrong from SSPRequired. The requirements of the strong heuristic are: * A Protector is required for functions which contain an array, regardless of type or length. * A Protector is required for functions which contain a structure/union which contains an array, regardless of type or length. Note, there is no limit to the depth of nesting. * A protector is required when the address of a local variable (i.e., stack based variable) is exposed. (E.g., such as through a local whose address is taken as part of the RHS of an assignment or a local whose address is taken as part of a function argument.) llvm-svn: 173231	2013-01-23 06:43:53 +00:00
Bill Wendling	d154e283f2	Add the IR attribute 'sspstrong'. SSPStrong applies a heuristic to insert stack protectors in these situations: * A Protector is required for functions which contain an array, regardless of type or length. * A Protector is required for functions which contain a structure/union which contains an array, regardless of type or length. Note, there is no limit to the depth of nesting. * A protector is required when the address of a local variable (i.e., stack based variable) is exposed. (E.g., such as through a local whose address is taken as part of the RHS of an assignment or a local whose address is taken as part of a function argument.) This patch implements the SSPString attribute to be equivalent to SSPRequired. This will change in a subsequent patch. llvm-svn: 173230	2013-01-23 06:41:41 +00:00
Michael Liao	3dffc5e2b7	Fix an issue of pseudo atomic instruction DAG schedule - Add list of physical registers clobbered in pseudo atomic insts Physical registers are clobbered when pseudo atomic instructions are expanded. Add them in clobber list to prevent DAG scheduler to mis-schedule them after these insns are declared side-effect free. - Add test case from Michael Kuperstein <michael.m.kuperstein@intel.com> llvm-svn: 173200	2013-01-22 21:47:38 +00:00
Akira Hatanaka	88c0ec826c	[mips] Implement MipsRegisterInfo::getRegPressureLimit. llvm-svn: 173197	2013-01-22 21:34:25 +00:00
NAKAMURA Takumi	9439237063	llvm/test/CodeGen/X86/win_ftol2.ll: Add -cpu=generic to appease valgrind. On valgrind the processor is reported; Host CPU: athlon-fx llvm-svn: 172983	2013-01-20 15:40:02 +00:00
Nadav Rotem	9450fcfff1	Revert 172708. The optimization handles esoteric cases but adds a lot of complexity both to the X86 backend and to other backends. This optimization disables an important canonicalization of chains of SEXT nodes and makes SEXT and ZEXT asymmetrical. Disabling the canonicalization of consecutive SEXT nodes into a single node disables other DAG optimizations that assume that there is only one SEXT node. The AVX mask optimizations is one example. Additionally this optimization does not update the cost model. llvm-svn: 172968	2013-01-20 08:35:56 +00:00
Nadav Rotem	7b3120b9ae	On Sandybridge split unaligned 256bit stores into two xmm-sized stores. llvm-svn: 172894	2013-01-19 08:38:41 +00:00
Jakob Stoklund Olesen	ac6cfa41d6	Remove some register allocation order dependencies. llvm-svn: 172874	2013-01-19 00:03:32 +00:00
Nadav Rotem	7431211214	On Sandybridge loading unaligned 256bits using two XMM loads (vmovups and vinsertf128) is faster than using a single vmovups instruction. llvm-svn: 172868	2013-01-18 23:10:30 +00:00
NAKAMURA Takumi	b72e763325	llvm/test/CodeGen/X86/Atomics-64.ll: Tweak for 2nd RUN not to overwrite %t. It sometimes causes spurious failure on lit win32. Feel free to prune or suppress each output. llvm-svn: 172823	2013-01-18 14:52:02 +00:00
Bill Schmidt	94b8cdbf55	Restore reverted test case, this time with REQUIRES: asserts llvm-svn: 172747	2013-01-17 19:46:51 +00:00
Bill Schmidt	b400204fd8	Remove bad test case llvm-svn: 172746	2013-01-17 19:39:36 +00:00
Bill Schmidt	dee1ef8f53	This patch fixes PR13626 by providing i128 support in the return calling convention. 128-bit integers are now properly returned in GPR3 and GPR4 on PowerPC. llvm-svn: 172745	2013-01-17 19:34:57 +00:00
Jyotsna Verma	9b60c1d171	Add indexed load/store instructions for offset validation check. This patch fixes bug 14902 - http://llvm.org/bugs/show_bug.cgi?id=14902 llvm-svn: 172737	2013-01-17 18:42:37 +00:00
Bill Schmidt	6b2940b01e	This patch fixes the PPC calling convention to handle returns of _Complex float and _Complex long double, by simply increasing the number of floating point registers available for return values. The test case verifies that the correct registers are loaded. llvm-svn: 172733	2013-01-17 17:45:19 +00:00
Elena Demikhovsky	f6a30e05d5	Optimization for the following SIGN_EXTEND pairs: v8i8 -> v8i64, v8i8 -> v8i32, v4i8 -> v4i64, v4i16 -> v4i64 for AVX and AVX2. Bug 14865. llvm-svn: 172708	2013-01-17 09:59:53 +00:00
Bill Schmidt	d006c6938b	This patch addresses an incorrect transformation in the DAG combiner. The included test case is derived from one of the GCC compatibility tests. The problem arises after the selection DAG has been converted to type-legalized form. The combiner first sees a 64-bit load that can be converted into a pre-increment form. The original load feeds into a SRL that isolates the upper 32 bits of the loaded doubleword. This looks like an opportunity for DAGCombiner::ReduceLoadWidth() to replace the 64-bit load with a 32-bit load. However, this transformation is not valid, as the replacement load is not a pre-increment load. The pre-increment load produces an extra result, which feeds a subsequent add instruction. The replacement load only has one result value, and this value is propagated to all uses of the pre- increment load, including the add. Because the add is looking for the second result value as its operand, it ends up attempting to add a constant to a token chain, resulting in a crash. So the patch simply disables this transformation for any load with more than two result values. llvm-svn: 172480	2013-01-14 22:04:38 +00:00
Benjamin Kramer	bcd14a0f26	X86: Add patterns for X86ISD::VSEXT in registers. Those can occur when something between the sextload and the store is on the same chain and blocks isel. Fixes PR14887. llvm-svn: 172353	2013-01-13 11:37:04 +00:00
Benjamin Kramer	5ea0349ef5	When lowering an inreg sext first shift left, then right arithmetically. Shifting right two times will only yield zero. Should fix SingleSource/UnitTests/SignlessTypes/factor. llvm-svn: 172322	2013-01-12 19:06:44 +00:00
Nadav Rotem	dbe5c72d03	PPC: Implement efficient lowering of sign_extend_inreg. llvm-svn: 172269	2013-01-11 22:57:48 +00:00
Preston Gurd	99c6990457	Update patch for the pad short functions pass for Intel Atom (only). Adds a check for -Oz, changes the code to not re-visit BBs, and skips over DBG_VALUE instrs. Patch by Andy Zhang. llvm-svn: 172258	2013-01-11 22:06:56 +00:00
Eric Christopher	0cb6fd930e	For inline asm: - recognize string "{memory}" in the MI generation - mark as mayload/maystore when there's a memory clobber constraint. PR14859. Patch by Krzysztof Parzyszek llvm-svn: 172228	2013-01-11 18:12:39 +00:00
Tim Northover	3a51aab390	Simplify writing floating types to assembly. This removes previous special cases for each floating-point type in favour of a shared codepath. llvm-svn: 172189	2013-01-11 10:36:13 +00:00
NAKAMURA Takumi	e46e8225f4	llvm/test/CodeGen/X86/ms-inline-asm.ll: Fixup; Globals doesn't have leading underscore in symbol on linux. llvm-svn: 172139	2013-01-10 23:02:48 +00:00
Evan Cheng	c8444b159a	PR14896: Handle memcpy from constant string where the memcpy size is larger than the string size. llvm-svn: 172124	2013-01-10 22:13:27 +00:00
Chad Rosier	a4bc9437a2	[ms-inline asm] Add support for calling functions from inline assembly. Part of rdar://12991541 llvm-svn: 172121	2013-01-10 22:10:27 +00:00
Manman Ren	207bcbacca	Stack Alignment: throw error if we can't satisfy the minimal alignment requirement when creating stack objects in MachineFrameInfo. Add CreateStackObjectWithMinAlign to throw error when the minimal alignment can't be achieved and to clamp the alignment when the preferred alignment can't be achieved. Same is true for CreateVariableSizedObject. Will not emit error in CreateSpillStackObject or CreateStackObject. As long as callers of CreateStackObject do not assume the object will be aligned at the requested alignment, we should not have miscompile since later optimizations which look at the object's alignment will have the correct information. rdar://12713765 llvm-svn: 172027	2013-01-10 01:10:10 +00:00
Evan Cheng	5652a8df32	Fix a DAG combine bug visitBRCOND() is transforming br(xor(x, y)) to br(x != y). It cahced XOR's operands before calling visitXOR() but failed to update the operands when visitXOR changed the XOR node. rdar://12968664 llvm-svn: 171999	2013-01-09 20:56:40 +00:00
Nadav Rotem	3f5825c6c1	add -march to the test llvm-svn: 171956	2013-01-09 07:04:23 +00:00
Nadav Rotem	977e0be4a0	Efficient lowering of vector sdiv when the divisor is a splatted power of two constant. PR 14848. The lowered sequence is based on the existing sequence the target-independent DAG Combiner creates for the scalar case. Patch by Zvi Rackover. llvm-svn: 171953	2013-01-09 05:14:33 +00:00
Andrew Trick	9f0b95f260	MIsched: add an ILP window property to machine model. This was an experimental option, but needs to be defined per-target. e.g. PPC A2 needs to aggressively hide latency. I converted some in-order scheduling tests to A2. Hal is working on more test cases. llvm-svn: 171946	2013-01-09 03:36:49 +00:00
Tim Northover	90fb75d859	Specify complete triple for fp128 tests. This avoids FileCheck failing over different comment characters in assembly (notably powerpc64 on Linux vs Darwin) and should fix David's build-bot. llvm-svn: 171886	2013-01-08 19:36:33 +00:00
Preston Gurd	a01daace88	Pad Short Functions for Intel Atom The current Intel Atom microarchitecture has a feature whereby when a function returns early then it is slightly faster to execute a sequence of NOP instructions to wait until the return address is ready, as opposed to simply stalling on the ret instruction until the return address is ready. When compiling for X86 Atom only, this patch will run a pass, called "X86PadShortFunction" which will add NOP instructions where less than four cycles elapse between function entry and return. It includes tests. This patch has been updated to address Nadav's review comments - Optimize only at >= O1 and don't do optimization if -Os is set - Stores MachineBasicBlock* instead of BBNum - Uses DenseMap instead of std::map - Fixes placement of braces Patch by Andy Zhang. llvm-svn: 171879	2013-01-08 18:27:24 +00:00
Tim Northover	7bb9992cce	Allow the asm printer to print fp128 values properly. llvm-svn: 171866	2013-01-08 16:56:23 +00:00
Bill Schmidt	9b1e3e25dc	This patch addresses bug 14678 by fixing two problems in medium code model code generation. Variables addressed through a GlobalAlias were not being handled, and variables with available_externally linkage were treated incorrectly. The patch contains two new tests to verify the correct code generation for these cases. llvm-svn: 171778	2013-01-07 19:29:18 +00:00
Silviu Baranga	a055aab506	Make the MergeGlobals pass correctly handle the address space qualifiers of the global variables. We partition the set of globals by their address space, and apply the same the trasnformation as before to merge them. llvm-svn: 171730	2013-01-07 12:31:25 +00:00
Craig Topper	4f1c7256f9	Fix suffix handling for parsing and printing of cvtsi2ss, cvtsi2sd, cvtss2si, cvttss2si, cvtsd2si, and cvttsd2si to match gas behavior. cvtsi2* should parse with an 'l' or 'q' suffix or no suffix at all. No suffix should be treated the same as 'l' suffix. Printing should always print a suffix. Previously we didn't parse or print an 'l' suffix. cvtt2si/cvt2si should parse with an 'l' or 'q' suffix or not suffix at all. No suffix should use the destination register size to choose encoding. Printing should not print a suffix. Original 'l' suffix issue with cvtsi2* pointed out by Michael Kuperstein. llvm-svn: 171668	2013-01-06 20:39:29 +00:00
Evan Cheng	3fb03e23a4	Fix for PR14739. It's not safe to fold a load into a call across a store. Thanks to Nick Lewycky for the initial patch. llvm-svn: 171665	2013-01-06 19:00:15 +00:00
Craig Topper	92a70b1e65	Recommit r171461 which was incorrectly reverted. Mark DIV/IDIV instructions hasSideEffects=1 because they can trap when dividing by 0. This is needed to keep early if conversion from moving them across basic blocks. llvm-svn: 171608	2013-01-05 07:39:25 +00:00
Nadav Rotem	478b6a47ec	Revert revision 171524. Original message: URL: http://llvm.org/viewvc/llvm-project?rev=171524&view=rev Log: The current Intel Atom microarchitecture has a feature whereby when a function returns early then it is slightly faster to execute a sequence of NOP instructions to wait until the return address is ready, as opposed to simply stalling on the ret instruction until the return address is ready. When compiling for X86 Atom only, this patch will run a pass, called "X86PadShortFunction" which will add NOP instructions where less than four cycles elapse between function entry and return. It includes tests. Patch by Andy Zhang. llvm-svn: 171603	2013-01-05 05:42:48 +00:00
Preston Gurd	e36b685a94	The current Intel Atom microarchitecture has a feature whereby when a function returns early then it is slightly faster to execute a sequence of NOP instructions to wait until the return address is ready, as opposed to simply stalling on the ret instruction until the return address is ready. When compiling for X86 Atom only, this patch will run a pass, called "X86PadShortFunction" which will add NOP instructions where less than four cycles elapse between function entry and return. It includes tests. Patch by Andy Zhang. llvm-svn: 171524	2013-01-04 20:54:54 +00:00
Akira Hatanaka	b13b33359b	[mips] MipsTargetLowering::getSetCCResultType should return a vector type if vectors are being compared. llvm-svn: 171517	2013-01-04 20:06:01 +00:00
Nadav Rotem	c616a5408a	Revert revision: 171467. This transformation is incorrect and makes some tests fail. Original message: Simplified TRUNCATE operation that comes after SETCC. It is possible since SETCC result is 0 or -1. Added a test. llvm-svn: 171468	2013-01-04 17:35:21 +00:00
Elena Demikhovsky	5f2f06d2d9	Simplified TRUNCATE operation that comes after SETCC. It is possible since SETCC result is 0 or -1. Added a test. llvm-svn: 171467	2013-01-03 08:48:33 +00:00
Michael Gottesman	820aac1c78	Revert "Mark DIV/IDIV instructions hasSideEffects=1 because they can trap when dividing by 0. This is needed to keep early if conversion from moving them across basic blocks." This reverts commit r171461 since it breaks the following tests: Clang :: Analysis/outofbound-notwork.c Clang :: Analysis/string-fail.c Clang :: CXX/basic/basic.lookup/basic.lookup.qual/p6-0x.cpp Clang :: CXX/basic/basic.lookup/basic.lookup.unqual/p15.cpp Clang :: CXX/dcl.dcl/dcl.spec/dcl.fct.spec/p4.cpp Clang :: CXX/dcl.dcl/dcl.spec/dcl.stc/p10.cpp Clang :: CXX/temp/temp.param/p14.cpp Clang :: CXX/temp/temp.res/temp.dep.res/temp.point/p1.cpp Clang :: CodeGen/2009-02-13-zerosize-union-field-ppc.c Clang :: CodeGen/blocks-2.c Clang :: CodeGen/libcalls-d.c Clang :: CodeGen/libcalls-ld.c Clang :: CodeGenCXX/conversion-function.cpp Clang :: CodeGenCXX/debug-info-limit-type.cpp Clang :: CodeGenCXX/inheriting-constructor.cpp Clang :: FixIt/fixit-errors.c Clang :: FixIt/fixit-pmem.cpp Clang :: Modules/namespaces.cpp Clang :: PCH/changed-files.c Clang :: PCH/pr4489.c Clang :: PCH/source-manager-stack.c Clang :: Parser/cxx-ambig-decl-expr-xfail.cpp Clang :: SemaCXX/switch-implicit-fallthrough-cxx98.cpp Clang :: SemaTemplate/instantiate-function-1.mm llvm-svn: 171466	2013-01-03 08:18:30 +00:00
Craig Topper	7c27cc9fd0	Mark DIV/IDIV instructions hasSideEffects=1 because they can trap when dividing by 0. This is needed to keep early if conversion from moving them across basic blocks. llvm-svn: 171461	2013-01-03 06:40:20 +00:00
Jakob Stoklund Olesen	725d57682b	Fix PR14732 by handling all kinds of IMPLICIT_DEF live ranges. Most IMPLICIT_DEF instructions are removed by the ProcessImplicitDefs pass, and a few are reinserted by PHIElimination when a PHI argument is <undef>. RegisterCoalescer was assuming that all IMPLICIT_DEF live ranges look like those created by PHIElimination, and that their live range never leaves the basic block. The PR14732 test case does tricks with PHI nodes that causes a longer IMPLICIT_DEF live range to appear. This happens very rarely, but RegisterCoalescer should be able to handle it. llvm-svn: 171435	2013-01-03 00:47:51 +00:00
Tom Stellard	567f886eb0	DAGCombiner: Avoid generating illegal vector INT_TO_FP nodes DAGCombiner::reduceBuildVecConvertToConvertBuildVec() was making two mistakes: 1. It was checking the legality of scalar INT_TO_FP nodes and then generating vector nodes. 2. It was passing the result value type to TargetLoweringInfo::getOperationAction() when it should have been passing the value type of the first operand. llvm-svn: 171420	2013-01-02 22:13:01 +00:00
Nadav Rotem	c8d7047fa9	AVX: Fix a bug in WidenMaskArithmetic. llvm-svn: 171397	2013-01-02 17:40:39 +00:00
Hal Finkel	6dbdd4307b	Support ppcf128 in SelectionDAG::getConstantFP Fixes pr14751. Patch by Kai; Thanks! llvm-svn: 171261	2012-12-30 19:03:32 +00:00
Dmitri Gribenko	56bf2e1830	Tests: rewrite 'opt ... %s' to 'opt ... < %s' so that opt does not emit a ModuleID This is done to avoid odd test failures, like the one fixed in r171243. llvm-svn: 171250	2012-12-30 02:33:22 +00:00
Nadav Rotem	3da9ac72fa	AVX: Move the ZEXT/ANYEXT DAGCo optimizations to the lowering of these optimizations. The old test cases still cover all of these lowering/optimizations. The single change that we have is that now anyext does not need to zero a register, because it does not use the exact code path as the zero_extend. llvm-svn: 171178	2012-12-28 05:45:24 +00:00
Nadav Rotem	2a054b4475	On AVX/AVX2 the type v8i1 is legalized to v8i16, which is an XMM sized register. In most cases we actually compare or select YMM-sized registers and mixing the two types creates horrible code. This commit optimizes some of the transition sequences. PR14657. llvm-svn: 171148	2012-12-27 08:15:45 +00:00
NAKAMURA Takumi	40aa3285f4	llvm/test/CodeGen/X86: FileCheck-ize two tests in r171083. llvm-svn: 171084	2012-12-26 03:19:30 +00:00
NAKAMURA Takumi	334f685328	llvm/test/CodeGen/X86: Disable avx in two tests corresponding to r171082. llvm-svn: 171083	2012-12-26 03:08:55 +00:00
Hal Finkel	2ebe6d08cd	Loosen scheduling restrictions on the PPC dcbt intrinsic As with the prefetch intrinsic to which it maps, simply have dcbt marked as reading from and writing to its arguments instead of having unmodeled side effects. While this might cause unwanted code motion (because aliasing checks don't really capture cache-line sharing), it is more important that prefetches in unrolled loops don't block the scheduler from rearranging the unrolled loop body. llvm-svn: 171073	2012-12-25 18:51:18 +00:00
Hal Finkel	1b5ff08d43	Expand PPC64 atomic load and store Use of store or load with the atomic specifier on 64-bit types would cause instruction-selection failures. As with the 32-bit case, these can use the default expansion in terms of cmp-and-swap. llvm-svn: 171072	2012-12-25 17:22:53 +00:00
Benjamin Kramer	a9f265ee98	Harden test so it's not affected by changes to compare lowering. This only failed on hosts that don't have SSE41. llvm-svn: 171066	2012-12-25 13:23:23 +00:00
Benjamin Kramer	81b5a8fd2e	X86: Shave off one shuffle from the pcmpeqq sequence for SSE2 by making use of and commutativity. llvm-svn: 171064	2012-12-25 13:09:08 +00:00
Benjamin Kramer	df4af41b9b	X86: Custom lower <2 x i64> eq and ne when SSE41 is not available. pcmpeqd, pshufd, pshufd, pand is cheaper than unpack + cmpq, sbbq, cmpq, sbbq + pack. Small speedup on loop-vectorized viterbi (-march=core2). llvm-svn: 171063	2012-12-25 12:54:19 +00:00
NAKAMURA Takumi	1b18db7ea3	llvm/test/CodeGen/X86/fold-vex.ll: Add explicit triple. llvm-svn: 171029	2012-12-24 11:14:06 +00:00
Nadav Rotem	dc0ad92b64	Some x86 instructions can load/store one of the operands to memory. On SSE, this memory needs to be aligned. When these instructions are encoded in VEX (on AVX) there is no such requirement. This changes the folding tables and removes the alignment restrictions from VEX-encoded instructions. llvm-svn: 171024	2012-12-24 09:40:33 +00:00
Benjamin Kramer	76268ac682	X86: Turn mul of <4 x i32> into pmuludq when no SSE4.1 is available. pmuludq is slow, but it turns out that all the unpacking and packing of the scalarized mul is even slower. 10% speedup on loop-vectorized paq8p. llvm-svn: 170985	2012-12-22 16:07:56 +00:00
Benjamin Kramer	b2f0a2bd4b	X86: Emit vector sext as shuffle + sra if vpmovsx is not available. Also loosen the SSSE3 dependency a bit, expanded pshufb + psra is still better than scalarized loads. Fixes PR14590. llvm-svn: 170984	2012-12-22 11:34:28 +00:00
Nadav Rotem	d5aae980cb	In some cases, due to scheduling constraints we copy the EFLAGS. The only way to read the eflags is using push and pop. If we don't adjust the stack then we run over the first frame index. This is not something that we want to do, so we have to make sure that our machine function does not copy the flags. If it does then we have to emit the prolog that adjusts the stack. rdar://12896831 llvm-svn: 170961	2012-12-21 23:48:49 +00:00
Benjamin Kramer	b4688f84bd	try to unbreak ppc buildbots. llvm-svn: 170913	2012-12-21 18:11:45 +00:00
Benjamin Kramer	82d1c371e2	X86: Match pmin/pmax as a target specific dag combine. This occurs during vectorization. Part of PR14667. llvm-svn: 170908	2012-12-21 17:46:58 +00:00
Tom Stellard	a8b0351720	R600: Expand vec4 INT <-> FP conversions llvm-svn: 170901	2012-12-21 16:33:24 +00:00
Reed Kotler	93f778d2bd	Add test case for r170674 llvm-svn: 170823	2012-12-21 00:55:10 +00:00
Eric Christopher	6e47b725ff	Move these files over to the debug info directory. llvm-svn: 170810	2012-12-21 00:03:42 +00:00
Bob Wilson	7bba4f8957	Revert "Adding support for llvm.arm.neon.vaddl[su].* and" This reverts r170694. The operations can be represented in IR without adding any new intrinsics. llvm-svn: 170765	2012-12-20 21:09:38 +00:00
Evan Cheng	ddc0cb6dc5	On some ARM cpus, flags setting movs with shifter operand, i.e. lsl, lsr, asr, are more expensive than the non-flag setting variant. Teach thumb2 size reduction pass to avoid generating them unless we are optimizing for size. rdar://12892707 llvm-svn: 170728	2012-12-20 19:59:30 +00:00
Rafael Espindola	642c7cd56e	Simplify the testcase a bit. I checked that it would still crash llc before the corresponding fix. llvm-svn: 170709	2012-12-20 17:47:27 +00:00
Renato Golin	6b2ea4a48f	Adding support for llvm.arm.neon.vaddl[su].* and llvm.arm.neon.vsub[su].* intrinsics. Patch by Pete Couperus <pjcoup@gmail.com> llvm-svn: 170694	2012-12-20 13:52:11 +00:00
Reed Kotler	d019dbf75e	fix most of remaining issues with large frames. these patches are tested a lot by test-suite but make check tests are forthcoming once the next few patches that complete this are committed. with the next few patches the pass rate for mips16 is near 100% llvm-svn: 170656	2012-12-20 04:07:42 +00:00
Akira Hatanaka	f423672117	[mips] Use "or $r0, $r1, $zero" instead of "addu $r0, $zero, $r1" to copy physical register $r1 to $r0. GNU disassembler recognizes an "or" instruction as a "move", and this change makes the disassembled code easier to read. Original patch by Reed Kotler. llvm-svn: 170655	2012-12-20 04:06:06 +00:00
Bob Wilson	3365b80290	Do not introduce vector operations in functions marked with noimplicitfloat. <rdar://problem/12879313> llvm-svn: 170630	2012-12-20 01:36:20 +00:00
Evan Cheng	eae6d2ccea	LLVM sdisel normalize bit extraction of the form: ((x & 0xff00) >> 8) << 2 to (x >> 6) & 0x3fc This is general goodness since it folds a left shift into the mask. However, the trailing zeros in the mask prevents the ARM backend from using the bit extraction instructions. And worse since the mask materialization may require an addition instruction. This comes up fairly frequently when the result of the bit twiddling is used as memory address. e.g. = ptr[(x & 0xFF0000) >> 16] We want to generate: ubfx r3, r1, #16, #8 ldr.w r3, [r0, r3, lsl #2] vs. mov.w r9, #1020 and.w r2, r9, r1, lsr #14 ldr r2, [r0, r2] Add a late ARM specific isel optimization to ARMDAGToDAGISel::PreprocessISelDAG(). It folds the left shift to the 'base + offset' address computation; change the mask to one which doesn't have trailing zeros and enable the use of ubfx. Note the optimization has to be done late since it's target specific and we don't want to change the DAG normalization. It's also fairly restrictive as shifter operands are not always free. It's only done for lsh 1 / 2. It's known to be free on some cpus and they are most common for address computation. This is a slight win for blowfish, rijndael, etc. rdar://12870177 llvm-svn: 170581	2012-12-19 20:16:09 +00:00
Benjamin Kramer	c5071466d4	PowerPC: Expand VSELECT nodes. There's probably a better expansion for those nodes than the default for altivec, but this is better than crashing. VSELECTs occur in loop vectorizer output. llvm-svn: 170551	2012-12-19 15:49:14 +00:00
Elena Demikhovsky	14a4af0e66	Optimized load + SIGN_EXTEND patterns in the X86 backend. llvm-svn: 170506	2012-12-19 07:50:20 +00:00
Nadav Rotem	33360d8ae9	After reducing the size of an operation in the DAG we zero-extend the reduced bitwidth op back to the original size. If we reduce ANDs then this can cause an endless loop. This patch changes the ZEXT to ANY_EXTEND if the demanded bits are equal or smaller than the size of the reduced operation. llvm-svn: 170505	2012-12-19 07:39:08 +00:00
Craig Topper	63f5921776	Teach SimplifySetCC that comparing AssertZext i1 against a constant 1 can be rewritten as a compare against a constant 0 with the opposite condition. llvm-svn: 170495	2012-12-19 06:12:28 +00:00
Quentin Colombet	23b404d5ad	Disable ARM partial flag dependency optimization at -Oz To not over constrain the scheduler for ARM in thumb mode, some optimizations for code size reduction, specific to ARM thumb, are blocked when they add a dependency (like write after read dependency). Disables this check when code size is the priority, i.e., code is compiled with -Oz. llvm-svn: 170462	2012-12-18 22:47:16 +00:00
Andrew Trick	ec2564818c	MISched: add dependence to ExitSU to model live-out latency. llvm-svn: 170454	2012-12-18 20:53:01 +00:00
Hal Finkel	943f76d1b3	Check multiple register classes for inline asm tied registers A register can be associated with several distinct register classes. For example, on PPC, the floating point registers are each associated with both F4RC (which holds f32) and F8RC (which holds f64). As a result, this code would fail when provided with a floating point register and an f64 operand because it would happen to find the register in the F4RC class first and return that. From the F4RC class, SDAG would extract f32 as the register type and then assert because of the invalid implied conversion between the f64 value and the f32 register. Instead, search all register classes. If a register class containing the the requested register has the requested type, then return that register class. Otherwise, as before, return the first register class found that contains the requested register. llvm-svn: 170436	2012-12-18 17:50:58 +00:00
Craig Topper	f924a58af1	Add rest of BMI/BMI2 instructions to the folding tables as well as popcnt and lzcnt. llvm-svn: 170304	2012-12-17 05:02:29 +00:00
Reed Kotler	aee4d5d194	This patch is needed to make c++ exceptions work for mips16. Mips16 is really a processor decoding mode (ala thumb 1) and in the same program, mips16 and mips32 functions can exist and can call each other. If a jal type instruction encounters an address with the lower bit set, then the processor switches to mips16 mode (if it is not already in it). If the lower bit is not set, then it switches to mips32 mode. The linker knows which functions are mips16 and which are mips32. When relocation is performed on code labels, this lower order bit is set if the code label is a mips16 code label. In general this works just fine, however when creating exception handling tables and dwarf, there are cases where you don't want this lower order bit added in. This has been traditionally distinguished in gas assembly source by using a different syntax for the label. lab1: ; this will cause the lower order bit to be added lab2=. ; this will not cause the lower order bit to be added In some cases, it does not matter because in dwarf and debug tables the difference of two labels is used and in that case the lower order bits subtract each other out. To fix this, I have added to mcstreamer the notion of a debuglabel. The default is for label and debug label to be the same. So calling EmitLabel and EmitDebugLabel produce the same result. For various reasons, there is only one set of labels that needs to be modified for the mips exceptions to work. These are the "$eh_func_beginXXX" labels. Mips overrides the debug label suffix from ":" to "=." . This initial patch fixes exceptions. More changes most likely will be needed to DwarfCFException to make all of this work for actual debugging. These changes will be to emit debug labels in some places where a simple label is emitted now. Some historical discussion on this from gcc can be found at: http://gcc.gnu.org/ml/gcc-patches/2008-08/msg00623.html http://gcc.gnu.org/ml/gcc-patches/2008-11/msg01273.html llvm-svn: 170279	2012-12-16 04:00:45 +00:00
Benjamin Kramer	b16ccde7a4	X86: Add a couple of target-specific dag combines that turn VSELECTS into psubus if possible. We match the pattern "x >= y ? x-y : 0" into "subus x, y" and two special cases if y is a constant. DAGCombiner canonicalizes those so we first have to undo the canonicalization for those cases. The pattern occurs in gzip when the loop vectorizer is enabled. Part of PR14613. llvm-svn: 170273	2012-12-15 16:47:44 +00:00
Reed Kotler	5fdeb21249	This code implements most of mips16 hardfloat as it is done by gcc. In this case, essentially it is soft float with different library routines. The next step will be to make this fully interoperational with mips32 floating point and that requires creating stubs for functions with signatures that contain floating point types. I have a more sophisticated design for mips16 hardfloat which I hope to implement at a later time that directly does floating point without the need for function calls. The mips16 encoding has no floating point instructions so one needs to switch to mips32 mode to execute floating point instructions. llvm-svn: 170259	2012-12-15 00:20:05 +00:00
Nadav Rotem	8487537bdb	TypeLegalizer: Do not generate target specific nodes with illegal types, because we cant type-legalize them. llvm-svn: 170245	2012-12-14 21:20:37 +00:00
Bill Schmidt	a4f898448c	This patch removes some nondeterminism from direct object file output for TLS dynamic models on 64-bit PowerPC ELF. The default sort routine for relocations only sorts on the r_offset field; but with TLS, there can be two relocations with the same r_offset. For PowerPC, this patch sorts secondarily on descending r_type, which matches the behavior expected by the linker. llvm-svn: 170237	2012-12-14 20:28:38 +00:00
Bill Schmidt	9f0b4ec0f5	This patch improves the 64-bit PowerPC InitialExec TLS support by providing for a wider range of GOT entries that can hold thread-relative offsets. This matches the behavior of GCC, which was not documented in the PPC64 TLS ABI. The ABI will be updated with the new code sequence. Former sequence: ld 9,x@got@tprel(2) add 9,9,x@tls New sequence: addis 9,2,x@got@tprel@ha ld 9,x@got@tprel@l(9) add 9,9,x@tls Note that a linker optimization exists to transform the new sequence into the shorter sequence when appropriate, by replacing the addis with a nop and modifying the base register and relocation type of the ld. llvm-svn: 170209	2012-12-14 17:02:38 +00:00
Akira Hatanaka	cf9a61b6ee	[mips] Do not copy GOT address to register $gp if the function being called has internal linkage. llvm-svn: 170092	2012-12-13 03:17:29 +00:00
Evan Cheng	bf0baa9de7	Fix a bug in DAGCombiner::MatchBSwapHWord. Make sure the node has operands before referencing them. rdar://12868039 llvm-svn: 170078	2012-12-13 01:34:32 +00:00
Evan Cheng	b7d3d03bf9	Fix a logic bug in inline expansion of memcpy / memset with an overlapping load / store pair. It's not legal to use a wider load than the size of the remaining bytes if it's the first pair of load / store. llvm-svn: 170018	2012-12-12 20:43:23 +00:00
Bill Schmidt	25ffd19502	The ordering of two relocations on the same instruction is apparently not predictable when compiled on at least one non-PowerPC host. Source of nondeterminism not apparent. Restrict the test to build on PowerPC hosts for now while looking into the issue further. llvm-svn: 170016	2012-12-12 20:29:20 +00:00
Bill Schmidt	24b8dd6eb7	This patch implements local-dynamic TLS model support for the 64-bit PowerPC target. This is the last of the four models, so we now have full TLS support. This is mostly a straightforward extension of the general dynamic model. I had to use an additional Chain operand to tie ADDIS_DTPREL_HA to the register copy following ADDI_TLSLD_L; otherwise everything above the ADDIS_DTPREL_HA appeared dead and was removed. As before, there are new test cases to test the assembly generation, and the relocations output during integrated assembly. The expected code gen sequence can be read in test/CodeGen/PowerPC/tls-ld.ll. There are a couple of things I think can be done more efficiently in the overall TLS code, so there will likely be a clean-up patch forthcoming; but for now I want to be sure the functionality is in place. Bill llvm-svn: 170003	2012-12-12 19:29:35 +00:00
NAKAMURA Takumi	be230b8fdb	llvm/test/CodeGen/X86/atom-bypass-slow-division.ll: Fix possible typo(s) in CHECK-NOT lines. Found by Alexander Zinenko, thanks! llvm-svn: 169978	2012-12-12 13:34:20 +00:00
NAKAMURA Takumi	cae5321a3b	llvm/test/CodeGen/X86/atom-bypass-slow-division.ll: Rename symbols, s/test_/Test/g, not to mismatch "CHECK(-NOT): test". llvm-svn: 169977	2012-12-12 13:34:14 +00:00
NAKAMURA Takumi	69d1405e48	llvm/test/CodeGen/X86/store_op_load_fold.ll: Fix typo, s/CHECK_NEXT/CHECK-NEXT/ llvm-svn: 169957	2012-12-12 01:41:01 +00:00
NAKAMURA Takumi	01ac65af00	llvm/test/CodeGen/X86/store_op_load_fold.ll: Add explicit triple. llvm-svn: 169956	2012-12-12 01:40:56 +00:00
Manman Ren	82751a105c	DAGCombine: clamp hi bit in APInt::getBitsSet to avoid assertion rdar://12838504 llvm-svn: 169951	2012-12-12 01:13:50 +00:00
Evan Cheng	04e5518783	Avoid using lossy load / stores for memcpy / memset expansion. e.g. f64 load / store on non-SSE2 x86 targets. llvm-svn: 169944	2012-12-12 00:42:09 +00:00
Tom Stellard	75aadc2813	Add R600 backend A new backend supporting AMD GPUs: Radeon HD2XXX - HD7XXX llvm-svn: 169915	2012-12-11 21:25:42 +00:00
Bill Schmidt	c56f1d34bc	This patch implements the general dynamic TLS model for 64-bit PowerPC. Given a thread-local symbol x with global-dynamic access, the generated code to obtain x's address is: Instruction Relocation Symbol addis ra,r2,x@got@tlsgd@ha R_PPC64_GOT_TLSGD16_HA x addi r3,ra,x@got@tlsgd@l R_PPC64_GOT_TLSGD16_L x bl __tls_get_addr(x@tlsgd) R_PPC64_TLSGD x R_PPC64_REL24 __tls_get_addr nop <use address in r3> The implementation borrows from the medium code model work for introducing special forms of ADDIS and ADDI into the DAG representation. This is made slightly more complicated by having to introduce a call to the external function __tls_get_addr. Using the full call machinery is overkill and, more importantly, makes it difficult to add a special relocation. So I've introduced another opcode GET_TLS_ADDR to represent the function call, and surrounded it with register copies to set up the parameter and return value. Most of the code is pretty straightforward. I ran into one peculiarity when I introduced a new PPC opcode BL8_NOP_ELF_TLSGD, which is just like BL8_NOP_ELF except that it takes another parameter to represent the symbol ("x" above) that requires a relocation on the call. Something in the TblGen machinery causes BL8_NOP_ELF and BL8_NOP_ELF_TLSGD to be treated identically during the emit phase, so this second operand was never visited to generate relocations. This is the reason for the slightly messy workaround in PPCMCCodeEmitter.cpp:getDirectBrEncoding(). Two new tests are included to demonstrate correct external assembly and correct generation of relocations using the integrated assembler. Comments welcome! Thanks, Bill llvm-svn: 169910	2012-12-11 20:30:11 +00:00
Chad Rosier	d4c0c6cb22	Add a triple to this test. llvm-svn: 169803	2012-12-11 00:51:36 +00:00
Chandler Carruth	b27041c50b	Fix a miscompile in the DAG combiner. Previously, we would incorrectly try to reduce the width of this load, and would end up transforming: (truncate (lshr (sextload i48 <ptr> as i64), 32) to i32) to (truncate (zextload i32 <ptr+4> as i64) to i32) We lost the sext attached to the load while building the narrower i32 load, and replaced it with a zext because lshr always zext's the results. Instead, bail out of this combine when there is a conflict between a sextload and a zext narrowing. The rest of the DAG combiner still optimize the code down to the proper single instruction: movswl 6(...),%eax Which is exactly what we wanted. Previously we read past the end and missed the sign extension: movl 6(...), %eax llvm-svn: 169802	2012-12-11 00:36:57 +00:00
Paul Redmond	c4550d4967	move X86-specific test This test case uses -mcpu=corei7 so it belongs in CodeGen/X86 Reviewed by: Nadav llvm-svn: 169801	2012-12-11 00:36:43 +00:00
Chad Rosier	df42cf39ab	Fall back to the selection dag isel to select tail calls. This shouldn't affect codegen for -O0 compiles as tail call markers are not emitted in unoptimized compiles. Testing with the external/internal nightly test suite reveals no change in compile time performance. Testing with -O1, -O2 and -O3 with fast-isel enabled did not cause any compile-time or execution-time failures. All tests were performed on my x86 machine. I'll monitor our arm testers to ensure no regressions occur there. In an upcoming clang patch I will be marking the objc_autoreleaseReturnValue and objc_retainAutoreleaseReturnValue as tail calls unconditionally. While it's theoretically true that this is just an optimization, it's an optimization that we very much want to happen even at -O0, or else ARC applications become substantially harder to debug. Part of rdar://12553082 llvm-svn: 169796	2012-12-11 00:18:02 +00:00
Evan Cheng	79e2ca90bc	Some enhancements for memcpy / memset inline expansion. 1. Teach it to use overlapping unaligned load / store to copy / set the trailing bytes. e.g. On 86, use two pairs of movups / movaps for 17 - 31 byte copies. 2. Use f64 for memcpy / memset on targets where i64 is not legal but f64 is. e.g. x86 and ARM. 3. When memcpy from a constant string, do not replace the load with a constant if it's not possible to materialize an integer immediate with a single instruction (required a new target hook: TLI.isIntImmLegal()). 4. Use unaligned load / stores more aggressively if target hooks indicates they are "fast". 5. Update ARM target hooks to use unaligned load / stores. e.g. vld1.8 / vst1.8. Also increase the threshold to something reasonable (8 for memset, 4 pairs for memcpy). This significantly improves Dhrystone, up to 50% on ARM iOS devices. rdar://12760078 llvm-svn: 169791	2012-12-10 23:21:26 +00:00
Hal Finkel	66859ae0f6	Use GetUnderlyingObjects in misched misched used GetUnderlyingObject in order to break false load/store dependencies, and the -enable-aa-sched-mi feature similarly relied on GetUnderlyingObject in order to ensure it is safe to use the aliasing analysis. Unfortunately, GetUnderlyingObject does not recurse through phi nodes, and so (especially due to LSR) all of these mechanisms failed for induction-variable-dependent loads and stores inside loops. This change replaces uses of GetUnderlyingObject with GetUnderlyingObjects (which will recurse through phi and select instructions) in misched. Andy reviewed, tested and simplified this patch; Thanks! llvm-svn: 169744	2012-12-10 18:49:16 +00:00
Craig Topper	d8005db486	Teach DAG combine to handle vector add/sub with vectors of all 0s. llvm-svn: 169727	2012-12-10 08:12:29 +00:00
Craig Topper	a183ddb0fe	Teach DAG combine to handle vector logical operations with vectors of all 1s or all 0s. These cases can show up when vectors are split for legalizing. Fix some tests that were dependent on these cases not being combined. llvm-svn: 169684	2012-12-08 22:49:19 +00:00
Nadav Rotem	ad0b5fbe8c	When we use the BLEND instruction that uses the MSB as a mask, we can remove the VSRI instruction before it since it does not affect the MSB. Thanks Craig Topper for suggesting this. llvm-svn: 169638	2012-12-07 21:43:11 +00:00
Matthew Curtis	7a93811e8b	In hexagon convertToHardwareLoop, don't deref end() iterator In particular, check if MachineBasicBlock::iterator is end() before using it to call getDebugLoc(); See also this thread on llvm-commits: http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20121112/155914.html llvm-svn: 169634	2012-12-07 21:03:15 +00:00
Nadav Rotem	481e50efe0	X86: Prefer using VPSHUFD over VPERMIL because it has better throughput. llvm-svn: 169624	2012-12-07 19:01:13 +00:00
Tim Northover	5cc3dc86bb	Added Mapping Symbols for ARM ELF Before this patch, when you objdump an LLVM-compiled file, objdump tried to decode data-in-code sections as if they were code. This patch adds the missing Mapping Symbols, as defined by "ELF for the ARM Architecture" (ARM IHI 0044D). Patch based on work by Greg Fitzgerald. llvm-svn: 169609	2012-12-07 16:50:23 +00:00
Dmitri Gribenko	1c704355cf	Fix typos in CHECK lines. Patch by Alexander Zinenko. llvm-svn: 169547	2012-12-06 21:24:47 +00:00
Nadav Rotem	ac450eb59e	Fix a bug in the code that merges consecutive stores. Previously we did not check if loads that happen in between stores alias with the first store in the chain, only with the second store onwards. llvm-svn: 169516	2012-12-06 17:34:13 +00:00
Craig Topper	216bcd522b	Remove intrinsic specific instructions for (V)MOVQUmr with patterns pointing to the normal instructions. llvm-svn: 169482	2012-12-06 07:31:16 +00:00
Evan Cheng	16846051db	Properly fix the tes. llvm-svn: 169464	2012-12-06 02:29:29 +00:00
NAKAMURA Takumi	1eccd286fd	llvm/test/CodeGen/ARM/extload-knownzero.ll: Try to unbreak, to add -O0. I guess Chad expects fastisel here. llvm-svn: 169463	2012-12-06 02:22:58 +00:00
Chad Rosier	9f5c68af4c	[arm fast-isel] Make the fast-isel implementation of memcpy respect alignment. rdar://12821569 llvm-svn: 169460	2012-12-06 01:34:31 +00:00
Evan Cheng	5213139f48	Let targets provide hooks that compute known zero and ones for any_extend and extload's. If they are implemented as zero-extend, or implicitly zero-extend, then this can enable more demanded bits optimizations. e.g. define void @foo(i16* %ptr, i32 %a) nounwind { entry: %tmp1 = icmp ult i32 %a, 100 br i1 %tmp1, label %bb1, label %bb2 bb1: %tmp2 = load i16* %ptr, align 2 br label %bb2 bb2: %tmp3 = phi i16 [ 0, %entry ], [ %tmp2, %bb1 ] %cmp = icmp ult i16 %tmp3, 24 br i1 %cmp, label %bb3, label %exit bb3: call void @bar() nounwind br label %exit exit: ret void } This compiles to the followings before: push {lr} mov r2, #0 cmp r1, #99 bhi LBB0_2 @ BB#1: @ %bb1 ldrh r2, [r0] LBB0_2: @ %bb2 uxth r0, r2 cmp r0, #23 bhi LBB0_4 @ BB#3: @ %bb3 bl _bar LBB0_4: @ %exit pop {lr} bx lr The uxth is not needed since ldrh implicitly zero-extend the high bits. With this change it's eliminated. rdar://12771555 llvm-svn: 169459	2012-12-06 01:28:01 +00:00
Andrew Trick	fda7a8832d	RegisterPressureTracker: fix findUseBetween to handle DebugValue llvm-svn: 169427	2012-12-05 21:37:50 +00:00
Andrew Trick	7f7cee39ab	RegisterPresssureTracker: Track live physical register by unit. This is much simpler to reason about, more efficient, and fixes some corner cases involving implicit super-register defs. Fixed rdar://12797931. llvm-svn: 169425	2012-12-05 21:37:42 +00:00
Justin Holewinski	fb711156ae	[NVPTX] Fix crash with unnamed struct arguments Patch by Eric Holk llvm-svn: 169418	2012-12-05 20:50:28 +00:00
Jyotsna Verma	90295156d8	Use multiclass to define store instructions with base+immediate offset addressing mode and immediate stored value. llvm-svn: 169408	2012-12-05 19:32:03 +00:00
Elena Demikhovsky	cd3c1c4a16	Simplified BLEND pattern matching for shuffles. Generate VPBLENDD for AVX2 and VPBLENDW for v16i16 type on AVX2. llvm-svn: 169366	2012-12-05 09:24:57 +00:00
Evan Cheng	d31802c1f6	Add x86 isel lowering logic to form bit test with inverted condition. e.g. x ^ -1. Patch by David Majnemer. rdar://12755626 llvm-svn: 169339	2012-12-05 00:10:38 +00:00
Evan Cheng	b4eae1361c	ARM custom lower ctpop for vector types. Patch by Pete Couperus. llvm-svn: 169325	2012-12-04 22:41:50 +00:00
Bill Wendling	d7767125d5	Use the 'count' attribute to calculate the upper bound of an array. The count attribute is more accurate with regards to the size of an array. It also obviates the upper bound attribute in the subrange. We can also better handle an unbound array by setting the count to -1 instead of the lower bound to 1 and upper bound to 0. llvm-svn: 169312	2012-12-04 21:34:03 +00:00
Bill Schmidt	ca4a0c9dbd	This patch introduces initial-exec model support for thread-local storage on 64-bit PowerPC ELF. The patch includes code to handle external assembly and MC output with the integrated assembler. It intentionally does not support the "old" JIT. For the initial-exec TLS model, the ABI requires the following to calculate the address of external thread-local variable x: Code sequence Relocation Symbol ld 9,x@got@tprel(2) R_PPC64_GOT_TPREL16_DS x add 9,9,x@tls R_PPC64_TLS x The register 9 is arbitrary here. The linker will replace x@got@tprel with the offset relative to the thread pointer to the generated GOT entry for symbol x. It will replace x@tls with the thread-pointer register (13). The two test cases verify correct assembly output and relocation output as just described. PowerPC-specific selection node variants are added for the two instructions above: LD_GOT_TPREL and ADD_TLS. These are inserted when an initial-exec global variable is encountered by PPCTargetLowering::LowerGlobalTLSAddress(), and later lowered to machine instructions LDgotTPREL and ADD8TLS. LDgotTPREL is a pseudo that uses the same LDrs support added for medium code model's LDtocL, with a different relocation type. The rest of the processing is straightforward. llvm-svn: 169281	2012-12-04 16:18:08 +00:00
Bill Wendling	bfc0e5725f	Add a 'count' field to the DWARF subrange. The count field is necessary because there isn't a difference between the 'lo' and 'hi' attributes for a one-element array and a zero-element array. When the count is '0', we know that this is a zero-element array. When it's >=1, then it's a normal constant sized array. When it's -1, then the array is unbounded. llvm-svn: 169218	2012-12-04 06:20:49 +00:00
Manman Ren	f563941adc	Stack Alignment: when creating stack objects in MachineFrameInfo, make sure the alignment is clamped to TargetFrameLowering.getStackAlignment if the target does not support stack realignment or the option "realign-stack" is off. This will cause miscompile if the address is treated as aligned and add is replaced with or in DAGCombine. Added a bool StackRealignable to TargetFrameLowering to check whether stack realignment is implemented for the target. Also added a bool RealignOption to MachineFrameInfo to check whether the option "realign-stack" is on. rdar://12713765 llvm-svn: 169197	2012-12-04 00:52:33 +00:00
Nadav Rotem	1157e1410c	Allow merging multiple store sequences on the same chain. llvm-svn: 169111	2012-12-02 17:14:09 +00:00
Eli Bendersky	b7b1ffc8e7	Fix an invalid regex in the test llvm-svn: 169108	2012-12-02 15:46:02 +00:00
Andrew Trick	b767d1eba8	misched: Fix RegisterPressureTracker handling of DebugVals. Assertion failed: (TopRPTracker.getPos() == RegionBegin && "bad initial Top tracker"). rdar://12790302. llvm-svn: 169072	2012-12-01 01:22:49 +00:00
Andrew Trick	d5953622ce	misched: Fix the DAG builder to handle an undef operand at ExitSU. Assertion failed: (VNI && "No value to read by operand") rdar://12790267. llvm-svn: 169071	2012-12-01 01:22:44 +00:00
Andrew Trick	a01302182c	misched: Fix LiveInterval update to better handle DebugVal. Assertion failed: (itr != mi2iMap.end() && "Instruction not found in maps.") rdar://12777252. llvm-svn: 169070	2012-12-01 01:22:41 +00:00
Andrew Trick	e7ea8aa48a	misched: fix RegionBegin when DebugValues get shuffled to the top. assert (RemainingInstrs == 0 && "Instruction count mismatch!") rdar://12776937. llvm-svn: 169069	2012-12-01 01:22:38 +00:00
Jakob Stoklund Olesen	da2b6b381a	Simplify REG_SEQUENCE lowering. The TwoAddressInstructionPass takes the machine code out of SSA form by expanding REG_SEQUENCE instructions into copies. It is no longer necessary to rewrite the registers used by a REG_SEQUENCE instruction because the new coalescer algorithm can do it now. REG_SEQUENCE is just converted to a sequence of sub-register copies now. llvm-svn: 169067	2012-12-01 01:06:44 +00:00
Chad Rosier	31e7d2deb3	test/CodeGen/PowerPC/vec_mul.ll: Add a triple. Thanks, Hal. llvm-svn: 169026	2012-11-30 19:15:10 +00:00
Sebastian Pop	a204f72237	Codegen failure for vmull with small vectors Codegen was failing with an assertion because of unexpected vector operands when legalizing the selection DAG for a MUL instruction. The asserting code was legalizing multiplies for vectors of size 128 bits. It uses a custom lowering to try and detect cases where it can use a VMULL instruction instead of a VMOVL + VMUL. The code was looking for input operands to the MUL that had been sign or zero extended. If it found the extended operands it would drop the sign/zero extension and use the original vector size as input to a VMULL instruction. The code assumed that the original input vector was 64 bits so that after dropping the extension it would fit directly into a D register and could be used as an operand of a VMULL instruction. The input code that trigger the failure used a vector of <4 x i8> that was sign extended to <4 x i32>. It was not safe to drop the sign extension in this case because the original vector is only 32 bits wide. The fix is to insert a sign extension for the vector to reach the required 64 bit size. In this particular example, the vector would need to be sign extented to a <4 x i16>. llvm-svn: 169024	2012-11-30 19:08:04 +00:00
Chad Rosier	a820e7feff	test/CodeGen/PowerPC/vec_mul.ll: Fix register operands. llvm-svn: 169020	2012-11-30 18:29:01 +00:00
NAKAMURA Takumi	faaf131091	test/CodeGen/PowerPC: Add explicit -march=ppc32. FIXME: Please add another RUN line if you would like to check also on ppc64. llvm-svn: 168999	2012-11-30 13:28:31 +00:00
Adhemerval Zanella	812410f2d1	This patch fixes the Altivec addend construction for the fused multiply-add instruction (vmaddfp) to conform with IEEE to ensure the sign of a zero result when resulting product is -0.0. The -0.0 vector addend to vmaddfp is generated by a creating a vector with full bits sets and then shifting each elements by 31-bits to the left, resulting in a vector of 0x80000000 (or -0.0 as float). The 'buildvec_canonicalize.ll' was adjusted to reflect this change and the 'vec_mul.ll' was complemented with the float vector multiplication test. llvm-svn: 168998	2012-11-30 13:05:44 +00:00
Bill Wendling	a4a77edf2e	Handle the situation where CodeGenPrepare removes a reference to a BB that has the last invoke instruction in the function. This also removes the last landing pad in an function. This is fine, but with SjLj EH code, we've already placed a bunch of code in the 'entry' block, which expects the landing pad to stick around. When we get to the situation where CGP has removed the last landing pad, go ahead and nuke the SjLj instructions from the 'entry' block. <rdar://problem/12721258> llvm-svn: 168930	2012-11-29 19:38:06 +00:00
Silviu Baranga	93aefa5f2c	Added atomic 64 min/max/umin/umax instrinsics support in the ARM backend. llvm-svn: 168886	2012-11-29 14:41:25 +00:00
Justin Holewinski	0ac49bf846	Teach the legalizer how to handle operands for VSELECT nodes If we need to split the operand of a VSELECT, it must be the mask operand. We split the entire VSELECT operand with EXTRACT_SUBVECTOR. llvm-svn: 168883	2012-11-29 14:26:28 +00:00
Justin Holewinski	bc45119b44	Allow targets to prefer TypeSplitVector over TypePromoteInteger when computing the legalization method for vectors For some targets, it is desirable to prefer scalarizing <N x i1> instead of promoting to a larger legal type, such as <N x i32>. llvm-svn: 168882	2012-11-29 14:26:24 +00:00
Jakob Stoklund Olesen	546e9e85f1	Avoid rewriting instructions twice. This could cause miscompilations in targets where sub-register composition is not always idempotent (ARM). <rdar://problem/12758887> llvm-svn: 168837	2012-11-29 00:26:11 +00:00
Nadav Rotem	307d767177	When combining consecutive stores allow loads in between the stores, if the loads do not alias. llvm-svn: 168832	2012-11-29 00:00:08 +00:00
Benjamin Kramer	b1996da782	ARM: Implement CanLowerReturn so large vectors get expanded into sret. Fixes 14337. llvm-svn: 168809	2012-11-28 20:55:10 +00:00
Andrew Trick	48d392e81e	misched: Analysis that partitions the DAG into subtrees. This is a simple, cheap infrastructure for analyzing the shape of a DAG. It recognizes uniform DAGs that take the shape of bottom-up subtrees, such as the included matrix multiplication example. This is useful for heuristics that balance register pressure with ILP. Two canonical expressions of the heuristic are implemented in scheduling modes: -misched-ilpmin and -misched-ilpmax. llvm-svn: 168773	2012-11-28 05:13:28 +00:00
Andrew Trick	0be19363d1	misched: better alias analysis. This fixes a hole in the "cheap" alias analysis logic implemented within the DAG builder itself, regardless of whether proper alias analysis is enabled. It now handles this pattern produced by LSR+CodeGenPrepare. %sunkaddr1 = ptrtoint * %obj to i64 %sunkaddr2 = add i64 %sunkaddr1, %lsr.iv %sunkaddr3 = inttoptr i64 %sunkaddr2 to i32* store i32 %v, i32* %sunkaddr3 llvm-svn: 168768	2012-11-28 03:42:49 +00:00
Bill Schmidt	e0a68a562b	This patch makes medium code model the default for 64-bit PowerPC ELF. When the CodeGenInfo is to be created for the PPC64 target machine, a default code-model selection is converted to CodeModel::Medium provided we are not targeting the Darwin OS. Defaults for Darwin are unaffected. llvm-svn: 168747	2012-11-27 23:36:26 +00:00
Chad Rosier	45c33c985c	Add -verify-machineinstrs to these fast-isel test cases. llvm-svn: 168723	2012-11-27 20:49:56 +00:00
Manman Ren	f89406ac78	CSE: allow PerformTrivialCoalescing to check copies across basic block boundaries. Given the following case: BB0 %vreg1<def> = SUBrr %vreg0, %vreg7 %vreg2<def> = COPY %vreg7 BB1 %vreg10<def> = SUBrr %vreg0, %vreg2 We should be able to CSE between SUBrr in BB0 and SUBrr in BB1. rdar://12462006 llvm-svn: 168717	2012-11-27 18:58:41 +00:00
Manman Ren	5b4628201f	X86: do not fold load instructions such as [V]MOVS[S\|D] to other instructions when the destination register is wider than the memory load. These load instructions load from m32 or m64 and set the upper bits to zero, while the folded instructions may accept m128. rdar://12721174 llvm-svn: 168710	2012-11-27 18:09:26 +00:00
Bill Schmidt	34627e3434	This patch implements medium code model support for 64-bit PowerPC. The default for 64-bit PowerPC is small code model, in which TOC entries must be addressable using a 16-bit offset from the TOC pointer. Additionally, only TOC entries are addressed via the TOC pointer. With medium code model, TOC entries and data sections can all be addressed via the TOC pointer using a 32-bit offset. Cooperation with the linker allows 16-bit offsets to be used when these are sufficient, reducing the number of extra instructions that need to be executed. Medium code model also does not generate explicit TOC entries in ".section toc" for variables that are wholly internal to the compilation unit. Consider a load of an external 4-byte integer. With small code model, the compiler generates: ld 3, .LC1@toc(2) lwz 4, 0(3) .section .toc,"aw",@progbits .LC1: .tc ei[TC],ei With medium model, it instead generates: addis 3, 2, .LC1@toc@ha ld 3, .LC1@toc@l(3) lwz 4, 0(3) .section .toc,"aw",@progbits .LC1: .tc ei[TC],ei Here .LC1@toc@ha is a relocation requesting the upper 16 bits of the 32-bit offset of ei's TOC entry from the TOC base pointer. Similarly, .LC1@toc@l is a relocation requesting the lower 16 bits. Note that if the linker determines that ei's TOC entry is within a 16-bit offset of the TOC base pointer, it will replace the "addis" with a "nop", and replace the "ld" with the identical "ld" instruction from the small code model example. Consider next a load of a function-scope static integer. For small code model, the compiler generates: ld 3, .LC1@toc(2) lwz 4, 0(3) .section .toc,"aw",@progbits .LC1: .tc test_fn_static.si[TC],test_fn_static.si .type test_fn_static.si,@object .local test_fn_static.si .comm test_fn_static.si,4,4 For medium code model, the compiler generates: addis 3, 2, test_fn_static.si@toc@ha addi 3, 3, test_fn_static.si@toc@l lwz 4, 0(3) .type test_fn_static.si,@object .local test_fn_static.si .comm test_fn_static.si,4,4 Again, the linker may replace the "addis" with a "nop", calculating only a 16-bit offset when this is sufficient. Note that it would be more efficient for the compiler to generate: addis 3, 2, test_fn_static.si@toc@ha lwz 4, test_fn_static.si@toc@l(3) The current patch does not perform this optimization yet. This will be addressed as a peephole optimization in a later patch. For the moment, the default code model for 64-bit PowerPC will remain the small code model. We plan to eventually change the default to medium code model, which matches current upstream GCC behavior. Note that the different code models are ABI-compatible, so code compiled with different models will be linked and execute correctly. I've tested the regression suite and the application/benchmark test suite in two ways: Once with the patch as submitted here, and once with additional logic to force medium code model as the default. The tests all compile cleanly, with one exception. The mandel-2 application test fails due to an unrelated ABI compatibility with passing complex numbers. It just so happens that small code model was incredibly lucky, in that temporary values in floating-point registers held the expected values needed by the external library routine that was called incorrectly. My current thought is to correct the ABI problems with _Complex before making medium code model the default, to avoid introducing this "regression." Here are a few comments on how the patch works, since the selection code can be difficult to follow: The existing logic for small code model defines three pseudo-instructions: LDtoc for most uses, LDtocJTI for jump table addresses, and LDtocCPT for constant pool addresses. These are expanded by SelectCodeCommon(). The pseudo-instruction approach doesn't work for medium code model, because we need to generate two instructions when we match the same pattern. Instead, new logic in PPCDAGToDAGISel::Select() intercepts the TOC_ENTRY node for medium code model, and generates an ADDIStocHA followed by either a LDtocL or an ADDItocL. These new node types correspond naturally to the sequences described above. The addis/ld sequence is generated for the following cases: * Jump table addresses * Function addresses * External global variables * Tentative definitions of global variables (common linkage) The addis/addi sequence is generated for the following cases: * Constant pool entries * File-scope static global variables * Function-scope static variables Expanding to the two-instruction sequences at select time exposes the instructions to subsequent optimization, particularly scheduling. The rest of the processing occurs at assembly time, in PPCAsmPrinter::EmitInstruction. Each of the instructions is converted to a "real" PowerPC instruction. When a TOC entry needs to be created, this is done here in the same manner as for the existing LDtoc, LDtocJTI, and LDtocCPT pseudo-instructions (I factored out a new routine to handle this). I had originally thought that if a TOC entry was needed for LDtocL or ADDItocL, it would already have been generated for the previous ADDIStocHA. However, at higher optimization levels, the ADDIStocHA may appear in a different block, which may be assembled textually following the block containing the LDtocL or ADDItocL. So it is necessary to include the possibility of creating a new TOC entry for those two instructions. Note that for LDtocL, we generate a new form of LD called LDrs. This allows specifying the @toc@l relocation for the offset field of the LD instruction (i.e., the offset is replaced by a SymbolLo relocation). When the peephole optimization described above is added, we will need to do similar things for all immediate-form load and store operations. The seven "mcm-n.ll" test cases are kept separate because otherwise the intermingling of various TOC entries and so forth makes the tests fragile and hard to understand. The above assumes use of an external assembler. For use of the integrated assembler, new relocations are added and used by PPCELFObjectWriter. Testing is done with "mcm-obj.ll", which tests for proper generation of the various relocations for the same sequences tested with the external assembler. llvm-svn: 168708	2012-11-27 17:35:46 +00:00
Ulrich Weigand	e5f9405842	Never use .lcomm on platforms where it does not accept an alignment argument. Instead, use a pair of .local and .comm directives. This avoids spurious differences between binaries built by the integrated assembler vs. those built by the external assembler, since the external assembler may impose alignment requirements on .lcomm symbols where the integrated assembler does not. llvm-svn: 168704	2012-11-27 16:11:16 +00:00
Craig Topper	32313da57c	Revert accidental commit. llvm-svn: 168687	2012-11-27 08:17:04 +00:00
Craig Topper	b9773650ec	Make PrintReg constructor explicit to prevent weird implicit conversions from accidentally being triggered. llvm-svn: 168686	2012-11-27 08:14:24 +00:00
Craig Topper	7c0f9a6461	Add test cases for r168417. llvm-svn: 168681	2012-11-27 07:19:54 +00:00
Chad Rosier	69fb6ddc6e	Extend test case for r168657. llvm-svn: 168658	2012-11-27 01:10:48 +00:00
NAKAMURA Takumi	ec844dab8f	llvm/test/CodeGen/X86/2012-07-15-broadcastfold.ll: Loosen expression corresponding to r168627. Win32 and *bsd were affected. llvm-svn: 168651	2012-11-27 00:48:27 +00:00
Chad Rosier	4179e3f513	Remove the X86 Maximal Stack Alignment Check pass as it is no longer necessary. This pass was conservative in that it always reserved the FP to enable dynamic stack realignment, which allowed the RA to use aligned spills for vector registers. This happens even when spills were not necessary. The RA has since been improved to use unaligned spills when necessary. The new behavior is to realign the stack if the frame pointer was already reserved for some other reason, but don't reserve the frame pointer just because a function contains vector virtual registers. Part of rdar://12719844 llvm-svn: 168627	2012-11-26 22:55:05 +00:00
Jakub Staszak	a17f3f8c30	Normalize splat 256bit vectors with 8 elements. llvm-svn: 168600	2012-11-26 19:24:31 +00:00
Eli Bendersky	eaf1a28594	Rewrite test to not use a FileCheck variable and redefine it on the same line. In preparation for the FileCheck functionality change which will allow using a variable later on the same line. No functionality change. llvm-svn: 168588	2012-11-26 14:09:46 +00:00
Benjamin Kramer	dd76a93af5	PPC: MCize most of the darwin PIC emission. The last remaining bit is "bcl 20, 31, AnonSymbol", which I couldn't find the instruction definition for. Only whitespace changes in assembly output. llvm-svn: 168541	2012-11-24 13:18:25 +00:00
Akira Hatanaka	bb6e74a2f1	[mips] Generate big GOT code. llvm-svn: 168460	2012-11-21 20:40:38 +00:00
Anton Korobeynikov	568afebcb2	Add support for varargs functions for msp430. Patch by Job Noorman! llvm-svn: 168440	2012-11-21 17:28:27 +00:00
Anton Korobeynikov	3414872fc8	Add support for byval args. Patch by Job Noorman! llvm-svn: 168439	2012-11-21 17:23:03 +00:00
Tim Northover	dd219d06c2	Fix physical register liveness calculations: + Take account of clobbers + Give outputs priority over inputs since they happen later. llvm-svn: 168360	2012-11-20 09:56:11 +00:00
Elena Demikhovsky	fc4840fbed	Intel OCL built-ins calling conventions now support MacOS 32-bit. llvm-svn: 168359	2012-11-20 09:37:57 +00:00
Anton Korobeynikov	f65a638d94	Factor out type info emission into separate routine. It turned out that ARM wants different layout of type infos. This is yet another patch in attempt to fix PR7187 llvm-svn: 168325	2012-11-19 21:06:26 +00:00
Jakob Stoklund Olesen	31ebe55808	Handle mixed normal and early-clobber defs on inline asm. PR14376. llvm-svn: 168320	2012-11-19 19:31:10 +00:00
Andrew Trick	e615ad0063	Use a full triple for a PPC test case for asm syntax. llvm-svn: 168283	2012-11-18 06:21:03 +00:00
Andrew Trick	bb1b351860	Silence the buildbots for this test while I figure out the triple llvm-svn: 168249	2012-11-17 03:39:26 +00:00
Andrew Trick	28c000b234	Broaden isSchedulingBoundary to check aliases of SP. On PPC the stack pointer is X1, but ADJCALLSTACK writes R1. Fixes PR14315: Register regmask dependency problem with misched. llvm-svn: 168248	2012-11-17 03:35:11 +00:00
Eli Friedman	30834940ec	Mark FP_EXTEND form v2f32 to v2f64 as "expand" for ARM NEON. Patch by Pete Couperus. llvm-svn: 168240	2012-11-17 01:52:46 +00:00
Chad Rosier	8983158e9d	[fast-isel] Add the -verify-machineinstrs to these test cases. The remaining test cases require fixes to fast-isel before the verifier can be enabled. Part of rdar://12594152 llvm-svn: 168233	2012-11-17 00:42:06 +00:00
Akira Hatanaka	ef83919b4c	Initial implementation of MipsTargetLowering::isLegalAddressingMode. llvm-svn: 168230	2012-11-17 00:25:41 +00:00
Weiming Zhao	8f56f88661	Remove hard coded registers in ARM ldrexd and strexd instructions This patch replaces the hard coded GPR pair [R0, R1] of Intrinsic:arm_ldrexd and [R2, R3] of Intrinsic:arm_strexd with even/odd GPRPair reg class. Similar to the lowering of atomic_64 operation. llvm-svn: 168207	2012-11-16 21:55:34 +00:00
Anton Korobeynikov	7d94f3bd7f	Make sure FABS on v2f32 and v4f32 is legal on ARM NEON This fixes PR14359 llvm-svn: 168200	2012-11-16 21:15:20 +00:00
Richard Osborne	9a43772783	Fix handling of aliases to functions. An alias to a function should use pc relative addressing. llvm-svn: 168199	2012-11-16 21:12:38 +00:00
Justin Holewinski	2c5ac70dd9	[NVPTX] Order global variables in def-use order before emiting them in the final assembly llvm-svn: 168198	2012-11-16 21:03:51 +00:00
NAKAMURA Takumi	0ff86ac8a6	llvm/test/CodeGen/X86/hipe-cc*.ll: Add explicit -mcpu, or they don't expect to pass on Atom. llvm-svn: 168171	2012-11-16 16:07:37 +00:00
Duncan Sands	d71b4e4568	Add the Erlang/HiPE calling convention, patch by Yiannis Tsiouris. llvm-svn: 168166	2012-11-16 12:36:39 +00:00
Craig Topper	70601ba6f9	Use roundps/pd for llvm.ceil, llvm.trunc, llvm.rint, and llvm.nearbyint of vector types. llvm-svn: 168141	2012-11-16 06:37:56 +00:00
Akira Hatanaka	907f5f0ca7	[mips] Fix delay slot filler so that instructions with register operand $1 are allowed in branch delay slot. llvm-svn: 168131	2012-11-16 02:39:34 +00:00
Eli Friedman	e6385e61b5	Mark FP_ROUND for converting NEON v2f64 to v2f32 as expand. Add a missing case to vector legalization so this actually works. Patch by Pete Couperus. Fixes PR12540. llvm-svn: 168107	2012-11-15 22:44:27 +00:00
Adhemerval Zanella	bdface5699	PowerPC: Lowering floor intrinsic for Altivec This patch lowers the llvm.floor, llvm.ceil, llvm.trunc, and llvm.nearbyint to Altivec instruction when using 4 single-precision float vectors. llvm-svn: 168086	2012-11-15 20:56:03 +00:00
Bill Schmidt	451499f02d	This patch is in preparation for adding medium code model support to the PPC64 target. The five tests modified herein test code generation that is sensitive to the code model selected. So I've added -code-model=small to the RUN commands for each. Since small code model is the default, this has no effect for now; but this prepares us for eventually changing the default to medium code model for PPC64. Test changes verified with small and medium code model as default on powerpc64-unknown-linux-gnu. All tests continue to pass. llvm-svn: 167999	2012-11-14 23:23:27 +00:00
Jakub Staszak	62fc067518	Make sure to not get AVX code on an AVX-capable host. Revealed in r167967. llvm-svn: 167989	2012-11-14 22:24:01 +00:00
NAKAMURA Takumi	a54c14a922	test/CodeGen/Hexagon/postinc-load.ll: Suppress it for now. It triggered the failure on i686 hosts. llvm-svn: 167988	2012-11-14 22:22:37 +00:00
Eric Christopher	950d8703b1	Remove the CellSPU port. Approved by Chris Lattner. llvm-svn: 167984	2012-11-14 22:09:20 +00:00
NAKAMURA Takumi	902b137133	llvm/test/CodeGen/X86/memset.ll: FileCheck-ize, and add another case on +avx. llvm-svn: 167975	2012-11-14 21:01:40 +00:00
Jyotsna Verma	6649360860	Added multiclass for post-increment load instructions. llvm-svn: 167974	2012-11-14 20:38:48 +00:00
Benjamin Kramer	d71fb86aed	Force CPU in test so we don't accidentally get AVX code on an AVX-capable host. llvm-svn: 167973	2012-11-14 20:31:42 +00:00
Benjamin Kramer	6293429b51	X86: Enable SSE memory intrinsics even when stack alignment is less than 16 bytes. The stack realignment code was fixed to work when there is stack realignment and a dynamic alloca is present so this shouldn't cause correctness issues anymore. Note that this also enables generation of AVX instructions for memset under the assumptions: - Unaligned loads/stores are always fast on CPUs supporting AVX - AVX is not slower than SSE We may need some tweaked heuristics if one of those assumptions turns out not to be true. Effectively reverts r58317. Part of PR2962. llvm-svn: 167967	2012-11-14 20:08:40 +00:00
Nadav Rotem	9f567c62f2	The code pattern "imm0_255_neg" is used for checking if an immediate value is a small negative number. This patch changes the definition of negative from -0..-255 to -1..-255. I am changing this because of a bug that we had in some of the patterns that assumed that "subs" of zero does not set the carry flag. rdar://12028498 llvm-svn: 167963	2012-11-14 19:39:15 +00:00
Justin Holewinski	c6462aacd5	[NVPTX] Implement custom lowering of loads/stores for i1 Loads from i1 become loads from i8 followed by trunc Stores to i1 become zext to i8 followed by store to i8 Fixes PR13291 llvm-svn: 167948	2012-11-14 19:19:16 +00:00
Anton Korobeynikov	b619a4138d	Fix really stupid ARM EHABI info generation bug: we should not emit eh table and handler data if there are no landing pads in the function. Patch by Logan Chien with some cleanups from me. llvm-svn: 167945	2012-11-14 19:13:30 +00:00
Rafael Espindola	c79532d101	Handle DAG CSE adding new uses during ReplaceAllUsesWith. Fixes PR14333. llvm-svn: 167912	2012-11-14 05:08:56 +00:00
Anton Korobeynikov	e42af3699b	Use TARGET2 relocation for TType references on ARM. Do some cleanup of the code while here. Inspired by patch by Logan Chien! llvm-svn: 167904	2012-11-14 01:47:00 +00:00
Eric Christopher	0f23b82147	Revert "Use the 'count' attribute instead of the 'upper_bound' attribute." temporarily as it is breaking the gdb bots. This reverts commit r167806/e7ff4c14b157746b3e0228d2dce9f70712d1c126. llvm-svn: 167886	2012-11-13 23:30:43 +00:00
Manman Ren	0f3240d3a7	X86: when constructing VZEXT_LOAD from other loads, makes sure its output chain is correctly setup. As an example, if the original load must happen before later stores, we need to make sure the constructed VZEXT_LOAD is constrained to be before the stores. rdar://12684358 llvm-svn: 167859	2012-11-13 19:13:05 +00:00
Ulrich Weigand	3946877f88	Do not consider a machine instruction that uses and defines the same physical register as candidate for common subexpression elimination in MachineCSE. This fixes a bug on PowerPC in MultiSource/Applications/oggenc/oggenc caused by MachineCSE invalidly merging two separate DYNALLOC insns. llvm-svn: 167855	2012-11-13 18:40:58 +00:00
Duncan Sands	b8d3caf65a	Codegen support for arbitrary vector getelementptrs. llvm-svn: 167830	2012-11-13 13:01:58 +00:00
Bill Wendling	f454dfb6b5	Use the 'count' attribute instead of the 'upper_bound' attribute. If we have a type 'int a[1]' and a type 'int b[0]', the generated DWARF is the same for both of them because we use the 'upper_bound' attribute. Instead use the 'count' attrbute, which gives the correct number of elements in the array. <rdar://problem/12566646> llvm-svn: 167806	2012-11-13 02:31:47 +00:00
Andrew Trick	edac22a9f3	Cleanup the main RegisterCoalescer loop. Block priorities still apply outside loops. llvm-svn: 167793	2012-11-13 00:34:44 +00:00
Michael Liao	b193ed44ee	Fix test case added in patch fixing PR14314 llvm-svn: 167769	2012-11-12 22:33:18 +00:00
Andrew Trick	f1ff84c64e	misched: Infrastructure for weak DAG edges. This adds support for weak DAG edges to the general scheduling infrastructure in preparation for MachineScheduler support for heuristics based on weak edges. llvm-svn: 167738	2012-11-12 19:28:57 +00:00
Michael Liao	d39c0fb19f	Fix PR14314 - Fix operand order for atomic sub, where the minuend is the value loaded from memory and the subtrahend is the parameter specified. llvm-svn: 167718	2012-11-12 06:49:17 +00:00
Justin Holewinski	1812ee9a5b	[NVPTX] Add more precise PTX/SM target attributes Each SM and PTX version is modeled as a subtarget feature/CPU. Additionally, PTX 3.1 is added as the default PTX version to be out-of-the-box compatible with CUDA 5.0. Available CPUs for this target: sm_10 - Select the sm_10 processor. sm_11 - Select the sm_11 processor. sm_12 - Select the sm_12 processor. sm_13 - Select the sm_13 processor. sm_20 - Select the sm_20 processor. sm_21 - Select the sm_21 processor. sm_30 - Select the sm_30 processor. sm_35 - Select the sm_35 processor. Available features for this target: ptx30 - Use PTX version 3.0. ptx31 - Use PTX version 3.1. sm_10 - Target SM 1.0. sm_11 - Target SM 1.1. sm_12 - Target SM 1.2. sm_13 - Target SM 1.3. sm_20 - Target SM 2.0. sm_21 - Target SM 2.1. sm_30 - Target SM 3.0. sm_35 - Target SM 3.5. llvm-svn: 167699	2012-11-12 03:16:43 +00:00
Evan Cheng	a5d363ec24	Convert an improper CodeGen test to a MC test. llvm-svn: 167663	2012-11-10 04:30:40 +00:00
Evan Cheng	a17fea1967	xfail a bad test. This is a MC test but it's dependent on a codegen optimization which is now disabled. llvm-svn: 167658	2012-11-10 02:34:36 +00:00
Evan Cheng	21b0348199	Disable the Thumb no-return call optimization: mov lr, pc b.w _foo The "mov" instruction doesn't set bit zero to one, it's putting incorrect value in lr. It messes up backtraces. rdar://12663632 llvm-svn: 167657	2012-11-10 02:09:05 +00:00
Craig Topper	9268c94b15	Cleanup pcmp(e/i)str(m/i) instruction definitions and load folding support. llvm-svn: 167652	2012-11-10 01:23:36 +00:00
Justin Holewinski	2dc9d072e5	[NVPTX] Use ABI alignment for parameters when alignment is not specified. Affects SM 2.0+. Fixes bug 13324. llvm-svn: 167646	2012-11-09 23:50:24 +00:00
Jakob Stoklund Olesen	13d5562963	Fix assertions in updateRegMaskSlots(). The RegMaskSlots contains 'r' slots while NewIdx and OldIdx are 'B' slots. This broke the checks in the assertions. This fixes PR14302. llvm-svn: 167625	2012-11-09 19:18:49 +00:00
Amara Emerson	ec2cd56708	Recommit modified r167540. Improve ARM build attribute emission for architectures types. This also changes the default architecture emitted for a generic CPU to "v7". llvm-svn: 167574	2012-11-08 09:51:45 +00:00
Michael Liao	73cffddb95	Add support of RTM from TSX extension - Add RTM code generation support throught 3 X86 intrinsics: xbegin()/xend() to start/end a transaction region, and xabort() to abort a tranaction region llvm-svn: 167573	2012-11-08 07:28:54 +00:00
Akira Hatanaka	28e02ec8c1	[mips] Custom-lower ISD::FRAME_TO_ARGS_OFFSET node. Patch by Sasa Stankovic. llvm-svn: 167548	2012-11-07 19:10:58 +00:00
Andrew Trick	3ca33acb95	misched: Heuristics based on the machine model. misched is disabled by default. With -enable-misched, these heuristics balance the schedule to simultaneously avoid saturating processor resources, expose ILP, and minimize register pressure. I've been analyzing the performance of these heuristics on everything in the llvm test suite in addition to a few other benchmarks. I would like each heuristic check to be verified by a unit test, but I'm still trying to figure out the best way to do that. The heuristics are still in considerable flux, but as they are refined we should be rigorous about unit testing the improvements. llvm-svn: 167527	2012-11-07 07:05:09 +00:00
Ulrich Weigand	339d0597d3	On PowerPC64, integer return values (as well as arguments) are supposed to be extended to a full register. This is modeled in the IR by marking the return value (or argument) with a signext or zeroext attribute. However, while these attributes are respected for function arguments, they are currently ignored for function return values by the PowerPC back-end. This patch updates PPCCallingConv.td to ask for the promotion to i64, and fixes LowerReturn and LowerCallResult to implement it. The new test case verifies that both arguments and return values are properly extended when passing them; and also that the optimizers understand incoming argument and return values are in fact guaranteed by the ABI to be extended. The patch caused a spurious breakage in CodeGen/PowerPC/coalesce-ext.ll, since the test case used a "ret" instruction to create a use of an i32 value at the end of the function (to set up data flow as required for what the test is intended to test). Since there's now an implicit promotion to i64, that data flow no longer works as expected. To fix this, this patch now adds an extra "add" to ensure we have an appropriate use of the i32 value. llvm-svn: 167396	2012-11-05 19:39:45 +00:00
Hal Finkel	4f24c621d9	Add support for the PowerPC-specific inline asm Z constraint and y modifier. The Z constraint specifies an r+r memory address, and the y modifier expands to the "r, r" in the asm string. For this initial implementation, the base register is forced to r0 (which has the special meaning of 0 for r+r addressing on PowerPC) and the full address is taken in the second register. In the future, this should be improved. llvm-svn: 167388	2012-11-05 18:18:42 +00:00
Adhemerval Zanella	c4182d1890	[PATCH] PowerPC: Expand load extend vector operations This patch expands the SEXTLOAD, ZEXTLOAD, and EXTLOAD operations for vector types when altivec is enabled. llvm-svn: 167386	2012-11-05 17:15:56 +00:00
Akira Hatanaka	da1980f697	[mips] Set flag neverHasSideEffects flag on floating point conversion instructions. llvm-svn: 167348	2012-11-03 00:53:12 +00:00
Akira Hatanaka	7828331329	[mips] Set flag isAsCheapAsAMove flag on instruction LUi. llvm-svn: 167345	2012-11-03 00:26:02 +00:00
Akira Hatanaka	5852e3b800	[mips] Stop reserving register AT and use register scavenger when a scratch register is needed. llvm-svn: 167341	2012-11-03 00:05:43 +00:00
Akira Hatanaka	6dcf75897c	[mips] Fix bug in test case. Disable machine LICM to prevent instruction from being moved out of a basic block. llvm-svn: 167322	2012-11-02 21:46:42 +00:00
Quentin Colombet	8e1fe84c3c	Vext Lowering was missing opportunities llvm-svn: 167318	2012-11-02 21:32:17 +00:00
Akira Hatanaka	949f8d890d	[mips] Use register number instead of name to print register $AT. llvm-svn: 167315	2012-11-02 21:26:03 +00:00
Akira Hatanaka	0dfbf1262b	[mips] Delete MipsFunctionInfo::EmitNOAT. Unconditionally print directive "set .noat" so that the assembler doesn't issue warnings when register $AT is used. llvm-svn: 167310	2012-11-02 20:56:25 +00:00
NAKAMURA Takumi	68d1700eae	test/CodeGen/X86/fp-fast.ll: Add +avx. llvm-svn: 167207	2012-11-01 02:13:45 +00:00
Owen Anderson	b351c8d692	Add a few more simple fast-math constant propagations and cancellations. llvm-svn: 167200	2012-11-01 02:00:53 +00:00
Shuxin Yang	01efdd6c28	(For X86) Enhancement to add-carray/sub-borrow (adc/sbb) optimization. The adc/sbb optimization is to able to convert following expression into a single adc/sbb instruction: (ult) ... = x + 1 // where the ult is unsigned-less-than comparison (ult) ... = x - 1 This change is to flip the "x >u y" (i.e. ugt comparison) in order to expose the adc/sbb opportunity. llvm-svn: 167180	2012-10-31 23:11:48 +00:00
Akira Hatanaka	4f5ef21869	[mips] Set isAsCheapAsAMove flag on ADDiu and DADDiu, which enables re-materialization of immediate loads. llvm-svn: 167153	2012-10-31 18:37:55 +00:00
Akira Hatanaka	c096c88067	Test case for r167039. Check that tail-call optimization is disabled for mips16. llvm-svn: 167139	2012-10-31 17:25:23 +00:00
Reed Kotler	27a7229c47	Implement ADJCALLSTACKUP and ADJCALLSTACKDOWN llvm-svn: 167107	2012-10-31 05:21:10 +00:00
Bill Schmidt	9953cf294b	This patch addresses an ABI compatibility issue with empty aggregate parameters. Examples of these are: struct { } a; union { } b[256]; int a[0]; An empty aggregate has an address, although dereferencing that address is pointless. When passed as a parameter, an empty aggregate does not consume a protocol register, nor does it consume a doubleword in the parameter save area. Passing an empty aggregate by reference passes an address just as for any other aggregate. Returning an empty aggregate uses GPR3 as a hidden address of the return value location, just as for any other aggregate. The patch modifies PPCTargetLowering::LowerFormalArguments_64SVR4 and PPCTargetLowering::LowerCall_64SVR4 to properly skip empty aggregate parameters passed by value. The handling of return values and by-reference parameters was already correct. Built on powerpc64-unknown-linux-gnu and tested with no new regressions. A test case is included to test proper handling of empty aggregate parameters on both sides of the function call protocol. llvm-svn: 167090	2012-10-31 01:15:05 +00:00
Manman Ren	6b223a4f06	X86 SSE: update rsqrtss and rcpss to use two source operands and the first source operand is tied to the destination operand. This is to accurately model the corresponding instructions where the upper bits are unmodified. rdar://12558838 PR14221 llvm-svn: 167064	2012-10-30 23:53:59 +00:00
Manman Ren	acb8becc73	X86 MMX: optimize transfer from mmx to i32 We used to generate a store (movq) + a load. Now we use movd. rdar://9946746 llvm-svn: 167056	2012-10-30 22:15:38 +00:00
Akira Hatanaka	9c962c02e4	[mips] Allow tail-call optimization for vararg functions and functions which use the caller's stack. llvm-svn: 167048	2012-10-30 20:16:31 +00:00
Adhemerval Zanella	5c043aeb1b	PowerPC: Expand FSRQT for vector types This patch expands FSQRT for floating point vector types when altivec is used. llvm-svn: 167034	2012-10-30 18:29:42 +00:00
Quentin Colombet	5799e9f66c	Change ForceSizeOpt attribute into MinSize attribute llvm-svn: 167020	2012-10-30 16:32:52 +00:00
Adhemerval Zanella	56775e0f13	PowerPC: More support for Altivec compare operations This patch adds more support for vector type comparisons using altivec. It adds correct support for v16i8, v8i16, v4i32, and v4f32 vector types for comparison operators ==, !=, >, >=, <, and <=. llvm-svn: 167015	2012-10-30 13:50:19 +00:00
Hans Wennborg	f3254838e4	Use TargetTransformInfo to control switch-to-lookup table transformation When the switch-to-lookup tables transform landed in SimplifyCFG, it was pointed out that this could be inappropriate for some targets. Since there was no way at the time for the pass to know anything about the target, an awkward reverse-transform was added in CodeGenPrepare that turned lookup tables back into switches for some targets. This patch uses the new TargetTransformInfo to determine if a switch should be transformed, and removes CodeGenPrepare::ConvertLoadToSwitch. llvm-svn: 167011	2012-10-30 11:23:25 +00:00
Reed Kotler	a811753716	Change mips16 delay slot jumps to non delay slot forms by default. We will make them delay slot forms if there is something that can be placed in the delay slot during a separate pass. Mips16 extended instructions cannot be placed in delay slots. llvm-svn: 166990	2012-10-30 00:54:49 +00:00
Jakub Staszak	a3d8e9974a	Re-commit r166971. I reverted it to quickly, when buildbots didn't have a chance to test it with chapni's fix (-mattr=+avx). llvm-svn: 166985	2012-10-30 00:01:57 +00:00
Jakub Staszak	d74cb61d86	Revert r166971. It causes buildbot failure. To be investigated. llvm-svn: 166979	2012-10-29 23:13:50 +00:00
NAKAMURA Takumi	382df5eb18	llvm/test/CodeGen/X86/vec_shuffle-30.ll: Try to unbreak builds - assuming +avx. llvm-svn: 166974	2012-10-29 22:45:18 +00:00
Jakub Staszak	c8f4825ba6	Allow to fold vector load if there is more than one bitcast, so in the case: %0 = load <8 x i16>* %dest %1 = shufflevector <8 x i16> %0, <8 x i16> %in, <8 x i32> < i32 0, i32 1, i32 2, i32 3, i32 13, i32 undef, i32 14, i32 14> store <8 x i16> %1, <8 x i16>* %dest We get: vmovlpd (%eax), %xmm0, %xmm0 instead of: vmovaps (%eax), %xmm1 vmovsd %xmm1, %xmm0, %xmm0 No extra test-case is added. I just fixed the existing one (also it uses FileCheck now). llvm-svn: 166971	2012-10-29 21:56:35 +00:00
Bill Schmidt	bd4ac26973	This patch solves a problem with passing varargs parameters under the PPC64 ELF ABI. A varargs parameter consisting of a single-precision floating-point value, or of a single-element aggregate containing a single-precision floating-point value, must be passed in the low-order (rightmost) four bytes of the doubleword stack slot reserved for that parameter. If there are GPR protocol registers remaining, the parameter must also be mirrored in the low-order four bytes of the reserved GPR. Prior to this patch, such parameters were being passed in the high-order four bytes of the stack slot and the mirrored GPR. The patch adds a new test case to verify the correct code generation. llvm-svn: 166968	2012-10-29 21:18:16 +00:00
Reed Kotler	740981e35c	Implement patterns for extloadi8 and extloadi16 llvm-svn: 166960	2012-10-29 19:39:04 +00:00
Ulrich Weigand	3abb34389d	In various places throughout the code generator, there were special checks to avoid performing compile-time arithmetic on PPCDoubleDouble. Now that APFloat supports arithmetic on PPCDoubleDouble, those checks are no longer needed, and we can treat the type like any other. llvm-svn: 166958	2012-10-29 18:35:49 +00:00
Chad Rosier	466c1c6870	Remove redundant test case from r166949, per Eli's suggestion. llvm-svn: 166953	2012-10-29 18:18:26 +00:00
Chad Rosier	1bbaa449ad	[ms-inline asm] Add support for the [] operator. Essentially, [expr1][expr2] is equivalent to [expr1 + expr2]. See test cases for more examples. rdar://12470392 llvm-svn: 166949	2012-10-29 18:01:54 +00:00
Michael Liao	ad0b69fe3e	Fix PR14204 - Add missing pattern on X86ISD::VZEXT from VR256 to VR256 when AVX2 is enabled. llvm-svn: 166947	2012-10-29 17:57:12 +00:00
Jakob Stoklund Olesen	9a06696a77	Completely disallow partial copies in adjustCopiesBackFrom(). Partial copies can show up even when CoalescerPair.isPartial() returns false. For example: %vreg24:dsub_0<def> = COPY %vreg31:dsub_0; QPR:%vreg24,%vreg31 Such a partial-partial copy is not good enough for the transformation adjustCopiesBackFrom() needs to do. llvm-svn: 166944	2012-10-29 17:51:52 +00:00
Ulrich Weigand	0de4a1e4ae	Allow i32/i64 for 'f' constraint on PowerPC. This fixes PR12757. llvm-svn: 166943	2012-10-29 17:49:34 +00:00
Reed Kotler	aebb8b034c	Expand all atomic ops for mips16. llvm-svn: 166935	2012-10-29 16:16:54 +00:00
Preston Gurd	52dacca977	This patch addresses a problem with the Post RA scheduler generating an incorrect instruction sequence due to it not being aware that an inline assembly instruction may reference memory. This patch fixes the problem by causing the scheduler to always assume that any inline assembly code instruction could access memory. This is necessary because the internal representation of the inline instruction does not include any information about memory accesses. This should fix PR13504. llvm-svn: 166929	2012-10-29 15:01:23 +00:00
Bill Schmidt	bbc661e572	This patch adds alignment information for long double to the 64-bit PowerPC ELF subtarget. The existing logic is used as a fallback to avoid any changes to the Darwin ABI. PPC64 ELF now has two possible data layout strings: one for FreeBSD, which requires 8-byte alignment, and a default string that requires 16-byte alignment. I've added a test for PPC64 Linux to verify the 16-byte alignment. If somebody wants to add a separate test for FreeBSD, that would be great. Note that there is a companion patch to update the alignment information in Clang, which I am committing now as well. llvm-svn: 166928	2012-10-29 14:59:36 +00:00
Reed Kotler	e6c31579be	Implement brind operator for mips16. llvm-svn: 166903	2012-10-28 23:08:07 +00:00
Reed Kotler	3589dd74ac	This patch is for the implementation of mips16 complex pattern addr16. Previously mips16 was sharing the pattern addr which is used for mips32 and mips64. This had a number of problems: 1) Storing and loading byte and halfword quantities for mips16 has particular problems due to the primarily non mips16 nature of SP. When we must load/store byte/halfword stack objects in a function, we must create a mips16 alias register for SP. This functionality is tested in stchar.ll. 2) We need to have an FP register under certain conditions (such as dynamically sized alloca). We use mips16 register S0 for this purpose. In this case, we also use this register when accessing frame objects so this issue also affects the complex pattern addr16. This functionality is tested in alloca16.ll. The Mips16InstrInfo.td has been updated to use addr16 instead of addr. The complex pattern C++ function for addr has been copied to addr16 and updated to reflect the above issues. llvm-svn: 166897	2012-10-28 06:02:37 +00:00
Jakob Stoklund Olesen	57143f7e78	Never attempt to join an early-clobber def with a regular kill. This fixes PR14194. llvm-svn: 166880	2012-10-27 17:41:27 +00:00
Quentin Colombet	3ee56a3bf5	[code size][ARM] Emit regular call instructions instead of the move, branch sequence llvm-svn: 166854	2012-10-27 01:10:17 +00:00
Reed Kotler	7e4d9969cb	Implement MipsHi for mips16 llvm-svn: 166852	2012-10-27 00:57:14 +00:00
Akira Hatanaka	6a124a84dc	[mips] Do not tail-call optimize vararg functions or functions with byval arguments. This is rather conservative and should be fixed later to be more aggressive. llvm-svn: 166851	2012-10-27 00:56:56 +00:00
Akira Hatanaka	2c07f1f140	[mips] Make sure FuncArg doesn't advance when OrigArgIndex is the same as in the previous iteration. llvm-svn: 166850	2012-10-27 00:44:39 +00:00
Jakob Stoklund Olesen	1f06e7f00e	Revert r163298 "Optimize codegen for VSETLNi{8,16,32} operating on Q registers." Keep the integer_insertelement test case, the new coalescer can handle this kind of lane insertion without help from pseudo-instructions. llvm-svn: 166835	2012-10-26 23:39:46 +00:00
Reed Kotler	b650f6bbe7	implement mips16 tls global addr llvm-svn: 166827	2012-10-26 22:57:32 +00:00
Jakob Stoklund Olesen	e46a1046c0	Add GPRPair Register class to ARM. Some instructions in ARM require 2 even-odd paired GPRs. This patch adds support for such register class. Patch by Weiming Zhao! llvm-svn: 166816	2012-10-26 21:29:15 +00:00
Reed Kotler	287f0449a2	Implement carry for subtract/add for mips16 llvm-svn: 166755	2012-10-26 04:46:26 +00:00
Reed Kotler	e47873ab89	implement large (>16 bit) constant loading. llvm-svn: 166749	2012-10-26 03:09:34 +00:00
Reed Kotler	183ba5ef26	fix test setgek.ll so that it will not give false "make check" failure in some cases llvm-svn: 166747	2012-10-26 01:29:42 +00:00
Reed Kotler	097556d6bd	implement mips16 patterns for select nodes llvm-svn: 166721	2012-10-25 21:33:30 +00:00
Michael Liao	8fe3a6bda4	Add test for ATOM ISA SSSE3 - Remove SSE4.1 feature in other ATOM-based test cases llvm-svn: 166699	2012-10-25 17:50:05 +00:00
Bill Schmidt	6ed3b99f43	This patch addresses a PPC64 ELF issue with passing parameters consisting of structs having size 3, 5, 6, or 7. Such a struct must be passed and received as right-justified within its register or memory slot. The problem is only present for structs that are passed in registers. Previously, as part of a patch handling all structs of size less than 8, I added logic to rotate the incoming register so that the struct was left- justified prior to storing the whole register. This was incorrect because the address of the parameter had already been adjusted earlier to point to the right-adjusted value in the storage slot. Essentially I had accidentally accounted for the right-adjustment twice. In this patch, I removed the incorrect logic and reorganized the code to make the flow clearer. The removal of the rotates changes the expected code generation, so test case structsinregs.ll has been modified to reflect this. I also added a new test case, jaggedstructs.ll, to demonstrate that structs of these sizes can now be properly received and passed. I've built and tested the code on powerpc64-unknown-linux-gnu with no new regressions. I also ran the GCC compatibility test suite and verified that earlier problems with these structs are now resolved, with no new regressions. llvm-svn: 166680	2012-10-25 13:38:09 +00:00
Elena Demikhovsky	e0e69a33a0	The test avx-intel-ocl.ll failed. I can't reproduce on any of my machines. I added -mcpu flag, may be it will fix the problem llvm-svn: 166669	2012-10-25 08:38:42 +00:00
Chad Rosier	dd5eada241	[ms-inline asm] Add back-end test case for r166632. Make sure we emit the correct .s output as well as get the correct encoding by the integrated assembler. llvm-svn: 166638	2012-10-24 23:10:28 +00:00
Evan Cheng	59ed7d45a6	Fix a miscompilation caused by a typo. When turning a adde with negative value into a sbc with a positive number, the immediate should be complemented, not negated. Also added a missing pattern for ARM codegen. rdar://12559385 llvm-svn: 166613	2012-10-24 19:53:01 +00:00
Elena Demikhovsky	d6afb03bc9	Special calling conventions for Intel OpenCL built-in library. llvm-svn: 166566	2012-10-24 14:46:16 +00:00
Michael Liao	5922979e49	Teach DAG combine to fold (buildvec (Xint2fp x)) to (Xint2fp (buildvec x)) - If more than 1 elemennts are defined and target supports the vectorized conversion, use the vectorized one instead to reduce the strength on conversion operation. llvm-svn: 166546	2012-10-24 04:14:18 +00:00
Michael Liao	c5af149e70	Add custom conversion from v2u32 to v2f32 in 32-bit mode - As there's no 64-bit GPRs in 32-bit mode, a custom conversion from v2u32 to v2f32 is added to improve the efficiency of the code generated. llvm-svn: 166545	2012-10-24 04:09:32 +00:00
Akira Hatanaka	868b3a333b	[mips] Make sure sret argument is returned in register V0. llvm-svn: 166539	2012-10-24 02:10:54 +00:00
Rafael Espindola	4e6e537314	Change x86_fastcallcc to require inreg markers. This allows it to known the difference from "int x" (which should go in registers and "struct y {int x;}" (which should not). Clang will be updated in the next patches. llvm-svn: 166536	2012-10-24 01:58:48 +00:00
Michael Liao	2843625bb5	Fix PR14161 - Check index being extracted to be constant 0 before simplfiying. Otherwise, retain the original sequence. llvm-svn: 166504	2012-10-23 21:40:15 +00:00
Michael Liao	1be96bb5ce	Enable lowering ZERO_EXTEND/ANY_EXTEND to PMOVZX from SSE4.1 llvm-svn: 166486	2012-10-23 17:34:00 +00:00
Reed Kotler	164bb37c7b	implement setXX patterns llvm-svn: 166459	2012-10-23 01:35:48 +00:00
Bill Wendling	12cda50f1f	When a block ends in an indirect branch, add its successors to the machine basic block. The CFG of the machine function needs to know that the targets of the indirect branch are successors to the indirect branch. <rdar://problem/12529625> llvm-svn: 166448	2012-10-22 23:30:04 +00:00
Akira Hatanaka	0c7d131a7b	[mips] Use 64-bit registers to return an sret pointer if target ABI is N64. llvm-svn: 166344	2012-10-19 22:11:40 +00:00
Akira Hatanaka	90131ac26c	[mips] Add code to do tail call optimization. Currently, it is enabled only if option "enable-mips-tail-calls" is given and all of the callee's arguments are passed in registers. llvm-svn: 166342	2012-10-19 21:47:33 +00:00
Shuxin Yang	cdde059a34	This patch is to fix radar://8426430. It is about llvm support of __builtin_debugtrap() which is supposed to consistently raise SIGTRAP across all systems. In contrast, __builtin_trap() behave differently on different systems. e.g. it raises SIGTRAP on ARM, and SIGILL on X86. The purpose of __builtin_debugtrap() is to consistently provide "trap" functionality, in the mean time preserve the compatibility with on gcc on __builtin_trap(). The X86 backend is already able to handle debugtrap(). This patch is to: 1) make front-end recognize "__builtin_debugtrap()" (emboddied in the one-line change to Clang). 2) In DAG legalization phase, by default, "debugtrap" will be replaced with "trap", which make the __builtin_debugtrap() "available" to all existing ports without the hassle of changing their code. 3) If trap-function is specified (via -trap-func=xyz to llc), both __builtin_debugtrap() and __builtin_trap() will be expanded into the function call of the specified trap function. This behavior may need change in the future. The provided testing-case is to make sure 2) and 3) are working for ARM port, and we already have a testing case for x86. llvm-svn: 166300	2012-10-19 20:11:16 +00:00
Michael Liao	4b7ccfcaad	Lower BUILD_VECTOR to SHUFFLE + INSERT_VECTOR_ELT for X86 - If INSERT_VECTOR_ELT is supported (above SSE2, either by custom sequence of legal insn), transform BUILD_VECTOR into SHUFFLE + INSERT_VECTOR_ELT if most of elements could be built from SHUFFLE with few (so far 1) elements being inserted. llvm-svn: 166288	2012-10-19 17:15:18 +00:00
Stepan Dyatkovskiy	dab8043048	ARM: Removed extra stack frame object for fixed byval arguments, VarArgsStyleRegisters invocation was reworked due to some improper usage in past. PR14099 also demonstrates it. llvm-svn: 166273	2012-10-19 08:23:06 +00:00
Sebastian Pop	127777d686	Clear unknown mem ops when merging stack slots (pr14090) When merging stack slots, if StackColoring::remapInstructions gets a value back from GetUnderlyingObject that it does not know about or is not itself a stack slot, clear the memory operand in case it aliases the merged slot. This prevents the introduction of incorrect aliasing information. Author: Matthew Curtis <mcurtis@codeaurora.org> llvm-svn: 166216	2012-10-18 19:53:48 +00:00
Nadav Rotem	d5f8859672	In SimplifySelectOps we pulled two loads through a select node despite the fact that one was dependent on the other. rdar://12513091 llvm-svn: 166196	2012-10-18 18:06:48 +00:00
Ulrich Weigand	d34b5bd610	This patch fixes failures in the SingleSource/Regression/C/uint64_to_float test case on PowerPC caused by rounding errors when converting from a 64-bit integer to a single-precision floating point. The reason for this are double-rounding effects, since on PowerPC we have to convert to an intermediate double-precision value first, which gets rounded to the final single-precision result. The patch fixes the problem by preparing the 64-bit integer so that the first conversion step to double-precision will always be exact, and the final rounding step will result in the correctly-rounded single-precision result. The generated code sequence is equivalent to what GCC would generate. When -enable-unsafe-fp-math is in effect, that extra effort is omitted and we accept possible rounding errors (just like GCC does as well). llvm-svn: 166178	2012-10-18 13:16:11 +00:00
Michael Liao	3ac8201ea4	Revert part of r166049 back and enable test case in r166125. - Folding (trunc (concat ... X )) to (concat ... (trunc X) ...) is valid when '...' are all 'undef's. - r166125 relies on this transformation. llvm-svn: 166155	2012-10-17 23:45:54 +00:00
Michael Liao	f630f4cadc	Disable extract-concat test case temporarily llvm-svn: 166141	2012-10-17 23:08:19 +00:00
Michael Liao	c87d98dbc8	Revert r166049 - In general, it's unsafe for this transformation. llvm-svn: 166135	2012-10-17 22:41:15 +00:00
Reed Kotler	6743924a32	Add conditional branch instructions and their patterns. llvm-svn: 166134	2012-10-17 22:29:54 +00:00
Michael Liao	7a442c8031	Teach DAG combine to fold (extract_subvec (concat v1, ..) i) to v_i - If the extracted vector has the same type of all vectored being concatenated together, it should be simplified directly into v_i, where i is the index of the element being extracted. llvm-svn: 166125	2012-10-17 20:48:33 +00:00
Anton Korobeynikov	0a69176ce0	Fix fallout from RegInfo => FrameLowering refactoring on MSP430. Patch by Job Noorman! llvm-svn: 166108	2012-10-17 17:37:11 +00:00
Michael Liao	6f7206132f	Fix setjmp on models with non-Small code model nor non-Static relocation model - MBB address is only valid as an immediate value in Small & Static code/relocation models. On other models, LEA is needed to load IP address of the restore MBB. - A minor fix of MBB in MC lowering is added as well to enable target relocation flag being propagated into MC. llvm-svn: 166084	2012-10-17 02:22:27 +00:00
Jakob Stoklund Olesen	4df59a9ff8	Avoid rematerializing a redef immediately after the old def. PR14098 contains an example where we would rematerialize a MOV8ri immediately after the original instruction: %vreg7:sub_8bit<def> = MOV8ri 9; GR32_ABCD:%vreg7 %vreg22:sub_8bit<def> = MOV8ri 9; GR32_ABCD:%vreg7 Besides being pointless, it is also wrong since the original instruction only redefines part of the register, and the value read by the new instruction is wrong. The problem was the LiveRangeEdit::allUsesAvailableAt() didn't special-case OrigIdx == UseIdx and found the wrong SSA value. llvm-svn: 166068	2012-10-16 22:51:58 +00:00
Jakob Stoklund Olesen	2043329e67	Revert r166046 "Switch back to the old coalescer for now to fix the 32 bit bit" A fix for PR14098, including the test case is in the next commit. llvm-svn: 166067	2012-10-16 22:51:55 +00:00
Michael Liao	19006206a1	Teach DAG combine to fold (trunc (fptoXi x)) to (fptoXi x) llvm-svn: 166049	2012-10-16 19:38:35 +00:00
Rafael Espindola	b58be2c593	Switch back to the old coalescer for now to fix the 32 bit bit llvm+clang+compiler-rt bootstrap. llvm-svn: 166046	2012-10-16 19:34:06 +00:00
Bill Schmidt	48081cad0d	This patch addresses PR13949. For the PowerPC 64-bit ELF Linux ABI, aggregates of size less than 8 bytes are to be passed in the low-order bits ("right-adjusted") of the doubleword register or memory slot assigned to them. A previous patch addressed this for aggregates passed in registers. However, small aggregates passed in the overflow portion of the parameter save area are still being passed left-adjusted. The fix is made in PPCTargetLowering::LowerCall_Darwin_Or_64SVR4 on the caller side, and in PPCTargetLowering::LowerFormalArguments_64SVR4 on the callee side. The main fix on the callee side simply extends existing logic for 1- and 2-byte objects to 1- through 7-byte objects, and correcting a constant left over from 32-bit code. There is also a fix to a bogus calculation of the offset to the following argument in the parameter save area. On the caller side, again a constant left over from 32-bit code is fixed. Additionally, some code for 1, 2, and 4-byte objects is duplicated to handle the 3, 5, 6, and 7-byte objects for SVR4 only. The LowerCall_Darwin_Or_64SVR4 logic is getting fairly convoluted trying to handle both ABIs, and I propose to separate this into two functions in a future patch, at which time the duplication can be removed. The patch adds a new test (structsinmem.ll) to demonstrate correct passing of structures of all seven sizes. Eight dummy parameters are used to force these structures to be in the overflow portion of the parameter save area. As a side effect, this corrects the case when aggregates passed in registers are saved into the first eight doublewords of the parameter save area: Previously they were stored left-justified, and now are properly stored right-justified. This requires changing the expected output of existing test case structsinregs.ll. llvm-svn: 166022	2012-10-16 13:30:53 +00:00
Stepan Dyatkovskiy	e59a920b0c	Issue: Stack is formed improperly for long structures passed as byval arguments for EABI mode. If we took AAPCS reference, we can found the next statements: A: "If the argument requires double-word alignment (8-byte), the NCRN (Next Core Register Number) is rounded up to the next even register number." (5.5 Parameter Passing, Stage C, C.3). B: "The alignment of an aggregate shall be the alignment of its most-aligned component." (4.3 Composite Types, 4.3.1 Aggregates). So if we have structure with doubles (9 double fields) and 3 Core unused registers (r1, r2, r3): caller should use r2 and r3 registers only. Currently r1,r2,r3 set is used, but it is invalid. Callee VA routine should also use r2 and r3 regs only. All is ok here. This behaviour is guessed by rounding up SP address with ADD+BFC operations. Fix: Main fix is in ARMTargetLowering::HandleByVal. If we detected AAPCS mode and 8 byte alignment, we waste odd registers then. P.S.: I also improved LDRB_POST_IMM regression test. Since ldrb instruction will not generated by current regression test after this patch. llvm-svn: 166018	2012-10-16 07:16:47 +00:00
NAKAMURA Takumi	1705a999fa	Reapply r165661, Patch by Shuxin Yang <shuxin.llvm@gmail.com>. Original message: The attached is the fix to radar://11663049. The optimization can be outlined by following rules: (select (x != c), e, c) -> select (x != c), e, x), (select (x == c), c, e) -> select (x == c), x, e) where the <c> is an integer constant. The reason for this change is that : on x86, conditional-move-from-constant needs two instructions; however, conditional-move-from-register need only one instruction. While the LowerSELECT() sounds to be the most convenient place for this optimization, it turns out to be a bad place. The reason is that by replacing the constant <c> with a symbolic value, it obscure some instruction-combining opportunities which would otherwise be very easy to spot. For that reason, I have to postpone the change to last instruction-combining phase. The change passes the test of "make check-all -C <build-root/test" and "make -C project/test-suite/SingleSource". Original message since r165661: My previous change has a bug: I negated the condition code of a CMOV, and go ahead creating a new CMOV using the ORIGINAL condition code. llvm-svn: 166017	2012-10-16 06:28:34 +00:00
Rafael Espindola	7f4f79a5bc	Fix the cpu name and add -verify-machineinstrs. llvm-svn: 166003	2012-10-16 01:13:06 +00:00
Andrew Trick	d9d4be0d57	misched: Added handleMove support for updating all kill flags, not just for allocatable regs. This is a medium term workaround until we have a more robust solution in the form of a register liveness utility for postRA passes. llvm-svn: 166001	2012-10-16 00:22:51 +00:00
Michael Liao	97bf363a9e	Add __builtin_setjmp/_longjmp supprt in X86 backend - Besides used in SjLj exception handling, __builtin_setjmp/__longjmp is also used as a light-weight replacement of setjmp/longjmp which are used to implementation continuation, user-level threading, and etc. The support added in this patch ONLY addresses this usage and is NOT intended to support SjLj exception handling as zero-cost DWARF exception handling is used by default in X86. llvm-svn: 165989	2012-10-15 22:39:43 +00:00
Jim Grosbach	54c7432e22	ARM: v1i64 and v2i64 VBSL intrinsic support. rdar://12502028 llvm-svn: 165981	2012-10-15 21:23:40 +00:00
Andrew Trick	6e5f49d7b7	Check output of the misched unit tests llvm-svn: 165959	2012-10-15 20:33:14 +00:00
Rafael Espindola	aee00b5e72	Add a cpu to try to fix the atom builder. llvm-svn: 165956	2012-10-15 19:25:43 +00:00
Rafael Espindola	b41459a3c0	Add testcase for pr14088. llvm-svn: 165954	2012-10-15 19:00:10 +00:00
Andrew Trick	5a89e0ef07	misched tests: add a triple to speculatively fix windows builders. llvm-svn: 165952	2012-10-15 18:21:08 +00:00
Andrew Trick	90f711da9a	misched: ILP scheduler for experimental heuristics. llvm-svn: 165950	2012-10-15 18:02:27 +00:00
Silviu Baranga	b14097000b	Fixed PR13938: the ARM backend was crashing because it couldn't select a VDUPLANE node with the vector input size different from the output size. This was bacause the BUILD_VECTOR lowering code didn't check that the size of the input vector was correct for using VDUPLANE. llvm-svn: 165929	2012-10-15 09:41:32 +00:00
Jakob Stoklund Olesen	ea82bd7f0d	Drop <def,dead> flags when merging into an unused lane. The new coalescer can merge a dead def into an unused lane of an otherwise live vector register. Clear the <dead> flag when that happens since the flag refers to the full virtual register which is still live after the partial dead def. This fixes PR14079. llvm-svn: 165877	2012-10-13 17:26:47 +00:00
Jakob Stoklund Olesen	2f6dfc7d0b	Allow for loops in LiveIntervals::pruneValue(). It is possible that the live range of the value being pruned loops back into the kill MBB where the search started. When that happens, make sure that the beginning of KillMBB is also pruned. Instead of starting a DFS at KillMBB and skipping the root of the search, start a DFS at each KillMBB successor, and allow the search to loop back to KillMBB. This fixes PR14078. llvm-svn: 165872	2012-10-13 16:15:31 +00:00
Benjamin Kramer	ecd15d7f6c	X86: Fix accidentally swapped operands. llvm-svn: 165871	2012-10-13 12:50:19 +00:00
Benjamin Kramer	d6b9362fc2	X86: Promote i8 cmov when both operands are coming from truncates of the same width. X86 doesn't have i8 cmovs so isel would emit a branch. Emitting branches at this level is often not a good idea because it's too late for many optimizations to kick in. This solution doesn't add any extensions (truncs are free) and tries to avoid introducing partial register stalls by filtering direct copyfromregs. I'm seeing a ~10% speedup on reading a random .png file with libpng15 via graphicsmagick on x86_64/westmere, but YMMV depending on the microarchitecture. llvm-svn: 165868	2012-10-13 10:39:49 +00:00
Manman Ren	7e48b252e7	ARM: tail-call inside a function where part of a byval argument is on caller's local frame causes problem. For example: void f(StructToPass s) { g(&s, sizeof(s)); } will cause problem with tail-call since part of s is passed via registers and saved in f's local frame. When g tries to access s, part of s may be corrupted since f's local frame is popped out before the tail-call. The current fix is to disable tail-call if getVarArgsRegSaveSize is not 0 for the caller. This is a conservative approach, if we can prove the address of s or part of s is not taken and passed to g, it should be okay to perform tail-call. rdar://12442472 llvm-svn: 165853	2012-10-12 23:39:43 +00:00
Jakob Stoklund Olesen	5ac781c576	Fix buildbots: -misched=shuffle is only available in +Asserts builds. llvm-svn: 165846	2012-10-12 23:01:33 +00:00
Jim Grosbach	30af442a84	ARM: Mark VSELECT as 'expand'. The backend already pattern matches to form VBSL when it can. We may want to teach it to use the vbsl intrinsics at some point to prevent machine licm from mucking with this, but using the Expand is completely correct. http://llvm.org/bugs/show_bug.cgi?id=13831 http://llvm.org/bugs/show_bug.cgi?id=13961 Patch by Peter Couperus <peter.couperus@st.com>. llvm-svn: 165845	2012-10-12 22:59:21 +00:00
Jakob Stoklund Olesen	1a87a29d08	Use a transposed algorithm for handleMove(). Completely update one interval at a time instead of collecting live range fragments to be updated. This avoids building data structures, except for a single SmallPtrSet of updated intervals. Also share code between handleMove() and handleMoveIntoBundle(). Add support for moving dead defs across other live values in the interval. The MI scheduler can do that. llvm-svn: 165824	2012-10-12 21:31:57 +00:00
Jakob Stoklund Olesen	1a3eb878f6	Fix coalescing with IMPLICIT_DEF values. PHIElimination inserts IMPLICIT_DEF instructions to guarantee that all PHI predecessors have a live-out value. These IMPLICIT_DEF values are not considered to be real interference when coalescing virtual registers: %vreg1 = IMPLICIT_DEF %vreg2 = MOV32r0 When joining %vreg1 and %vreg2, the IMPLICIT_DEF instruction and its value number should simply be erased since the %vreg2 value number now provides a live-out value for the PHI predecesor block. llvm-svn: 165813	2012-10-12 18:03:04 +00:00
NAKAMURA Takumi	4d88f05fe0	llvm/test/CodeGen/PowerPC/2012-10-12-bitcast.ll: Try to fix failure on non-ppc hosts, to add -mattr=+altivec. llvm-svn: 165803	2012-10-12 16:01:08 +00:00
Ulrich Weigand	9aa51d1a2c	Fix big-endian codegen bug in DAGTypeLegalizer::ExpandRes_BITCAST On PowerPC, a bitcast of <16 x i8> to i128 may run through a code path in ExpandRes_BITCAST that attempts to do an intermediate bitcast to a <4 x i32> vector, and then construct the Hi and Lo parts of the resulting i128 by pairing up two of those i32 vector elements each. The code already recognizes that on a big-endian system, the first two vector elements form the Hi part, and the final two vector elements form the Lo part (vice-versa from the little-endian situation). However, we also need to take endianness into account when forming each of those separate pairs: on a big-endian system, vector element 0 is the high part of the pair making up the Hi part of the result, and vector element 1 is the low part of the pair. The code currently always uses vector element 0 as the low part and vector element 1 as the high part, as is appropriate for little-endian platforms only. This patch fixes this by swapping the vector elements as they are paired up as appropriate. llvm-svn: 165802	2012-10-12 15:42:58 +00:00
Reed Kotler	cf11c59e2f	Div, Rem int/unsigned int llvm-svn: 165783	2012-10-12 02:01:09 +00:00
Evan Cheng	21c4adcdd8	Legalizer optimize a pair of div / mod to a call to divrem libcall if they are not legal. However, it should use a div instruction + mul + sub if divide is legal. The rem legalization code was missing a check and incorrectly uses a divrem libcall even when div is legal. rdar://12481395 llvm-svn: 165778	2012-10-12 01:15:47 +00:00
Jakob Stoklund Olesen	d0d7860f40	Pass an explicit operand number to addLiveIns. Not all instructions define a virtual register in their first operand. Specifically, INLINEASM has a different format. <rdar://problem/12472811> llvm-svn: 165721	2012-10-11 16:46:07 +00:00
Bill Schmidt	22162470ba	This patch addresses PR13947. For function calls on the 64-bit PowerPC SVR4 target, each parameter is mapped to as many doublewords in the parameter save area as necessary to hold the parameter. The first 13 non-varargs floating-point values are passed in registers; any additional floating-point parameters are passed in the parameter save area. A single-precision floating-point parameter (32 bits) must be mapped to the second (rightmost, low-order) word of its assigned doubleword slot. Currently LLVM violates this ABI requirement by mapping such a parameter to the first (leftmost, high-order) word of its assigned doubleword slot. This is internally self-consistent but will not interoperate correctly with libraries compiled with an ABI-compliant compiler. This patch corrects the problem by adjusting the parameter addressing on both sides of the calling convention. llvm-svn: 165714	2012-10-11 15:38:20 +00:00
NAKAMURA Takumi	da0730c2d7	Revert r165661, "Patch by Shuxin Yang <shuxin.llvm@gmail.com>." It broke stage2 clang and test-suite/MultiSource/Benchmarks/mediabench/g721/g721encode. llvm-svn: 165692	2012-10-11 02:02:05 +00:00
Evan Cheng	0f2565fd35	Add isel patterns for v2f32 / v4f32 neon.vbsl intrinsics. rdar://12471808 llvm-svn: 165673	2012-10-10 23:06:34 +00:00
Bill Schmidt	e09d4fd769	Add -mattr=+altivec and remove XFAIL. llvm-svn: 165666	2012-10-10 22:25:11 +00:00
Bill Schmidt	6d110a5139	XFAIL for all targets pending investigation llvm-svn: 165664	2012-10-10 21:52:10 +00:00
Nadav Rotem	17418964f8	Patch by Shuxin Yang <shuxin.llvm@gmail.com>. Original message: The attached is the fix to radar://11663049. The optimization can be outlined by following rules: (select (x != c), e, c) -> select (x != c), e, x), (select (x == c), c, e) -> select (x == c), x, e) where the <c> is an integer constant. The reason for this change is that : on x86, conditional-move-from-constant needs two instructions; however, conditional-move-from-register need only one instruction. While the LowerSELECT() sounds to be the most convenient place for this optimization, it turns out to be a bad place. The reason is that by replacing the constant <c> with a symbolic value, it obscure some instruction-combining opportunities which would otherwise be very easy to spot. For that reason, I have to postpone the change to last instruction-combining phase. The change passes the test of "make check-all -C <build-root/test" and "make -C project/test-suite/SingleSource". llvm-svn: 165661	2012-10-10 21:31:55 +00:00
Bill Schmidt	b9bc47409d	When generating spill and reload code for vector registers on PowerPC, the compiler makes use of GPR0. However, there are two flavors of GPR0 defined by the target: the 32-bit GPR0 (R0) and the 64-bit GPR0 (X0). The spill/reload code makes use of R0 regardless of whether we are generating 32- or 64-bit code. This patch corrects the problem in the obvious manner, using X0 and ADDI8 for 64-bit and R0 and ADDI for 32-bit. llvm-svn: 165658	2012-10-10 21:25:01 +00:00
Bill Schmidt	38d9458720	The PowerPC VRSAVE register has been somewhat of an odd beast since the Altivec extensions were introduced. Its use is optional, and allows the compiler to communicate to the operating system which vector registers should be saved and restored during a context switch. In practice, this information is ignored by the various operating systems using the SVR4 ABI; the kernel saves and restores the entire register state. Setting the VRSAVE register is no longer performed by the AIX XL compilers, the IBM i compilers, or by GCC on Power Linux systems. It seems best to avoid this logic within LLVM as well. This patch avoids generating code to update and restore VRSAVE for the PowerPC SVR4 ABIs (32- and 64-bit). The code remains in place for the Darwin ABI. llvm-svn: 165656	2012-10-10 20:54:15 +00:00
Michael Liao	e26b0313de	Specify CPU model to avoid breaking ATOM builds llvm-svn: 165638	2012-10-10 18:04:52 +00:00
Michael Liao	e999b865dd	Add support for FP_ROUND from v2f64 to v2f32 - Due to the current matching vector elements constraints in ISD::FP_ROUND, rounding from v2f64 to v4f32 (after legalization from v2f32) is scalarized. Add a customized v2f32 widening to convert it into a target-specific X86ISD::VFPROUND to work around this constraints. llvm-svn: 165631	2012-10-10 16:53:28 +00:00
Stepan Dyatkovskiy	283baa0027	Fix for LDRB instruction: SDNode for LDRB_POST_IMM is invalid: number of registers added to SDNode fewer that described in .td. 7 ops is needed, but SDNode with only 6 is created. In more details: In ARMInstrInfo.td, in multiclass AI2_ldridx, in definition _POST_IMM, offset operand is defined as am2offset_imm. am2offset_imm is complex parameter type, and actually it consists from dummy register and imm itself. As I understood trick with dummy reg was made for AsmParser. In ARMISelLowering.cpp, this dummy register was not added to SDNode, and it cause crash in Peephole Optimizer pass. The problem fixed by setting up additional dummy reg when emitting LDRB_POST_IMM instruction. llvm-svn: 165617	2012-10-10 11:43:40 +00:00
Stepan Dyatkovskiy	f13dbb8e24	Issue description: SchedulerDAGInstrs::buildSchedGraph ignores dependencies between FixedStack objects and byval parameters. So loading byval parameters from stack may be inserted before it will be stored, since these operations are treated as independent. Fix: Currently ARMTargetLowering::LowerFormalArguments saves byval registers with FixedStack MachinePointerInfo. To fix the problem we need to store byval registers with MachinePointerInfo referenced to first the "byval" parameter. Also commit adds two new fields to the InputArg structure: Function's argument index and InputArg's part offset in bytes relative to the start position of Function's argument. E.g.: If function's argument is 128 bit width and it was splitted onto 32 bit regs, then we got 4 InputArg structs with same arg index, but different offset values. llvm-svn: 165616	2012-10-10 11:37:36 +00:00
Akira Hatanaka	9c8dcfc73a	Implement MipsTargetLowering::CanLowerReturn. Patch by Sasa Stankovic. llvm-svn: 165585	2012-10-10 01:27:09 +00:00
Evan Cheng	3903e1be01	When expanding atomic load arith instructions, do not lose target flags. rdar://12453106 llvm-svn: 165568	2012-10-09 23:48:33 +00:00
Jakob Stoklund Olesen	9d1173a86e	Don't crash on extra evil irreducible control flow. When the CFG contains a loop with multiple entry blocks, the traces computed by MachineTraceMetrics don't always have the same nice properties. Loop back-edges are normally excluded from traces, but MachineLoopInfo doesn't recognize loops with multiple entry blocks, so those back-edges may be included. Avoid asserting when that happens by adding an isEarlierInSameTrace() function that accurately determines if a dominating block is part of the same trace AND is above the currrent block in the trace. llvm-svn: 165434	2012-10-08 22:06:44 +00:00
Adhemerval Zanella	fe3f793cec	PR12716: PPC crashes on vector compare Vector compare using altivec 'vcmpxxx' instructions have as third argument a vector register instead of CR one, different from integer and float-point compares. This leads to a failure in code generation, where 'SelectSETCC' expects a DAG with a CR register and gets vector register instead. This patch changes the behavior by just returning a DAG with the vector compare instruction based on the type. The patch also adds a testcase for all vector types llvm defines. It also included a fix on signed 5-bits predicates printing, where signed values were not handled correctly as signed (char are unsigned by default for PowerPC). This generates 'vspltisw' (vector splat) instruction with SIM out of range. llvm-svn: 165419	2012-10-08 18:59:53 +00:00
Adhemerval Zanella	5c6e08435e	Add floating-point to and from integer conversion This patch add altivec support for v4i32 to v4f32 and for v4f32 to v4i32 vector rounding conversion. llvm-svn: 165409	2012-10-08 17:27:24 +00:00
Benjamin Kramer	302178bf13	X86: fcmov doesn't handle all possible EFLAGS, fall back to a branch for the others. Otherwise it will try to use SSE patterns and fail horribly if sse is disabled. Fixes PR14035. llvm-svn: 165377	2012-10-07 15:34:27 +00:00
Reed Kotler	240322140e	Patch for integer multiply, signed/unsigned, long/long long. llvm-svn: 165322	2012-10-05 18:27:54 +00:00
Rafael Espindola	144e271570	Convert to unix line endings. llvm-svn: 165308	2012-10-05 13:32:38 +00:00
Evan Cheng	847ad4460a	Follow up to r165072. Try a different approach: only move the load when it's going to be folded into the call. rdar://12437604 llvm-svn: 165287	2012-10-05 01:48:22 +00:00
Nadav Rotem	b27777ff02	When merging connsecutive stores, use vectors to store the constant zero. llvm-svn: 165267	2012-10-04 22:35:15 +00:00
Jim Grosbach	330840ffd9	ARM: locate user-defined text sections next to default text. Make sure functions located in user specified text sections (via the section attribute) are located together with the default text sections. Otherwise, for large object files, the relocations for call instructions are more likely to be out of range. This becomes even more likely in the presence of LTO. rdar://12402636 llvm-svn: 165254	2012-10-04 21:33:24 +00:00
Chad Rosier	271623f8ae	[ms-inline asm] Add support in the X86AsmPrinter for printing memory references in the Intel syntax. The MC layer supports emitting in the Intel syntax, but this would require the inline assembly MachineInstr to be lowered to an MCInst before emission. This is potential future work, but for now emitting directly from the MachineInstr suffices. llvm-svn: 165173	2012-10-03 22:06:44 +00:00
Nadav Rotem	ac92066b0c	Fix a cycle in the DAG. In this code we replace multiple loads with a single load and multiple stores with a single load. We create the wide loads and stores (and their chains) before we remove the scalar loads and stores and fix the DAG chain. We attempted to merge loads with a different chain. When that happened, the assumption that it is safe to RAUW broke and a cycle was introduced. llvm-svn: 165148	2012-10-03 19:30:31 +00:00
Nadav Rotem	7cbc12a41d	A DAGCombine optimization for mergeing consecutive stores to memory. The optimization is not profitable in many cases because modern processors perform multiple stores in parallel and merging stores prior to merging requires extra work. We handle two main cases: 1. Store of multiple consecutive constants: q->a = 3; q->4 = 5; In this case we store a single legal wide integer. 2. Store of multiple consecutive loads: int a = p->a; int b = p->b; q->a = a; q->b = b; In this case we load/store either ilegal vector registers or legal wide integer registers. llvm-svn: 165125	2012-10-03 16:11:15 +00:00
Silviu Baranga	3c314990e6	Fixed a bug in the ExecutionDependencyFix pass that caused dependencies to not propagate through implicit defs. llvm-svn: 165102	2012-10-03 08:29:36 +00:00
Jakob Stoklund Olesen	0f6e8bb5e0	The early if conversion pass is ready to be used as an opt-in. Enable the pass by default for targets that request it, and change the -enable-early-ifcvt to the opposite -disable-early-ifcvt. There are still some x86 regressions when enabling early if-conversion because of the missing machine models. Disable the pass for x86 until machine models are added. llvm-svn: 165075	2012-10-03 00:51:32 +00:00

... 7 8 9 10 11 ...

7283 Commits