llvm-project

Commit Graph

Author	SHA1	Message	Date
Matt Arsenault	4c24d73709	R600/SI: Relax some ordering in tests. This will help with enabling misched llvm-svn: 216971	2014-09-02 21:45:50 +00:00
Matt Arsenault	b78875e979	R600/SI: Fix hardcoded register numbers in test llvm-svn: 216944	2014-09-02 20:43:07 +00:00
Matt Arsenault	d1649db2fc	R600/SI: Add failing testcase. This is broken when 64-bit add is only partially moved to the VALU. llvm-svn: 216933	2014-09-02 19:12:31 +00:00
Matt Arsenault	c1a71217b3	Fix interference caused by fmul 2, x -> fadd x, x If an fmul was introduced by lowering, it wouldn't be folded into a multiply by a constant since the earlier combine would have replaced the fmul with the fadd. llvm-svn: 216932	2014-09-02 19:02:53 +00:00
Reid Kleckner	0b2bccc3cd	CodeGen: Handle va_start in the entry block Also fix a small copy-paste bug in X86ISelLowering where Chain should have been used in place of DAG.getEntryToken(). Fixes PR20828. llvm-svn: 216929	2014-09-02 18:42:44 +00:00
Hal Finkel	e19006ea22	Enable splitting indexing from loads with TargetConstants When I recommitted r208640 (in r216898) I added an exclusion for TargetConstant offsets, as there is no guarantee that a backend can handle them on generic ADDs (even if it generates them during address-mode matching) -- and, specifically, applying this transformation directly with TargetConstants caused a self-hosting failure on PPC64. Ignoring all TargetConstants, however, is less than ideal. Instead, for non-opaque constants, we can convert them into regular constants for use with the generated ADD (or SUB). llvm-svn: 216908	2014-09-02 16:05:23 +00:00
Rafael Espindola	4dd3677b5f	Replace -use-init-array with -use-ctors. We have been using .init-array for most systems for quiet some time, but tools like llc are still defaulting to .ctors because the old option was never changed. This patch makes llc default to .init-array and changes the option to be -use-ctors. Clang is not affected by this. It has its own fancier logic. llvm-svn: 216905	2014-09-02 13:54:53 +00:00
David Xu	052b9d9282	Merge Extend and Shift into a UBFX llvm-svn: 216899	2014-09-02 09:33:56 +00:00
Hal Finkel	51e6fa2201	Revert "Revert '[DAGCombiner] Split up an indexed load if only the base pointer value is live'" I reverted r208640 in r209747 because r208640 broke self-hosting on PPC64. The underlying cause of the failure is that pre-inc loads with increments represented by ISD::TargetConstants were being transformed into ISD:::ADDs with ISD::TargetConstant operands. PPC doesn't have a pattern for those, and so they were selected as invalid r+r adds. This recommits r208640, rebased and with an exclusion for ISD::TargetConstant increments. This behavior seems correct, although in the future we might want to ask the target to split out the indexing that uses ISD::TargetConstants. Unfortunately, I don't yet have small test case where the relevant invalid 'add' instruction is not itself dead (and thus eliminated by DeadMachineInstructionElim -- sometimes bugpoint is too good at removing things) Original commit message (by Adam Nemet): Right now the load may not get DCE'd because of the side-effect of updating the base pointer. This can happen if we lower a read-modify-write of an illegal larger type (e.g. i48) such that the modification only affects one of the subparts (the lower i32 part but not the higher i16 part). See the testcase. In order to spot the dead load we need to revisit it when SimplifyDemandedBits decided that the value of the load is masked off. This is the CommitTargetLoweringOpt piece. I checked compile time with ARM64 by sending SPEC bitcode files through llc. No measurable change. Fixes <rdar://problem/16031651> llvm-svn: 216898	2014-09-02 06:24:04 +00:00
Jingyue Wu	263bab01f8	Fix a typo in comments in r216862, NFC PR20766 -> PR20776. Thanks Roman Divacky for the catch! llvm-svn: 216883	2014-09-01 14:55:04 +00:00
Tilmann Scheller	659dfb4dc8	[ARM] Add Thumb-2 code size optimization regression test for EOR. llvm-svn: 216881	2014-09-01 12:59:34 +00:00
Tilmann Scheller	2fc1adcaff	ARM] Add Thumb-2 code size optimization regression test for BIC. llvm-svn: 216880	2014-09-01 12:53:29 +00:00
Jingyue Wu	5208cc5dbe	[MachineSink] Use the real post dominator tree Summary: Fixes a FIXME in MachineSinking. Instead of using the simple heuristics in isPostDominatedBy, use the real MachinePostDominatorTree. The old heuristics caused instructions to sink unnecessarily, and might create register pressure. Test Plan: Added a NVPTX codegen test to verify that our change is in effect. It also shows the unnecessary register pressure caused by over-sinking. Updated affected tests in AArch64 and X86. Reviewers: eliben, meheff, Jiangning Reviewed By: Jiangning Subscribers: jholewinski, aemerson, mcrosier, llvm-commits Differential Revision: http://reviews.llvm.org/D4814 llvm-svn: 216862	2014-09-01 03:47:25 +00:00
Juergen Ributzka	25816b0fdd	Revert r216805 "[MachineCombiner][AArch64] Use the correct register class for MADD, SUB, and OR." I think this broke the build bot. Reverting it for now until I have time to take a closer look. llvm-svn: 216813	2014-08-30 06:16:26 +00:00
Juergen Ributzka	3e7f88c169	[MachineCombiner][AArch64] Use the correct register class for MADD, SUB, and OR. Select the correct register class for the various instructions that are generated when combining instructions and constrain the registers to the appropriate register class. This fixes rdar://problem/18183707. llvm-svn: 216805	2014-08-29 23:48:09 +00:00
Juergen Ributzka	c5c1c6090f	[FastISel][AArch64] Use the correct register class for branches. Also constrain the register class for branches. This fixes rdar://problem/18181496. llvm-svn: 216804	2014-08-29 23:48:06 +00:00
Juergen Ributzka	00d78221ab	[MachineSinking] Clear kill flag of all operands at all their uses. When sinking an instruction it might be moved past the original last use of one of its operands. This last use has the kill flag set and the verifier will obviously complain about this. Before Machine Sinking (AArch64): %vreg3<def> = ASRVXr %vreg1, %vreg2<kill> %XZR<def> = SUBSXrs %vreg4, %vreg1<kill>, 160, %NZCV<imp-def> ... After Machine Sinking: %XZR<def> = SUBSXrs %vreg4, %vreg1<kill>, 160, %NZCV<imp-def> ... %vreg3<def> = ASRVXr %vreg1, %vreg2<kill> This fix clears all the kill flags in all instruction that use the same operands as the instruction that is being sunk. This fixes rdar://problem/18180996. llvm-svn: 216803	2014-08-29 23:48:03 +00:00
Robin Morisset	039781ef26	Fix typos in comments, NFC Summary: Just fixing comments, no functional change. Test Plan: N/A Reviewers: jfb Subscribers: mcrosier, llvm-commits Differential Revision: http://reviews.llvm.org/D5130 llvm-svn: 216784	2014-08-29 21:53:01 +00:00
Reid Kleckner	16e5541211	musttail: Forward regparms of variadic functions on x86_64 Summary: If a variadic function body contains a musttail call, then we copy all of the remaining register parameters into virtual registers in the function prologue. We track the virtual registers through the function body, and add them as additional registers to pass to the call. Because this is all done in virtual registers, the register allocator usually gives us good code. If the function does a call, however, it will have to spill and reload all argument registers (ew). Forwarding regparms on x86_32 is not implemented because most compilers don't support varargs in 32-bit with regparms. Reviewers: majnemer Subscribers: aemerson, llvm-commits Differential Revision: http://reviews.llvm.org/D5060 llvm-svn: 216780	2014-08-29 21:42:08 +00:00
Reid Kleckner	329d4a2b29	Verifier: Don't reject varargs callee cleanup functions We've rejected these kinds of functions since r28405 in 2006 because it's impossible to lower the return of a callee cleanup varargs function. However there are lots of legal ways to leave such a function without returning, such as aborting. Today we can leave a function with a musttail call to another function with the correct prototype, and everything works out. I'm removing the verifier check declaring that a normal return from such a function is UB. Reviewed By: nlewycky Differential Revision: http://reviews.llvm.org/D5059 llvm-svn: 216779	2014-08-29 21:25:28 +00:00
Louis Gerbarg	03c627e8a7	Remove spurious mask operations from AArch64 add->compares on 16 and 8 bit values This patch checks for DAG patterns that are an add or a sub followed by a compare on 16 and 8 bit inputs. Since AArch64 does not support those types natively they are legalized into 32 bit values, which means that mask operations are inserted into the DAG to emulate overflow behaviour. In many cases those masks do not change the result of the processing and just introduce a dependent operation, often in the middle of a hot loop. This patch detects the relevent DAG patterns and then tests to see if the transforms are equivalent with and without the mask, removing the mask if possible. The exact mechanism of this patch was discusses in http://lists.cs.uiuc.edu/pipermail/llvmdev/2014-July/074444.html There is a reasonably good chance there are missed oppurtunities due to similiar (but not identical) DAG patterns that could be funneled into this test, adding them should be simple if we see test cases. Tests included. rdar://13754426 llvm-svn: 216776	2014-08-29 21:00:22 +00:00
Reid Kleckner	ab99e24e94	X86: Fix conflict over ESI between base register and rep;movsl The new solution is to not use this lowering if there are any dynamic allocas in the current function. We know up front if there are dynamic allocas, but we don't know if we'll need to create stack temporaries with large alignment during lowering. Conservatively assume that we will need such temporaries. Reviewed By: hans Differential Revision: http://reviews.llvm.org/D5128 llvm-svn: 216775	2014-08-29 20:50:31 +00:00
Juergen Ributzka	f6ee7a7cdd	[FastISel][AArch64] Fix an incorrect kill flag due to a bug in SelectTrunc. When we select a trunc instruction we don't emit any code if the type is already i32 or smaller. This is because the instruction that uses the truncated value will deal with it. This behavior can incorrectly transfer a kill flag, which was meant for the result of the truncate, onto the source register. %2 = trunc i32 %1 to i16 ... = ... %2 -> ... = ... vreg1 <kill> ... = ... %1 ... = ... vreg1 This commit fixes this by emitting a COPY instruction, so that the result and source register are distinct virtual registers. This fixes rdar://problem/18178188. llvm-svn: 216750	2014-08-29 17:58:16 +00:00
Tilmann Scheller	09366de7a3	[ARM] Add Thumb-2 code size optimization test for ASR (register). llvm-svn: 216746	2014-08-29 17:19:00 +00:00
Tilmann Scheller	b3854015db	[ARM] Add Thumb-2 code size optimization test for ASR (immediate). llvm-svn: 216744	2014-08-29 17:02:28 +00:00
Matt Arsenault	8675db15da	R600/SI: Use mad for fsub + fmul We can use a negate source modifier to match this for fsub. llvm-svn: 216735	2014-08-29 16:01:14 +00:00
Tim Northover	3c0915e858	AArch64: only try to get operand of a known node. A bug in r216725 meant we tried to discover the type of a SETCC before confirming the node actually was a SETCC. llvm-svn: 216734	2014-08-29 15:34:58 +00:00
Jingyue Wu	cb83a155c1	[NVPTX] Make the alignment an explicit argument to ldu/ldg Summary: Instead of specifying the alignment as metadata which may be destroyed by transformation passes, make the alignment the second argument to ldu/ldg intrinsic calls. Test Plan: ldu-ldg.ll ldu-i8.ll ldu-reg-plus-offset.ll Reviewers: eliben, meheff, jholewinski Reviewed By: meheff, jholewinski Subscribers: jholewinski, llvm-commits Differential Revision: http://reviews.llvm.org/D5093 llvm-svn: 216731	2014-08-29 15:30:20 +00:00
Tilmann Scheller	f28e7e7d34	[ARM] Make Thumb-2 code size optimization test more strict. llvm-svn: 216729	2014-08-29 15:13:35 +00:00
Tilmann Scheller	e8e577833a	[ARM] Add a first test for the Thumb-2 code size optimization pass. While working on a Thumb-2 code size optimization I just realized that we don't have any regression tests for it. So here's a first test case, I plan to increase the coverage over time. llvm-svn: 216728	2014-08-29 15:04:40 +00:00
Tim Northover	c1c05aeb5d	AArch64: skip select/setcc combine in complex case. In an llvm-stress generated test, we were trying to create a v0iN type and asserting when that failed. This case could probably be handled by the function, but not without added complexity and the situation it arises in is sufficiently odd that there's probably no benefit anyway. Should fix PR20775. llvm-svn: 216725	2014-08-29 13:05:18 +00:00
Robert Khasanov	a651a62340	[SKX] Enable lowering of integer CMP operations. Added new types to Legalizer. Fixed getSetCCResultType function Added lowering tests. Reviewed by Elena Demikhovsky. llvm-svn: 216717	2014-08-29 08:46:04 +00:00
Job Noorman	9b31bd6bb0	Do not assume the value passed to memset is an i32. The code in SelectionDAG::getMemset for some reason assumes the value passed to memset is an i32. This breaks the generated code for targets that only have registers smaller than 32 bits because the value might get split into multiple registers by the calling convention. See the test for the MSP430 target included in the patch for an example. This patch ensures that nothing is assumed about the type of the value. Instead, the type is taken from the selected overload of the llvm.memset intrinsic. llvm-svn: 216716	2014-08-29 08:23:53 +00:00
Jiangning Liu	08f4cda2ec	[AArch64] Fix some failures exposed by value type v4f16 and v8f16. 1) Add some missing bitcast patterns for v8f16. 2) Add type promotion for operand of ld/st operations. llvm-svn: 216706	2014-08-29 01:31:42 +00:00
Juergen Ributzka	77bc09f5ab	[FastISel][AArch64] Don't fold instructions that are not in the same basic block. This fix checks first if the instruction to be folded (e.g. sign-/zero-extend, or shift) is in the same machine basic block as the instruction we are folding into. Not doing so can result in incorrect code, because the value might not be live-out of the basic block, where the value is defined. This fixes rdar://problem/18169495. llvm-svn: 216700	2014-08-29 00:19:21 +00:00
Jim Grosbach	ec2b0d0b11	AArch64: More correctly constrain target vector extend lowering. The AArch64 target lowering for [zs]ext of vectors is set up to handle input simple types and expects the generic SDag path to do something reasonable with anything that's not a simple type. The code, however, was only checking that the result type was a simple type and assuming that implied that the source type would also be a simple type. That's not a valid assumption, as operations like "zext <1 x i1> %0 to <1 x i32>" demonstrate. The fix is to simply explicitly validate the source type as well as the result type. PR20791 llvm-svn: 216689	2014-08-28 22:08:28 +00:00
Rafael Espindola	b43d51de95	On MachO, don't put non-private constants in mergeable sections. On MachO, putting a symbol that doesn't start with a 'L' or 'l' in one of the __TEXT,__literal* sections prevents the linker from merging the context of the section. Since private GVs are the ones the get mangled to start with 'L' or 'l', we now only put those on the __TEXT,__literal* sections. llvm-svn: 216682	2014-08-28 20:13:31 +00:00
Sanjay Patel	81ecbb0737	Fix a logic bug in x86 vector codegen: sext (zext (x) ) != sext (x) (PR20472). Remove a block of code from LowerSIGN_EXTEND_INREG() that was added with: http://llvm.org/viewvc/llvm-project?view=revision&revision=177421 And caused: http://llvm.org/bugs/show_bug.cgi?id=20472 (more analysis here) http://llvm.org/bugs/show_bug.cgi?id=18054 The testcases confirm that we (1) don't remove a zext op that is necessary and (2) generate a pmovz instead of punpck if SSE4.1 is available. Although pmovz is 1 byte longer, it allows folding of the load, and so saves 3 bytes overall. Differential Revision: http://reviews.llvm.org/D4909 llvm-svn: 216679	2014-08-28 18:59:22 +00:00
David Xu	ee978203e6	Generate CMN when comparing a short int with minus llvm-svn: 216651	2014-08-28 04:59:53 +00:00
Chandler Carruth	5260c0894c	[x86] Clean up some tests to use FileCheck and combine two into a single file. Changing code that is covered by these tests is just too hard to debug currently, and now it will be clear the nature of the changes. llvm-svn: 216643	2014-08-28 03:41:28 +00:00
Juergen Ributzka	31328168bb	[FastISel] Undo phi node updates when falling-back to SelectionDAG. The included test case would fail, because the MI PHI node would have two operands from the same predecessor. This problem occurs when a switch instruction couldn't be selected. This happens always, because there is no default switch support for FastISel to begin with. The problem was that FastISel would first add the operand to the PHI nodes and then fall-back to SelectionDAG, which would then in turn add the same operands to the PHI nodes again. This fix removes these duplicate PHI node operands by reseting the PHINodesToUpdate to its original state before FastISel tried to select the instruction. This fixes <rdar://problem/18155224>. llvm-svn: 216640	2014-08-28 02:06:55 +00:00
Juergen Ributzka	4f1a54a41a	[FastISel] Currently instructions are folded very aggressively for AArch64 into the memory operation, which can lead to the use of killed operands: %vreg1<def> = ADDXri %vreg0<kill>, 2 %vreg2<def> = LDRBBui %vreg0, 2 ... = ... %vreg1 ... This usually happens when the result is also used by another non-memory instruction in the same basic block, or any instruction in another basic block. This fix teaches hasTrivialKill to not only check the LLVM IR that the value has a single use, but also to check if the register that represents that value has already been used. This can happen when the instruction with the use was folded into another instruction (in this particular case a load instruction). This fixes rdar://problem/18142857. llvm-svn: 216634	2014-08-28 00:09:46 +00:00
Juergen Ributzka	843f14f411	Revert "[FastISel][AArch64] Don't fold instructions too aggressively into the memory operation." Quentin pointed out that this is not the correct approach and there is a better and easier solution. llvm-svn: 216632	2014-08-27 23:09:40 +00:00
Juergen Ributzka	ad8beabe38	[FastISel][AArch64] Don't fold instructions too aggressively into the memory operation. Currently instructions are folded very aggressively into the memory operation, which can lead to the use of killed operands: %vreg1<def> = ADDXri %vreg0<kill>, 2 %vreg2<def> = LDRBBui %vreg0, 2 ... = ... %vreg1 ... This usually happens when the result is also used by another non-memory instruction in the same basic block, or any instruction in another basic block. If the computed address is used by only memory operations in the same basic block, then it is safe to fold them. This is because all memory operations will fold the address computation and the original computation will never be emitted. This fixes rdar://problem/18142857. llvm-svn: 216629	2014-08-27 22:52:33 +00:00
Juergen Ributzka	3c1b286152	[FastISel][AArch64] Fix simplify address when the address comes from a shift. When the address comes directly from a shift instruction then the address computation cannot be folded into the memory instruction, because the zero register is not available as a base register. Simplify addess needs to emit the shift instruction and use the result as base register. llvm-svn: 216621	2014-08-27 21:38:33 +00:00
Juergen Ributzka	100a9b7fda	[FastISel][AArch64] Use the zero register for stores. Use the zero register directly when possible to avoid an unnecessary register copy and a wasted register at -O0. This also uses integer stores to store a positive floating-point zero. This saves us from materializing the positive zero in a register and then storing it. llvm-svn: 216617	2014-08-27 21:04:52 +00:00
Oliver Stannard	89d1542840	Teach the AArch64 backend about v4f16 and v8f16 This teaches the AArch64 backend to deal with the operations required to deal with the operations on v4f16 and v8f16 which are exposed by NEON intrinsics, plus the add, sub, mul and div operations. llvm-svn: 216555	2014-08-27 16:16:04 +00:00
Chandler Carruth	a5a8a9adc8	[x86] Fix a regression introduced with r213897 for 32-bit targets where we stopped efficiently lowering sextload using the SSE41 instructions for that operation. This is a consequence of a bad predicate I used thinking of the memory access needs. The code actually handles the cases where the predicate doesn't apply, and handles them much better. =] Simple fix and a test case added. Fixes PR20767. llvm-svn: 216538	2014-08-27 11:39:47 +00:00
Chandler Carruth	74ec9e19ee	[SDAG] Re-instate r215611 with a fix to a pesky X86 DAG combine. This combine is essentially combining target-specific nodes back into target independent nodes that it "knows" will be combined yet again by a target independent DAG combine into a different set of target-independent nodes that are legal (not custom though!) and thus "ok". This seems... deeply flawed. The crux of the problem is that we don't combine un-legalized shuffles that are introduced by legalizing other operations, and thus we don't see a very profitable combine opportunity. So the backend just forces the input to that combine to re-appear. However, for this to work, the conditions detected to re-form the unlegalized nodes must be exactly right. Previously, failing this would have caused poor code (if you're lucky) or a crasher when we failed to select instructions. After r215611 we would fall back into the legalizer. In some cases, this just "fixed" the crasher by produces bad code. But in the test case added it caused the legalizer and the dag combiner to iterate forever. The fix is to make the alignment checking in the x86 side of things match the alignment checking in the generic DAG combine exactly. This isn't really a satisfying or principled fix, but it at least make the code work as intended. It also highlights that it would be nice to detect the availability of under aligned loads for a given type rather than bailing on this optimization. I've left a FIXME to document this. Original commit message for r215611 which covers the rest of the chang: [SDAG] Fix a case where we would iteratively legalize a node during combining by replacing it with something else but not re-process the node afterward to remove it. In a truly remarkable stroke of bad luck, this would (in the test case attached) end up getting some other node combined into it without ever getting re-processed. By adding it back on to the worklist, in addition to deleting the dead nodes more quickly we also ensure that if it stops being dead for any reason it makes it back through the legalizer. Without this, the test case will end up failing during instruction selection due to an and node with a type we don't have an instruction pattern for. It took many million runs of the shuffle fuzz tester to find this. llvm-svn: 216537	2014-08-27 11:22:16 +00:00
Elena Demikhovsky	ff620edd3c	AVX-512: Added intrinsic for VMOVSS store form with mask. llvm-svn: 216530	2014-08-27 07:38:43 +00:00

1 2 3 4 5 ...

10651 Commits