llvm-project

Commit Graph

Author	SHA1	Message	Date
Jakob Stoklund Olesen	73d96651ab	FileCheckize these tests. Add an extra test to ldr_post with an immediate increment. llvm-svn: 154859	2012-04-16 20:56:42 +00:00
Jakob Stoklund Olesen	e8ee9d1c8c	Disable code placement for this test. It makes it less sensitive to small changes in heuristics. llvm-svn: 154857	2012-04-16 20:49:06 +00:00
Richard Smith	12da79b859	Fix incorrect atomics codegen introduced in r154705, and extend test to catch it. llvm-svn: 154845	2012-04-16 18:43:53 +00:00
Bill Wendling	7e6be75e06	Move to X86 directory because this fails on non-X86 platforms. llvm-svn: 154825	2012-04-16 16:38:48 +00:00
Chandler Carruth	4190b507c5	Flip the new block-placement pass to be on by default. This is mostly to test the waters. I'd like to get results from FNT build bots and other bots running on non-x86 platforms. This feature has been pretty heavily tested over the last few months by me, and it fixes several of the execution time regressions caused by the inlining work by preventing inlining decisions from radically impacting block layout. I've seen very large improvements in yacr2 and ackermann benchmarks, along with the expected noise across all of the benchmark suite whenever code layout changes. I've analyzed all of the regressions and fixed them, or found them to be impossible to fix. See my email to llvmdev for more details. I'd like for this to be in 3.1 as it complements the inliner changes, but if any failures are showing up or anyone has concerns, it is just a flag flip and so can be easily turned off. I'm switching it on tonight to try and get at least one run through various folks' performance suites in case SPEC or something else has serious issues with it. I'll watch bots and revert if anything shows up. llvm-svn: 154816	2012-04-16 13:49:17 +00:00
Chandler Carruth	a355e7cf82	Remove an overly brittle test. This test will no longer be interesting once we start changing the block layout, so just nuke it. If anyone has ideas about how to craft a code layout agnostic form of the test please let me know. llvm-svn: 154815	2012-04-16 13:49:09 +00:00
Chandler Carruth	8c0b41d656	Add a somewhat hacky heuristic to do something different from whole-loop rotation. When there is a loop backedge which is an unconditional branch, we will end up with a branch somewhere no matter what. Try placing this backedge in a fallthrough position above the loop header as that will definitely remove at least one branch from the loop iteration, where whole loop rotation may not. I haven't seen any benchmarks where this is important but loop-blocks.ll tests for it, and so this will be covered when I flip the default. llvm-svn: 154812	2012-04-16 13:33:36 +00:00
Chandler Carruth	8c74c7b1c6	Tweak the loop rotation logic to check whether the loop is naturally laid out in a form with a fallthrough into the header and a fallthrough out of the bottom. In that case, leave the loop alone because any rotation will introduce unnecessary branches. If either side looks like it will require an explicit branch, then the rotation won't add any, do it to ensure the branch occurs outside of the loop (if possible) and maximize the benefit of the fallthrough in the bottom. llvm-svn: 154806	2012-04-16 09:31:23 +00:00
Hal Finkel	e0cf6397fd	Remove dead SD nodes after the combining pass. Fixes PR12201. llvm-svn: 154786	2012-04-16 03:33:22 +00:00
Chandler Carruth	ccc7e42b1f	Rewrite how machine block placement handles loop rotation. This is a complex change that resulted from a great deal of experimentation with several different benchmarks. The one which proved the most useful is included as a test case, but I don't know that it captures all of the relevant changes, as I didn't have specific regression tests for each, they were more the result of reasoning about what the old algorithm would possibly do wrong. I'm also failing at the moment to craft more targeted regression tests for these changes, if anyone has ideas, it would be welcome. The first big thing broken with the old algorithm is the idea that we can take a basic block which has a loop-exiting successor and a looping successor and use the looping successor as the layout top in order to get that particular block to be the bottom of the loop after layout. This happens to work in many cases, but not in all. The second big thing broken was that we didn't try to select the exit which fell into the nearest enclosing loop (to which we exit at all). As a consequence, even if the rotation worked perfectly, it would result in one of two bad layouts. Either the bottom of the loop would get fallthrough, skipping across a nearer enclosing loop and thereby making it discontiguous, or it would be forced to take an explicit jump over the nearest enclosing loop to earch its successor. The point of the rotation is to get fallthrough, so we need it to fallthrough to the nearest loop it can. The fix to the first issue is to actually layout the loop from the loop header, and then rotate the loop such that the correct exiting edge can be a fallthrough edge. This is actually much easier than I anticipated because we can handle all the hard parts of finding a viable rotation before we do the layout. We just store that, and then rotate after layout is finished. No inner loops get split across the post-rotation backedge because we check for them when selecting the rotation. That fix exposed a latent problem with our exitting block selection -- we should allow the backedge to point into the middle of some inner-loop chain as there is no real penalty to it, the whole point is that it won't be a fallthrough edge. This may have blocked the rotation at all in some cases, I have no idea and no test case as I've never seen it in practice, it was just noticed by inspection. Finally, all of these fixes, and studying the loops they produce, highlighted another problem: in rotating loops like this, we sometimes fail to align the destination of these backwards jumping edges. Fix this by actually walking the backwards edges rather than relying on loopinfo. This fixes regressions on heapsort if block placement is enabled as well as lots of other cases where the previous logic would introduce an abundance of unnecessary branches into the execution. llvm-svn: 154783	2012-04-16 01:12:56 +00:00
Craig Topper	bfc9a5f7d3	Remove AVX2 vpermq and vpermpd intrinsics. These can now be handled with normal shuffle vectors. llvm-svn: 154778	2012-04-15 22:43:31 +00:00
Nadav Rotem	42bcd04ee3	Fix PR12529. The Vxx family of instructions are only supported by AVX. Use non-vex instructions for SSE4. llvm-svn: 154770	2012-04-15 19:36:44 +00:00
Nadav Rotem	02ef0c3524	When emulating vselect using OR/AND/XOR make sure to bitcast the result back to the original type. llvm-svn: 154764	2012-04-15 15:08:09 +00:00
Elena Demikhovsky	779a72b49e	Added VPERM optimization for AVX2 shuffles llvm-svn: 154761	2012-04-15 11:18:59 +00:00
Richard Smith	3e8f1f6aea	Fix X86 codegen for 'atomicrmw nand' to generate x = ~(x & y), not x = ~x & y. llvm-svn: 154705	2012-04-13 22:47:00 +00:00
Evan Cheng	267a4ada52	On Darwin targets, only use vfma etc. if the source use fma() intrinsic explicitly. llvm-svn: 154689	2012-04-13 18:59:28 +00:00
Sirish Pande	1d195b9c25	Disable Hexagon test temporarily. There is an assert at line 558 in ScheduleDAGInstrs::buildSchedGraph(AliasAnalysis *AA). This assert needs to addressed for post RA scheduler. Until that assert is addressed, any passes that uses post ra scheduler will fail. So, I am temporarily disabling the hexagon tests until that fix is in. The assert is as follows: assert(!MI->isTerminator() && !MI->isLabel() && "Cannot schedule terminators or labels!"); llvm-svn: 154617	2012-04-12 21:06:54 +00:00
Craig Topper	d0271b27cb	Fix 128-bit ptest intrinsics to take v2i64 instead of v4f32 since these are integer instructions. llvm-svn: 154580	2012-04-12 07:23:00 +00:00
Akira Hatanaka	c80ae58a5e	Revert changes that were accidentally committed. llvm-svn: 154563	2012-04-11 23:19:55 +00:00
Akira Hatanaka	1e962f250b	Fix string that is being checked. llvm-svn: 154547	2012-04-11 23:11:33 +00:00
Akira Hatanaka	47ad674f67	Emit neg.s or neg.d only if -enable-no-nans-fp-math is supplied by user, otherwise expand FNEG during legalization. llvm-svn: 154546	2012-04-11 22:59:08 +00:00
Akira Hatanaka	7f4c9d1429	Emit abs.s or abs.d only if -enable-no-nans-fp-math is supplied by user. Invalid operation is signaled if the operand of these instructions is NaN. llvm-svn: 154545	2012-04-11 22:49:04 +00:00
Akira Hatanaka	4f5c8421b3	Fix bugs in lowering of FCOPYSIGN nodes. - FCOPYSIGN nodes that have operands of different types were not handled. - Different code was generated depending on the endianness of the target. Additionally, code is added that emits INS and EXT instructions, if they are supported by target (they are R2 instructions). llvm-svn: 154540	2012-04-11 22:13:04 +00:00
Evan Cheng	5efc442290	Add more fused mul+add/sub patterns. rdar://10139676 llvm-svn: 154484	2012-04-11 06:59:47 +00:00
Nadav Rotem	9bc178ac5c	Reapply 154396 after fixing a test. Original message: Modify the code that lowers shuffles to blends from using blendvXX to vblendXX. blendV uses a register for the selection while Vblend uses an immediate. On sandybridge they still have the same latency and execute on the same execution ports. llvm-svn: 154483	2012-04-11 06:40:27 +00:00
Evan Cheng	67a09fc397	Match (fneg (fma) to vfnma. rdar://10139676 llvm-svn: 154469	2012-04-11 01:21:25 +00:00
Evan Cheng	d0f61cbefe	Merge fma.ll into fusedMAC.ll llvm-svn: 154466	2012-04-11 01:03:11 +00:00
Jakob Stoklund Olesen	0bcf8f4bfb	Fix test to be register assignment invariant. llvm-svn: 154453	2012-04-11 00:00:24 +00:00
Owen Anderson	6f1ee1634d	Move the constant-folding support for FP_ROUND in SelectionDAG from the one-operand version of getNode() to the two-operand version, since it became a two-operand node at sound point. Zap a testcase that this allows us to completely fold away. llvm-svn: 154447	2012-04-10 22:46:53 +00:00
Evan Cheng	d0007f3c83	Handle llvm.fma.* intrinsics. rdar://10914096 llvm-svn: 154439	2012-04-10 21:40:28 +00:00
Duncan Sands	4f53074cca	Add a comment noting that the fdiv -> fmul conversion won't generate multiplication by a denormal, and some tests checking that. llvm-svn: 154431	2012-04-10 20:35:27 +00:00
Eric Christopher	65ada95b84	Temporarily revert this patch to see if it brings the buildbots back. llvm-svn: 154425	2012-04-10 19:33:16 +00:00
Eric Christopher	e9abba71fe	To ensure that we have more accurate line information for a block don't elide the branch instruction if it's the only one in the block, otherwise it's ok. PR9796 and rdar://11215207 llvm-svn: 154417	2012-04-10 18:18:10 +00:00
Nadav Rotem	f934f91709	Modify the code that lowers shuffles to blends from using blendvXX to vblendXX. blendv uses a register for the selection while vblend uses an immediate. On sandybridge they still have the same latency and execute on the same execution ports. llvm-svn: 154396	2012-04-10 14:33:13 +00:00
Anton Korobeynikov	4d1220de34	Transform div to mul with reciprocal only when fp imm is legal. This fixes PR12516 and uncovers one weird problem in legalize (workarounded) llvm-svn: 154394	2012-04-10 13:22:49 +00:00
Evan Cheng	0752624970	Add proper checks. llvm-svn: 154379	2012-04-10 03:15:42 +00:00
Evan Cheng	f8bad08001	Fix a long standing tail call optimization bug. When a libcall is emitted legalizer always use the DAG entry node. This is wrong when the libcall is emitted as a tail call since it effectively folds the return node. If the return node's input chain is not the entry (i.e. call, load, or store) use that as the tail call input chain. PR12419 rdar://9770785 rdar://11195178 llvm-svn: 154370	2012-04-10 01:51:00 +00:00
Rafael Espindola	1d9672bdce	Don't try to zExt just to check if an integer constant is zero, it might not fit in a i64. llvm-svn: 154364	2012-04-10 00:16:22 +00:00
Lang Hames	ec96cd0690	Test case for PR12495. llvm-svn: 154359	2012-04-09 23:58:59 +00:00
Akira Hatanaka	8483a6c47d	Have TargetLowering::getPICJumpTableRelocBase return a node that points to the GOT if jump table uses 64-bit gp-relative relocation. llvm-svn: 154341	2012-04-09 20:32:12 +00:00
Chad Rosier	e0e38f61a5	When performing a truncating store, it's possible to rearrange the data in-register, such that we can use a single vector store rather then a series of scalar stores. For func_4_8 the generated code vldr d16, LCPI0_0 vmov d17, r0, r1 vadd.i16 d16, d17, d16 vmov.u16 r0, d16[3] strb r0, [r2, #3] vmov.u16 r0, d16[2] strb r0, [r2, #2] vmov.u16 r0, d16[1] strb r0, [r2, #1] vmov.u16 r0, d16[0] strb r0, [r2] bx lr becomes vldr d16, LCPI0_0 vmov d17, r0, r1 vadd.i16 d16, d17, d16 vuzp.8 d16, d17 vst1.32 {d16[0]}, [r2, :32] bx lr I'm not fond of how this combine pessimizes 2012-03-13-DAGCombineBug.ll, but I couldn't think of a way to judiciously apply this combine. This ldrh r0, [r0, #4] strh r0, [r1] becomes vldr d16, [r0] vmov.u16 r0, d16[2] vmov.32 d16[0], r0 vuzp.16 d16, d17 vst1.32 {d16[0]}, [r1, :32] PR11158 rdar://10703339 llvm-svn: 154340	2012-04-09 20:32:02 +00:00
Rafael Espindola	8f62b3248e	Pattern match a setcc of boolean value with 0 as a truncate. llvm-svn: 154322	2012-04-09 16:06:03 +00:00
Nadav Rotem	fb7e2ae53c	Lower some x86 shuffle sequences to the vblend family of instructions. llvm-svn: 154313	2012-04-09 08:33:21 +00:00
Nadav Rotem	b801ca3976	Fix a bug in the lowering of broadcasts: ConstantPools need to use the target pointer type. Move NormalizeVectorShuffle and LowerVectorBroadcast into X86TargetLowering. llvm-svn: 154310	2012-04-09 07:45:58 +00:00
Chandler Carruth	3779ac10b4	Cleanup and relax a restriction on the matching of global offsets into x86 addressing modes. This allows PIE-based TLS offsets to fit directly into an addressing mode immediate offset, which is the last remaining code quality issue from PR12380. With this patch, that PR is completely fixed. To understand why this patch is correct to match these offsets into addressing mode immediates, break it down by cases: 1) 32-bit is trivially correct, and unmodified here. 2) 64-bit non-small mode is unchanged and never matches. 3) 64-bit small PIC code which is RIP-relative is handled specially in the match to try to fit RIP into the base register. If it fails, it now early exits. This behavior is unchanged by the patch. 4) 64-bit small non-PIC code which is not RIP-relative continues to work as it did before. The reason these immediates are safe is because the ABI ensures they fit in small mode. This behavior is unchanged. 5) 64-bit small PIC code which is not using RIP-relative addressing. This is the only case changed by the patch, and the primary place you see it is in TLS, either the win64 section offset TLS or Linux local-exec TLS model in a PIC compilation. Here the ABI again ensures that the immediates fit because we are in small mode, and any other operations required due to the PIC relocation model have been handled externally to the Wrapper node (extra loads etc are made around the wrapper node in ISelLowering). I've tested this as much as I can comparing it with GCC's output, and everything appears safe. I discussed this with Anton and it made sense to him at least at face value. That said, if there are issues with PIC code after this patch, yell and we can revert it. llvm-svn: 154304	2012-04-09 02:13:06 +00:00
Chandler Carruth	84b834267e	Fold 15 tiny test cases into a single file that implements the comprehensive testing of TLS codegen for x86. Convert all of the ones that were still using grep to use FileCheck. Remove some redundancies between them. Perhaps most interestingly expand the test cases so that they actually fully list the instruction snippet being tested. TLS operations are very narrowly defined, and so these seem reasonably stable. More importantly, the existing test cases already were crazy fine grained, expecting specific registers to be allocated. This just clarifies that no other instructions are expected, and fills in some crucial gaps that weren't being tested at all. This will make any subsequent changes to TLS much more clear during review. llvm-svn: 154303	2012-04-09 01:43:17 +00:00
Duncan Sands	2f1dc3814b	Only have codegen turn fdiv by a constant into fmul by the reciprocal when -ffast-math, i.e. don't just always do it if the reciprocal can be formed exactly. There is already an IR level transform that does that, and it does it more carefully. llvm-svn: 154296	2012-04-08 18:08:12 +00:00
Chandler Carruth	ede4a8aa2b	Teach LLVM about a PIE option which, when enabled on top of PIC, makes optimizations which are valid for position independent code being linked into a single executable, but not for such code being linked into a shared library. I discussed the design of this with Eric Christopher, and the decision was to support an optional bit rather than a completely separate relocation model. Fundamentally, this is still PIC relocation, its just that certain optimizations are only valid under a PIC relocation model when the resulting code won't be in a shared library. The simplest path to here is to expose a single bit option in the TargetOptions. If folks have different/better designs, I'm all ears. =] I've included the first optimization based upon this: changing TLS models to the *Exec models when PIE is enabled. This is the LLVM component of PR12380 and is all of the hard work. llvm-svn: 154294	2012-04-08 17:51:45 +00:00
Nadav Rotem	82609df647	AVX2: Build splat vectors by broadcasting a scalar from the constant pool. Previously we used three instructions to broadcast an immediate value into a vector register. On Sandybridge we continue to load the broadcasted value from the constant pool. llvm-svn: 154284	2012-04-08 12:54:54 +00:00
Nadav Rotem	71d07ae5cb	1. Remove the part of r153848 which optimizes shuffle-of-shuffle into a new shuffle node because it could introduce new shuffle nodes that were not supported efficiently by the target. 2. Add a more restrictive shuffle-of-shuffle optimization for cases where the second shuffle reverses the transformation of the first shuffle. llvm-svn: 154266	2012-04-07 21:19:08 +00:00

1 2 3 4 5 ...

5800 Commits