llvm-project

Commit Graph

Author	SHA1	Message	Date
Jim Grosbach	5803f6d5a2	ARM assembly parsing for optional datatype suffix on VFP VMOV GPR<->VFP insns. Yet more of rdar://10435076. llvm-svn: 144691	2011-11-15 20:29:42 +00:00
Jim Grosbach	c5b1bc561e	ARM assembly parsing for two-operand form of 'mul' instruction. rdar://10449856. llvm-svn: 144689	2011-11-15 20:14:51 +00:00
Jim Grosbach	72dfd20aba	ARM assembly parsing for two-operand form of 'mul' instruction. Ongoing rdar://10435114. llvm-svn: 144688	2011-11-15 20:02:06 +00:00
Jim Grosbach	68c899c211	Testcase for r144684. llvm-svn: 144685	2011-11-15 19:56:17 +00:00
Owen Anderson	0ac9058f89	Fix an ambiguous decoding where we failed to properly decode VMOVv2f32 and VMOVv4f32. llvm-svn: 144683	2011-11-15 19:55:00 +00:00
Jim Grosbach	6efa7b9852	Thumb2 assembly parsing for mul.w in IT block fix. When the 3rd operand is not a low-register, and the first two operands are the same low register, the parser was incorrectly trying to use the 16-bit instruction encoding. rdar://10449281 llvm-svn: 144679	2011-11-15 19:29:45 +00:00
Rafael Espindola	f11e7f1305	We currently use a callback to handle an IL pass deleting a BB that still has a reference to it. Unfortunately, that doesn't work for codegen passes since we don't get notified of MBB's being deleted (the original BB stays). Use that fact to our advantage and after printing a function, check if any of the IL BBs corresponds to a symbol that was not printed. This fixes pr11202. llvm-svn: 144674	2011-11-15 19:08:46 +00:00
Jakob Stoklund Olesen	4949b9a283	Revert r144611 and r144613. These tests are actually correct, clang was miscompiling ExeDepsFix::processUses. Evan fixed the miscompilation in r144628. llvm-svn: 144630	2011-11-15 07:13:03 +00:00
Chandler Carruth	9b548a7fcf	Rather than trying to use the loop block sequence or the function block sequence when recovering from unanalyzable control flow constructs, always use the function sequence. I'm not sure why I ever went down the path of trying to use the loop sequence, it is fundamentally not the correct sequence to use. We're trying to preserve the incoming layout in the cases of unreasonable control flow, and that is only encoded at the function level. We already have a filter to select exactly the sub-set of blocks within the function that we're trying to form into a chain. The resulting code layout is also significantly better because of this. In several places we were ending up with completely unreasonable control flow constructs due to the ordering chosen by the loop structure for its internal storage. This change removes a completely wasteful vector of basic blocks, saving memory allocation in the common case even though it costs us CPU in the fairly rare case of unnatural loops. Finally, it fixes the latest crasher reduced out of GCC's single source. Thanks again to Benjamin Kramer for the reduction, my bugpoint skills failed at it. llvm-svn: 144627	2011-11-15 06:26:43 +00:00
Craig Topper	05baa85f58	Properly qualify AVX2 specific parts of execution dependency table. Also enable converting between 256-bit PS/PD operations when AVX1 is enabled. Fixes PR11370. llvm-svn: 144622	2011-11-15 05:55:35 +00:00
Jakob Stoklund Olesen	9c0de9bb6b	Really fix test. llvm-svn: 144613	2011-11-15 03:17:01 +00:00
Jakob Stoklund Olesen	14b66375a9	Allow for depencendy-breaking instructions before cvt*. This should unbreak clang-x86_64-darwin10-RA, but I can't actually reproduce the failure. llvm-svn: 144611	2011-11-15 02:29:48 +00:00
Evan Cheng	7ca4b6eb5c	Add vmov.f32 to materialize f32 immediate splats which cannot be handled by integer variants. rdar://10437054 llvm-svn: 144608	2011-11-15 02:12:34 +00:00
Jakob Stoklund Olesen	f8ad336bc4	Break false dependencies before partial register updates. Two new TargetInstrInfo hooks lets the target tell ExecutionDepsFix about instructions with partial register updates causing false unwanted dependencies. The ExecutionDepsFix pass will break the false dependencies if the updated register was written in the previoius N instructions. The small loop added to sse-domains.ll runs twice as fast with dependency-breaking instructions inserted. llvm-svn: 144602	2011-11-15 01:15:30 +00:00
Jim Grosbach	a498af2b1d	ARM parsing datatype suffix variants for non-writeback VST1 instructions. rdar://10435076 llvm-svn: 144593	2011-11-14 23:43:46 +00:00
Jim Grosbach	72838a0345	ARM parsing datatype suffix variants for non-writeback VLD1 instructions. rdar://10435076 llvm-svn: 144592	2011-11-14 23:32:59 +00:00
Jim Grosbach	3d6c0e0bb2	ARM parsing optional datatype suffix for VAND/VEOR/VORR instructions. rdar://10435076 llvm-svn: 144587	2011-11-14 23:11:19 +00:00
Jim Grosbach	3e2c6f380c	ARM VLDR/VSTR instructions don't need a size suffix. Canonicallize on the non-suffixed form, but continue to accept assembly that has any correctly sized type suffix. llvm-svn: 144583	2011-11-14 23:03:21 +00:00
Nick Lewycky	7013a19e8a	Refactor capture tracking (which already had a couple flags for whether returns and stores capture) to permit the caller to see each capture point and decide whether to continue looking. Use this inside memdep to do an analysis that basicaa won't do. This lets us solve another devirtualization case, fixing PR8908! llvm-svn: 144580	2011-11-14 22:49:42 +00:00
Chad Rosier	4e88fbebde	Add newline to end of file. Thanks, Eli. llvm-svn: 144579	2011-11-14 22:48:33 +00:00
Chad Rosier	ab7223e99a	Add support for inlining small memcpys. rdar://10412592 llvm-svn: 144578	2011-11-14 22:46:17 +00:00
Chad Rosier	45110fdf8d	Fix a performance regression from r144565. Positive offsets were being lowered into registers, rather then encoded directly in the load/store. llvm-svn: 144576	2011-11-14 22:34:48 +00:00
Evan Cheng	fb13d32b3f	Add a missing pattern for X86ISD::MOVLPD. rdar://10436044 llvm-svn: 144566	2011-11-14 20:35:52 +00:00
Chad Rosier	adfd200bcb	Add support for Thumb load/stores with negative offsets. rdar://10412592 llvm-svn: 144565	2011-11-14 20:22:27 +00:00
Evan Cheng	30f44ad785	Teach two-address pass to re-schedule two-address instructions (or the kill instructions of the two-address operands) in order to avoid inserting copies. This fixes the few regressions introduced when the two-address hack was disabled (without regressing the improvements). rdar://10422688 llvm-svn: 144559	2011-11-14 19:48:55 +00:00
Pete Cooper	890e02e854	Changed SSE4/AVX <2 x i64> extract and insert ops to be Custom lowered Constant idx case is still done in tablegen but other cases are then expanded Fixes <rdar://problem/10435460> llvm-svn: 144557	2011-11-14 19:38:42 +00:00
Jakob Stoklund Olesen	7e6004a3c1	Fix early-clobber handling in shrinkToUses. I broke this in r144515, it affected most ARM testers. <rdar://problem/10441389> llvm-svn: 144547	2011-11-14 18:45:38 +00:00
Jakob Stoklund Olesen	7e07b388ac	Delete stale comment. llvm-svn: 144542	2011-11-14 18:03:05 +00:00
Chandler Carruth	ed5aa547bc	Fix an overflow bug in MachineBranchProbabilityInfo. This pass relied on the sum of the edge weights not overflowing uint32, and crashed when they did. This is generally safe as BranchProbabilityInfo tries to provide this guarantee. However, the CFG can get modified during codegen in a way that grows the sum of the edge weights. This doesn't seem unreasonable (imagine just adding more blocks all with the default weight of 16), but it is hard to come up with a case that actually triggers 32-bit overflow. Fortuately, the single-source GCC build is good at this. The solution isn't very pretty, but its no worse than the previous code. We're already summing all of the edge weights on each query, we can sum them, check for an overflow, compute a scale, and sum them again. I've included a greatly reduced test case out of the GCC source that triggers it. It's a pretty lame test, as it clearly is just barely triggering the overflow. I'd like to have something that is much more definitive, but I don't understand the fundamental pattern that triggers an explosion in the edge weight sums. The buggy code is duplicated within this file. I'll colapse them into a single implementation in a subsequent commit. llvm-svn: 144526	2011-11-14 08:50:16 +00:00
Chad Rosier	2a1df883d0	Add support for ARM halfword load/stores and signed byte loads with negative offsets. rdar://10412592 llvm-svn: 144518	2011-11-14 04:09:28 +00:00
Chandler Carruth	1071cfa4ae	Teach machine block placement to cope with unnatural loops. These don't get loop info structures associated with them, and so we need some way to make forward progress selecting and placing basic blocks. The technique used here is pretty brutal -- it just scans the list of blocks looking for the first unplaced candidate. It keeps placing blocks like this until the CFG becomes tractable. The cost is somewhat unfortunate, it requires allocating a vector of all basic block pointers eagerly. I have some ideas about how to simplify and optimize this, but I'm trying to get the logic correct first. Thanks to Benjamin Kramer for the reduced test case out of GCC. Sadly there are other bugs that GCC is tickling that I'm reducing and working on now. llvm-svn: 144516	2011-11-14 00:00:35 +00:00
Chandler Carruth	8d15078927	Rewrite #3 of machine block placement. This is based somewhat on the second algorithm, but only loosely. It is more heavily based on the last discussion I had with Andy. It continues to walk from the inner-most loop outward, but there is a key difference. With this algorithm we ensure that as we visit each loop, the entire loop is merged into a single chain. At the end, the entire function is treated as a "loop", and merged into a single chain. This chain forms the desired sequence of blocks within the function. Switching to a single algorithm removes my biggest problem with the previous approaches -- they had different behavior depending on which system triggered the layout. Now there is exactly one algorithm and one basis for the decision making. The other key difference is how the chain is formed. This is based heavily on the idea Andy mentioned of keeping a worklist of blocks that are viable layout successors based on the CFG. Having this set allows us to consistently select the best layout successor for each block. It is expensive though. The code here remains very rough. There is a lot that needs to be done to clean up the code, and to make the runtime cost of this pass much lower. Very much WIP, but this was a giant chunk of code and I'd rather folks see it sooner than later. Everything remains behind a flag of course. I've added a couple of tests to exercise the issues that this iteration was motivated by: loop structure preservation. I've also fixed one test that was exhibiting the broken behavior of the previous version. llvm-svn: 144495	2011-11-13 11:20:44 +00:00
Chad Rosier	1198d894d0	The order in which the predicate is added differs between Thumb and ARM mode. Fix predicate when in ARM mode and restore SelectIntrinsicCall. llvm-svn: 144494	2011-11-13 09:44:21 +00:00
Chad Rosier	a476e391f1	Temporarily disable SelectIntrinsicCall when in ARM mode. This is causing failures. llvm-svn: 144492	2011-11-13 05:14:43 +00:00
Chad Rosier	c8cfd3a8fb	Add support for emitting both signed- and zero-extend loads. Fix SimplifyAddress to handle either a 12-bit unsigned offset or the ARM +/-imm8 offsets (addressing mode 3). This enables a load followed by an integer extend to be folded into a single load. For example: ldrb r1, [r0] ldrb r1, [r0] uxtb r2, r1 => mov r3, r2 mov r3, r1 llvm-svn: 144488	2011-11-13 02:23:59 +00:00
Jakob Stoklund Olesen	6ddb767fb5	Remove the -color-ss-with-regs option. It was off by default. The new register allocators don't have the problems that made it necessary to reallocate registers during stack slot coloring. llvm-svn: 144481	2011-11-13 00:31:23 +00:00
Jakob Stoklund Olesen	7ef502f6d1	Delete the 'standard' spiller with used the old spilling framework. The current register allocators all use the inline spiller. llvm-svn: 144477	2011-11-12 23:29:02 +00:00
Jakob Stoklund Olesen	ce4ef9f8d5	Remove histogram tests. Counting the number of occurences of each opcode is not a useful test. llvm-svn: 144474	2011-11-12 22:39:40 +00:00
Jakob Stoklund Olesen	0eac531bc2	RAGreedy is better about hinting now. Or maybe we are just getting lucky. llvm-svn: 144473	2011-11-12 22:39:37 +00:00
Jakob Stoklund Olesen	8ec1a92afd	Linear scan is going away. llvm-svn: 144472	2011-11-12 22:39:34 +00:00
Jakob Stoklund Olesen	654d60888e	XFAIL test that depends on linear scan to remove dead code. Filed PR11364 to track the problem. Should the register allocator eliminate dead code? llvm-svn: 144471	2011-11-12 22:39:30 +00:00
Jakob Stoklund Olesen	fa3a8ee6e2	Remove obsolete test. This test was committed with a bugfix to RemoveCopyByCommutingDef, but that optimization is no longer triggered by this test. llvm-svn: 144470	2011-11-12 22:39:27 +00:00
Jakob Stoklund Olesen	80b3d299a9	Remove obsolete test. This test is for a very specific LocalRewriter bug. LocalRewriter is going away. llvm-svn: 144469	2011-11-12 22:39:24 +00:00
Jakob Stoklund Olesen	0c7d9d90ef	Remove obsolete test. I don't think this test does what is was supposed to do, and LocalRewriter is going away anyway. llvm-svn: 144463	2011-11-12 20:37:57 +00:00
Jakob Stoklund Olesen	126f9779c3	Eliminate more linear scan tests. llvm-svn: 144462	2011-11-12 20:35:26 +00:00
Jakob Stoklund Olesen	9d090daa33	Switch a couple -O0 tests to RABasic. llvm-svn: 144461	2011-11-12 20:11:04 +00:00
Jakob Stoklund Olesen	4deff7bc1d	Switch a few tests off linearscan. llvm-svn: 144460	2011-11-12 19:53:52 +00:00
Jakob Stoklund Olesen	6ac6aa782d	Delete old test of a VirtRegRewriter feature. This test doesn't expose the issue with RAGreedy. I filed PR11363 to track the missing InlineSpiller feature. llvm-svn: 144459	2011-11-12 19:53:48 +00:00
Jakob Stoklund Olesen	74d091b395	Remove old test that doesn't make sense. The test is checking that the output doesn't contains any 'mov ' strings. It does contain movl, though. llvm-svn: 144458	2011-11-12 19:53:45 +00:00
Craig Topper	3dc75f9e3b	Add more AVX2 shift lowering support. Move AVX2 variable shift to use patterns instead of custom lowering code. llvm-svn: 144457	2011-11-12 09:58:49 +00:00

1 2 3 4 5 ...

14985 Commits