llvm-project

Commit Graph

Author	SHA1	Message	Date
Chandler Carruth	8a8cd2bab9	Re-sort all of the includes with ./utils/sort_includes.py so that subsequent changes are easier to review. About to fix some layering issues, and wanted to separate out the necessary churn. Also comment and sink the include of "Windows.h" in three .inc files to match the usage in Memory.inc. llvm-svn: 198685	2014-01-07 11:48:04 +00:00
Andrew Trick	e339828b90	Allow MachineCSE to coalesce trivial subregister copies the same way that it coalesces normal copies. Without this, MachineCSE is powerless to handle redundant operations with truncated source operands. This required fixing the 2-addr pass to handle tied subregisters. It isn't clear what combinations of subregisters can legally be tied, but the simple case of truncated source operands is now safely handled: %vreg11<def> = COPY %vreg1:sub_32bit; GR32:%vreg11 GR64:%vreg1 %vreg12<def> = COPY %vreg2:sub_32bit; GR32:%vreg12 GR64:%vreg2 %vreg13<def,tied1> = ADD32rr %vreg11<tied0>, %vreg12<kill>, %EFLAGS<imp-def> Test case: cse-add-with-overflow.ll. This exposed an existing bug in PPCInstrInfo::commuteInstruction. Thanks to Rafael for the test case: PowerPC/crash.ll. llvm-svn: 197465	2013-12-17 04:50:45 +00:00
Andrew Trick	9defbd882b	whitespace llvm-svn: 197464	2013-12-17 04:50:40 +00:00
Hal Finkel	ceb1f12d9a	Improve instruction scheduling for the PPC POWER7 Aside from a few minor latency corrections, the major change here is a new hazard recognizer which focuses on better dispatch-group formation on the POWER7. As with the PPC970's hazard recognizer, the most important thing it does is avoid load-after-store hazards within the same dispatch group. It uses the POWER7's special dispatch-group-terminating nop instruction (instead of inserting multiple regular nop instructions). This new hazard recognizer makes use of the scheduling dependency graph itself, built using AA information, to robustly detect the possibility of load-after-store hazards. significant test-suite performance changes (the error bars are 99.5% confidence intervals based on 5 test-suite runs both with and without the change -- speedups are negative): speedups: MultiSource/Benchmarks/FreeBench/pcompress2/pcompress2 -0.55171% +/- 0.333168% MultiSource/Benchmarks/TSVC/CrossingThresholds-dbl/CrossingThresholds-dbl -17.5576% +/- 14.598% MultiSource/Benchmarks/TSVC/Reductions-dbl/Reductions-dbl -29.5708% +/- 7.09058% MultiSource/Benchmarks/TSVC/Reductions-flt/Reductions-flt -34.9471% +/- 11.4391% SingleSource/Benchmarks/BenchmarkGame/puzzle -25.1347% +/- 11.0104% SingleSource/Benchmarks/Misc/flops-8 -17.7297% +/- 9.79061% SingleSource/Benchmarks/Shootout-C++/ary3 -35.5018% +/- 23.9458% SingleSource/Regression/C/uint64_to_float -56.3165% +/- 25.4234% SingleSource/UnitTests/Vectorizer/gcc-loops -18.5309% +/- 6.8496% regressions: MultiSource/Benchmarks/ASCI_Purple/SMG2000/smg2000 18.351% +/- 12.156% SingleSource/Benchmarks/Shootout-C++/methcall 27.3086% +/- 14.4733% llvm-svn: 197099	2013-12-12 00:19:11 +00:00
Hal Finkel	94a6f380bb	Fix the PPC subsumes-predicate check For one predicate to subsume another, they must both check the same condition register. Failure to check this prerequisite was causing miscompiles. Fixes PR18003. llvm-svn: 197089	2013-12-11 23:12:25 +00:00
Hal Finkel	563cc05cae	Remove PPCScoreboardHazardRecognizer PPCScoreboardHazardRecognizer was a subclass of ScoreboardHazardRecognizer which did only one thing: filtered out nodes in EmitInstruction for which DAG->getInstrDesc(SU) returned NULL. This used to be the case for PPC pseudo instructions. As far as I can tell, this is no longer true, and so we can use ScoreboardHazardRecognizer directly. llvm-svn: 196171	2013-12-02 23:52:46 +00:00
Juergen Ributzka	d12ccbd343	[weak vtables] Remove a bunch of weak vtables This patch removes most of the trivial cases of weak vtables by pinning them to a single object file. The memory leaks in this version have been fixed. Thanks Alexey for pointing them out. Differential Revision: http://llvm-reviews.chandlerc.com/D2068 Reviewed by Andy llvm-svn: 195064	2013-11-19 00:57:56 +00:00
Alexey Samsonov	49109a279c	Revert r194865 and r194874. This change is incorrect. If you delete virtual destructor of both a base class and a subclass, then the following code: Base *foo = new Child(); delete foo; will not cause the destructor for members of Child class. As a result, I observe plently of memory leaks. Notable examples I investigated are: ObjectBuffer and ObjectBufferStream, AttributeImpl and StringSAttributeImpl. llvm-svn: 194997	2013-11-18 09:31:53 +00:00
Juergen Ributzka	dbedae89b9	[weak vtables] Remove a bunch of weak vtables This patch removes most of the trivial cases of weak vtables by pinning them to a single object file. Differential Revision: http://llvm-reviews.chandlerc.com/D2068 Reviewed by Andy llvm-svn: 194865	2013-11-15 22:34:48 +00:00
Hal Finkel	8e8618ae5c	Fix register subclass handling in PPCInstrInfo::insertSelect PPCInstrInfo::insertSelect and PPCInstrInfo::canInsertSelect were computing the common subclass of the true and false inputs, and then selecting either the 32-bit or the 64-bit isel variant based on the result of calling PPC::GPRCRegClass.hasSubClassEq(RC) and PPC::G8RCRegClass.hasSubClassEq(RC) (where RC is the common subclass). Unfortunately, this is not quite right: if we have something like this: %vreg8<def> = SELECT_CC_I8 %vreg4<kill>, %vreg7<kill>, %vreg6<kill>, 76; G8RC_and_G8RC_NOX0:%vreg8 CRRC:%vreg4 G8RC_NOX0:%vreg7,%vreg6 then the common subclass of G8RC_and_G8RC_NOX0 and G8RC_NOX0 is G8RC_NOX0, and G8RC_NOX0 is not a subclass of G8RC (because it also contains the ZERO8 pseudo-register). As a result, we also need to check the common subclass against GPRC_NOR0 and G8RC_NOX0 explicitly. This had not been a problem for clients of insertSelect that called canInsertSelect first (because it had a compensating mistake), but insertSelect is also used by the PPC pseudo-instruction expander, and this error was causing a problem in that context. This problem was found by csmith. llvm-svn: 186343	2013-07-15 20:22:58 +00:00
David Blaikie	b735b4d6db	DebugInfo: remove target-specific Frame Index handling for DBG_VALUE MachineInstrs Frame index handling is now target-agnostic, so delete the target hooks for creation & asm printing of target-specific addressing in DBG_VALUEs and any related functions. llvm-svn: 184067	2013-06-16 20:34:27 +00:00
Benjamin Kramer	f0ec199448	Fold variable that's only used in assert into the assert. Avoids unused variable warnings in Release builds. llvm-svn: 183512	2013-06-07 11:23:35 +00:00
Bill Wendling	5e7656bf0c	Don't cache the instruction and register info from the TargetMachine, because the internals of TargetMachine could change. No functionality change intended. llvm-svn: 183494	2013-06-07 07:55:53 +00:00
Hal Finkel	08e53ee551	PPCInstrInfo::optimizeCompareInstr should not optimize FP compares The floating-point record forms on PPC don't set the condition register bits based on a comparison with zero (like the integer record forms do), but rather based on the exception status bits. llvm-svn: 181423	2013-05-08 12:16:14 +00:00
Hal Finkel	c363245ff2	Cleanup PPCInstrInfo::optimizeCompareInstr Implement suggestions by Bill Schmidt in post-commit review. No functionality change intended. llvm-svn: 181338	2013-05-07 17:49:55 +00:00
Hal Finkel	0f64e21bb9	Move PPC getSwappedPredicate for reuse The getSwappedPredicate function can be used in other places (such as in improvements to the PPCCTRLoops pass). Instead of trapping it as a static function in PPCInstrInfo, move it into PPCPredicates with other predicate-related things. No functionality change intended. llvm-svn: 179926	2013-04-20 05:16:26 +00:00
Hal Finkel	e632239d7b	Fix PPC optimizeCompareInstr swapped-sub argument handling When matching a compare with a subtract where the arguments of the compare are swapped w.r.t. the arguments of the subtract, we need to negate the predicates (or CR bit indices) of the users. This, however, is not the same as inverting the predicate (negating LT -> GT, but inverting LT -> GE, for example). The ARM backend seems to do this correctly, but when I adapted the code for the PPC backend, I introduced an error in this logic. Comparison optimization is now enabled again by default. llvm-svn: 179899	2013-04-19 22:08:38 +00:00
Hal Finkel	b12da6be75	Disable PPC comparison optimization by default This seems to cause a stage-2 LLVM compile failure (by crashing TableGen); do I'm disabling this for now. llvm-svn: 179807	2013-04-18 22:54:25 +00:00
Hal Finkel	82656cb200	Implement optimizeCompareInstr for PPC Many PPC instructions have a so-called 'record form' which stores to a specific condition register the result of comparing the result of the instruction with zero (always as a signed comparison). For integer operations on PPC64, this is always a 64-bit comparison. This implementation is derived from the implementation in the ARM backend; there are some differences because PPC condition registers are allocatable virtual registers (although the record forms always use a specific one), and we look for a matching subtraction instruction after the compare (but before the first use) in addition to before it. llvm-svn: 179802	2013-04-18 22:15:08 +00:00
Hal Finkel	654d43b41a	Add PPC instruction record forms and associated query functions This is prep. work for the implementation of optimizeCompare. Many PPC instructions have 'record' forms (in almost all cases, this means that the RC bit is set) that cause the result of the instruction to be compared with zero, and the result of that comparison saved in a predefined condition register. In order to add the record forms of the instructions without too much copy-and-paste, the relevant functions have been refactored into multiclasses which define both the record and normal forms. Also, two TableGen-generated mapping functions have been added which allow querying the instruction code for the record form given the normal form (and vice versa). No functionality change intended. llvm-svn: 179356	2013-04-12 02:18:09 +00:00
Hal Finkel	f29285a487	Make PPCInstrInfo::isPredicated always return false Because of how predication in implemented on PPC (only for branches), I think that this is the right thing to do. No functionality change intended. llvm-svn: 179252	2013-04-11 01:23:34 +00:00
Hal Finkel	30ae229141	PPC: Don't predicate a diamond with two counter decrements I've not seen this happen in practice, and probably can't until we start allowing decrement-counter-based conditional branches to be double predicated, but just in case, don't allow predication of a diamond in which both sides have ctr-defining branches. Even though the branching behavior of these can be predicated, the counter-decrementing behavior cannot be. llvm-svn: 179199	2013-04-10 18:30:16 +00:00
Hal Finkel	af822018aa	Cleanup PPCInstrInfo::DefinesPredicate Implement suggestions made by Bill Schmidt in post-commit review. Thanks! llvm-svn: 179162	2013-04-10 07:17:47 +00:00
Hal Finkel	500b004566	PPC: Prep for if conversion of bctr[l] This adds in-principle support for if-converting the bctr[l] instructions. These instructions are used for indirect branching. It seems, however, that the current if converter will never actually predicate these. To do so, it would need the ability to hoist a few setup insts. out of the conditionally-executed block. For example, code like this: void foo(int a, int (*bar)()) { if (a != 0) bar(); } becomes: ... beq 0, .LBB0_2 std 2, 40(1) mr 12, 4 ld 3, 0(4) ld 11, 16(4) ld 2, 8(4) mtctr 3 bctrl ld 2, 40(1) .LBB0_2: ... and it would be safe to do all of this unconditionally with a predicated beqctrl instruction. llvm-svn: 179156	2013-04-10 06:42:34 +00:00
Hal Finkel	5711eca19c	Allow PPC B and BLR to be if-converted into some predicated forms This enables us to form predicated branches (which are the same conditional branches we had before) and also a larger set of predicated returns (including instructions like bdnzlr which is a conditional return and loop-counter decrement all in one). At the moment, if conversion does not capture all possible opportunities. A simple example is provided in early-ret2.ll, where if conversion forms one predicated return, and then the PPCEarlyReturn pass picks up the other one. So, at least for now, we'll keep both mechanisms. llvm-svn: 179134	2013-04-09 22:58:37 +00:00
Hal Finkel	21aad9a8e8	Cleanup PPCEarlyReturn Some general cleanup and only scan the end of a BB for branches (once we're done with the terminators and debug values, then there should not be any other branches). These address post-commit review suggestions by Bill Schmidt. No functionality change intended. llvm-svn: 179112	2013-04-09 18:25:18 +00:00
Hal Finkel	b5aa7e54d9	Generate PPC early conditional returns PowerPC has a conditional branch to the link register (return) instruction: BCLR. This should be used any time when we'd otherwise have a conditional branch to a return. This adds a small pass, PPCEarlyReturn, which runs just prior to the branch selection pass (and, importantly, after block placement) to generate these conditional returns when possible. It will also eliminate unconditional branches to returns (these happen rarely; most of the time these have already been tail duplicated by the time PPCEarlyReturn is invoked). This is a nice optimization for small functions that do not maintain a stack frame. llvm-svn: 179026	2013-04-08 16:24:03 +00:00
Hal Finkel	d61d4f80e6	Implement PPCInstrInfo::FoldImmediate There are certain PPC instructions into which we can fold a zero immediate operand. We can detect such cases by looking at the register class required by the using operand (so long as it is not otherwise constrained). llvm-svn: 178961	2013-04-06 19:30:30 +00:00
Hal Finkel	ed6a28597b	Enable early if conversion on PPC On cores for which we know the misprediction penalty, and we have the isel instruction, we can profitably perform early if conversion. This enables us to replace some small branch sequences with selects and avoid the potential stalls from mispredicting the branches. Enabling this feature required implementing canInsertSelect and insertSelect in PPCInstrInfo; isel code in PPCISelLowering was refactored to use these functions as well. llvm-svn: 178926	2013-04-05 23:29:01 +00:00
Hal Finkel	37714b8a48	Resynchronize isLoadFromStackSlot with LoadRegFromStackSlot (and stores) in PPCInstrInfo These functions should have the same list of load/store instructions. Now that all load/store forms have been normalized (to single instructions or pseudos) they can be resynchronized. Found by inspection, although hopefully this will improve optimization. I've also added some comments. llvm-svn: 178180	2013-03-27 21:21:15 +00:00
Hal Finkel	5791f51449	Remove more dead LR-as-GPR PPC code I had removed similar code a few days ago, but somehow missed this. llvm-svn: 178169	2013-03-27 19:10:40 +00:00
Hal Finkel	a7b0630ba8	Don't spill PPC VRSAVE on non-Darwin (even in SjLj) As Bill Schmidt pointed out to me, only on Darwin do we need to spill/restore VRSAVE in the SjLj code. For non-Darwin, don't spill/restore VRSAVE (and I've added some asserts to make sure that we're not). As it turns out, we're not currently handling the Darwin case correctly (I've added a FIXME in the test case). I've tried adding various implied register definitions/uses to force the spill without success, so I'll need to address this later. llvm-svn: 178096	2013-03-27 00:02:20 +00:00
Hal Finkel	cc1eeda16d	Note in PPCFunctionInfo VRSAVE spills In preparation for using the new register scavenger capability for providing more than one register simultaneously, specifically note functions that have spilled VRSAVE (currently, this can happen only in functions that use the setjmp intrinsic). As with CR spilling, such functions will need to provide two emergency spill slots to the scavenger. No functionality change intended. llvm-svn: 177832	2013-03-23 22:06:03 +00:00
Hal Finkel	794e05b03b	Remove dead PPC LR spilling code The LR register is unconditionally reserved, and its spilling and restoration is handled by the prologue/epilogue code. As a result, it is never explicitly spilled by the register allocator. No functionality change intended. llvm-svn: 177823	2013-03-23 17:14:27 +00:00
Ulrich Weigand	f62e83f415	Remove ABI-duplicated call instruction patterns. We currently have a duplicated set of call instruction patterns depending on the ABI to be followed (Darwin vs. Linux). This is a bit odd; while the different ABIs will result in different instruction sequences, the actual instructions themselves ought to be independent of the ABI. And in fact it turns out that the only nontrivial difference between the two sets of patterns is that in the PPC64 Linux ABI, the instruction used for indirect calls is marked to take X11 as extra input register (which is indeed used only with that ABI to hold an incoming environment pointer for nested functions). However, this does not need to be hard-coded at the .td pattern level; instead, the C++ code expanding calls can simply add that use, just like it adds uses for argument registers anyway. No change in generated code expected. llvm-svn: 177735	2013-03-22 15:24:13 +00:00
Hal Finkel	891671afe5	Fix a register-class comparison bug in PPCCTRLoops Thanks to Jakob for isolating the underlying problem from the test case in r177423. The original commit had introduced asymmetric copy operations, but these turned out to be a work-around to the real problem (the use of == instead of hasSubClassEq in PPCCTRLoops). llvm-svn: 177679	2013-03-21 23:23:34 +00:00
Hal Finkel	a1431df540	Add support for spilling VRSAVE on PPC Although there is only one Altivec VRSAVE register, it is a member of a register class, and we need the ability to spill it. Because this register is normally callee-preserved and handled by special code this has never before been necessary. However, this capability will be required by a forthcoming commit adding SjLj support. llvm-svn: 177654	2013-03-21 19:03:21 +00:00
Hal Finkel	638a9fa43e	Prepare to make r0 an allocatable register on PPC Currently the PPC r0 register is unconditionally reserved. There are two reasons for this: 1. r0 is treated specially (as the constant 0) by certain instructions, and so cannot be used with those instructions as a regular register. 2. r0 is used as a temporary register in the CR-register spilling process (where, under some circumstances, we require two GPRs). This change addresses the first reason by introducing a restricted register class (without r0) for use by those instructions that treat r0 specially. These register classes have a new pseudo-register, ZERO, which represents the r0-as-0 use. This has the side benefit of making the existing target code simpler (and easier to understand), and will make it clear to the register allocator that uses of r0 as 0 don't conflict will real uses of the r0 register. Once the CR spilling code is improved, we'll be able to allocate r0. Adding these extra register classes, for some reason unclear to me, causes requests to the target to copy 32-bit registers to 64-bit registers. The resulting code seems correct (and causes no test-suite failures), and the new test case covers this new kind of asymmetric copy. As r0 is still reserved, no functionality change intended. llvm-svn: 177423	2013-03-19 18:51:05 +00:00
Hal Finkel	fcc51d4ff1	Improve PPC VR (Altivec) register spilling This change cleans up two issues with Altivec register spilling: 1. The spilling code was inefficient (using two instructions, and add and a load, when just one would do) 2. The code assumed that r0 would always be available (true for now, but this will change) The new code handles VR spilling just like GPR spills but forced into r+r mode. As a result, when any VR spills are present, we must now always allocate the register-scavenger spill slot. llvm-svn: 177231	2013-03-17 04:43:44 +00:00
Hal Finkel	bb420f10e9	Allocate the RS spill slot for any PPC function with spills and a large stack frame For spills into a large stack frame, the FI-elimination code uses the register scavenger to obtain a free GPR for use with an r+r-addressed load or store. When there are no available GPRs, the scavenger gets one by using its spill slot. Previously, we were not always allocating that spill slot and the RS would assert when the spill slot was needed. I don't currently have a small test that triggered the assert, but I've created a small regression test that verifies that the spill slot is now added when the stack frame is sufficiently large. llvm-svn: 177140	2013-03-15 05:06:04 +00:00
Hal Finkel	e154c8f23e	PPC should always use the register scavenger for CR spilling This removes the -disable-ppc[32\|64]-regscavenger options; the code that uses the register scavenger has been working well (and has been the default) for some time, and we don't need options to enable the old (broken) CR spilling code. llvm-svn: 176865	2013-03-12 14:12:16 +00:00
Chandler Carruth	ed0881b2a6	Use the new script to sort the includes of every file under lib. Sooooo many of these had incorrect or strange main module includes. I have manually inspected all of these, and fixed the main module include to be the nearest plausible thing I could find. If you own or care about any of these source files, I encourage you to take some time and check that these edits were sensible. I can't have broken anything (I strictly added headers, and reordered them, never removed), but they may not be the headers you'd really like to identify as containing the API being implemented. Many forward declarations and missing includes were added to a header files to allow them to parse cleanly when included first. The main module rule does in fact have its merits. =] llvm-svn: 169131	2012-12-03 16:50:05 +00:00
Jakob Stoklund Olesen	9de596e650	Remove all references to TargetInstrInfoImpl. This class has been merged into its super-class TargetInstrInfo. llvm-svn: 168760	2012-11-28 02:35:17 +00:00
Bill Schmidt	b9bc47409d	When generating spill and reload code for vector registers on PowerPC, the compiler makes use of GPR0. However, there are two flavors of GPR0 defined by the target: the 32-bit GPR0 (R0) and the 64-bit GPR0 (X0). The spill/reload code makes use of R0 regardless of whether we are generating 32- or 64-bit code. This patch corrects the problem in the obvious manner, using X0 and ADDI8 for 64-bit and R0 and ADDI for 32-bit. llvm-svn: 165658	2012-10-10 21:25:01 +00:00
Hal Finkel	742b535e40	Add PPC Freescale e500mc and e5500 subtargets. Add subtargets for Freescale e500mc (32-bit) and e5500 (64-bit) to the PowerPC backend. Patch by Tobias von Koch. llvm-svn: 162764	2012-08-28 16:12:39 +00:00
Jakob Stoklund Olesen	0f855e4263	Implement PPCInstrInfo::isCoalescableExtInstr(). The PPC::EXTSW instruction preserves the low 32 bits of its input, just like some of the x86 instructions. Use it to reduce register pressure when the low 32 bits have multiple uses. This requires a small change to PeepholeOptimizer since EXTSW takes a 64-bit input register. This is related to PR5997. llvm-svn: 158743	2012-06-19 21:14:34 +00:00
Hal Finkel	c6b5debb40	Enable PPC CTR loop formation by default. Thanks to Jakob's help, this now causes no new test suite failures! Over the entire test suite, this gives an average 1% speedup. The largest speedups are: SingleSource/Benchmarks/Misc/pi - 108% SingleSource/Benchmarks/CoyoteBench/lpbench - 54% MultiSource/Benchmarks/Prolangs-C/unix-smail/unix-smail - 50% SingleSource/Benchmarks/Shootout/ary3 - 32% SingleSource/Benchmarks/Shootout-C++/matrix - 30% The largest slowdowns are: MultiSource/Benchmarks/mediabench/gsm/toast/toast - -30% MultiSource/Benchmarks/Prolangs-C/bison/mybison - -25% MultiSource/Benchmarks/BitBench/uuencode/uuencode - -22% MultiSource/Applications/d/make_dparser - -14% SingleSource/Benchmarks/Shootout-C++/ary - -13% In light of these slowdowns, additional profiling work is obviously needed! llvm-svn: 158223	2012-06-08 19:19:53 +00:00
Hal Finkel	821e00121c	Disable the PPC CTR-Loops pass by default. The pass itself works well, but the something in the Machine* infrastructure does not understand terminators which define registers. Without the ability to use the block-placement pass, etc. this causes performance regressions (and so is turned off by default). Turning off the analysis turns off the problems with the Machine* infrastructure. llvm-svn: 158206	2012-06-08 15:38:25 +00:00
Hal Finkel	96c2d4d945	Add the PPCCTRLoops pass: a PPC machine-code-level optimization pass to form CTR-based loop branching code. This pass is derived from the Hexagon HardwareLoops pass. The only significant enhancement over the Hexagon pass is that PPCCTRLoops will also attempt to delete the replaced add and compare operations if they are no longer otherwise used. Also, invalid preheader DebugLoc is not used. llvm-svn: 158204	2012-06-08 15:38:21 +00:00
Craig Topper	abadc660e0	Convert some uses of XXXRegisterClass to &XXXRegClass. No functional change since they are equivalent. llvm-svn: 155186	2012-04-20 06:31:50 +00:00

1 2 3 4

178 Commits