llvm-project

Commit Graph

Author	SHA1	Message	Date
Hal Finkel	b5aa7e54d9	Generate PPC early conditional returns PowerPC has a conditional branch to the link register (return) instruction: BCLR. This should be used any time when we'd otherwise have a conditional branch to a return. This adds a small pass, PPCEarlyReturn, which runs just prior to the branch selection pass (and, importantly, after block placement) to generate these conditional returns when possible. It will also eliminate unconditional branches to returns (these happen rarely; most of the time these have already been tail duplicated by the time PPCEarlyReturn is invoked). This is a nice optimization for small functions that do not maintain a stack frame. llvm-svn: 179026	2013-04-08 16:24:03 +00:00
Hal Finkel	81f8799fe3	Cleanup and improve PPC fsel generation First, we should not cheat: fsel-based lowering of select_cc is a finite-math-only optimization (the ISA manual, section F.3 of v2.06, makes this clear, as does a note in our own README). This also adds fsel-based lowering of EQ and NE condition codes. As it turned out, fsel generation was covered by a grand total of zero regression test cases. I've added some test cases to cover the existing behavior (which is now finite-math only), as well as the new EQ cases. llvm-svn: 179000	2013-04-07 22:11:09 +00:00
Hal Finkel	d61d4f80e6	Implement PPCInstrInfo::FoldImmediate There are certain PPC instructions into which we can fold a zero immediate operand. We can detect such cases by looking at the register class required by the using operand (so long as it is not otherwise constrained). llvm-svn: 178961	2013-04-06 19:30:30 +00:00
Hal Finkel	ed6a28597b	Enable early if conversion on PPC On cores for which we know the misprediction penalty, and we have the isel instruction, we can profitably perform early if conversion. This enables us to replace some small branch sequences with selects and avoid the potential stalls from mispredicting the branches. Enabling this feature required implementing canInsertSelect and insertSelect in PPCInstrInfo; isel code in PPCISelLowering was refactored to use these functions as well. llvm-svn: 178926	2013-04-05 23:29:01 +00:00
Hal Finkel	f96c18e3bc	PPC: Improve code generation for mixed-precision reciprocal sqrt The DAGCombine logic that recognized a/sqrt(b) and transformed it into a multiplication by the reciprocal sqrt did not handle cases where the sqrt and the division were separated by an fpext or fptrunc. llvm-svn: 178801	2013-04-04 22:44:12 +00:00
Bill Schmidt	92e26646bc	Fix PR15632: No support for ppcf128 floating-point remainder on PowerPC. For this we need to use a libcall. Previously LLVM didn't implement libcall support for frem, so I've added it in the usual straightforward manner. A test case from the bug report is included. llvm-svn: 178639	2013-04-03 13:05:44 +00:00
Hal Finkel	2e10331057	Use PPC reciprocal estimates with Newton iteration in fast-math mode When unsafe FP math operations are enabled, we can use the fre[s] and frsqrte[s] instructions, which generate reciprocal (sqrt) estimates, together with some Newton iteration, in order to quickly generate floating-point division and sqrt results. All of these instructions are separately optional, and so each has its own feature flag (except for the Altivec instructions, which are covered under the existing Altivec flag). Doing this is not only faster than using the IEEE-compliant fdiv/fsqrt instructions, but allows these computations to be pipelined with other computations in order to hide their overall latency. I've also added a couple of missing fnmsub patterns which turned out to be missing (but are necessary for good code generation of the Newton iterations). Altivec needs a similar fix, but that will probably be more complicated because fneg is expanded for Altivec's v4f32. llvm-svn: 178617	2013-04-03 04:01:11 +00:00
Bill Schmidt	3581cd4b4c	Fix PR15630: Replace faulty stdcx. with stwcx. When doing a partword atomic operation, a lwarx was being paired with a stdcx. instead of a stwcx. when compiling for a 64-bit target. The target has nothing to do with it in this case; we always need a stwcx. Thanks to Kai Nacke for reporting the problem. llvm-svn: 178559	2013-04-02 18:37:08 +00:00
Hal Finkel	3f88d08974	Fix a bad assert in PPCTargetLowering llvm-svn: 178489	2013-04-01 18:42:58 +00:00
Hal Finkel	c2eddb0d02	Add triple to test/CodeGen/PowerPC/stfiwx-2 llvm-svn: 178486	2013-04-01 18:18:44 +00:00
Hal Finkel	f6d45f2379	Add more PPC floating-point conversion instructions The P7 and A2 have additional floating-point conversion instructions which allow a direct two-instruction sequence (plus load/store) to convert from all combinations (signed/unsigned i32/i64) <--> (float/double) (on previous cores, only some combinations were directly available). llvm-svn: 178480	2013-04-01 17:52:07 +00:00
Hal Finkel	c006295f3c	Fix PowerPC/cttz.ll to specify a cpu (and use FileCheck) llvm-svn: 178472	2013-04-01 16:31:56 +00:00
Hal Finkel	290376dd78	Add the PPC popcntw instruction The popcntw instruction is available whenever the popcntd instruction is available, and performs a separate popcnt on the lower and upper 32-bits. Ignoring the high-order count, this can be used for the 32-bit input case (saving on the explicit zero extension otherwise required to use popcntd). llvm-svn: 178470	2013-04-01 15:58:15 +00:00
Hal Finkel	beb296bea1	Add the PPC lfiwax instruction This instruction is available on modern PPC64 CPUs, and is now used to improve the SINT_TO_FP lowering (by eliminating the need for the separate sign extension instruction and decreasing the amount of needed stack space). llvm-svn: 178446	2013-03-31 10:12:51 +00:00
Hal Finkel	e53429a13e	Cleanup PPC(64) i32 -> float/double conversion The existing SINT_TO_FP code for i32 -> float/double conversion was disabled because it relied on broken EXTSW_32/STD_32 instruction definitions. The original intent had been to enable these 64-bit instructions to be used on CPUs that support them even in 32-bit mode. Unfortunately, this form of lying to the infrastructure was buggy (as explained in the FIXME comment) and had therefore been disabled. This re-enables this functionality, using regular DAG nodes, but only when compiling in 64-bit mode. The old STD_32/EXTSW_32 definitions (which were dead) are removed. llvm-svn: 178438	2013-03-31 01:58:02 +00:00
Hal Finkel	f8ac57e289	Implement FRINT lowering on PPC using frin Like nearbyint, rint can be implemented on PPC using the frin instruction. The complication comes from the fact that rint needs to set the FE_INEXACT flag when the result does not equal the input value (and frin does not do that). As a result, we use a custom inserter which, after the rounding, compares the rounded value with the original, and if they differ, explicitly sets the XX bit in the FPSCR register (which corresponds to FE_INEXACT). Once LLVM has better modeling of the floating-point environment we should be able to (often) eliminate this extra complexity. llvm-svn: 178362	2013-03-29 19:41:55 +00:00
Hal Finkel	c20a08d25b	Add PPC FP rounding instructions fri[mnpz] These instructions are available on the P5x (and later) and on the A2. They implement the standard floating-point rounding operations (floor, trunc, etc.). One caveat: frin (round to nearest) does not implement "ties to even", and so is only enabled in fast-math mode. llvm-svn: 178337	2013-03-29 08:57:48 +00:00
Hal Finkel	fd53ddd71c	Specify CPUs on the PPC bswap-load-store test Otherwise, the CHECK-NOT's might trigger depending on the host's CPU. llvm-svn: 178287	2013-03-28 20:35:18 +00:00
Hal Finkel	22e41c411e	Only enable 64-bit bswap DAG combines for PPC64 Compiling in 32-bit mode on a P7 would assert after 64-bit DAG combines were added for bswap with load/store. This is because these combines are really only valid in 64-bit mode, regardless of the CPU (and this was not being checked). llvm-svn: 178286	2013-03-28 20:23:46 +00:00
Hal Finkel	31d2956510	Add the PPC64 ldbrx/stdbrx instructions These are 64-bit load/store with byte-swap, and available on the P7 and the A2. Like the similar instructions for 16- and 32-bit words, these are matched in the target DAG-combine phase against load/store-bswap pairs. llvm-svn: 178276	2013-03-28 19:25:55 +00:00
Hal Finkel	a4d074863a	Add the PPC64 popcntd instruction PPC ISA 2.06 (P7, A2, etc.) has a popcntd instruction. Add this instruction and tell TTI about it so that popcount-loop recognition will know about it. llvm-svn: 178233	2013-03-28 13:29:47 +00:00
Hal Finkel	035b4825ce	Cleanup PPC CR-spill kill flags and 32- vs. 64-bit instructions There were a few places where kill flags were not being set correctly, and where 32-bit instruction variants were being used with 64-bit registers. After r178180, this code was being triggered causing llc to assert. llvm-svn: 178220	2013-03-28 03:38:16 +00:00
Hal Finkel	687143557d	Print PPC ZERO as 0 (not r0) even on Darwin It seems that the Darwin PPC assembler requires r0 to be written as 0 when it means 0 (at least in lwarx/stwcx.). Fixes PR15605. llvm-svn: 178142	2013-03-27 13:20:52 +00:00
Hal Finkel	0f77861d9f	Allocate r0 on PPC The R0 register can now be allocated because instructions that cannot use R0 as a GPR have been appropriately marked. llvm-svn: 178123	2013-03-27 06:52:27 +00:00
Bill Schmidt	a1b72d0f6a	Remove the link register from the GPR classes on PowerPC. Some implementation detail in the forgotten past required the link register to be placed in the GPRC and G8RC register classes. This is just wrong on the face of it, and causes several extra intersection register classes to be generated. I found this was having evil effects on instruction scheduling, by causing the wrong register class to be consulted for register pressure decisions. No code generation changes are expected, other than some minor changes in instruction order. Seven tests in the test bucket required minor tweaks to adjust to the new normal. llvm-svn: 178114	2013-03-27 02:40:14 +00:00
Hal Finkel	a7b0630ba8	Don't spill PPC VRSAVE on non-Darwin (even in SjLj) As Bill Schmidt pointed out to me, only on Darwin do we need to spill/restore VRSAVE in the SjLj code. For non-Darwin, don't spill/restore VRSAVE (and I've added some asserts to make sure that we're not). As it turns out, we're not currently handling the Darwin case correctly (I've added a FIXME in the test case). I've tried adding various implied register definitions/uses to force the spill without success, so I'll need to address this later. llvm-svn: 178096	2013-03-27 00:02:20 +00:00
Hal Finkel	0dfbb05aff	Use multiple virtual registers in PPC CR spilling Now that the register scavenger can support multiple spill slots, and PEI can use virtual-register-based scavenging for multiple simultaneous registers, we can use a virtual register for the transfer register in the CR spilling code. This should eliminate the last place (outside of the prologue/epilogue) where we depend on the unconditional availability of the r0 register. We will soon be able to allocate it (in a somewhat restricted sense) as a GPR. llvm-svn: 178060	2013-03-26 18:57:22 +00:00
Hal Finkel	891671afe5	Fix a register-class comparison bug in PPCCTRLoops Thanks to Jakob for isolating the underlying problem from the test case in r177423. The original commit had introduced asymmetric copy operations, but these turned out to be a work-around to the real problem (the use of == instead of hasSubClassEq in PPCCTRLoops). llvm-svn: 177679	2013-03-21 23:23:34 +00:00
Hal Finkel	756810fe36	Implement builtin_{setjmp/longjmp} on PPC This implements SJLJ lowering on PPC, making the Clang functions __builtin_{setjmp/longjmp} functional on PPC platforms. The implementation strategy is similar to that on X86, with the exception that a branch-and-link variant is used to get the right jump address. Credit goes to Bill Schmidt for suggesting the use of the unconditional bcl form (instead of the regular bl instruction) to limit return-address-cache pollution. Benchmarking the speed at -O3 of: static jmp_buf env_sigill; void foo() { __builtin_longjmp(env_sigill,1); } main() { ... for (int i = 0; i < c; ++i) { if (__builtin_setjmp(env_sigill)) { goto done; } else { foo(); } done:; } ... } vs. the same code using the libc setjmp/longjmp functions on a P7 shows that this builtin implementation is ~4x faster with Altivec enabled and ~7.25x faster with Altivec disabled. This comparison is somewhat unfair because the libc version must also save/restore the VSX registers which we don't yet support. llvm-svn: 177666	2013-03-21 21:37:52 +00:00
David Blaikie	cc8d090163	Remove unused field in DISubprogram llvm-svn: 177661	2013-03-21 20:28:52 +00:00
Hal Finkel	a1431df540	Add support for spilling VRSAVE on PPC Although there is only one Altivec VRSAVE register, it is a member of a register class, and we need the ability to spill it. Because this register is normally callee-preserved and handled by special code this has never before been necessary. However, this capability will be required by a forthcoming commit adding SjLj support. llvm-svn: 177654	2013-03-21 19:03:21 +00:00
Hal Finkel	aa03c03a2d	Correct PPC FRAMEADDR lowering using a pseudo-register The old code used to lower FRAMEADDR tried to replicate the logic in the real frame-lowering code that determines whether or not the frame pointer (r31) will be used. When it seemed as through the frame pointer would not be used, the stack pointer (r1) was used instead. Unfortunately, because the stack size is not yet known, this does not work. Instead, this change introduces new always-reserved pseudo-registers (FP and FP8) that are replaced during prologue insertion with the real frame-pointer register (either r1 or r31). It is important that this intrinsic always return a valid frame address because it is used by Clang to store the frame address as part of code generation for __builtin_setjmp. llvm-svn: 177653	2013-03-21 19:03:19 +00:00
David Blaikie	43a729d165	Remove unused field in DICompileUnit llvm-svn: 177590	2013-03-20 22:34:33 +00:00
Hal Finkel	559754a482	Add a comment to the CodeGen/PowerPC/asym-regclass-copy.ll test llvm-svn: 177434	2013-03-19 20:22:32 +00:00
Ulrich Weigand	d850167a19	Rewrite pre-increment store patterns to use standard memory operands. Currently, pre-increment store patterns are written to use two separate operands to represent address base and displacement: stwu $rS, $ptroff($ptrreg) This causes problems when implementing the assembler parser, so this commit changes the patterns to use standard (complex) memory operands like in all other memory access instruction patterns: stwu $rS, $dst To still match those instructions against the appropriate pre_store SelectionDAG nodes, the patch uses the new feature that allows a Pat to match multiple DAG operands against a single (complex) instruction operand. Approved by Hal Finkel. llvm-svn: 177429	2013-03-19 19:52:04 +00:00
Hal Finkel	638a9fa43e	Prepare to make r0 an allocatable register on PPC Currently the PPC r0 register is unconditionally reserved. There are two reasons for this: 1. r0 is treated specially (as the constant 0) by certain instructions, and so cannot be used with those instructions as a regular register. 2. r0 is used as a temporary register in the CR-register spilling process (where, under some circumstances, we require two GPRs). This change addresses the first reason by introducing a restricted register class (without r0) for use by those instructions that treat r0 specially. These register classes have a new pseudo-register, ZERO, which represents the r0-as-0 use. This has the side benefit of making the existing target code simpler (and easier to understand), and will make it clear to the register allocator that uses of r0 as 0 don't conflict will real uses of the r0 register. Once the CR spilling code is improved, we'll be able to allocate r0. Adding these extra register classes, for some reason unclear to me, causes requests to the target to copy 32-bit registers to 64-bit registers. The resulting code seems correct (and causes no test-suite failures), and the new test case covers this new kind of asymmetric copy. As r0 is still reserved, no functionality change intended. llvm-svn: 177423	2013-03-19 18:51:05 +00:00
Hal Finkel	6681486375	Cleanup PPC64 unaligned i64 load/store Remove an accidentally-added instruction definition and add a comment in the test case. This is in response to a post-commit review by Bill Schmidt. No functionality change intended. llvm-svn: 177404	2013-03-19 15:23:39 +00:00
Hal Finkel	d9e10d51fa	Don't reserve R31 on PPC64 unless the frame pointer is needed llvm-svn: 177379	2013-03-19 08:09:38 +00:00
Hal Finkel	fc9aad6436	Fix a sign-extension bug in PPCCTRLoops Don't sign extend the immediate value from the OR instruction in an LIS/OR pair. llvm-svn: 177361	2013-03-18 23:58:28 +00:00
Hal Finkel	b09680b0f7	Fix PPC unaligned 64-bit loads and stores PPC64 supports unaligned loads and stores of 64-bit values, but in order to use the r+i forms, the offset must be a multiple of 4. Unfortunately, this cannot always be determined by examining the immediate itself because it might be available only via a TOC entry. In order to get around this issue, we additionally predicate the selection of the r+i form on the alignment of the load or store (forcing it to be at least 4 in order to select the r+i form). llvm-svn: 177338	2013-03-18 23:00:58 +00:00
Bill Schmidt	45bdbe7ec0	Change test cases to handle unaligned references. Hal Finkel recently added code to allow unaligned memory references for PowerPC. Two tests were temporarily modified with -disable-ppc-unaligned to keep them from failing. This patch adjusts the expected code generation for the unaligned references. llvm-svn: 177328	2013-03-18 22:12:04 +00:00
David Blaikie	3053029310	Remove unnecessary leading comment characters in lit-only file llvm-svn: 177327	2013-03-18 22:08:16 +00:00
David Blaikie	99067af791	Include '.test' suffix in target specific lit configs that need it Apparently my final cleanup to use a relevant suffix for these tests before committing r176831 caused them to stop running since lit wasn't configured to run tests with that suffix in those directories (why don't we just have a global suffix list?). So, add the suffix to the relevant directories & fix the test that has bitrotted over the last week due to my debug info schema changes. llvm-svn: 177315	2013-03-18 20:31:44 +00:00
Hal Finkel	21f2a43ab4	Fix large count and negative constant count handling in PPCCTRLoops This commit fixes an assert that would occur on loops with large constant counts (like looping for ((uint32_t) -1) iterations on PPC64). The existing code did not handle counts that it computed to be negative (asserting instead), but these can be created with valid inputs. This bug was discovered by bugpoint while I was attempting to isolate a completely different problem. Also, in writing test cases for the negative-count problem, I discovered that the ori/lsi handling was broken (there was a typo which caused the logic that was supposed to detect these pairs and extract the iteration count to always fail). This has now also been corrected (and is covered by one of the new test cases). llvm-svn: 177295	2013-03-18 17:40:44 +00:00
Hal Finkel	12337e4e7d	Cleanup initial-value constants in PPCCTRLoops Because the initial-value constants had not been added to the list of instructions considered for DCE the resulting code had redundant constant-materialization instructions. llvm-svn: 177294	2013-03-18 17:40:27 +00:00
Hal Finkel	fcc51d4ff1	Improve PPC VR (Altivec) register spilling This change cleans up two issues with Altivec register spilling: 1. The spilling code was inefficient (using two instructions, and add and a load, when just one would do) 2. The code assumed that r0 would always be available (true for now, but this will change) The new code handles VR spilling just like GPR spills but forced into r+r mode. As a result, when any VR spills are present, we must now always allocate the register-scavenger spill slot. llvm-svn: 177231	2013-03-17 04:43:44 +00:00
Hal Finkel	57080382e6	Remove FIXMEs in PPC test cases related to unaligned loads/stores As pointed out by Bill in response to r177160, these two FIXMEs can also be removed. llvm-svn: 177229	2013-03-16 23:02:31 +00:00
Hal Finkel	8d7fbc9dad	Enable unaligned memory access on PPC for scalar types Unaligned access is supported on PPC for non-vector types, and is generally more efficient than manually expanding the loads and stores. A few of the existing test cases were using expanded unaligned loads and stores to test other features (like load/store with update), and for these test cases, unaligned access remains disabled. llvm-svn: 177160	2013-03-15 15:27:13 +00:00
Hal Finkel	b0fac42987	Protect PPC Altivec patterns with a predicate In preparation for the addition of other SIMD ISA extensions (such as QPX) we need to make sure that all Altivec patterns are properly predicated on having Altivec support. No functionality change intended (one test case needed to be updated b/c it assumed that Altivec intrinsics would be supported without enabling Altivec support). llvm-svn: 177152	2013-03-15 13:21:21 +00:00
Hal Finkel	bb420f10e9	Allocate the RS spill slot for any PPC function with spills and a large stack frame For spills into a large stack frame, the FI-elimination code uses the register scavenger to obtain a free GPR for use with an r+r-addressed load or store. When there are no available GPRs, the scavenger gets one by using its spill slot. Previously, we were not always allocating that spill slot and the RS would assert when the spill slot was needed. I don't currently have a small test that triggered the assert, but I've created a small regression test that verifies that the spill slot is now added when the stack frame is sufficiently large. llvm-svn: 177140	2013-03-15 05:06:04 +00:00

1 2 3 4 5 ...

565 Commits