Commit Graph

33 Commits

Author SHA1 Message Date
Nemanja Ivanovic ae22101c55 Fix for PR 26500
This patch corresponds to review:
http://reviews.llvm.org/D17294

It ensures that whatever block we are emitting the prologue/epilogue into, we
have the necessary scratch registers. It takes away the hard-coded register
numbers for use as scratch registers as registers that are guaranteed to be
available in the function prologue/epilogue are not guaranteed to be available
within the function body. Since we shrink-wrap, the prologue/epilogue may end
up in the function body.

llvm-svn: 261441
2016-02-20 18:16:25 +00:00
Kit Barton 9c432ae111 Find available scratch register to use in function prologue and epilogue as part of shrink wrapping.
Phabricator: http://reviews.llvm.org/D13955
llvm-svn: 253247
2015-11-16 20:22:15 +00:00
Kit Barton d3b904d440 Enable the shrink wrapping optimization for PPC64.
The changes in this patch are as follows:
  1. Modify the emitPrologue and emitEpilogue methods to work properly when the prologue and epilogue blocks are not the first/last blocks in the function
  2. Fix a bug in PPCEarlyReturn optimization caused by an empty entry block in the function
  3. Override the runShrinkWrap PredicateFtor (defined in TargetMachine) to check whether shrink wrapping should run:
      Shrink wrapping will run on PPC64 (Little Endian and Big Endian) unless -enable-shrink-wrap=false is specified on command line

A new test case, ppc-shrink-wrapping.ll was created based on the existing shrink wrapping tests for x86, arm, and arm64.

Phabricator review: http://reviews.llvm.org/D11817

llvm-svn: 247237
2015-09-10 01:55:44 +00:00
Matthias Braun 0256486532 PrologEpilogInserter: Rewrite API to determine callee save regsiters.
This changes TargetFrameLowering::processFunctionBeforeCalleeSavedScan():

- Rename the function to determineCalleeSaves()
- Pass a bitset of callee saved registers by reference, thus avoiding
  the function-global PhysRegUsed bitset in MachineRegisterInfo.
- Without PhysRegUsed the implementation is fine tuned to not save
  physcial registers which are only read but never modified.

Related to rdar://21539507

Differential Revision: http://reviews.llvm.org/D10909

llvm-svn: 242165
2015-07-14 17:17:13 +00:00
Alexander Kornienko f00654e31b Revert r240137 (Fixed/added namespace ending comments using clang-tidy. NFC)
Apparently, the style needs to be agreed upon first.

llvm-svn: 240390
2015-06-23 09:49:53 +00:00
Alexander Kornienko 70bc5f1398 Fixed/added namespace ending comments using clang-tidy. NFC
The patch is generated using this command:

tools/clang/tools/extra/clang-tidy/tool/run-clang-tidy.py -fix \
  -checks=-*,llvm-namespace-comment -header-filter='llvm/.*|clang/.*' \
  llvm/lib/


Thanks to Eugene Kosov for the original patch!

llvm-svn: 240137
2015-06-19 15:57:42 +00:00
Quentin Colombet 61b305edfd [ShrinkWrap] Add (a simplified version) of shrink-wrapping.
This patch introduces a new pass that computes the safe point to insert the
prologue and epilogue of the function.
The interest is to find safe points that are cheaper than the entry and exits
blocks.

As an example and to avoid regressions to be introduce, this patch also
implements the required bits to enable the shrink-wrapping pass for AArch64.


** Context **

Currently we insert the prologue and epilogue of the method/function in the
entry and exits blocks. Although this is correct, we can do a better job when
those are not immediately required and insert them at less frequently executed
places.
The job of the shrink-wrapping pass is to identify such places.


** Motivating example **

Let us consider the following function that perform a call only in one branch of
a if:
define i32 @f(i32 %a, i32 %b)  {
 %tmp = alloca i32, align 4
 %tmp2 = icmp slt i32 %a, %b
 br i1 %tmp2, label %true, label %false

true:
 store i32 %a, i32* %tmp, align 4
 %tmp4 = call i32 @doSomething(i32 0, i32* %tmp)
 br label %false

false:
 %tmp.0 = phi i32 [ %tmp4, %true ], [ %a, %0 ]
 ret i32 %tmp.0
}

On AArch64 this code generates (removing the cfi directives to ease
readabilities):
_f:                                     ; @f
; BB#0:
  stp x29, x30, [sp, #-16]!
  mov  x29, sp
  sub sp, sp, #16             ; =16
  cmp  w0, w1
  b.ge  LBB0_2
; BB#1:                                 ; %true
  stur  w0, [x29, #-4]
  sub x1, x29, #4             ; =4
  mov  w0, wzr
  bl  _doSomething
LBB0_2:                                 ; %false
  mov  sp, x29
  ldp x29, x30, [sp], #16
  ret

With shrink-wrapping we could generate:
_f:                                     ; @f
; BB#0:
  cmp  w0, w1
  b.ge  LBB0_2
; BB#1:                                 ; %true
  stp x29, x30, [sp, #-16]!
  mov  x29, sp
  sub sp, sp, #16             ; =16
  stur  w0, [x29, #-4]
  sub x1, x29, #4             ; =4
  mov  w0, wzr
  bl  _doSomething
  add sp, x29, #16            ; =16
  ldp x29, x30, [sp], #16
LBB0_2:                                 ; %false
  ret

Therefore, we would pay the overhead of setting up/destroying the frame only if
we actually do the call.


** Proposed Solution **

This patch introduces a new machine pass that perform the shrink-wrapping
analysis (See the comments at the beginning of ShrinkWrap.cpp for more details).
It then stores the safe save and restore point into the MachineFrameInfo
attached to the MachineFunction.
This information is then used by the PrologEpilogInserter (PEI) to place the
related code at the right place. This pass runs right before the PEI.

Unlike the original paper of Chow from PLDI’88, this implementation of
shrink-wrapping does not use expensive data-flow analysis and does not need hack
to properly avoid frequently executed point. Instead, it relies on dominance and
loop properties.

The pass is off by default and each target can opt-in by setting the
EnableShrinkWrap boolean to true in their derived class of TargetPassConfig.
This setting can also be overwritten on the command line by using
-enable-shrink-wrap.

Before you try out the pass for your target, make sure you properly fix your
emitProlog/emitEpilog/adjustForXXX method to cope with basic blocks that are not
necessarily the entry block.


** Design Decisions **

1. ShrinkWrap is its own pass right now. It could frankly be merged into PEI but
for debugging and clarity I thought it was best to have its own file.
2. Right now, we only support one save point and one restore point. At some
point we can expand this to several save point and restore point, the impacted
component would then be:
- The pass itself: New algorithm needed.
- MachineFrameInfo: Hold a list or set of Save/Restore point instead of one
  pointer.
- PEI: Should loop over the save point and restore point.
Anyhow, at least for this first iteration, I do not believe this is interesting
to support the complex cases. We should revisit that when we motivating
examples.

Differential Revision: http://reviews.llvm.org/D9210

<rdar://problem/3201744>

llvm-svn: 236507
2015-05-05 17:38:16 +00:00
Eric Christopher fcd3d87ad8 The base pointer save offset can be computed at initialization time,
do so and fix up the calls.

llvm-svn: 229169
2015-02-13 22:48:53 +00:00
Eric Christopher a4ae213193 PPC LinkageSize can be computed at initialization time, do so.
llvm-svn: 229163
2015-02-13 22:22:57 +00:00
Eric Christopher dc3a8a4a66 PPCFrameLowering's FramePointerOffset can be computed at initialization
time. Do so.

llvm-svn: 228998
2015-02-13 00:39:38 +00:00
Eric Christopher 736d39e189 The TOC save offset can be computed at compile time, do so and
propagate changes.

llvm-svn: 228997
2015-02-13 00:39:36 +00:00
Eric Christopher f71609b5dd The return save offset can be computed at initialization time - do
so and save the value.

llvm-svn: 228996
2015-02-13 00:39:27 +00:00
Benjamin Kramer a7c40ef022 Canonicalize header guards into a common format.
Add header guards to files that were missing guards. Remove #endif comments
as they don't seem common in LLVM (we can easily add them back if we decide
they're useful)

Changes made by clang-tidy with minor tweaks.

llvm-svn: 215558
2014-08-13 16:26:38 +00:00
Ulrich Weigand 8658f17eef [PowerPC] ELFv2 stack space reduction
The ELFv2 ABI reduces the amount of stack required to implement an
ABI-compliant function call in two ways:
* the "linkage area" is reduced from 48 bytes to 32 bytes by
  eliminating two unused doublewords
* the 64-byte "parameter save area" is now optional and need not be
  present in certain cases (it remains mandatory in functions with
  variable arguments, and functions that have any parameter that is
  passed on the stack)

The following patch implements this required changes:
- reducing the linkage area, and associated relocation of the TOC save
  slot, in getLinkageSize / getTOCSaveOffset (this requires updating all
  callers of these routines to pass in the isELFv2ABI flag).
- (partially) handling the case where the parameter save are is optional

This latter part requires some extra explanation:  Currently, we still
always allocate the parameter save area when *calling* a function.
That is certainly always compliant with the ABI, but may cause code to
allocate stack unnecessarily.  This can be addressed by a follow-on
optimization patch.

On the *callee* side, in LowerFormalArguments, we *must* track
correctly whether the ABI guarantees that the caller has allocated
the parameter save area for our use, and the patch does so. However,
there is one complication: the code that handles incoming "byval"
arguments will currently *always* write to the parameter save area,
because it has to force incoming register arguments to the stack since
it must return an *address* to implement the byval semantics.

To fix this, the patch changes the LowerFormalArguments code to write
arguments to a freshly allocated stack slot on the function's own stack
frame instead of the argument save area in those cases where that area
is not present.

Reviewed by Hal Finkel.

llvm-svn: 213490
2014-07-20 23:43:15 +00:00
Hal Finkel 3ee2af7d1c [PowerPC] 32-bit ELF PIC support
This adds initial support for PPC32 ELF PIC (Position Independent Code; the
-fPIC variety), thus rectifying a long-standing deficiency in the PowerPC
backend.

Patch by Justin Hibbits!

llvm-svn: 213427
2014-07-18 23:29:49 +00:00
Ulrich Weigand 8ca988f31a [PowerPC] Refactor getMinCallFrameSize / getMinCallArgumentsSize
As of r211495, the only remaining users of getMinCallFrameSize are in
core ABI code (LowerFormalParameter / LowerCall).  This is actually a
good thing, since the details of the parameter save area are ABI specific.

With the new ELFv2 ABI in particular, the rules defining the size of the
save area will become significantly more complex, so it wouldn't make
sense to implement those outside ABI code that has all required
information.

In preparation, this patch eliminates the getMinCallFrameSize (and
associated getMinCallArgumentsSize) routines, and inlines them into all
callers.  Note that since nearly all call arguments are constant, this
allows simplifying the inlined copies to a single line everywhere.

No change in generate code expected.

llvm-svn: 211497
2014-06-23 14:15:53 +00:00
Ulrich Weigand ad0cb91ed9 [PowerPC] Simplify and improve loading into TOC register
During an indirect function call sequence on the 64-bit SVR4 ABI,
generate code must load and then restore the TOC register.

This does not use a regular LOAD instruction since the TOC
register r2 is marked as reserved.  Instead, the are two
special instruction patterns:

 let RST = 2, DS = 2 in
 def LDinto_toc: DSForm_1a<58, 0, (outs), (ins g8rc:$reg),
                     "ld 2, 8($reg)", IIC_LdStLD,
                     [(PPCload_toc i64:$reg)]>, isPPC64;
 
 let RST = 2, DS = 10, RA = 1 in
 def LDtoc_restore : DSForm_1a<58, 0, (outs), (ins),
                     "ld 2, 40(1)", IIC_LdStLD,
                     [(PPCtoc_restore)]>, isPPC64;

Note that these not only restrict the destination of the
load to r2, but they also restrict the *source* of the
load to particular address combinations.  The latter is
a problem when we want to support the ELFv2 ABI, since
there the TOC save slot is no longer at 40(1).

This patch replaces those two instructions with a single
instruction pattern that only hard-codes r2 as destination,
but supports generic addresses as source.  This will allow
supporting the ELFv2 ABI, and also helps generate more
efficient code for calls to absolute addresses (allowing
simplification of the ppc64-calls.ll test case).

llvm-svn: 211193
2014-06-18 17:52:49 +00:00
Eric Christopher d104c31fc0 Move PPCFrameLowering into PPCSubtarget from PPCTargetMachine. Use
the initializeSubtargetDependencies code to obtain an initialized
subtarget and migrate a couple of subtarget using functions to the
.cpp file to avoid circular includes.

llvm-svn: 210822
2014-06-12 20:54:11 +00:00
Craig Topper 0d3fa92514 [C++11] Add 'override' keywords and remove 'virtual'. Additionally add 'final' and leave 'virtual' on some methods that are marked virtual without overriding anything and have no obvious overrides themselves. PowerPC edition
llvm-svn: 207504
2014-04-29 07:57:37 +00:00
Craig Topper e73658ddbb [C++] Use 'nullptr'.
llvm-svn: 207394
2014-04-28 04:05:08 +00:00
Hal Finkel f05d6c7843 PPC: Add base-pointer support to builtin setjmp/longjmp
First, this changes the base-pointer implementation to remove an unnecessary
complication (and one that is incompatible with how builtin SjLj is
implemented): instead of using r31 as the base pointer when it is not needed as
a frame pointer, now the base pointer will always be r30 when needed.

Second, we introduce another pseudo register, BP, which is used just like the FP
pseudo register to refer to the base register before we know for certain what
register it will be.

Third, we now save BP into the jmp_buf, and restore r30 from that slot in
longjmp.  If the function that called setjmp did not use a base pointer, then
r30 will be overwritten by the setjmp-calling-function's restore code. FP
restoration (which is restored into r31) works the same way.

llvm-svn: 186545
2013-07-17 23:50:51 +00:00
Hal Finkel a7c54e8cf4 PPC: Implement base pointer and stack realignment
This builds on some frame-lowering code that has existed since 2005 (r24224)
but was disabled in 2008 (r48188) because it needed base pointer support to
function correctly. This implementation follows the strategy suggested by Dale
Johannesen in r48188 where the following comment was added:

  This does not currently work, because the delta between old and new stack
  pointers is added to offsets that reference incoming parameters after the
  prolog is generated, and the code that does that doesn't handle a variable
  delta.  You don't want to do that anyway; a better approach is to reserve
  another register that retains to the incoming stack pointer, and reference
  parameters relative to that.

And now we do exactly that. If we don't need a frame pointer, then we use r31
as a base pointer. If we do need a frame pointer, then we use r30 as a base
pointer. The base pointer retains the value of the stack pointer before it was
decremented in the prologue. We then use the base pointer to resolve all
negative frame indicies. The basic scheme follows that for base pointers in the
X86 backend.

We use a base pointer when we need to dynamically realign the incoming stack
pointer. This currently applies only to static objects (dynamic allocas with
large alignments, and base-pointer support in SjLj lowering will come in future
commits).

llvm-svn: 186478
2013-07-17 00:45:52 +00:00
Hal Finkel aa03c03a2d Correct PPC FRAMEADDR lowering using a pseudo-register
The old code used to lower FRAMEADDR tried to replicate the logic in the real
frame-lowering code that determines whether or not the frame pointer (r31) will
be used. When it seemed as through the frame pointer would not be used, the
stack pointer (r1) was used instead. Unfortunately, because the stack size is
not yet known, this does not work. Instead, this change introduces new
always-reserved pseudo-registers (FP and FP8) that are replaced during prologue
insertion with the real frame-pointer register (either r1 or r31).

It is important that this intrinsic always return a valid frame address because
it is used by Clang to store the frame address as part of code generation for
__builtin_setjmp.

llvm-svn: 177653
2013-03-21 19:03:19 +00:00
Hal Finkel bb420f10e9 Allocate the RS spill slot for any PPC function with spills and a large stack frame
For spills into a large stack frame, the FI-elimination code uses the register
scavenger to obtain a free GPR for use with an r+r-addressed load or store.
When there are no available GPRs, the scavenger gets one by using its spill
slot. Previously, we were not always allocating that spill slot and the RS
would assert when the spill slot was needed.

I don't currently have a small test that triggered the assert, but I've
created a small regression test that verifies that the spill slot is now
added when the stack frame is sufficiently large.

llvm-svn: 177140
2013-03-15 05:06:04 +00:00
Hal Finkel 5a765fddb0 Provide the register scavenger to processFunctionBeforeFrameFinalized
Add the current PEI register scavenger as a parameter to the
processFunctionBeforeFrameFinalized callback.

This change is necessary in order to allow the PowerPC target code to
set the register scavenger frame index after the save-area offset
adjustments performed by processFunctionBeforeFrameFinalized. Only
after these adjustments have been made is it possible to estimate
the size of the stack frame.

llvm-svn: 177108
2013-03-14 20:33:40 +00:00
Hal Finkel ad92b46505 Add a comment about overlapping PPC frame offsets
I don't think that it is otherwise clear how the overlapping offsets
are processed into distinct spill slots. Comment that this is done
in processFunctionBeforeFrameFinalized.

llvm-svn: 177094
2013-03-14 18:38:31 +00:00
Eli Bendersky 8da87163ca Move the eliminateCallFramePseudoInstr method from TargetRegisterInfo
to TargetFrameLowering, where it belongs. Incidentally, this allows us
to delete some duplicated (and slightly different!) code in TRI.

There are potentially other layering problems that can be cleaned up
as a result, or in a similar manner.

The refactoring was OK'd by Anton Korobeynikov on llvmdev.

Note: this touches the target interfaces, so out-of-tree targets may
be affected.

llvm-svn: 175788
2013-02-21 20:05:00 +00:00
Hal Finkel e1df90958d PPC QPX requires a 32-byte aligned stack
On systems which support the QPX vector instructions, the stack must be
32-byte aligned.

llvm-svn: 173993
2013-01-30 23:43:27 +00:00
Chandler Carruth 802d755533 Sort includes for all of the .h files under the 'lib' tree. These were
missed in the first pass because the script didn't yet handle include
guards.

Note that the script is now able to handle all of these headers without
manual edits. =]

llvm-svn: 169224
2012-12-04 07:12:27 +00:00
Roman Divacky c9e23d93ae This patch corrects logic in PPCFrameLowering for save and restore of
nonvolatile condition register fields across calls under the SVR4 ABIs.                                            
                                                                                                                   
 * With the 64-bit ABI, the save location is at a fixed offset of 8 from                                           
the stack pointer.  The frame pointer cannot be used to access this                                                
portion of the stack frame since the distance from the frame pointer may                                           
change with alloca calls.                                                                                          
                                                                                                                   
 * With the 32-bit ABI, the save location is just below the general
register save area, and is accessed via the frame pointer like the rest
of the save areas.  This is an optional slot, so it must only be created                                           
if any of CR2, CR3, and CR4 were modified.                                                                      
                                                                                                                   
 * For both ABIs, save/restore logic is generated only if one of the     
nonvolatile CR fields were modified.                                   

I also took this opportunity to clean up an extra FIXME in
PPCFrameLowering.h.  Save area offsets for 32-bit GPRs are meaningless
for the 64-bit ABI, so I removed them for correctness and efficiency.


Fixes PR13708 and partially also PR13623. It lets us enable exception handling
on PPC64.

Patch by William J. Schmidt!

llvm-svn: 163713
2012-09-12 14:47:47 +00:00
Jia Liu b22310fda6 Emacs-tag and some comment fix for all ARM, CellSPU, Hexagon, MBlaze, MSP430, PPC, PTX, Sparc, X86, XCore.
llvm-svn: 150878
2012-02-18 12:03:15 +00:00
Evan Cheng 67c033e6b8 Move getInitialFrameState from TargetFrameInfo to MCAsmInfo (suggestions for
better location welcome).

llvm-svn: 135438
2011-07-18 22:29:13 +00:00
Anton Korobeynikov 2f93128109 Rename TargetFrameInfo into TargetFrameLowering. Also, put couple of FIXMEs and fixes here and there.
llvm-svn: 123170
2011-01-10 12:39:04 +00:00