Commit Graph

13 Commits

Author SHA1 Message Date
Christudasan Devadasan 375cec4b6c [AMDGPU] Introduce more scratch registers in the ABI.
The AMDGPU target has a convention that defined all VGPRs
(execept the initial 32 argument registers) as callee-saved.
This convention is not efficient always, esp. when the callee
requiring more registers, ended up emitting a large number of
spills, even though its caller requires only a few.

This patch revises the ABI by introducing more scratch registers
that a callee can freely use.
The 256 vgpr registers now become:
  32 argument registers
  112 scratch registers and
  112 callee saved registers.
The scratch registers and the CSRs are intermixed at regular
intervals (a split boundary of 8) to obtain a better occupancy.

Reviewers: arsenm, t-tye, rampitec, b-sumner, mjbedy, tpr

Reviewed By: arsenm, t-tye

Differential Revision: https://reviews.llvm.org/D76356
2020-05-05 23:02:58 +05:30
Scott Linder 0e9368cc8c [AMDGPU] Move frame pointer from s34 to s33
Remove the gap left between the stack pointer (s32) and frame pointer
(s34) now that the scratch wave offset is no longer a part of the
calling convention ABI.

Update llvm/docs/AMDGPUUsage.rst to reflect the change.

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D75657
2020-03-19 15:35:16 -04:00
Christudasan Devadasan b2d24bd540 [AMDGPU] Created a sub-register class for the return address operand in the return instruction.
Function return instruction lowering, currently uses the fixed register pair s[30:31] for holding
the return address. It can be any SGPR pair other than the CSRs. Created an SGPR pair sub-register class
exclusive of the CSRs, and used this regclass while lowering the return instruction.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D63924

llvm-svn: 365512
2019-07-09 16:48:42 +00:00
Matt Arsenault 71dfb7ec5c AMDGPU: Make s34 the FP register
Make the FP register callee saved.

This is tricky because now the FP needs to be spilled in the prolog
relative to the incoming SP register, rather than the frame register
used throughout the rest of the function. I don't like how this
bypassess the standard mechanism for CSR spills just to get the
correct insert point. I may look for a better solution, since all CSR
VGPRs may also need to have all lanes activated. Another option might
be to make getFrameIndexReference change the base register if the
frame index is a CSR, and then try to figure out the right insertion
point in emitProlog.

If there is a free VGPR lane available for SGPR spilling, try to use
it for the FP. If that would require intrtoducing a new VGPR spill,
try to use a free call clobbered SGPR. Only fallback to introducing a
new VGPR spill as a last resort.

This also doesn't attempt to handle SGPR spilling with scalar stores.

llvm-svn: 365372
2019-07-08 19:03:38 +00:00
Matt Arsenault d88db6d7fc AMDGPU: Always use s33 for global scratch wave offset
Every called function could possibly need this to calculate the
absolute address of stack objectst, and this avoids inserting a copy
around every call site in the kernel. It's also somewhat cleaner to
keep this in a callee saved SGPR.

llvm-svn: 363990
2019-06-20 21:58:24 +00:00
Matt Arsenault 34c8b835b1 AMDGPU: Don't fix emergency stack slot at offset 0
This forced the caller to be aware of this, which is an ugly ABI
feature.

Partially reverts r295877. The original reasons for doing this are
mostly fixed. Alloca is now in a non-0 address space, so it should be
OK to have 0 as a valid pointer. Since we treat the absolute address
as the pointer value, this part only really needed to apply to
kernels.

Since r357093, we avoid the need to increment/decrement the offset
register in more cases, and since r354816 the scavenger can fail
without spilling, so it's less critical that we try to avoid an offset
that fits in the MUBUF offset.

Restrict to callable functions for now to split this into 2 steps to
limit thte number of test updates and in case anything breaks.

llvm-svn: 362665
2019-06-05 22:37:50 +00:00
Matt Arsenault 3d59e388ca AMDGPU: Activate all lanes when spilling CSR VGPR for SGPR spills
If some lanes weren't active on entry to the function, this could
clobber their VGPR values.

llvm-svn: 361655
2019-05-24 18:18:51 +00:00
Jonas Paulsson 611b533f1d [SchedModel] Fix for read advance cycles with implicit pseudo operands.
The SchedModel allows the addition of ReadAdvances to express that certain
operands of the instructions are needed at a later point than the others.

RegAlloc may add pseudo operands that are not part of the instruction
descriptor, and therefore cannot have any read advance entries. This meant
that in some cases the desired read advance was nullified by such a pseudo
operand, which still had the original latency.

This patch fixes this by making sure that such pseudo operands get a zero
latency during DAG construction.

Review: Matthias Braun, Ulrich Weigand.
https://reviews.llvm.org/D49671

llvm-svn: 345606
2018-10-30 15:04:40 +00:00
Matt Arsenault ffb132e74b AMDGPU: Increase default stack alignment
8 and 16-byte values are common, so increase the default
alignment to avoid realigning the stack in most functions.

llvm-svn: 328821
2018-03-29 20:22:04 +00:00
Yaxun Liu 2a22c5deff [AMDGPU] Switch to the new addr space mapping by default
This requires corresponding clang change.

Differential Revision: https://reviews.llvm.org/D40955

llvm-svn: 324101
2018-02-02 16:07:16 +00:00
Matt Arsenault a202538bfa AMDGPU: Remove error on calls for amdgcn
Repurpose the -amdgpu-function-calls flag. Rather
than require it to emit a call, only use it to
run the always inline path or not.

llvm-svn: 310003
2017-08-03 23:24:05 +00:00
Matt Arsenault 8e8f8f43b0 AMDGPU: Fix clobbering CSR VGPRs when spilling SGPR to it
llvm-svn: 309783
2017-08-02 01:52:45 +00:00
Matt Arsenault b62a4eb524 AMDGPU: Initial implementation of calls
Includes a hack to fix the type selected for
the GlobalAddress of the function, which will be
fixed by changing the default datalayout to use
generic pointers for 0.

llvm-svn: 309732
2017-08-01 19:54:18 +00:00