[amdgpu] Default to code object v3
v4 is not yet readily available, and doesn't appear
to be implemented in the back end
Reviewed By: t-tye
Differential Revision: https://reviews.llvm.org/D93258
- Document which processors are supported by which runtimes.
- Add missing mappings for code object V2 note records
Differential Revision: https://reviews.llvm.org/D93016
- Document that the kernel descriptor defined is for code object V3.
Document that it also applies to earlier code object formats for CP.
- Document the deprecated bits in kernel descriptor.
Differential Revision: https://reviews.llvm.org/D91458
- In certain cases, a generic pointer could be assumed as a pointer to
the global memory space or other spaces. With a dedicated target hook
to query that address space from a given value, infer-address-space
pass could infer and propagate that to all its users.
Differential Revision: https://reviews.llvm.org/D91121
Add a calling convention called amdgpu_gfx for real function calls
within graphics shaders. For the moment, this uses the same calling
convention as other calls in amdgpu, with registers excluded for return
address, stack pointer and stack buffer descriptor.
Differential Revision: https://reviews.llvm.org/D88540
This differentiates the Ryzen 4000/4300/4500/4700 series APUs that were
previously included in gfx909.
Differential Revision: https://reviews.llvm.org/D90419
Change-Id: Ia901a7157eb2f73ccd9f25dbacec38427312377d
Make all of the "AMDGPU Machine Code GFX*" columns in the Memory Model
table a consistent width of 32-characters.
Best viewed with something like --word-diff
Differential Revision: https://reviews.llvm.org/D89977
Mostly NFC, but some changes are "bug fixes" rather than just e.g.
formatting changes or typo corrections.
- Fix typo "competing" -> "completing".
- Document why waintcnt is added to stores and not loads for
sequentially consistent ordering.
- Lowercase some mentions of `buffer_gl{0,1}_inv`.
- Make mentions of `*cnt(0)` consistently include the `(0)` count.
- Remove some mentions of instructions for incorrect address spaces. For
example, remove mention of `flat_load` from
`load atomic acquire workgroup global`.
- Re-flow some text to get all the target columns to fit in a
32-character wide column. Makes a future NFC patch to make these columns
both 32-character wide more straightforward.
Modified cherry-pick of patch by Tony Tye
Reviewed By: t-tye
Differential Revision: https://reviews.llvm.org/D89596
- AMDGPUUsage.rst: Correct AMD GPU DWARF address space table address
sizes which are in bits and not bytes.
- clang/.../Options.td: Improve description of AMD GPU options.
- Re-generate ClangComamndLineReference.rst from clang/.../Options.td .
Differential Revision: https://reviews.llvm.org/D90364
At AMD, in an internal audit of our code, we found some corner cases
where we were not quite differentiating targets enough for some old
hardware. This commit is part of fixing that by adding three new
targets:
* The "Oland" and "Hainan" variants of gfx601 are now split out into
gfx602. LLPC (in the GPUOpen driver) and other front-ends could use
that to avoid using the shaderZExport workaround on gfx602.
* One variant of gfx703 is now split out into gfx705. LLPC and other
front-ends could use that to avoid using the
shaderSpiCsRegAllocFragmentation workaround on gfx705.
* The "TongaPro" variant of gfx802 is now split out into gfx805.
TongaPro has a faster 64-bit shift than its former friends in gfx802,
and a subtarget feature could be set up for that to take advantage of
it. This commit does not make that change; it just adds the target.
V2: Add clang changes. Put TargetParser list in order.
V3: AMDGCNGPUs table in TargetParser.cpp needs to be in GPUKind order,
so fix the GPUKind order.
Differential Revision: https://reviews.llvm.org/D88916
Change-Id: Ia901a7157eb2f73ccd9f25dbacec38427312377d
- Rename AMDGPU SCC DWARF register to STATUS since the scalar
condition code is a bit within the STATUS register.
- Correct bit size of the VCC_64 register to 64 which is the size in
wave64 mode.
Differential Revision: https://reviews.llvm.org/D86259
- Clarify that these are extensions to DWARF 5 and not as yet a
proposal.
Reviewed By: scott.linder
Differential Revision: https://reviews.llvm.org/D70523
- Clarify what context is used in DWARF expression evaluation.
- Define location descriptions to fully resolve the context and so
include the context in their result.
- As a consequence of location descriptions being fully resoved,
change address spaces so only a swizzled and unswizzled private
address space is defined. The lane is now part of the location
description context.
- Clarify how call frame information is used to fully resolve
expressions that specify registers.
Reviewed By: scott.linder
Differential Revision: https://reviews.llvm.org/D70523
This doesn't appear used for anything, and is emitted incorrectly
based on the description. This also depends on the IR type, and
pointee element type.
Private pointers used to workaround IR semantics by artifically
reserving an object at offset 0 so no user object would be allocated
there. Since alloca now uses a non-0 address space, that workaround is
unnecssary and 0 can be treated as a valid pointer.
Summary:
- Correct missing space in some "note" and "TODO" directives in
AMDGPUUsage.rst
- Correct warning for heading underline being too short in
BitCodeFormat.rst
Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, kerbowa, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D80407
When the callee requires a dynamic stack realignment,
it is not possible to correcty access the incoming
stack arguments using the stack pointer. We reserve a
base pointer in such cases to access the function arguments
inside the callee. The base pointer will hold the incoming
stack pointer value before any kind of delta added to it.
Reviewed By: arsenm, scott.linder
Differential Revision: https://reviews.llvm.org/D78811
The AMDGPU target has a convention that defined all VGPRs
(execept the initial 32 argument registers) as callee-saved.
This convention is not efficient always, esp. when the callee
requiring more registers, ended up emitting a large number of
spills, even though its caller requires only a few.
This patch revises the ABI by introducing more scratch registers
that a callee can freely use.
The 256 vgpr registers now become:
32 argument registers
112 scratch registers and
112 callee saved registers.
The scratch registers and the CSRs are intermixed at regular
intervals (a split boundary of 8) to obtain a better occupancy.
Reviewers: arsenm, t-tye, rampitec, b-sumner, mjbedy, tpr
Reviewed By: arsenm, t-tye
Differential Revision: https://reviews.llvm.org/D76356
- Unify the sections on DWARF expression and location lists.
- Allow a location description to have one or more single location
descriptions.
- Define context of DWARF expression that includes an initial
stack. Allow initial stack to be used when evaluating location list
expression with overlapping PC ranges.
- Reorganize the DWARF proposal in AMDGPUUsage so suitable for
submission to the DWARF site.
- Replace CFI instruction DW_CFA_LLVM_def_cfa_aspace with
DW_CFA_def_aspace_cfa and DW_CFA_def_aspace_cfa_sf. This is to avoid
the problem that DW_CFA_def_cfa and DW_CFA_def_cfa_sf cannot use a
register that is not the size of an address in the CFA address
space.
- Clarify DWARF address class and DWARF address space. Define language
values for DWARF address classes and specify how they are used by
some common source languages.
- Define rules for accessing registers and derefencing memory when the
type size and register size or byte size operand do not match.
- Numerous cleanups for consistency.
Differential Revision: https://reviews.llvm.org/D70523
Remove the gap left between the stack pointer (s32) and frame pointer
(s34) now that the scratch wave offset is no longer a part of the
calling convention ABI.
Update llvm/docs/AMDGPUUsage.rst to reflect the change.
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D75657
Add the scratch wave offset to the scratch buffer descriptor (SRSrc) in
the entry function prologue. This allows us to removes the scratch wave
offset register from the calling convention ABI.
As part of this change, allow the use of an inline constant zero for the
SOffset of MUBUF instructions accessing the stack in entry functions
when a frame pointer is not requested/required. Entry functions with
calls still need to set up the calling convention ABI stack pointer
register, and reference it in order to address arguments of called
functions. The ABI stack pointer register remains unswizzled, but is now
wave-relative instead of queue-relative.
Non-entry functions also use an inline constant zero SOffset for
wave-relative scratch access, but continue to use the stack and frame
pointers as before. When the stack or frame pointer is converted to a
swizzled offset it is now scaled directly, as the scratch wave offset no
longer needs to be subtracted first.
Update llvm/docs/AMDGPUUsage.rst to reflect these changes to the calling
convention.
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D75138