[AMDGPU] Update AMDGPU PAL usage documentation

Change-Id: I65f3edcfe5063551cad5aab0da1374c3a6ccd3a2
This commit is contained in:
Tim Renouf 2021-03-30 08:33:07 +01:00
parent c352a2b829
commit 083b0f1b40
1 changed files with 371 additions and 127 deletions

View File

@ -10959,140 +10959,384 @@ AMDPAL
------ ------
This section provides code conventions used when the target triple OS is This section provides code conventions used when the target triple OS is
``amdpal`` (see :ref:`amdgpu-target-triples`) for passing runtime parameters ``amdpal`` (see :ref:`amdgpu-target-triples`).
from the application/runtime to each invocation of a hardware shader. These
parameters include both generic, application-controlled parameters called
*user data* as well as system-generated parameters that are a product of the
draw or dispatch execution.
User Data .. _amdgpu-amdpal-code-object-metadata-section:
~~~~~~~~~
Each hardware stage has a set of 32-bit *user data registers* which can be Code Object Metadata
written from a command buffer and then loaded into SGPRs when waves are launched ~~~~~~~~~~~~~~~~~~~~
via a subsequent dispatch or draw operation. This is the way most arguments are
passed from the application/runtime to a hardware shader.
Compute User Data
~~~~~~~~~~~~~~~~~
Compute shader user data mappings are simpler than graphics shaders and have a
fixed mapping.
Note that there are always 10 available *user data entries* in registers -
entries beyond that limit must be fetched from memory (via the spill table
pointer) by the shader.
.. table:: PAL Compute Shader User Data Registers
:name: pal-compute-user-data-registers
============= ================================
User Register Description
============= ================================
0 Global Internal Table (32-bit pointer)
1 Per-Shader Internal Table (32-bit pointer)
2 - 11 Application-Controlled User Data (10 32-bit values)
12 Spill Table (32-bit pointer)
13 - 14 Thread Group Count (64-bit pointer)
15 GDS Range
============= ================================
Graphics User Data
~~~~~~~~~~~~~~~~~~
Graphics pipelines support a much more flexible user data mapping:
.. table:: PAL Graphics Shader User Data Registers
:name: pal-graphics-user-data-registers
============= ================================
User Register Description
============= ================================
0 Global Internal Table (32-bit pointer)
+ Per-Shader Internal Table (32-bit pointer)
+ 1-15 Application Controlled User Data
(1-15 Contiguous 32-bit Values in Registers)
+ Spill Table (32-bit pointer)
+ Draw Index (First Stage Only)
+ Vertex Offset (First Stage Only)
+ Instance Offset (First Stage Only)
============= ================================
The placement of the global internal table remains fixed in the first *user
data SGPR register*. Otherwise all parameters are optional, and can be mapped
to any desired *user data SGPR register*, with the following restrictions:
* Draw Index, Vertex Offset, and Instance Offset can only be used by the first
active hardware stage in a graphics pipeline (i.e. where the API vertex
shader runs).
* Application-controlled user data must be mapped into a contiguous range of
user data registers.
* The application-controlled user data range supports compaction remapping, so
only *entries* that are actually consumed by the shader must be assigned to
corresponding *registers*. Note that in order to support an efficient runtime
implementation, the remapping must pack *registers* in the same order as
*entries*, with unused *entries* removed.
.. _pal_global_internal_table:
Global Internal Table
~~~~~~~~~~~~~~~~~~~~~
The global internal table is a table of *shader resource descriptors* (SRDs)
that define how certain engine-wide, runtime-managed resources should be
accessed from a shader. The majority of these resources have HW-defined formats,
and it is up to the compiler to write/read data as required by the target
hardware.
The following table illustrates the required format:
.. table:: PAL Global Internal Table
:name: pal-git-table
============= ================================
Offset Description
============= ================================
0-3 Graphics Scratch SRD
4-7 Compute Scratch SRD
8-11 ES/GS Ring Output SRD
12-15 ES/GS Ring Input SRD
16-19 GS/VS Ring Output #0
20-23 GS/VS Ring Output #1
24-27 GS/VS Ring Output #2
28-31 GS/VS Ring Output #3
32-35 GS/VS Ring Input SRD
36-39 Tessellation Factor Buffer SRD
40-43 Off-Chip LDS Buffer SRD
44-47 Off-Chip Param Cache Buffer SRD
48-51 Sample Position Buffer SRD
52 vaRange::ShadowDescriptorTable High Bits
============= ================================
The pointer to the global internal table passed to the shader as user data
is a 32-bit pointer. The top 32 bits should be assumed to be the same as
the top 32 bits of the pipeline, so the shader may use the program
counter's top 32 bits.
.. _pal_call-convention:
Call Convention
~~~~~~~~~~~~~~~
For graphics use cases, the calling convention is `amdgpu_gfx`.
.. note:: .. note::
`amdgpu_gfx` Function calls are currently in development and are The metadata is currently in development and is subject to major
subject to major changes. changes. Only the current version is supported. *When this document
was generated the version was 2.6.*
This calling convention shares most properties with calling non-kernel Code object metadata is specified by the ``NT_AMDGPU_METADATA`` note
functions (see record (see :ref:`amdgpu-note-records-v3-v4`).
:ref:`amdgpu-amdhsa-function-call-convention-non-kernel-functions`).
Differences are:
- Currently there are none, differences will be listed here The metadata is represented as Message Pack formatted binary data (see
[MsgPack]_). The top level is a Message Pack map that includes the keys
defined in table :ref:`amdgpu-amdpal-code-object-metadata-map-table`
and referenced tables.
Additional information can be added to the maps. To avoid conflicts, any
key names should be prefixed by "*vendor-name*." where ``vendor-name``
can be the name of the vendor and specific vendor tool that generates the
information. The prefix is abbreviated to simply "." when it appears
within a map that has been added by the same *vendor-name*.
.. table:: AMDPAL Code Object Metadata Map
:name: amdgpu-amdpal-code-object-metadata-map-table
=================== ============== ========= ======================================================================
String Key Value Type Required? Description
=================== ============== ========= ======================================================================
"amdpal.version" sequence of Required PAL code object metadata (major, minor) version. The current values
2 integers are defined by *Util::Abi::PipelineMetadata(Major|Minor)Version*.
"amdpal.pipelines" sequence of Required Per-pipeline metadata. See
map :ref:`amdgpu-amdpal-code-object-pipeline-metadata-map-table` for the
definition of the keys included in that map.
=================== ============== ========= ======================================================================
..
.. table:: AMDPAL Code Object Pipeline Metadata Map
:name: amdgpu-amdpal-code-object-pipeline-metadata-map-table
====================================== ============== ========= ===================================================
String Key Value Type Required? Description
====================================== ============== ========= ===================================================
".name" string Source name of the pipeline.
".type" string Pipeline type, e.g. VsPs. Values include:
- "VsPs"
- "Gs"
- "Cs"
- "Ngg"
- "Tess"
- "GsTess"
- "NggTess"
".internal_pipeline_hash" sequence of Required Internal compiler hash for this pipeline. Lower
2 integers 64 bits is the "stable" portion of the hash, used
for e.g. shader replacement lookup. Upper 64 bits
is the "unique" portion of the hash, used for
e.g. pipeline cache lookup. The value is
implementation defined, and can not be relied on
between different builds of the compiler.
".shaders" map Per-API shader metadata. See
:ref:`amdgpu-amdpal-code-object-shader-map-table`
for the definition of the keys included in that
map.
".hardware_stages" map Per-hardware stage metadata. See
:ref:`amdgpu-amdpal-code-object-hardware-stage-map-table`
for the definition of the keys included in that
map.
".shader_functions" map Per-shader function metadata. See
:ref:`amdgpu-amdpal-code-object-shader-function-map-table`
for the definition of the keys included in that
map.
".registers" map Required Hardware register configuration. See
:ref:`amdgpu-amdpal-code-object-register-map-table`
for the definition of the keys included in that
map.
".user_data_limit" integer Number of user data entries accessed by this
pipeline.
".spill_threshold" integer The user data spill threshold. 0xFFFF for
NoUserDataSpilling.
".uses_viewport_array_index" boolean Indicates whether or not the pipeline uses the
viewport array index feature. Pipelines which use
this feature can render into all 16 viewports,
whereas pipelines which do not use it are
restricted to viewport #0.
".es_gs_lds_size" integer Size in bytes of LDS space used internally for
handling data-passing between the ES and GS
shader stages. This can be zero if the data is
passed using off-chip buffers. This value should
be used to program all user-SGPRs which have been
marked with "UserDataMapping::EsGsLdsSize"
(typically only the GS and VS HW stages will ever
have a user-SGPR so marked).
".nggSubgroupSize" integer Explicit maximum subgroup size for NGG shaders
(maximum number of threads in a subgroup).
".num_interpolants" integer Graphics only. Number of PS interpolants.
".mesh_scratch_memory_size" integer Max mesh shader scratch memory used.
".api" string Name of the client graphics API.
".api_create_info" binary Graphics API shader create info binary blob. Can
be defined by the driver using the compiler if
they want to be able to correlate API-specific
information used during creation at a later time.
====================================== ============== ========= ===================================================
..
.. table:: AMDPAL Code Object Shader Map
:name: amdgpu-amdpal-code-object-shader-map-table
+-------------+--------------+-------------------------------------------------------------------+
|String Key |Value Type |Description |
+=============+==============+===================================================================+
|- ".compute" |map |See :ref:`amdgpu-amdpal-code-object-api-shader-metadata-map-table` |
|- ".vertex" | |for the definition of the keys included in that map. |
|- ".hull" | | |
|- ".domain" | | |
|- ".geometry"| | |
|- ".pixel" | | |
+-------------+--------------+-------------------------------------------------------------------+
..
.. table:: AMDPAL Code Object API Shader Metadata Map
:name: amdgpu-amdpal-code-object-api-shader-metadata-map-table
==================== ============== ========= =====================================================================
String Key Value Type Required? Description
==================== ============== ========= =====================================================================
".api_shader_hash" sequence of Required Input shader hash, typically passed in from the client. The value
2 integers is implementation defined, and can not be relied on between
different builds of the compiler.
".hardware_mapping" sequence of Required Flags indicating the HW stages this API shader maps to. Values
string include:
- ".ls"
- ".hs"
- ".es"
- ".gs"
- ".vs"
- ".ps"
- ".cs"
==================== ============== ========= =====================================================================
..
.. table:: AMDPAL Code Object Hardware Stage Map
:name: amdgpu-amdpal-code-object-hardware-stage-map-table
+-------------+--------------+-----------------------------------------------------------------------+
|String Key |Value Type |Description |
+=============+==============+=======================================================================+
|- ".ls" |map |See :ref:`amdgpu-amdpal-code-object-hardware-stage-metadata-map-table` |
|- ".hs" | |for the definition of the keys included in that map. |
|- ".es" | | |
|- ".gs" | | |
|- ".vs" | | |
|- ".ps" | | |
|- ".cs" | | |
+-------------+--------------+-----------------------------------------------------------------------+
..
.. table:: AMDPAL Code Object Hardware Stage Metadata Map
:name: amdgpu-amdpal-code-object-hardware-stage-metadata-map-table
========================== ============== ========= ===============================================================
String Key Value Type Required? Description
========================== ============== ========= ===============================================================
".entry_point" string The ELF symbol pointing to this pipeline's stage entry point.
".scratch_memory_size" integer Scratch memory size in bytes.
".lds_size" integer Local Data Share size in bytes.
".perf_data_buffer_size" integer Performance data buffer size in bytes.
".vgpr_count" integer Number of VGPRs used.
".sgpr_count" integer Number of SGPRs used.
".vgpr_limit" integer If non-zero, indicates the shader was compiled with a
directive to instruct the compiler to limit the VGPR usage to
be less than or equal to the specified value (only set if
different from HW default).
".sgpr_limit" integer SGPR count upper limit (only set if different from HW
default).
".threadgroup_dimensions" sequence of Thread-group X/Y/Z dimensions (Compute only).
3 integers
".wavefront_size" integer Wavefront size (only set if different from HW default).
".uses_uavs" boolean The shader reads or writes UAVs.
".uses_rovs" boolean The shader reads or writes ROVs.
".writes_uavs" boolean The shader writes to one or more UAVs.
".writes_depth" boolean The shader writes out a depth value.
".uses_append_consume" boolean The shader uses append and/or consume operations, either
memory or GDS.
".uses_prim_id" boolean The shader uses PrimID.
========================== ============== ========= ===============================================================
..
.. table:: AMDPAL Code Object Shader Function Map
:name: amdgpu-amdpal-code-object-shader-function-map-table
=============== ============== ====================================================================
String Key Value Type Description
=============== ============== ====================================================================
*symbol name* map *symbol name* is the ELF symbol name of the shader function code
entry address. The value is the function's metadata. See
:ref:`amdgpu-amdpal-code-object-shader-function-metadata-map-table`.
=============== ============== ====================================================================
..
.. table:: AMDPAL Code Object Shader Function Metadata Map
:name: amdgpu-amdpal-code-object-shader-function-metadata-map-table
============================= ============== =================================================================
String Key Value Type Description
============================= ============== =================================================================
".api_shader_hash" sequence of Input shader hash, typically passed in from the client. The value
2 integers is implementation defined, and can not be relied on between
different builds of the compiler.
".scratch_memory_size" sequence of Size in bytes of scratch memory used by the shader.
2 integers
".lds_size" sequence of Size in bytes of LDS memory.
2 integers
".vgpr_count" integer Number of VGPRs used by the shader.
".sgpr_count" integer Number of SGPRs used by the shader.
".stack_frame_size_in_bytes" integer Amount of stack size used by the shader.
".shader_subtype" string Shader subtype/kind. Values include:
- "Unknown"
============================= ============== =================================================================
..
.. table:: AMDPAL Code Object Register Map
:name: amdgpu-amdpal-code-object-register-map-table
========================== ============== ====================================================================
32-bit Integer Key Value Type Description
========================== ============== ====================================================================
``reg offset`` 32-bit integer ``reg offset`` is the dword offset into the GFXIP register space of
a GRBM register (i.e., driver accessible GPU register number, not
shader GPR register number). The driver is required to program each
specified register to the corresponding specified value when
executing this pipeline. Typically, the ``reg offsets`` are the
``uint16_t`` offsets to each register as defined by the hardware
chip headers. The register is set to the provided value. However, a
``reg offset`` that specifies a user data register (e.g.,
COMPUTE_USER_DATA_0) needs special treatment. See
:ref:`amdgpu-amdpal-code-object-user-data-section` section for more
information.
========================== ============== ====================================================================
.. _amdgpu-amdpal-code-object-user-data-section:
User Data
+++++++++
Each hardware stage has a set of 32-bit physical SPI *user data registers*
(either 16 or 32 based on graphics IP and the stage) which can be
written from a command buffer and then loaded into SGPRs when waves are
launched via a subsequent dispatch or draw operation. This is the way
most arguments are passed from the application/runtime to a hardware
shader.
PAL abstracts this functionality by exposing a set of 128 *user data
entries* per pipeline a client can use to pass arguments from a command
buffer to one or more shaders in that pipeline. The ELF code object must
specify a mapping from virtualized *user data entries* to physical *user
data registers*, and PAL is responsible for implementing that mapping,
including spilling overflow *user data entries* to memory if needed.
Since the *user data registers* are GRBM-accessible SPI registers, this
mapping is actually embedded in the ``.registers`` metadata entry. For
most registers, the value in that map is a literal 32-bit value that
should be written to the register by the driver. However, when the
register is a *user data register* (any USER_DATA register e.g.,
SPI_SHADER_USER_DATA_PS_5), the value is instead an encoding that tells
the driver to write either a *user data entry* value or one of several
driver-internal values to the register. This encoding is described in
the following table:
.. note::
Currently, *user data registers* 0 and 1 (e.g., SPI_SHADER_USER_DATA_PS_0,
and SPI_SHADER_USER_DATA_PS_1) are reserved. *User data register* 0 must
always be programmed to the address of the GlobalTable, and *user data
register* 1 must always be programmed to the address of the PerShaderTable.
..
.. table:: AMDPAL User Data Mapping
:name: amdgpu-amdpal-code-object-metadata-user-data-mapping-table
========== ================= ===============================================================================
Value Name Description
========== ================= ===============================================================================
0..127 *User Data Entry* 32-bit value of user_data_entry[N] as specified via *CmdSetUserData()*
0x10000000 GlobalTable 32-bit pointer to GPU memory containing the global internal table (should
always point to *user data register* 0).
0x10000001 PerShaderTable 32-bit pointer to GPU memory containing the per-shader internal table. See
:ref:`amdgpu-amdpal-code-object-metadata-user-data-per-shader-table-section`
for more detail (should always point to *user data register* 1).
0x10000002 SpillTable 32-bit pointer to GPU memory containing the user data spill table. See
:ref:`amdgpu-amdpal-code-object-metadata-user-data-spill-table-section` for
more detail.
0x10000003 BaseVertex Vertex offset (32-bit unsigned integer). Not needed if the pipeline doesn't
reference the draw index in the vertex shader. Only supported by the first
stage in a graphics pipeline.
0x10000004 BaseInstance Instance offset (32-bit unsigned integer). Only supported by the first stage in
a graphics pipeline.
0x10000005 DrawIndex Draw index (32-bit unsigned integer). Only supported by the first stage in a
graphics pipeline.
0x10000006 Workgroup Thread group count (32-bit unsigned integer). Low half of a 64-bit address of
a buffer containing the grid dimensions for a Compute dispatch operation. The
high half of the address is stored in the next sequential user-SGPR. Only
supported by compute pipelines.
0x1000000A EsGsLdsSize Indicates that PAL will program this user-SGPR to contain the amount of LDS
space used for the ES/GS pseudo-ring-buffer for passing data between shader
stages.
0x1000000B ViewId View id (32-bit unsigned integer) identifies a view of graphic
pipeline instancing.
0x1000000C StreamOutTable 32-bit pointer to GPU memory containing the stream out target SRD table. This
can only appear for one shader stage per pipeline.
0x1000000D PerShaderPerfData 32-bit pointer to GPU memory containing the per-shader performance data buffer.
0x1000000F VertexBufferTable 32-bit pointer to GPU memory containing the vertex buffer SRD table. This can
only appear for one shader stage per pipeline.
0x10000010 UavExportTable 32-bit pointer to GPU memory containing the UAV export SRD table. This can
only appear for one shader stage per pipeline (PS). These replace color targets
and are completely separate from any UAVs used by the shader. This is optional,
and only used by the PS when UAV exports are used to replace color-target
exports to optimize specific shaders.
0x10000011 NggCullingData 64-bit pointer to GPU memory containing the hardware register data needed by
some NGG pipelines to perform culling. This value contains the address of the
first of two consecutive registers which provide the full GPU address.
0x10000015 FetchShaderPtr 64-bit pointer to GPU memory containing the fetch shader subroutine.
========== ================= ===============================================================================
.. _amdgpu-amdpal-code-object-metadata-user-data-per-shader-table-section:
Per-Shader Table
################
Low 32 bits of the GPU address for an optional buffer in the ``.data``
section of the ELF. The high 32 bits of the address match the high 32 bits
of the shader's program counter.
The buffer can be anything the shader compiler needs it for, and
allows each shader to have its own region of the ``.data`` section.
Typically, this could be a table of buffer SRD's and the data pointed to
by the buffer SRD's, but it could be a flat-address region of memory as
well. Its layout and usage are defined by the shader compiler.
Each shader's table in the ``.data`` section is referenced by the symbol
``_amdgpu_``\ *xs*\ ``_shdr_intrl_data`` where *xs* corresponds with the
hardware shader stage the data is for. E.g.,
``_amdgpu_cs_shdr_intrl_data`` for the compute shader hardware stage.
.. _amdgpu-amdpal-code-object-metadata-user-data-spill-table-section:
Spill Table
###########
It is possible for a hardware shader to need access to more *user data
entries* than there are slots available in user data registers for one
or more hardware shader stages. In that case, the PAL runtime expects
the necessary *user data entries* to be spilled to GPU memory and use
one user data register to point to the spilled user data memory. The
value of the *user data entry* must then represent the location where
a shader expects to read the low 32-bits of the table's GPU virtual
address. The *spill table* itself represents a set of 32-bit values
managed by the PAL runtime in GPU-accessible memory that can be made
indirectly accessible to a hardware shader.
Unspecified OS Unspecified OS
-------------- --------------