[AMDGPU] Cleanup AMDGPUUsage.rst

- Layout and typo improvements. - Add memory spaces section. - reStructure syntax fixes. Differential Revision: https://reviews.llvm.org/D90002
2020-10-22 07:15:41 +00:00 · 2020-10-22 07:15:41 +00:00 · bf6518a806
parent d590c85430
commit bf6518a806
1 changed files with 316 additions and 199 deletions
--- a/llvm/docs/AMDGPUUsage.rst
+++ b/llvm/docs/AMDGPUUsage.rst
@ -96,45 +96,45 @@ names from both the *Processor* and *Alternative Processor* can be used.
  .. table:: AMDGPU Processors
     :name: amdgpu-processor-table

-     =========== =============== ============ ===== ================= ======= ======================
+     =========== =============== ============ ===== ============================= ======= ======================
     Processor   Alternative     Target       dGPU/ Target                        ROCm    Example
                 Processor       Triple       APU   Features                      Support Products
                                 Architecture       Supported
                                                    [Default]
-     =========== =============== ============ ===== ================= ======= ======================
+     =========== =============== ============ ===== ============================= ======= ======================
     **Radeon HD 2000/3000 Series (R600)** [AMD-RADEON-HD-2000-3000]_
-     -----------------------------------------------------------------------------------------------
+     -----------------------------------------------------------------------------------------------------------
     ``r600``                    ``r600``     dGPU
     ``r630``                    ``r600``     dGPU
     ``rs880``                   ``r600``     dGPU
     ``rv670``                   ``r600``     dGPU
     **Radeon HD 4000 Series (R700)** [AMD-RADEON-HD-4000]_
-     -----------------------------------------------------------------------------------------------
+     -----------------------------------------------------------------------------------------------------------
     ``rv710``                   ``r600``     dGPU
     ``rv730``                   ``r600``     dGPU
     ``rv770``                   ``r600``     dGPU
     **Radeon HD 5000 Series (Evergreen)** [AMD-RADEON-HD-5000]_
-     -----------------------------------------------------------------------------------------------
+     -----------------------------------------------------------------------------------------------------------
     ``cedar``                   ``r600``     dGPU
     ``cypress``                 ``r600``     dGPU
     ``juniper``                 ``r600``     dGPU
     ``redwood``                 ``r600``     dGPU
     ``sumo``                    ``r600``     dGPU
     **Radeon HD 6000 Series (Northern Islands)** [AMD-RADEON-HD-6000]_
-     -----------------------------------------------------------------------------------------------
+     -----------------------------------------------------------------------------------------------------------
     ``barts``                   ``r600``     dGPU
     ``caicos``                  ``r600``     dGPU
     ``cayman``                  ``r600``     dGPU
     ``turks``                   ``r600``     dGPU
     **GCN GFX6 (Southern Islands (SI))** [AMD-GCN-GFX6]_
-     -----------------------------------------------------------------------------------------------
+     -----------------------------------------------------------------------------------------------------------
     ``gfx600``  - ``tahiti``    ``amdgcn``   dGPU
     ``gfx601``  - ``pitcairn``  ``amdgcn``   dGPU
                 - ``verde``
     ``gfx602``  - ``hainan``    ``amdgcn``   dGPU
                 - ``oland``
     **GCN GFX7 (Sea Islands (CI))** [AMD-GCN-GFX7]_
-     -----------------------------------------------------------------------------------------------
+     -----------------------------------------------------------------------------------------------------------
     ``gfx700``  - ``kaveri``    ``amdgcn``   APU                                         - A6-7000
                                                                                          - A6 Pro-7050B
                                                                                          - A8-7100
@ -166,9 +166,15 @@ names from both the *Processor* and *Alternative Processor* can be used.
                                                                                          - Radeon HD 8770
                                                                                          - R7 260
                                                                                          - R7 260X
-     ``gfx705``                  ``amdgcn``   APU
+     ``gfx705``                  ``amdgcn``   APU                                         *TBA*
+
+                                                                                          .. TODO::
+
+                                                                                             Add product
+                                                                                             names.
+
     **GCN GFX8 (Volcanic Islands (VI))** [AMD-GCN-GFX8]_
-     -----------------------------------------------------------------------------------------------
+     -----------------------------------------------------------------------------------------------------------
     ``gfx801``  - ``carrizo``   ``amdgcn``   APU   - xnack                               - A6-8500P
                                                      [on]                                - Pro A6-8500B
                                                                                          - A8-8600P
@ -206,10 +212,15 @@ names from both the *Processor* and *Alternative Processor* can be used.
                                                                                          - FirePro W7100
                                                                                          - Mobile FirePro
                                                                                            M7170
-     ``gfx810``  - ``stoney``    ``amdgcn``   APU   - xnack
+     ``gfx810``  - ``stoney``    ``amdgcn``   APU   - xnack                               *TBA*
                                                      [on]
+                                                                                          .. TODO::
+
+                                                                                             Add product
+                                                                                             names.
+
     **GCN GFX9** [AMD-GCN-GFX9]_
-     -----------------------------------------------------------------------------------------------
+     -----------------------------------------------------------------------------------------------------------
     ``gfx900``                  ``amdgcn``   dGPU  - xnack                       ROCm    - Radeon Vega
                                                      [off]                                 Frontier Edition
                                                                                          - Radeon RX Vega 56
@ -222,8 +233,10 @@ names from both the *Processor* and *Alternative Processor* can be used.
     ``gfx904``                  ``amdgcn``   dGPU  - xnack                               *TBA*
                                                      [off]
                                                                                          .. TODO::
+
                                                                                             Add product
                                                                                             names.
+
     ``gfx906``                  ``amdgcn``   dGPU  - xnack                               - Radeon Instinct MI50
                                                      [off]                               - Radeon Instinct MI60
                                                    - sram-ecc                            - Radeon VII
@ -233,15 +246,19 @@ names from both the *Processor* and *Alternative Processor* can be used.
                                                    - sram-ecc
                                                      [on]
                                                                                          .. TODO::
+
                                                                                             Add product
                                                                                             names.
+
     ``gfx909``                  ``amdgcn``   APU   - xnack                               *TBA*
-                                                      [on]
+                                                      [off]
                                                                                          .. TODO::
+
                                                                                             Add product
                                                                                             names.
+
     **GCN GFX10** [AMD-GCN-GFX10]_
-     -----------------------------------------------------------------------------------------------
+     -----------------------------------------------------------------------------------------------------------
     ``gfx1010``                 ``amdgcn``   dGPU  - xnack                               - Radeon RX 5700
                                                      [off]                               - Radeon RX 5700 XT
                                                    - wavefrontsize64                     - Radeon Pro 5600 XT
@ -254,9 +271,11 @@ names from both the *Processor* and *Alternative Processor* can be used.
                                                      [off]
                                                    - cumode
                                                      [off]
-                                                                              .. TODO
+                                                                                          .. TODO::
+
                                                                                             Add product
                                                                                             names.
+
     ``gfx1012``                 ``amdgcn``   dGPU  - xnack                               - Radeon RX 5500
                                                      [off]                               - Radeon RX 5500 XT
                                                    - wavefrontsize64
@ -267,24 +286,30 @@ names from both the *Processor* and *Alternative Processor* can be used.
                                                      [off]
                                                    - cumode
                                                      [off]
-                                                                              .. TODO
+                                                                                          .. TODO::
+
                                                                                             Add product
                                                                                             names.
+
     ``gfx1031``                 ``amdgcn``   dGPU  - wavefrontsize64                     *TBA*
                                                      [off]
                                                    - cumode
                                                      [off]
-                                                                              .. TODO
+                                                                                          .. TODO::
+
                                                                                             Add product
                                                                                             names.
+
     ``gfx1032``                 ``amdgcn``   dGPU  - wavefrontsize64                     *TBA*
                                                      [off]
                                                    - cumode
                                                      [off]
-                                                                              .. TODO
+                                                                                          .. TODO::
+
                                                                                             Add product
                                                                                             names.
-     =========== =============== ============ ===== ================= ======= ======================
+
+     =========== =============== ============ ===== ============================= ======= ======================

 .. _amdgpu-target-features:

@ -782,10 +807,10 @@ The AMDGPU backend uses the following ELF header:
  .. table:: AMDGPU ``EF_AMDGPU_MACH`` Values
     :name: amdgpu-ef-amdgpu-mach-table

-     ================================= ========== =============================
+     ==================================== ========== =============================
     Name                                 Value      Description (see
                                                     :ref:`amdgpu-processor-table`)
-     ================================= ========== =============================
+     ==================================== ========== =============================
     ``EF_AMDGPU_MACH_NONE``              0x000      *not specified*
     ``EF_AMDGPU_MACH_R600_R600``         0x001      ``r600``
     ``EF_AMDGPU_MACH_R600_R630``         0x002      ``r630``
@ -834,7 +859,7 @@ The AMDGPU backend uses the following ELF header:
     ``EF_AMDGPU_MACH_AMDGCN_GFX602``     0x03a      ``gfx602``
     ``EF_AMDGPU_MACH_AMDGCN_GFX705``     0x03b      ``gfx705``
     ``EF_AMDGPU_MACH_AMDGCN_GFX805``     0x03c      ``gfx805``
-     ================================= ========== =============================
+     ==================================== ========== =============================

 Sections
 --------
@ -922,8 +947,8 @@ Code Object V2 Note Records (--amdhsa-code-object-version=2)
  default configuration (Code Object V3) see :ref:`amdgpu-note-records-v3`.

 The AMDGPU backend code object uses the following ELF note record in the
-``.note`` section when compiling for Code Object
-V2 (--amdhsa-code-object-version=2).
+``.note`` section when compiling for Code Object V2
+(--amdhsa-code-object-version=2).

 Additional note records may be present, but any which are not documented here
 are deprecated and should not be used.
@ -2359,12 +2384,14 @@ non-AMD key names should be prefixed by "*vendor-name*.".
                                                - "Region"

                                                .. TODO::
+
                                                   Is GlobalBuffer only Global
                                                   or Constant? Is
                                                   DynamicSharedPointer always
                                                   Local? Can HCC allow Generic?
                                                   How can Private or Region
                                                   ever happen?
+
     "AccQual"         string                   Kernel argument access
                                                qualifier. Only present if
                                                "ValueKind" is "Image" or
@ -2376,8 +2403,10 @@ non-AMD key names should be prefixed by "*vendor-name*.".
                                                - "ReadWrite"

                                                .. TODO::
+
                                                   Does this apply to
                                                   GlobalBuffer?
+
     "ActualAccQual"   string                   The actual memory accesses
                                                performed by the kernel on the
                                                kernel argument. Only present if
@ -2415,8 +2444,10 @@ non-AMD key names should be prefixed by "*vendor-name*.".
                                                if "ValueKind" is "Pipe".

                                                .. TODO::
+
                                                   Can GlobalBuffer be pipe
                                                   qualified?
+
     ================= ============== ========= ================================

 ..
@ -2838,12 +2869,14 @@ same *vendor-name*.
                                                     - "region"

                                                     .. TODO::
+
                                                        Is "global_buffer" only "global"
                                                        or "constant"? Is
                                                        "dynamic_shared_pointer" always
                                                        "local"? Can HCC allow "generic"?
                                                        How can "private" or "region"
                                                        ever happen?
+
     ".access"              string                   Kernel argument access
                                                     qualifier. Only present if
                                                     ".value_kind" is "image" or
@ -2855,8 +2888,10 @@ same *vendor-name*.
                                                     - "read_write"

                                                     .. TODO::
+
                                                        Does this apply to
                                                        "global_buffer"?
+
     ".actual_access"       string                   The actual memory accesses
                                                     performed by the kernel on the
                                                     kernel argument. Only present if
@ -2894,8 +2929,10 @@ same *vendor-name*.
                                                     if ".value_kind" is "pipe".

                                                     .. TODO::
+
                                                        Can "global_buffer" be pipe
                                                        qualified?
+
     ====================== ============== ========= ================================

 ..
@ -2903,12 +2940,12 @@ same *vendor-name*.
 Kernel Dispatch
 ~~~~~~~~~~~~~~~

-The HSA architected queuing language (AQL) defines a user space memory
-interface that can be used to control the dispatch of kernels, in an agent
-independent way. An agent can have zero or more AQL queues created for it using
-the ROCm runtime, in which AQL packets (all of which are 64 bytes) can be
-placed. See the *HSA Platform System Architecture Specification* [HSA]_ for the
-AQL queue mechanics and packet layouts.
+The HSA architected queuing language (AQL) defines a user space memory interface
+that can be used to control the dispatch of kernels, in an agent independent
+way. An agent can have zero or more AQL queues created for it using the ROCm
+runtime, in which AQL packets (all of which are 64 bytes) can be placed. See the
+*HSA Platform System Architecture Specification* [HSA]_ for the AQL queue
+mechanics and packet layouts.

 The packet processor of a kernel agent is responsible for detecting and
 dispatching HSA kernels from the AQL queues associated with it. For AMD GPUs the
@ -2965,6 +3002,86 @@ CPU host program, or from an HSA kernel executing on a GPU.
 10. When the kernel dispatch has completed execution, CP signals the completion
    signal specified in the kernel dispatch packet if not 0.

+.. _amdgpu-amdhsa-memory-spaces:
+
+Memory Spaces
+~~~~~~~~~~~~~
+
+The memory space properties are:
+
+  .. table:: AMDHSA Memory Spaces
+     :name: amdgpu-amdhsa-memory-spaces-table
+
+     ================= =========== ======== ======= ==================
+     Memory Space Name HSA Segment Hardware Address NULL Value
+                       Name        Name     Size
+     ================= =========== ======== ======= ==================
+     Private           private     scratch  32      0x00000000
+     Local             group       LDS      32      0xFFFFFFFF
+     Global            global      global   64      0x0000000000000000
+     Constant          constant    *same as 64      0x0000000000000000
+                                   global*
+     Generic           flat        flat     64      0x0000000000000000
+     Region            N/A         GDS      32      *not implemented
+                                                    for AMDHSA*
+     ================= =========== ======== ======= ==================
+
+The global and constant memory spaces both use global virtual addresses, which
+are the same virtual address space used by the CPU. However, some virtual
+addresses may only be accessible to the CPU, some only accessible by the GPU,
+and some by both.
+
+Using the constant memory space indicates that the data will not change during
+the execution of the kernel. This allows scalar read instructions to be
+used. The vector and scalar L1 caches are invalidated of volatile data before
+each kernel dispatch execution to allow constant memory to change values between
+kernel dispatches.
+
+The local memory space uses the hardware Local Data Store (LDS) which is
+automatically allocated when the hardware creates work-groups of wavefronts, and
+freed when all the wavefronts of a work-group have terminated. The data store
+(DS) instructions can be used to access it.
+
+The private memory space uses the hardware scratch memory support. If the kernel
+uses scratch, then the hardware allocates memory that is accessed using
+wavefront lane dword (4 byte) interleaving. The mapping used from private
+address to physical address is:
+
+  ``wavefront-scratch-base +
+  (private-address * wavefront-size * 4) +
+  (wavefront-lane-id * 4)``
+
+There are different ways that the wavefront scratch base address is determined
+by a wavefront (see :ref:`amdgpu-amdhsa-initial-kernel-execution-state`). This
+memory can be accessed in an interleaved manner using buffer instruction with
+the scratch buffer descriptor and per wavefront scratch offset, by the scratch
+instructions, or by flat instructions. If each lane of a wavefront accesses the
+same private address, the interleaving results in adjacent dwords being accessed
+and hence requires fewer cache lines to be fetched. Multi-dword access is not
+supported except by flat and scratch instructions in GFX9-GFX10.
+
+The generic address space uses the hardware flat address support available in
+GFX7-GFX10. This uses two fixed ranges of virtual addresses (the private and
+local apertures), that are outside the range of addressible global memory, to
+map from a flat address to a private or local address.
+
+FLAT instructions can take a flat address and access global, private (scratch)
+and group (LDS) memory depending in if the address is within one of the
+aperture ranges. Flat access to scratch requires hardware aperture setup and
+setup in the kernel prologue (see
+:ref:`amdgpu-amdhsa-kernel-prolog-flat-scratch`). Flat access to LDS requires
+hardware aperture setup and M0 (GFX7-GFX8) register setup (see
+:ref:`amdgpu-amdhsa-kernel-prolog-m0`).
+
+To convert between a segment address and a flat address the base address of the
+apertures address can be used. For GFX7-GFX8 these are available in the
+:ref:`amdgpu-amdhsa-hsa-aql-queue` the address of which can be obtained with
+Queue Ptr SGPR (see :ref:`amdgpu-amdhsa-initial-kernel-execution-state`). For
+GFX9-GFX10 the aperture base addresses are directly available as inline constant
+registers ``SRC_SHARED_BASE/LIMIT`` and ``SRC_PRIVATE_BASE/LIMIT``. In 64 bit
+address mode the aperture sizes are 2^32 bytes and the base is aligned to 2^32
+which makes it easier to convert from flat to segment or segment to flat.
+
 Image and Samplers
 ~~~~~~~~~~~~~~~~~~

@ -3635,7 +3752,7 @@ SGPR register initial state is defined in
     First      Private Segment Buffer     4      V# that can be used, together
                (enable_sgpr_private              with Scratch Wavefront Offset
                _segment_buffer)                  as an offset, to access the
-                                                  private address space using a
+                                                  private memory space using a
                                                  segment address.

                                                  CP uses the value provided by
@ -3835,13 +3952,13 @@ VGPR register initial state is defined in
                (kernel descriptor enable  of
                field)                     VGPRs
     ========== ========================== ====== ==============================
-     First      Work-Item Id X             1      32-bit work item id in X
+     First      Work-Item Id X             1      32-bit work-item id in X
                (Always initialized)              dimension of work-group for
                                                  wavefront lane.
-     then       Work-Item Id Y             1      32-bit work item id in Y
+     then       Work-Item Id Y             1      32-bit work-item id in Y
                (enable_vgpr_workitem_id          dimension of work-group for
                > 0)                              wavefront lane.
-     then       Work-Item Id Z             1      32-bit work item id in Z
+     then       Work-Item Id Z             1      32-bit work-item id in Z
                (enable_vgpr_workitem_id          dimension of work-group for
                > 1)                              wavefront lane.
     ========== ========================== ====== ==============================
@ -4100,7 +4217,7 @@ For GFX6-GFX9:
 * The scalar memory operations access a scalar L1 cache shared by all wavefronts
  on a group of CUs. The scalar and vector L1 caches are not coherent. However,
  scalar operations are used in a restricted way so do not impact the memory
-  model. See :ref:`amdgpu-address-spaces`.
+  model. See :ref:`amdgpu-amdhsa-memory-spaces`.
 * The vector and scalar memory operations use an L2 cache shared by all CUs on
  the same agent.
 * The L2 cache has independent channels to service disjoint ranges of virtual
@ -4155,7 +4272,7 @@ For GFX10:
 * The scalar memory operations access a scalar L0 cache shared by all wavefronts
  on a WGP. The scalar and vector L0 caches are not coherent. However, scalar
  operations are used in a restricted way so do not impact the memory model. See
-  :ref:`amdgpu-address-spaces`.
+  :ref:`amdgpu-amdhsa-memory-spaces`.
 * The vector and scalar memory L0 caches use an L1 cache shared by all WGPs on
  the same SA. Therefore, no special action is required for coherence between
  the wavefronts of a single work-group. However, a ``BUFFER_GL1_INV`` is
@ -4220,7 +4337,7 @@ variables. Therefore, the kernel machine code does not have to maintain the
 scalar L1 cache to ensure it is coherent with the vector L1 cache. The scalar
 and vector L1 caches are invalidated between kernel dispatches by CP since
 constant address space data may change between kernel dispatch executions. See
-:ref:`amdgpu-address-spaces`.
+:ref:`amdgpu-amdhsa-memory-spaces`.

 The one exception is if scalar writes are used to spill SGPR registers. In this
 case the AMDGPU backend ensures the memory location used to spill is never