[AMDGPU] Add an additional Code Object V3 assembler example

Document the intended use of the `.amdgcn.next_free_{s,v}gpr` in the
context of multiple kernels and functions.

Differential Revision: https://reviews.llvm.org/D59949

llvm-svn: 357289
This commit is contained in:
Scott Linder 2019-03-29 17:49:51 +00:00
parent 2bff8b4272
commit 0bc9f15ddd
1 changed files with 78 additions and 0 deletions

View File

@ -5019,6 +5019,8 @@ For example, when assembling for a "GFX704" target this will be set to the
integer value "4". The possible GFX stepping generation numbers are presented
in :ref:`amdgpu-processors`.
.. _amdgpu-amdhsa-assembler-symbol-next_free_vgpr:
.amdgcn.next_free_vgpr
++++++++++++++++++++++
@ -5032,6 +5034,8 @@ May be used to set the `.amdhsa_next_free_vpgr` directive in
May be set at any time, e.g. manually set to zero at the start of each kernel.
.. _amdgpu-amdhsa-assembler-symbol-next_free_sgpr:
.amdgcn.next_free_sgpr
++++++++++++++++++++++
@ -5241,6 +5245,80 @@ Here is an example of a minimal assembly source file, defining one HSA kernel:
...
.end_amdgpu_metadata
If an assembly source file contains multiple kernels and/or functions, the
:ref:`amdgpu-amdhsa-assembler-symbol-next_free_vgpr` and
:ref:`amdgpu-amdhsa-assembler-symbol-next_free_sgpr` symbols may be reset using
the ``.set <symbol>, <expression>`` directive. For example, in the case of two
kernels, where ``function1`` is only called from ``kernel1`` it is sufficient
to group the function with the kernel that calls it and reset the symbols
between the two connected components:
.. code-block:: none
.amdgcn_target "amdgcn-amd-amdhsa--gfx900+xnack" // optional
// gpr tracking symbols are implicitly set to zero
.text
.globl kern0
.p2align 8
.type kern0,@function
kern0:
// ...
s_endpgm
.Lkern0_end:
.size kern0, .Lkern0_end-kern0
.rodata
.p2align 6
.amdhsa_kernel kern0
// ...
.amdhsa_next_free_vgpr .amdgcn.next_free_vgpr
.amdhsa_next_free_sgpr .amdgcn.next_free_sgpr
.end_amdhsa_kernel
// reset symbols to begin tracking usage in func1 and kern1
.set .amdgcn.next_free_vgpr, 0
.set .amdgcn.next_free_sgpr, 0
.text
.hidden func1
.global func1
.p2align 2
.type func1,@function
func1:
// ...
s_setpc_b64 s[30:31]
.Lfunc1_end:
.size func1, .Lfunc1_end-func1
.globl kern1
.p2align 8
.type kern1,@function
kern1:
// ...
s_getpc_b64 s[4:5]
s_add_u32 s4, s4, func1@rel32@lo+4
s_addc_u32 s5, s5, func1@rel32@lo+4
s_swappc_b64 s[30:31], s[4:5]
// ...
s_endpgm
.Lkern1_end:
.size kern1, .Lkern1_end-kern1
.rodata
.p2align 6
.amdhsa_kernel kern1
// ...
.amdhsa_next_free_vgpr .amdgcn.next_free_vgpr
.amdhsa_next_free_sgpr .amdgcn.next_free_sgpr
.end_amdhsa_kernel
These symbols cannot identify connected components in order to automatically
track the usage for each kernel. However, in some cases careful organization of
the kernels and functions in the source file means there is minimal additional
effort required to accurately calculate GPR usage.
Additional Documentation
========================