forked from OSchip/llvm-project
[AMDGPU] Add an additional Code Object V3 assembler example
Document the intended use of the `.amdgcn.next_free_{s,v}gpr` in the context of multiple kernels and functions. Differential Revision: https://reviews.llvm.org/D59949 llvm-svn: 357289
This commit is contained in:
parent
2bff8b4272
commit
0bc9f15ddd
|
@ -5019,6 +5019,8 @@ For example, when assembling for a "GFX704" target this will be set to the
|
|||
integer value "4". The possible GFX stepping generation numbers are presented
|
||||
in :ref:`amdgpu-processors`.
|
||||
|
||||
.. _amdgpu-amdhsa-assembler-symbol-next_free_vgpr:
|
||||
|
||||
.amdgcn.next_free_vgpr
|
||||
++++++++++++++++++++++
|
||||
|
||||
|
@ -5032,6 +5034,8 @@ May be used to set the `.amdhsa_next_free_vpgr` directive in
|
|||
|
||||
May be set at any time, e.g. manually set to zero at the start of each kernel.
|
||||
|
||||
.. _amdgpu-amdhsa-assembler-symbol-next_free_sgpr:
|
||||
|
||||
.amdgcn.next_free_sgpr
|
||||
++++++++++++++++++++++
|
||||
|
||||
|
@ -5241,6 +5245,80 @@ Here is an example of a minimal assembly source file, defining one HSA kernel:
|
|||
...
|
||||
.end_amdgpu_metadata
|
||||
|
||||
If an assembly source file contains multiple kernels and/or functions, the
|
||||
:ref:`amdgpu-amdhsa-assembler-symbol-next_free_vgpr` and
|
||||
:ref:`amdgpu-amdhsa-assembler-symbol-next_free_sgpr` symbols may be reset using
|
||||
the ``.set <symbol>, <expression>`` directive. For example, in the case of two
|
||||
kernels, where ``function1`` is only called from ``kernel1`` it is sufficient
|
||||
to group the function with the kernel that calls it and reset the symbols
|
||||
between the two connected components:
|
||||
|
||||
.. code-block:: none
|
||||
|
||||
.amdgcn_target "amdgcn-amd-amdhsa--gfx900+xnack" // optional
|
||||
|
||||
// gpr tracking symbols are implicitly set to zero
|
||||
|
||||
.text
|
||||
.globl kern0
|
||||
.p2align 8
|
||||
.type kern0,@function
|
||||
kern0:
|
||||
// ...
|
||||
s_endpgm
|
||||
.Lkern0_end:
|
||||
.size kern0, .Lkern0_end-kern0
|
||||
|
||||
.rodata
|
||||
.p2align 6
|
||||
.amdhsa_kernel kern0
|
||||
// ...
|
||||
.amdhsa_next_free_vgpr .amdgcn.next_free_vgpr
|
||||
.amdhsa_next_free_sgpr .amdgcn.next_free_sgpr
|
||||
.end_amdhsa_kernel
|
||||
|
||||
// reset symbols to begin tracking usage in func1 and kern1
|
||||
.set .amdgcn.next_free_vgpr, 0
|
||||
.set .amdgcn.next_free_sgpr, 0
|
||||
|
||||
.text
|
||||
.hidden func1
|
||||
.global func1
|
||||
.p2align 2
|
||||
.type func1,@function
|
||||
func1:
|
||||
// ...
|
||||
s_setpc_b64 s[30:31]
|
||||
.Lfunc1_end:
|
||||
.size func1, .Lfunc1_end-func1
|
||||
|
||||
.globl kern1
|
||||
.p2align 8
|
||||
.type kern1,@function
|
||||
kern1:
|
||||
// ...
|
||||
s_getpc_b64 s[4:5]
|
||||
s_add_u32 s4, s4, func1@rel32@lo+4
|
||||
s_addc_u32 s5, s5, func1@rel32@lo+4
|
||||
s_swappc_b64 s[30:31], s[4:5]
|
||||
// ...
|
||||
s_endpgm
|
||||
.Lkern1_end:
|
||||
.size kern1, .Lkern1_end-kern1
|
||||
|
||||
.rodata
|
||||
.p2align 6
|
||||
.amdhsa_kernel kern1
|
||||
// ...
|
||||
.amdhsa_next_free_vgpr .amdgcn.next_free_vgpr
|
||||
.amdhsa_next_free_sgpr .amdgcn.next_free_sgpr
|
||||
.end_amdhsa_kernel
|
||||
|
||||
These symbols cannot identify connected components in order to automatically
|
||||
track the usage for each kernel. However, in some cases careful organization of
|
||||
the kernels and functions in the source file means there is minimal additional
|
||||
effort required to accurately calculate GPR usage.
|
||||
|
||||
Additional Documentation
|
||||
========================
|
||||
|
||||
|
|
Loading…
Reference in New Issue