2015-06-13 11:28:10 +08:00
|
|
|
==============================
|
|
|
|
User Guide for AMDGPU Back-end
|
|
|
|
==============================
|
2014-11-14 22:08:00 +08:00
|
|
|
|
|
|
|
Introduction
|
|
|
|
============
|
|
|
|
|
2015-06-13 11:28:10 +08:00
|
|
|
The AMDGPU back-end provides ISA code generation for AMD GPUs, starting with
|
2015-04-08 09:09:26 +08:00
|
|
|
the R600 family up until the current Volcanic Islands (GCN Gen 3).
|
2014-11-14 22:08:00 +08:00
|
|
|
|
|
|
|
|
2016-04-06 09:29:19 +08:00
|
|
|
Conventions
|
|
|
|
===========
|
|
|
|
|
|
|
|
Address Spaces
|
|
|
|
--------------
|
|
|
|
|
|
|
|
The AMDGPU back-end uses the following address space mapping:
|
|
|
|
|
|
|
|
============= ============================================
|
|
|
|
Address Space Memory Space
|
|
|
|
============= ============================================
|
|
|
|
0 Private
|
|
|
|
1 Global
|
|
|
|
2 Constant
|
|
|
|
3 Local
|
|
|
|
4 Generic (Flat)
|
|
|
|
5 Region
|
|
|
|
============= ============================================
|
|
|
|
|
|
|
|
The terminology in the table, aside from the region memory space, is from the
|
|
|
|
OpenCL standard.
|
|
|
|
|
|
|
|
|
2014-11-14 22:08:00 +08:00
|
|
|
Assembler
|
|
|
|
=========
|
|
|
|
|
2015-04-08 09:09:26 +08:00
|
|
|
The assembler is currently considered experimental.
|
|
|
|
|
2015-06-13 11:28:10 +08:00
|
|
|
For syntax examples look in test/MC/AMDGPU.
|
2015-04-08 09:09:26 +08:00
|
|
|
|
|
|
|
Below some of the currently supported features (modulo bugs). These
|
|
|
|
all apply to the Southern Islands ISA, Sea Islands and Volcanic Islands
|
|
|
|
are also supported but may be missing some instructions and have more bugs:
|
|
|
|
|
|
|
|
DS Instructions
|
|
|
|
---------------
|
|
|
|
All DS instructions are supported.
|
|
|
|
|
2015-06-13 04:47:06 +08:00
|
|
|
FLAT Instructions
|
|
|
|
------------------
|
|
|
|
These instructions are only present in the Sea Islands and Volcanic Islands
|
|
|
|
instruction set. All FLAT instructions are supported for these architectures
|
|
|
|
|
2015-04-08 09:09:26 +08:00
|
|
|
MUBUF Instructions
|
|
|
|
------------------
|
|
|
|
All non-atomic MUBUF instructions are supported.
|
|
|
|
|
|
|
|
SMRD Instructions
|
|
|
|
-----------------
|
|
|
|
Only the s_load_dword* SMRD instructions are supported.
|
|
|
|
|
|
|
|
SOP1 Instructions
|
|
|
|
-----------------
|
|
|
|
All SOP1 instructions are supported.
|
|
|
|
|
|
|
|
SOP2 Instructions
|
|
|
|
-----------------
|
|
|
|
All SOP2 instructions are supported.
|
|
|
|
|
|
|
|
SOPC Instructions
|
|
|
|
-----------------
|
|
|
|
All SOPC instructions are supported.
|
2014-11-14 22:08:00 +08:00
|
|
|
|
|
|
|
SOPP Instructions
|
|
|
|
-----------------
|
|
|
|
|
2015-04-08 09:09:26 +08:00
|
|
|
Unless otherwise mentioned, all SOPP instructions that have one or more
|
|
|
|
operands accept integer operands only. No verification is performed
|
|
|
|
on the operands, so it is up to the programmer to be familiar with the
|
|
|
|
range or acceptable values.
|
2014-11-14 22:08:00 +08:00
|
|
|
|
|
|
|
s_waitcnt
|
|
|
|
^^^^^^^^^
|
|
|
|
|
|
|
|
s_waitcnt accepts named arguments to specify which memory counter(s) to
|
|
|
|
wait for.
|
|
|
|
|
|
|
|
.. code-block:: nasm
|
|
|
|
|
2016-07-14 21:08:16 +08:00
|
|
|
; Wait for all counters to be 0
|
2014-11-14 22:08:00 +08:00
|
|
|
s_waitcnt 0
|
|
|
|
|
2016-07-14 21:08:16 +08:00
|
|
|
; Equivalent to s_waitcnt 0. Counter names can also be delimited by
|
|
|
|
; '&' or ','.
|
2014-11-14 22:08:00 +08:00
|
|
|
s_waitcnt vmcnt(0) expcnt(0) lgkcmt(0)
|
|
|
|
|
2016-07-14 21:08:16 +08:00
|
|
|
; Wait for vmcnt counter to be 1.
|
2014-11-14 22:08:00 +08:00
|
|
|
s_waitcnt vmcnt(1)
|
|
|
|
|
2015-04-08 09:09:26 +08:00
|
|
|
VOP1, VOP2, VOP3, VOPC Instructions
|
|
|
|
-----------------------------------
|
|
|
|
|
|
|
|
All 32-bit and 64-bit encodings should work.
|
|
|
|
|
|
|
|
The assembler will automatically detect which encoding size to use for
|
|
|
|
VOP1, VOP2, and VOPC instructions based on the operands. If you want to force
|
|
|
|
a specific encoding size, you can add an _e32 (for 32-bit encoding) or
|
|
|
|
_e64 (for 64-bit encoding) suffix to the instruction. Most, but not all
|
|
|
|
instructions support an explicit suffix. These are all valid assembly
|
|
|
|
strings:
|
|
|
|
|
|
|
|
.. code-block:: nasm
|
|
|
|
|
|
|
|
v_mul_i32_i24 v1, v2, v3
|
|
|
|
v_mul_i32_i24_e32 v1, v2, v3
|
|
|
|
v_mul_i32_i24_e64 v1, v2, v3
|
2015-06-27 05:15:07 +08:00
|
|
|
|
|
|
|
Assembler Directives
|
|
|
|
--------------------
|
|
|
|
|
|
|
|
.hsa_code_object_version major, minor
|
|
|
|
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
|
|
|
|
|
|
|
*major* and *minor* are integers that specify the version of the HSA code
|
|
|
|
object that will be generated by the assembler. This value will be stored
|
|
|
|
in an entry of the .note section.
|
|
|
|
|
|
|
|
.hsa_code_object_isa [major, minor, stepping, vendor, arch]
|
|
|
|
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
|
|
|
|
|
|
|
*major*, *minor*, and *stepping* are all integers that describe the instruction
|
|
|
|
set architecture (ISA) version of the assembly program.
|
|
|
|
|
|
|
|
*vendor* and *arch* are quoted strings. *vendor* should always be equal to
|
|
|
|
"AMD" and *arch* should always be equal to "AMDGPU".
|
|
|
|
|
|
|
|
If no arguments are specified, then the assembler will derive the ISA version,
|
|
|
|
*vendor*, and *arch* from the value of the -mcpu option that is passed to the
|
|
|
|
assembler.
|
|
|
|
|
|
|
|
ISA version, *vendor*, and *arch* will all be stored in a single entry of the
|
|
|
|
.note section.
|
2015-06-27 05:58:31 +08:00
|
|
|
|
|
|
|
.amd_kernel_code_t
|
|
|
|
^^^^^^^^^^^^^^^^^^
|
|
|
|
|
|
|
|
This directive marks the beginning of a list of key / value pairs that are used
|
|
|
|
to specify the amd_kernel_code_t object that will be emitted by the assembler.
|
|
|
|
The list must be terminated by the *.end_amd_kernel_code_t* directive. For
|
|
|
|
any amd_kernel_code_t values that are unspecified a default value will be
|
|
|
|
used. The default value for all keys is 0, with the following exceptions:
|
|
|
|
|
|
|
|
- *kernel_code_version_major* defaults to 1.
|
|
|
|
- *machine_kind* defaults to 1.
|
|
|
|
- *machine_version_major*, *machine_version_minor*, and
|
|
|
|
*machine_version_stepping* are derived from the value of the -mcpu option
|
|
|
|
that is passed to the assembler.
|
|
|
|
- *kernel_code_entry_byte_offset* defaults to 256.
|
|
|
|
- *wavefront_size* defaults to 6.
|
|
|
|
- *kernarg_segment_alignment*, *group_segment_alignment*, and
|
|
|
|
*private_segment_alignment* default to 4. Note that alignments are specified
|
|
|
|
as a power of two, so a value of **n** means an alignment of 2^ **n**.
|
|
|
|
|
|
|
|
The *.amd_kernel_code_t* directive must be placed immediately after the
|
|
|
|
function label and before any instructions.
|
|
|
|
|
|
|
|
For a full list of amd_kernel_code_t keys, see the examples in
|
|
|
|
test/CodeGen/AMDGPU/hsa.s. For an explanation of the meanings of the different
|
|
|
|
keys, see the comments in lib/Target/AMDGPU/AmdKernelCodeT.h
|
|
|
|
|
|
|
|
Here is an example of a minimal amd_kernel_code_t specification:
|
|
|
|
|
2016-07-20 01:46:55 +08:00
|
|
|
.. code-block:: none
|
2015-06-27 05:58:31 +08:00
|
|
|
|
|
|
|
.hsa_code_object_version 1,0
|
|
|
|
.hsa_code_object_isa
|
|
|
|
|
2016-02-23 02:36:00 +08:00
|
|
|
.hsatext
|
|
|
|
.globl hello_world
|
|
|
|
.p2align 8
|
|
|
|
.amdgpu_hsa_kernel hello_world
|
2015-06-27 05:58:31 +08:00
|
|
|
|
|
|
|
hello_world:
|
|
|
|
|
|
|
|
.amd_kernel_code_t
|
|
|
|
enable_sgpr_kernarg_segment_ptr = 1
|
|
|
|
is_ptr64 = 1
|
|
|
|
compute_pgm_rsrc1_vgprs = 0
|
|
|
|
compute_pgm_rsrc1_sgprs = 0
|
|
|
|
compute_pgm_rsrc2_user_sgpr = 2
|
|
|
|
kernarg_segment_byte_size = 8
|
|
|
|
wavefront_sgpr_count = 2
|
|
|
|
workitem_vgpr_count = 3
|
|
|
|
.end_amd_kernel_code_t
|
|
|
|
|
|
|
|
s_load_dwordx2 s[0:1], s[0:1] 0x0
|
|
|
|
v_mov_b32 v0, 3.14159
|
|
|
|
s_waitcnt lgkmcnt(0)
|
|
|
|
v_mov_b32 v1, s0
|
|
|
|
v_mov_b32 v2, s1
|
2016-02-23 02:36:00 +08:00
|
|
|
flat_store_dword v[1:2], v0
|
2015-06-27 05:58:31 +08:00
|
|
|
s_endpgm
|
2016-02-23 19:17:27 +08:00
|
|
|
.Lfunc_end0:
|
2016-02-23 02:36:00 +08:00
|
|
|
.size hello_world, .Lfunc_end0-hello_world
|