[AMDGPU][MC][DOC] Updated AMD GPU assembler description.

Summary of changes:
- Updated to reflect recent changes in assembler;
- Minor bugfixing and improvements.

llvm-svn: 372857
This commit is contained in:
Dmitry Preobrazhensky 2019-09-25 12:38:35 +00:00
parent 20f4afc5a7
commit b9683d3c53
58 changed files with 1058 additions and 709 deletions

View File

@ -566,7 +566,7 @@ SOPC
s_cmp_lg_u64 :ref:`ssrc0<amdgpu_synid8_ssrc64_0>`, :ref:`ssrc1<amdgpu_synid8_ssrc64_0>`
s_cmp_lt_i32 :ref:`ssrc0<amdgpu_synid8_ssrc32_0>`, :ref:`ssrc1<amdgpu_synid8_ssrc32_0>`
s_cmp_lt_u32 :ref:`ssrc0<amdgpu_synid8_ssrc32_0>`, :ref:`ssrc1<amdgpu_synid8_ssrc32_0>`
s_set_gpr_idx_on :ref:`ssrc<amdgpu_synid8_ssrc32_0>`, :ref:`imm4<amdgpu_synid8_imm4>`
s_set_gpr_idx_on :ref:`ssrc<amdgpu_synid8_ssrc32_0>`, :ref:`imask<amdgpu_synid8_imask>`
s_setvskip :ref:`ssrc0<amdgpu_synid8_ssrc32_0>`, :ref:`ssrc1<amdgpu_synid8_ssrc32_0>`
SOPK
@ -624,7 +624,7 @@ SOPP
s_nop :ref:`imm16<amdgpu_synid8_bimm16>`
s_sendmsg :ref:`msg<amdgpu_synid8_msg>`
s_sendmsghalt :ref:`msg<amdgpu_synid8_msg>`
s_set_gpr_idx_mode :ref:`imm4<amdgpu_synid8_imm4>`
s_set_gpr_idx_mode :ref:`imask<amdgpu_synid8_imask>`
s_set_gpr_idx_off
s_sethalt :ref:`imm16<amdgpu_synid8_bimm16>`
s_setkill :ref:`imm16<amdgpu_synid8_bimm16>`
@ -1756,7 +1756,7 @@ VOPC
gfx8_fimm16
gfx8_fimm32
gfx8_hwreg
gfx8_imm4
gfx8_imask
gfx8_label
gfx8_msg
gfx8_param

View File

@ -736,7 +736,7 @@ SOPC
s_cmp_lg_u64 :ref:`ssrc0<amdgpu_synid9_ssrc64_0>`, :ref:`ssrc1<amdgpu_synid9_ssrc64_0>`
s_cmp_lt_i32 :ref:`ssrc0<amdgpu_synid9_ssrc32_0>`, :ref:`ssrc1<amdgpu_synid9_ssrc32_0>`
s_cmp_lt_u32 :ref:`ssrc0<amdgpu_synid9_ssrc32_0>`, :ref:`ssrc1<amdgpu_synid9_ssrc32_0>`
s_set_gpr_idx_on :ref:`ssrc<amdgpu_synid9_ssrc32_0>`, :ref:`imm4<amdgpu_synid9_imm4>`
s_set_gpr_idx_on :ref:`ssrc<amdgpu_synid9_ssrc32_0>`, :ref:`imask<amdgpu_synid9_imask>`
s_setvskip :ref:`ssrc0<amdgpu_synid9_ssrc32_0>`, :ref:`ssrc1<amdgpu_synid9_ssrc32_0>`
SOPK
@ -796,7 +796,7 @@ SOPP
s_nop :ref:`imm16<amdgpu_synid9_bimm16>`
s_sendmsg :ref:`msg<amdgpu_synid9_msg>`
s_sendmsghalt :ref:`msg<amdgpu_synid9_msg>`
s_set_gpr_idx_mode :ref:`imm4<amdgpu_synid9_imm4>`
s_set_gpr_idx_mode :ref:`imask<amdgpu_synid9_imask>`
s_set_gpr_idx_off
s_sethalt :ref:`imm16<amdgpu_synid9_bimm16>`
s_setkill :ref:`imm16<amdgpu_synid9_bimm16>`
@ -2010,7 +2010,7 @@ VOPC
gfx9_fimm16
gfx9_fimm32
gfx9_hwreg
gfx9_imm4
gfx9_imask
gfx9_label
gfx9_msg
gfx9_param

View File

@ -10,5 +10,5 @@
imm16
===========================
An :ref:`integer_number<amdgpu_synid_integer_number>`. The value is truncated to 16 bits.
A 16-bit :ref:`integer_number<amdgpu_synid_integer_number>` or an :ref:`absolute_expression<amdgpu_synid_absolute_expression>`. The value must be in the range -32768..65535.

View File

@ -10,5 +10,5 @@
imm32
===========================
An :ref:`integer_number<amdgpu_synid_integer_number>`. The value is truncated to 32 bits.
An :ref:`integer_number<amdgpu_synid_integer_number>` or an :ref:`absolute_expression<amdgpu_synid_absolute_expression>`. The value is truncated to 32 bits.

View File

@ -21,7 +21,7 @@ Optionally may serve as an output data:
* :ref:`dmask<amdgpu_synid_dmask>` may specify 2 data elements for 32-bit-per-pixel surfaces or 4 data elements for 64-bit-per-pixel surfaces. Each data element occupies 1 dword.
* :ref:`tfe<amdgpu_synid_tfe>` adds 1 dword if specified.
Note. The surface data format is indicated in the image resource constant but not in the instruction.
Note: the surface data format is indicated in the image resource constant but not in the instruction.
*Operands:* :ref:`v<amdgpu_synid_v>`

View File

@ -21,6 +21,6 @@ Optionally may serve as an output data:
* :ref:`dmask<amdgpu_synid_dmask>` may specify 1 data element for 32-bit-per-pixel surfaces or 2 data elements for 64-bit-per-pixel surfaces. Each data element occupies 1 dword.
* :ref:`tfe<amdgpu_synid_tfe>` adds 1 dword if specified.
Note. The surface data format is indicated in the image resource constant but not in the instruction.
Note: the surface data format is indicated in the image resource constant but not in the instruction.
*Operands:* :ref:`v<amdgpu_synid_v>`

View File

@ -10,5 +10,6 @@
imm32
===========================
An :ref:`integer_number<amdgpu_synid_integer_number>` or a :ref:`floating-point_number<amdgpu_synid_floating-point_number>`. The number is converted to *f16* as described :ref:`here<amdgpu_synid_lit_conv>`.
A :ref:`floating-point_number<amdgpu_synid_floating-point_number>`, an :ref:`integer_number<amdgpu_synid_integer_number>`, or an :ref:`absolute_expression<amdgpu_synid_absolute_expression>`.
The value is converted to *f16* as described :ref:`here<amdgpu_synid_fp_conv>`.

View File

@ -10,5 +10,6 @@
imm32
===========================
An :ref:`integer_number<amdgpu_synid_integer_number>` or a :ref:`floating-point_number<amdgpu_synid_floating-point_number>`. The value is converted to *f32* as described :ref:`here<amdgpu_synid_lit_conv>`.
A :ref:`floating-point_number<amdgpu_synid_floating-point_number>`, an :ref:`integer_number<amdgpu_synid_integer_number>`, or an :ref:`absolute_expression<amdgpu_synid_absolute_expression>`.
The value is converted to *f32* as described :ref:`here<amdgpu_synid_fp_conv>`.

View File

@ -14,18 +14,21 @@ Bits of a hardware register being accessed.
The bits of this operand have the following meaning:
============ ===================================
Bits Description
============ ===================================
5:0 Register *id*.
10:6 First bit *offset* (0..31).
15:11 *Size* in bits (1..32).
============ ===================================
======= ===================== ============
Bits Description Value Range
======= ===================== ============
5:0 Register *id*. 0..63
10:6 First bit *offset*. 0..31
15:11 *Size* in bits. 1..32
======= ===================== ============
This operand may be specified as a positive 16-bit :ref:`integer_number<amdgpu_synid_integer_number>` or using the syntax described below.
This operand may be specified as one of the following:
* An :ref:`integer_number<amdgpu_synid_integer_number>` or an :ref:`absolute_expression<amdgpu_synid_absolute_expression>`. The value must be in the range 0..0xFFFF.
* An *hwreg* value described below.
==================================== ============================================================================
Syntax Description
Hwreg Value Syntax Description
==================================== ============================================================================
hwreg({0..63}) All bits of a register indicated by its *id*.
hwreg(<*name*>) All bits of a register indicated by its *name*.
@ -33,7 +36,8 @@ This operand may be specified as a positive 16-bit :ref:`integer_number<amdgpu_s
hwreg(<*name*>, {0..31}, {1..32}) Register bits indicated by register *name*, first bit *offset* and *size*.
==================================== ============================================================================
Register *id*, *offset* and *size* must be specified as positive :ref:`integer numbers<amdgpu_synid_integer_number>`.
Numeric values may be specified as positive :ref:`integer numbers<amdgpu_synid_integer_number>`
or :ref:`absolute expressions<amdgpu_synid_absolute_expression>`.
Defined register *names* include:
@ -62,7 +66,16 @@ Examples:
.. parsed-literal::
s_getreg_b32 s2, 0x6
reg = 1
offset = 2
size = 4
hwreg_enc = reg | (offset << 6) | ((size - 1) << 11)
s_getreg_b32 s2, 0x1881
s_getreg_b32 s2, hwreg_enc // the same as above
s_getreg_b32 s2, hwreg(1, 2, 4) // the same as above
s_getreg_b32 s2, hwreg(reg, offset, size) // the same as above
s_getreg_b32 s2, hwreg(15)
s_getreg_b32 s2, hwreg(51, 1, 31)
s_getreg_b32 s2, hwreg(HW_REG_LDS_ALLOC, 0, 1)

View File

@ -12,19 +12,26 @@ label
A branch target which is a 16-bit signed integer treated as a PC-relative dword offset.
This operand may be specified as:
This operand may be specified as one of the following:
* An :ref:`integer_number<amdgpu_synid_integer_number>`. The number is truncated to 16 bits.
* An :ref:`absolute_expression<amdgpu_synid_absolute_expression>` which must start with an :ref:`integer_number<amdgpu_synid_integer_number>`. The value of the expression is truncated to 16 bits.
* A :ref:`symbol<amdgpu_synid_symbol>` (for example, a label). The value is handled as a 16-bit PC-relative dword offset to be resolved by a linker.
* An :ref:`integer_number<amdgpu_synid_integer_number>` or an :ref:`absolute_expression<amdgpu_synid_absolute_expression>`. The value must be in the range -32768..65535.
* A :ref:`symbol<amdgpu_synid_symbol>` (for example, a label) representing a relocatable address in the same compilation unit where it is referred from. The value is handled as a 16-bit PC-relative dword offset to be resolved by a linker.
Examples:
.. parsed-literal::
offset = 30
s_branch loop_end
s_branch 2 + offset
s_branch 32
loop_end:
label_1:
label_2 = . + 4
s_branch 32
s_branch offset + 2
s_branch label_1
s_branch label_2
s_branch label_3
s_branch label_4
label_3 = label_2 + 4
label_4:

View File

@ -12,24 +12,29 @@ msg
A 16-bit message code. The bits of this operand have the following meaning:
============ ======================================================
Bits Description
============ ======================================================
3:0 Message *type*.
6:4 Optional *operation*.
9:7 Optional *parameters*.
15:10 Unused.
============ ======================================================
============ =============================== ===============
Bits Description Value Range
============ =============================== ===============
3:0 Message *type*. 0..15
6:4 Optional *operation*. 0..7
7:7 Unused. \-
9:8 Optional *stream*. 0..3
15:10 Unused. \-
============ =============================== ===============
This operand may be specified as a positive 16-bit :ref:`integer_number<amdgpu_synid_integer_number>` or using the syntax described below:
This operand may be specified as one of the following:
======================================== ========================================================================
Syntax Description
======================================== ========================================================================
sendmsg(<*type*>) A message identified by its *type*.
sendmsg(<*type*>, <*op*>) A message identified by its *type* and *operation*.
sendmsg(<*type*>, <*op*>, <*stream*>) A message identified by its *type* and *operation* with a stream *id*.
======================================== ========================================================================
* An :ref:`integer_number<amdgpu_synid_integer_number>` or an :ref:`absolute_expression<amdgpu_synid_absolute_expression>`. The value must be in the range 0..0xFFFF.
* A *sendmsg* value described below.
==================================== ====================================================
Sendmsg Value Syntax Description
==================================== ====================================================
sendmsg(<*type*>) A message identified by its *type*.
sendmsg(<*type*>,<*op*>) A message identified by its *type* and *operation*.
sendmsg(<*type*>,<*op*>,<*stream*>) A message identified by its *type* and *operation*
with a stream *id*.
==================================== ====================================================
*Type* may be specified using message *name* or message *id*.
@ -37,7 +42,8 @@ This operand may be specified as a positive 16-bit :ref:`integer_number<amdgpu_s
Stream *id* is an integer in the range 0..3.
Message *id*, operation *id* and stream *id* must be specified as positive :ref:`integer numbers<amdgpu_synid_integer_number>`.
Numeric values may be specified as positive :ref:`integer numbers<amdgpu_synid_integer_number>`
or :ref:`absolute expressions<amdgpu_synid_absolute_expression>`.
Each message type supports specific operations:
@ -60,16 +66,32 @@ Each message type supports specific operations:
\ SYSMSG_OP_TTRACE_PC 4 \-
================= ========== ============================== ============ ==========
*Sendmsg* arguments are validated depending on how *type* value is specified:
* If message *type* is specified by name, arguments values must satisfy limitations detailed in the table above.
* If message *type* is specified as a number, each argument must not exceed corresponding value range (see the first table).
Examples:
.. parsed-literal::
// numeric message code
msg = 0x10
s_sendmsg 0x12
s_sendmsg msg + 2
// sendmsg with strict arguments validation
s_sendmsg sendmsg(MSG_INTERRUPT)
s_sendmsg sendmsg(MSG_GET_DOORBELL)
s_sendmsg sendmsg(2, GS_OP_CUT)
s_sendmsg sendmsg(MSG_GS, GS_OP_EMIT)
s_sendmsg sendmsg(MSG_GS, 2)
s_sendmsg sendmsg(MSG_GS_DONE, GS_OP_EMIT_CUT, 1)
s_sendmsg sendmsg(MSG_SYSMSG, SYSMSG_OP_TTRACE_PC)
s_sendmsg sendmsg(MSG_GET_DOORBELL)
// sendmsg with validation of value range only
msg = 2
op = 3
stream = 1
s_sendmsg sendmsg(msg, op, stream)
s_sendmsg sendmsg(2, GS_OP_CUT)

View File

@ -12,7 +12,8 @@ imm3
A bit mask which indicates request permissions.
This operand must be specified as an :ref:`integer_number<amdgpu_synid_integer_number>`. The value is truncated to 7 bits, but only 3 low bits are significant.
This operand must be specified as an :ref:`integer_number<amdgpu_synid_integer_number>` or an :ref:`absolute_expression<amdgpu_synid_absolute_expression>`.
The value is truncated to 7 bits, but only 3 low bits are significant.
============ ==============================
Bit Number Description

View File

@ -10,5 +10,5 @@
imm16
===========================
An :ref:`integer_number<amdgpu_synid_integer_number>`. The value is truncated to 16 bits and then sign-extended to 32 bits.
An :ref:`integer_number<amdgpu_synid_integer_number>` or an :ref:`absolute_expression<amdgpu_synid_absolute_expression>`. The value must be in the range -32768..65535.

View File

@ -10,5 +10,5 @@
imm16
===========================
An :ref:`integer_number<amdgpu_synid_integer_number>`. The value is truncated to 16 bits and then zero-extended to 32 bits.
An :ref:`integer_number<amdgpu_synid_integer_number>` or an :ref:`absolute_expression<amdgpu_synid_absolute_expression>`. The value must be in the range 0..65535.

View File

@ -14,30 +14,31 @@ Counts of outstanding instructions to wait for.
The bits of this operand have the following meaning:
============ ======================================================
Bits Description
============ ======================================================
3:0 VM_CNT: vector memory operations count, lower bits.
6:4 EXP_CNT: export count.
11:8 LGKM_CNT: LDS, GDS, Constant and Message count.
15:14 VM_CNT: vector memory operations count, upper bits.
============ ======================================================
========== ========= ================================================ ============
High Bits Low Bits Description Value Range
========== ========= ================================================ ============
15:14 3:0 VM_CNT: vector memory operations count. 0..63
\- 6:4 EXP_CNT: export count. 0..7
\- 11:8 LGKM_CNT: LDS, GDS, Constant and Message count. 0..15
========== ========= ================================================ ============
This operand may be specified as a positive 16-bit :ref:`integer_number<amdgpu_synid_integer_number>`
or as a combination of the following symbolic helpers:
This operand may be specified as one of the following:
* An :ref:`integer_number<amdgpu_synid_integer_number>` or an :ref:`absolute_expression<amdgpu_synid_absolute_expression>`. The value must be in the range 0..0xFFFF.
* A combination of *vmcnt*, *expcnt*, *lgkmcnt* and other values described below.
====================== ======================================================================
Syntax Description
====================== ======================================================================
vmcnt(<*N*>) VM_CNT value. *N* must not exceed the largest VM_CNT value.
expcnt(<*N*>) EXP_CNT value. *N* must not exceed the largest EXP_CNT value.
lgkmcnt(<*N*>) LGKM_CNT value. *N* must not exceed the largest LGKM_CNT value.
vmcnt_sat(<*N*>) VM_CNT value computed as min(*N*, the largest VM_CNT value).
expcnt_sat(<*N*>) EXP_CNT value computed as min(*N*, the largest EXP_CNT value).
lgkmcnt_sat(<*N*>) LGKM_CNT value computed as min(*N*, the largest LGKM_CNT value).
vmcnt(<*N*>) A VM_CNT value. *N* must not exceed the largest VM_CNT value.
expcnt(<*N*>) An EXP_CNT value. *N* must not exceed the largest EXP_CNT value.
lgkmcnt(<*N*>) An LGKM_CNT value. *N* must not exceed the largest LGKM_CNT value.
vmcnt_sat(<*N*>) A VM_CNT value computed as min(*N*, the largest VM_CNT value).
expcnt_sat(<*N*>) An EXP_CNT value computed as min(*N*, the largest EXP_CNT value).
lgkmcnt_sat(<*N*>) An LGKM_CNT value computed as min(*N*, the largest LGKM_CNT value).
====================== ======================================================================
These helpers may be specified in any order. Ampersands and commas may be used as optional separators.
These values may be specified in any order. Spaces, ampersands and commas may be used as optional separators.
*N* is either an
:ref:`integer number<amdgpu_synid_integer_number>` or an
@ -47,10 +48,18 @@ Examples:
.. parsed-literal::
s_waitcnt 0
vm_cnt = 1
exp_cnt = 2
lgkm_cnt = 3
cnt = vm_cnt | (exp_cnt << 4) | (lgkm_cnt << 8)
s_waitcnt cnt
s_waitcnt 1 | (2 << 4) | (3 << 8) // the same as above
s_waitcnt vmcnt(1) expcnt(2) lgkmcnt(3) // the same as above
s_waitcnt vmcnt(vm_cnt) expcnt(exp_cnt) lgkmcnt(lgkm_cnt) // the same as above
s_waitcnt vmcnt(1)
s_waitcnt expcnt(2) lgkmcnt(3)
s_waitcnt vmcnt(1) expcnt(2) lgkmcnt(3)
s_waitcnt vmcnt(1), expcnt(2), lgkmcnt(3)
s_waitcnt vmcnt(1) & lgkmcnt_sat(100) & expcnt(2)

View File

@ -10,5 +10,5 @@
imm16
===========================
An :ref:`integer_number<amdgpu_synid_integer_number>`. The value is truncated to 16 bits.
A 16-bit :ref:`integer_number<amdgpu_synid_integer_number>` or an :ref:`absolute_expression<amdgpu_synid_absolute_expression>`. The value must be in the range -32768..65535.

View File

@ -10,5 +10,5 @@
imm32
===========================
An :ref:`integer_number<amdgpu_synid_integer_number>`. The value is truncated to 32 bits.
An :ref:`integer_number<amdgpu_synid_integer_number>` or an :ref:`absolute_expression<amdgpu_synid_absolute_expression>`. The value is truncated to 32 bits.

View File

@ -21,7 +21,7 @@ Optionally may serve as an output data:
* :ref:`dmask<amdgpu_synid_dmask>` may specify 2 data elements for 32-bit-per-pixel surfaces or 4 data elements for 64-bit-per-pixel surfaces. Each data element occupies 1 dword.
* :ref:`tfe<amdgpu_synid_tfe>` adds 1 dword if specified.
Note. The surface data format is indicated in the image resource constant but not in the instruction.
Note: the surface data format is indicated in the image resource constant but not in the instruction.
*Operands:* :ref:`v<amdgpu_synid_v>`

View File

@ -21,6 +21,6 @@ Optionally may serve as an output data:
* :ref:`dmask<amdgpu_synid_dmask>` may specify 1 data element for 32-bit-per-pixel surfaces or 2 data elements for 64-bit-per-pixel surfaces. Each data element occupies 1 dword.
* :ref:`tfe<amdgpu_synid_tfe>` adds 1 dword if specified.
Note. The surface data format is indicated in the image resource constant but not in the instruction.
Note: the surface data format is indicated in the image resource constant but not in the instruction.
*Operands:* :ref:`v<amdgpu_synid_v>`

View File

@ -10,5 +10,6 @@
imm32
===========================
An :ref:`integer_number<amdgpu_synid_integer_number>` or a :ref:`floating-point_number<amdgpu_synid_floating-point_number>`. The value is converted to *f32* as described :ref:`here<amdgpu_synid_lit_conv>`.
A :ref:`floating-point_number<amdgpu_synid_floating-point_number>`, an :ref:`integer_number<amdgpu_synid_integer_number>`, or an :ref:`absolute_expression<amdgpu_synid_absolute_expression>`.
The value is converted to *f32* as described :ref:`here<amdgpu_synid_fp_conv>`.

View File

@ -14,18 +14,21 @@ Bits of a hardware register being accessed.
The bits of this operand have the following meaning:
============ ===================================
Bits Description
============ ===================================
5:0 Register *id*.
10:6 First bit *offset* (0..31).
15:11 *Size* in bits (1..32).
============ ===================================
======= ===================== ============
Bits Description Value Range
======= ===================== ============
5:0 Register *id*. 0..63
10:6 First bit *offset*. 0..31
15:11 *Size* in bits. 1..32
======= ===================== ============
This operand may be specified as a positive 16-bit :ref:`integer_number<amdgpu_synid_integer_number>` or using the syntax described below.
This operand may be specified as one of the following:
* An :ref:`integer_number<amdgpu_synid_integer_number>` or an :ref:`absolute_expression<amdgpu_synid_absolute_expression>`. The value must be in the range 0..0xFFFF.
* An *hwreg* value described below.
==================================== ============================================================================
Syntax Description
Hwreg Value Syntax Description
==================================== ============================================================================
hwreg({0..63}) All bits of a register indicated by its *id*.
hwreg(<*name*>) All bits of a register indicated by its *name*.
@ -33,7 +36,8 @@ This operand may be specified as a positive 16-bit :ref:`integer_number<amdgpu_s
hwreg(<*name*>, {0..31}, {1..32}) Register bits indicated by register *name*, first bit *offset* and *size*.
==================================== ============================================================================
Register *id*, *offset* and *size* must be specified as positive :ref:`integer numbers<amdgpu_synid_integer_number>`.
Numeric values may be specified as positive :ref:`integer numbers<amdgpu_synid_integer_number>`
or :ref:`absolute expressions<amdgpu_synid_absolute_expression>`.
Defined register *names* include:
@ -53,7 +57,16 @@ Examples:
.. parsed-literal::
s_getreg_b32 s2, 0x6
reg = 1
offset = 2
size = 4
hwreg_enc = reg | (offset << 6) | ((size - 1) << 11)
s_getreg_b32 s2, 0x1881
s_getreg_b32 s2, hwreg_enc // the same as above
s_getreg_b32 s2, hwreg(1, 2, 4) // the same as above
s_getreg_b32 s2, hwreg(reg, offset, size) // the same as above
s_getreg_b32 s2, hwreg(15)
s_getreg_b32 s2, hwreg(51, 1, 31)
s_getreg_b32 s2, hwreg(HW_REG_LDS_ALLOC, 0, 1)

View File

@ -12,19 +12,26 @@ label
A branch target which is a 16-bit signed integer treated as a PC-relative dword offset.
This operand may be specified as:
This operand may be specified as one of the following:
* An :ref:`integer_number<amdgpu_synid_integer_number>`. The number is truncated to 16 bits.
* An :ref:`absolute_expression<amdgpu_synid_absolute_expression>` which must start with an :ref:`integer_number<amdgpu_synid_integer_number>`. The value of the expression is truncated to 16 bits.
* A :ref:`symbol<amdgpu_synid_symbol>` (for example, a label). The value is handled as a 16-bit PC-relative dword offset to be resolved by a linker.
* An :ref:`integer_number<amdgpu_synid_integer_number>` or an :ref:`absolute_expression<amdgpu_synid_absolute_expression>`. The value must be in the range -32768..65535.
* A :ref:`symbol<amdgpu_synid_symbol>` (for example, a label) representing a relocatable address in the same compilation unit where it is referred from. The value is handled as a 16-bit PC-relative dword offset to be resolved by a linker.
Examples:
.. parsed-literal::
offset = 30
s_branch loop_end
s_branch 2 + offset
s_branch 32
loop_end:
label_1:
label_2 = . + 4
s_branch 32
s_branch offset + 2
s_branch label_1
s_branch label_2
s_branch label_3
s_branch label_4
label_3 = label_2 + 4
label_4:

View File

@ -12,24 +12,29 @@ msg
A 16-bit message code. The bits of this operand have the following meaning:
============ ======================================================
Bits Description
============ ======================================================
3:0 Message *type*.
6:4 Optional *operation*.
9:7 Optional *parameters*.
15:10 Unused.
============ ======================================================
============ =============================== ===============
Bits Description Value Range
============ =============================== ===============
3:0 Message *type*. 0..15
6:4 Optional *operation*. 0..7
7:7 Unused. \-
9:8 Optional *stream*. 0..3
15:10 Unused. \-
============ =============================== ===============
This operand may be specified as a positive 16-bit :ref:`integer_number<amdgpu_synid_integer_number>` or using the syntax described below:
This operand may be specified as one of the following:
======================================== ========================================================================
Syntax Description
======================================== ========================================================================
sendmsg(<*type*>) A message identified by its *type*.
sendmsg(<*type*>, <*op*>) A message identified by its *type* and *operation*.
sendmsg(<*type*>, <*op*>, <*stream*>) A message identified by its *type* and *operation* with a stream *id*.
======================================== ========================================================================
* An :ref:`integer_number<amdgpu_synid_integer_number>` or an :ref:`absolute_expression<amdgpu_synid_absolute_expression>`. The value must be in the range 0..0xFFFF.
* A *sendmsg* value described below.
==================================== ====================================================
Sendmsg Value Syntax Description
==================================== ====================================================
sendmsg(<*type*>) A message identified by its *type*.
sendmsg(<*type*>,<*op*>) A message identified by its *type* and *operation*.
sendmsg(<*type*>,<*op*>,<*stream*>) A message identified by its *type* and *operation*
with a stream *id*.
==================================== ====================================================
*Type* may be specified using message *name* or message *id*.
@ -37,7 +42,8 @@ This operand may be specified as a positive 16-bit :ref:`integer_number<amdgpu_s
Stream *id* is an integer in the range 0..3.
Message *id*, operation *id* and stream *id* must be specified as positive :ref:`integer numbers<amdgpu_synid_integer_number>`.
Numeric values may be specified as positive :ref:`integer numbers<amdgpu_synid_integer_number>`
or :ref:`absolute expressions<amdgpu_synid_absolute_expression>`.
Each message type supports specific operations:
@ -58,15 +64,31 @@ Each message type supports specific operations:
\ SYSMSG_OP_TTRACE_PC 4 \-
================= ========== ============================== ============ ==========
*Sendmsg* arguments are validated depending on how *type* value is specified:
* If message *type* is specified by name, arguments values must satisfy limitations detailed in the table above.
* If message *type* is specified as a number, each argument must not exceed corresponding value range (see the first table).
Examples:
.. parsed-literal::
// numeric message code
msg = 0x10
s_sendmsg 0x12
s_sendmsg msg + 2
// sendmsg with strict arguments validation
s_sendmsg sendmsg(MSG_INTERRUPT)
s_sendmsg sendmsg(2, GS_OP_CUT)
s_sendmsg sendmsg(MSG_GS, GS_OP_EMIT)
s_sendmsg sendmsg(MSG_GS, 2)
s_sendmsg sendmsg(MSG_GS_DONE, GS_OP_EMIT_CUT, 1)
s_sendmsg sendmsg(MSG_SYSMSG, SYSMSG_OP_TTRACE_PC)
// sendmsg with validation of value range only
msg = 2
op = 3
stream = 1
s_sendmsg sendmsg(msg, op, stream)
s_sendmsg sendmsg(2, GS_OP_CUT)

View File

@ -10,5 +10,5 @@
imm16
===========================
An :ref:`integer_number<amdgpu_synid_integer_number>`. The value is truncated to 16 bits and then sign-extended to 32 bits.
An :ref:`integer_number<amdgpu_synid_integer_number>` or an :ref:`absolute_expression<amdgpu_synid_absolute_expression>`. The value must be in the range -32768..65535.

View File

@ -10,5 +10,5 @@
imm16
===========================
An :ref:`integer_number<amdgpu_synid_integer_number>`. The value is truncated to 16 bits and then zero-extended to 32 bits.
An :ref:`integer_number<amdgpu_synid_integer_number>` or an :ref:`absolute_expression<amdgpu_synid_absolute_expression>`. The value must be in the range 0..65535.

View File

@ -14,29 +14,31 @@ Counts of outstanding instructions to wait for.
The bits of this operand have the following meaning:
============ ======================================================
Bits Description
============ ======================================================
3:0 VM_CNT: vector memory operations count.
6:4 EXP_CNT: export count.
12:8 LGKM_CNT: LDS, GDS, Constant and Message count.
============ ======================================================
===== ================================================ ============
Bits Description Value Range
===== ================================================ ============
3:0 VM_CNT: vector memory operations count. 0..15
6:4 EXP_CNT: export count. 0..7
12:8 LGKM_CNT: LDS, GDS, Constant and Message count. 0..31
===== ================================================ ============
This operand may be specified as a positive 16-bit :ref:`integer_number<amdgpu_synid_integer_number>`
or as a combination of the following symbolic helpers:
This operand may be specified as one of the following:
* An :ref:`integer_number<amdgpu_synid_integer_number>` or an :ref:`absolute_expression<amdgpu_synid_absolute_expression>`. The value must be in the range 0..0xFFFF.
* A combination of *vmcnt*, *expcnt*, *lgkmcnt* and other values described below.
====================== ======================================================================
Syntax Description
====================== ======================================================================
vmcnt(<*N*>) VM_CNT value. *N* must not exceed the largest VM_CNT value.
expcnt(<*N*>) EXP_CNT value. *N* must not exceed the largest EXP_CNT value.
lgkmcnt(<*N*>) LGKM_CNT value. *N* must not exceed the largest LGKM_CNT value.
vmcnt_sat(<*N*>) VM_CNT value computed as min(*N*, the largest VM_CNT value).
expcnt_sat(<*N*>) EXP_CNT value computed as min(*N*, the largest EXP_CNT value).
lgkmcnt_sat(<*N*>) LGKM_CNT value computed as min(*N*, the largest LGKM_CNT value).
vmcnt(<*N*>) A VM_CNT value. *N* must not exceed the largest VM_CNT value.
expcnt(<*N*>) An EXP_CNT value. *N* must not exceed the largest EXP_CNT value.
lgkmcnt(<*N*>) An LGKM_CNT value. *N* must not exceed the largest LGKM_CNT value.
vmcnt_sat(<*N*>) A VM_CNT value computed as min(*N*, the largest VM_CNT value).
expcnt_sat(<*N*>) An EXP_CNT value computed as min(*N*, the largest EXP_CNT value).
lgkmcnt_sat(<*N*>) An LGKM_CNT value computed as min(*N*, the largest LGKM_CNT value).
====================== ======================================================================
These helpers may be specified in any order. Ampersands and commas may be used as optional separators.
These values may be specified in any order. Spaces, ampersands and commas may be used as optional separators.
*N* is either an
:ref:`integer number<amdgpu_synid_integer_number>` or an
@ -46,10 +48,18 @@ Examples:
.. parsed-literal::
s_waitcnt 0
vm_cnt = 1
exp_cnt = 2
lgkm_cnt = 3
cnt = vm_cnt | (exp_cnt << 4) | (lgkm_cnt << 8)
s_waitcnt cnt
s_waitcnt 1 | (2 << 4) | (3 << 8) // the same as above
s_waitcnt vmcnt(1) expcnt(2) lgkmcnt(3) // the same as above
s_waitcnt vmcnt(vm_cnt) expcnt(exp_cnt) lgkmcnt(lgkm_cnt) // the same as above
s_waitcnt vmcnt(1)
s_waitcnt expcnt(2) lgkmcnt(3)
s_waitcnt vmcnt(1) expcnt(2) lgkmcnt(3)
s_waitcnt vmcnt(1), expcnt(2), lgkmcnt(3)
s_waitcnt vmcnt(1) & lgkmcnt_sat(100) & expcnt(2)

View File

@ -10,5 +10,5 @@
imm16
===========================
An :ref:`integer_number<amdgpu_synid_integer_number>`. The value is truncated to 16 bits.
A 16-bit :ref:`integer_number<amdgpu_synid_integer_number>` or an :ref:`absolute_expression<amdgpu_synid_absolute_expression>`. The value must be in the range -32768..65535.

View File

@ -10,5 +10,5 @@
imm32
===========================
An :ref:`integer_number<amdgpu_synid_integer_number>`. The value is truncated to 32 bits.
An :ref:`integer_number<amdgpu_synid_integer_number>` or an :ref:`absolute_expression<amdgpu_synid_absolute_expression>`. The value is truncated to 32 bits.

View File

@ -21,7 +21,7 @@ Optionally may serve as an output data:
* :ref:`dmask<amdgpu_synid_dmask>` may specify 2 data elements for 32-bit-per-pixel surfaces or 4 data elements for 64-bit-per-pixel surfaces. Each data element occupies 1 dword.
* :ref:`tfe<amdgpu_synid_tfe>` adds 1 dword if specified.
Note. The surface data format is indicated in the image resource constant but not in the instruction.
Note: the surface data format is indicated in the image resource constant but not in the instruction.
*Operands:* :ref:`v<amdgpu_synid_v>`

View File

@ -21,6 +21,6 @@ Optionally may serve as an output data:
* :ref:`dmask<amdgpu_synid_dmask>` may specify 1 data element for 32-bit-per-pixel surfaces or 2 data elements for 64-bit-per-pixel surfaces. Each data element occupies 1 dword.
* :ref:`tfe<amdgpu_synid_tfe>` adds 1 dword if specified.
Note. The surface data format is indicated in the image resource constant but not in the instruction.
Note: the surface data format is indicated in the image resource constant but not in the instruction.
*Operands:* :ref:`v<amdgpu_synid_v>`

View File

@ -10,5 +10,6 @@
imm32
===========================
An :ref:`integer_number<amdgpu_synid_integer_number>` or a :ref:`floating-point_number<amdgpu_synid_floating-point_number>`. The number is converted to *f16* as described :ref:`here<amdgpu_synid_lit_conv>`.
A :ref:`floating-point_number<amdgpu_synid_floating-point_number>`, an :ref:`integer_number<amdgpu_synid_integer_number>`, or an :ref:`absolute_expression<amdgpu_synid_absolute_expression>`.
The value is converted to *f16* as described :ref:`here<amdgpu_synid_fp_conv>`.

View File

@ -10,5 +10,6 @@
imm32
===========================
An :ref:`integer_number<amdgpu_synid_integer_number>` or a :ref:`floating-point_number<amdgpu_synid_floating-point_number>`. The value is converted to *f32* as described :ref:`here<amdgpu_synid_lit_conv>`.
A :ref:`floating-point_number<amdgpu_synid_floating-point_number>`, an :ref:`integer_number<amdgpu_synid_integer_number>`, or an :ref:`absolute_expression<amdgpu_synid_absolute_expression>`.
The value is converted to *f32* as described :ref:`here<amdgpu_synid_fp_conv>`.

View File

@ -14,18 +14,21 @@ Bits of a hardware register being accessed.
The bits of this operand have the following meaning:
============ ===================================
Bits Description
============ ===================================
5:0 Register *id*.
10:6 First bit *offset* (0..31).
15:11 *Size* in bits (1..32).
============ ===================================
======= ===================== ============
Bits Description Value Range
======= ===================== ============
5:0 Register *id*. 0..63
10:6 First bit *offset*. 0..31
15:11 *Size* in bits. 1..32
======= ===================== ============
This operand may be specified as a positive 16-bit :ref:`integer_number<amdgpu_synid_integer_number>` or using the syntax described below.
This operand may be specified as one of the following:
* An :ref:`integer_number<amdgpu_synid_integer_number>` or an :ref:`absolute_expression<amdgpu_synid_absolute_expression>`. The value must be in the range 0..0xFFFF.
* An *hwreg* value described below.
==================================== ============================================================================
Syntax Description
Hwreg Value Syntax Description
==================================== ============================================================================
hwreg({0..63}) All bits of a register indicated by its *id*.
hwreg(<*name*>) All bits of a register indicated by its *name*.
@ -33,7 +36,8 @@ This operand may be specified as a positive 16-bit :ref:`integer_number<amdgpu_s
hwreg(<*name*>, {0..31}, {1..32}) Register bits indicated by register *name*, first bit *offset* and *size*.
==================================== ============================================================================
Register *id*, *offset* and *size* must be specified as positive :ref:`integer numbers<amdgpu_synid_integer_number>`.
Numeric values may be specified as positive :ref:`integer numbers<amdgpu_synid_integer_number>`
or :ref:`absolute expressions<amdgpu_synid_absolute_expression>`.
Defined register *names* include:
@ -53,7 +57,16 @@ Examples:
.. parsed-literal::
s_getreg_b32 s2, 0x6
reg = 1
offset = 2
size = 4
hwreg_enc = reg | (offset << 6) | ((size - 1) << 11)
s_getreg_b32 s2, 0x1881
s_getreg_b32 s2, hwreg_enc // the same as above
s_getreg_b32 s2, hwreg(1, 2, 4) // the same as above
s_getreg_b32 s2, hwreg(reg, offset, size) // the same as above
s_getreg_b32 s2, hwreg(15)
s_getreg_b32 s2, hwreg(51, 1, 31)
s_getreg_b32 s2, hwreg(HW_REG_LDS_ALLOC, 0, 1)

View File

@ -0,0 +1,66 @@
..
**************************************************
* *
* Automatically generated file, do not edit! *
* *
**************************************************
.. _amdgpu_synid8_imask:
imask
===========================
This operand is a mask which controls indexing mode for operands of subsequent instructions.
Bits 0, 1 and 2 control indexing of *src0*, *src1* and *src2*, while bit 3 controls indexing of *dst*.
Value 1 enables indexing and value 0 disables it.
===== ========================================
Bit Meaning
===== ========================================
0 Enables or disables *src0* indexing.
1 Enables or disables *src1* indexing.
2 Enables or disables *src2* indexing.
3 Enables or disables *dst* indexing.
===== ========================================
This operand may be specified as one of the following:
* An :ref:`integer_number<amdgpu_synid_integer_number>` or an :ref:`absolute_expression<amdgpu_synid_absolute_expression>`. The value must be in the range 0..15.
* A *gpr_idx* value described below.
==================================== ===========================================
Gpr_idx Value Syntax Description
==================================== ===========================================
gpr_idx(*<operands>*) Enable indexing for specified *operands*
and disable it for the rest.
*Operands* is a comma-separated list of
values which may include:
* "SRC0" - enable *src0* indexing.
* "SRC1" - enable *src1* indexing.
* "SRC2" - enable *src2* indexing.
* "DST" - enable *dst* indexing.
Each of these values may be specified only
once.
*Operands* list may be empty; this syntax
disables indexing for all operands.
==================================== ===========================================
Examples:
.. parsed-literal::
s_set_gpr_idx_mode 0
s_set_gpr_idx_mode gpr_idx() // the same as above
s_set_gpr_idx_mode 15
s_set_gpr_idx_mode gpr_idx(DST,SRC0,SRC1,SRC2) // the same as above
s_set_gpr_idx_mode gpr_idx(SRC0,SRC1,SRC2,DST) // the same as above
s_set_gpr_idx_mode gpr_idx(DST,SRC1)

View File

@ -1,25 +0,0 @@
..
**************************************************
* *
* Automatically generated file, do not edit! *
* *
**************************************************
.. _amdgpu_synid8_imm4:
imm4
===========================
A positive :ref:`integer_number<amdgpu_synid_integer_number>`. The value is truncated to 4 bits.
This operand is a mask which controls indexing mode for operands of subsequent instructions. Value 1 enables indexing and value 0 disables it.
============ ========================================
Bit Meaning
============ ========================================
0 Enables or disables *src0* indexing.
1 Enables or disables *src1* indexing.
2 Enables or disables *src2* indexing.
3 Enables or disables *dst* indexing.
============ ========================================

View File

@ -12,19 +12,26 @@ label
A branch target which is a 16-bit signed integer treated as a PC-relative dword offset.
This operand may be specified as:
This operand may be specified as one of the following:
* An :ref:`integer_number<amdgpu_synid_integer_number>`. The number is truncated to 16 bits.
* An :ref:`absolute_expression<amdgpu_synid_absolute_expression>` which must start with an :ref:`integer_number<amdgpu_synid_integer_number>`. The value of the expression is truncated to 16 bits.
* A :ref:`symbol<amdgpu_synid_symbol>` (for example, a label). The value is handled as a 16-bit PC-relative dword offset to be resolved by a linker.
* An :ref:`integer_number<amdgpu_synid_integer_number>` or an :ref:`absolute_expression<amdgpu_synid_absolute_expression>`. The value must be in the range -32768..65535.
* A :ref:`symbol<amdgpu_synid_symbol>` (for example, a label) representing a relocatable address in the same compilation unit where it is referred from. The value is handled as a 16-bit PC-relative dword offset to be resolved by a linker.
Examples:
.. parsed-literal::
offset = 30
s_branch loop_end
s_branch 2 + offset
s_branch 32
loop_end:
label_1:
label_2 = . + 4
s_branch 32
s_branch offset + 2
s_branch label_1
s_branch label_2
s_branch label_3
s_branch label_4
label_3 = label_2 + 4
label_4:

View File

@ -12,24 +12,29 @@ msg
A 16-bit message code. The bits of this operand have the following meaning:
============ ======================================================
Bits Description
============ ======================================================
3:0 Message *type*.
6:4 Optional *operation*.
9:7 Optional *parameters*.
15:10 Unused.
============ ======================================================
============ =============================== ===============
Bits Description Value Range
============ =============================== ===============
3:0 Message *type*. 0..15
6:4 Optional *operation*. 0..7
7:7 Unused. \-
9:8 Optional *stream*. 0..3
15:10 Unused. \-
============ =============================== ===============
This operand may be specified as a positive 16-bit :ref:`integer_number<amdgpu_synid_integer_number>` or using the syntax described below:
This operand may be specified as one of the following:
======================================== ========================================================================
Syntax Description
======================================== ========================================================================
sendmsg(<*type*>) A message identified by its *type*.
sendmsg(<*type*>, <*op*>) A message identified by its *type* and *operation*.
sendmsg(<*type*>, <*op*>, <*stream*>) A message identified by its *type* and *operation* with a stream *id*.
======================================== ========================================================================
* An :ref:`integer_number<amdgpu_synid_integer_number>` or an :ref:`absolute_expression<amdgpu_synid_absolute_expression>`. The value must be in the range 0..0xFFFF.
* A *sendmsg* value described below.
==================================== ====================================================
Sendmsg Value Syntax Description
==================================== ====================================================
sendmsg(<*type*>) A message identified by its *type*.
sendmsg(<*type*>,<*op*>) A message identified by its *type* and *operation*.
sendmsg(<*type*>,<*op*>,<*stream*>) A message identified by its *type* and *operation*
with a stream *id*.
==================================== ====================================================
*Type* may be specified using message *name* or message *id*.
@ -37,7 +42,8 @@ This operand may be specified as a positive 16-bit :ref:`integer_number<amdgpu_s
Stream *id* is an integer in the range 0..3.
Message *id*, operation *id* and stream *id* must be specified as positive :ref:`integer numbers<amdgpu_synid_integer_number>`.
Numeric values may be specified as positive :ref:`integer numbers<amdgpu_synid_integer_number>`
or :ref:`absolute expressions<amdgpu_synid_absolute_expression>`.
Each message type supports specific operations:
@ -58,15 +64,31 @@ Each message type supports specific operations:
\ SYSMSG_OP_TTRACE_PC 4 \-
================= ========== ============================== ============ ==========
*Sendmsg* arguments are validated depending on how *type* value is specified:
* If message *type* is specified by name, arguments values must satisfy limitations detailed in the table above.
* If message *type* is specified as a number, each argument must not exceed corresponding value range (see the first table).
Examples:
.. parsed-literal::
// numeric message code
msg = 0x10
s_sendmsg 0x12
s_sendmsg msg + 2
// sendmsg with strict arguments validation
s_sendmsg sendmsg(MSG_INTERRUPT)
s_sendmsg sendmsg(2, GS_OP_CUT)
s_sendmsg sendmsg(MSG_GS, GS_OP_EMIT)
s_sendmsg sendmsg(MSG_GS, 2)
s_sendmsg sendmsg(MSG_GS_DONE, GS_OP_EMIT_CUT, 1)
s_sendmsg sendmsg(MSG_SYSMSG, SYSMSG_OP_TTRACE_PC)
// sendmsg with validation of value range only
msg = 2
op = 3
stream = 1
s_sendmsg sendmsg(msg, op, stream)
s_sendmsg sendmsg(2, GS_OP_CUT)

View File

@ -12,7 +12,8 @@ imm3
A bit mask which indicates request permissions.
This operand must be specified as an :ref:`integer_number<amdgpu_synid_integer_number>`. The value is truncated to 7 bits, but only 3 low bits are significant.
This operand must be specified as an :ref:`integer_number<amdgpu_synid_integer_number>` or an :ref:`absolute_expression<amdgpu_synid_absolute_expression>`.
The value is truncated to 7 bits, but only 3 low bits are significant.
============ ==============================
Bit Number Description

View File

@ -10,5 +10,5 @@
imm16
===========================
An :ref:`integer_number<amdgpu_synid_integer_number>`. The value is truncated to 16 bits and then sign-extended to 32 bits.
An :ref:`integer_number<amdgpu_synid_integer_number>` or an :ref:`absolute_expression<amdgpu_synid_absolute_expression>`. The value must be in the range -32768..65535.

View File

@ -10,5 +10,5 @@
imm16
===========================
An :ref:`integer_number<amdgpu_synid_integer_number>`. The value is truncated to 16 bits and then zero-extended to 32 bits.
An :ref:`integer_number<amdgpu_synid_integer_number>` or an :ref:`absolute_expression<amdgpu_synid_absolute_expression>`. The value must be in the range 0..65535.

View File

@ -14,29 +14,31 @@ Counts of outstanding instructions to wait for.
The bits of this operand have the following meaning:
============ ======================================================
Bits Description
============ ======================================================
3:0 VM_CNT: vector memory operations count.
6:4 EXP_CNT: export count.
11:8 LGKM_CNT: LDS, GDS, Constant and Message count.
============ ======================================================
===== ================================================ ============
Bits Description Value Range
===== ================================================ ============
3:0 VM_CNT: vector memory operations count. 0..15
6:4 EXP_CNT: export count. 0..7
11:8 LGKM_CNT: LDS, GDS, Constant and Message count. 0..15
===== ================================================ ============
This operand may be specified as a positive 16-bit :ref:`integer_number<amdgpu_synid_integer_number>`
or as a combination of the following symbolic helpers:
This operand may be specified as one of the following:
* An :ref:`integer_number<amdgpu_synid_integer_number>` or an :ref:`absolute_expression<amdgpu_synid_absolute_expression>`. The value must be in the range 0..0xFFFF.
* A combination of *vmcnt*, *expcnt*, *lgkmcnt* and other values described below.
====================== ======================================================================
Syntax Description
====================== ======================================================================
vmcnt(<*N*>) VM_CNT value. *N* must not exceed the largest VM_CNT value.
expcnt(<*N*>) EXP_CNT value. *N* must not exceed the largest EXP_CNT value.
lgkmcnt(<*N*>) LGKM_CNT value. *N* must not exceed the largest LGKM_CNT value.
vmcnt_sat(<*N*>) VM_CNT value computed as min(*N*, the largest VM_CNT value).
expcnt_sat(<*N*>) EXP_CNT value computed as min(*N*, the largest EXP_CNT value).
lgkmcnt_sat(<*N*>) LGKM_CNT value computed as min(*N*, the largest LGKM_CNT value).
vmcnt(<*N*>) A VM_CNT value. *N* must not exceed the largest VM_CNT value.
expcnt(<*N*>) An EXP_CNT value. *N* must not exceed the largest EXP_CNT value.
lgkmcnt(<*N*>) An LGKM_CNT value. *N* must not exceed the largest LGKM_CNT value.
vmcnt_sat(<*N*>) A VM_CNT value computed as min(*N*, the largest VM_CNT value).
expcnt_sat(<*N*>) An EXP_CNT value computed as min(*N*, the largest EXP_CNT value).
lgkmcnt_sat(<*N*>) An LGKM_CNT value computed as min(*N*, the largest LGKM_CNT value).
====================== ======================================================================
These helpers may be specified in any order. Ampersands and commas may be used as optional separators.
These values may be specified in any order. Spaces, ampersands and commas may be used as optional separators.
*N* is either an
:ref:`integer number<amdgpu_synid_integer_number>` or an
@ -46,10 +48,18 @@ Examples:
.. parsed-literal::
s_waitcnt 0
vm_cnt = 1
exp_cnt = 2
lgkm_cnt = 3
cnt = vm_cnt | (exp_cnt << 4) | (lgkm_cnt << 8)
s_waitcnt cnt
s_waitcnt 1 | (2 << 4) | (3 << 8) // the same as above
s_waitcnt vmcnt(1) expcnt(2) lgkmcnt(3) // the same as above
s_waitcnt vmcnt(vm_cnt) expcnt(exp_cnt) lgkmcnt(lgkm_cnt) // the same as above
s_waitcnt vmcnt(1)
s_waitcnt expcnt(2) lgkmcnt(3)
s_waitcnt vmcnt(1) expcnt(2) lgkmcnt(3)
s_waitcnt vmcnt(1), expcnt(2), lgkmcnt(3)
s_waitcnt vmcnt(1) & lgkmcnt_sat(100) & expcnt(2)

View File

@ -10,5 +10,5 @@
imm16
===========================
An :ref:`integer_number<amdgpu_synid_integer_number>`. The value is truncated to 16 bits.
A 16-bit :ref:`integer_number<amdgpu_synid_integer_number>` or an :ref:`absolute_expression<amdgpu_synid_absolute_expression>`. The value must be in the range -32768..65535.

View File

@ -10,5 +10,5 @@
imm32
===========================
An :ref:`integer_number<amdgpu_synid_integer_number>`. The value is truncated to 32 bits.
An :ref:`integer_number<amdgpu_synid_integer_number>` or an :ref:`absolute_expression<amdgpu_synid_absolute_expression>`. The value is truncated to 32 bits.

View File

@ -21,7 +21,7 @@ Optionally may serve as an output data:
* :ref:`dmask<amdgpu_synid_dmask>` may specify 2 data elements for 32-bit-per-pixel surfaces or 4 data elements for 64-bit-per-pixel surfaces. Each data element occupies 1 dword.
* :ref:`tfe<amdgpu_synid_tfe>` adds 1 dword if specified.
Note. The surface data format is indicated in the image resource constant but not in the instruction.
Note: the surface data format is indicated in the image resource constant but not in the instruction.
*Operands:* :ref:`v<amdgpu_synid_v>`

View File

@ -21,6 +21,6 @@ Optionally may serve as an output data:
* :ref:`dmask<amdgpu_synid_dmask>` may specify 1 data element for 32-bit-per-pixel surfaces or 2 data elements for 64-bit-per-pixel surfaces. Each data element occupies 1 dword.
* :ref:`tfe<amdgpu_synid_tfe>` adds 1 dword if specified.
Note. The surface data format is indicated in the image resource constant but not in the instruction.
Note: the surface data format is indicated in the image resource constant but not in the instruction.
*Operands:* :ref:`v<amdgpu_synid_v>`

View File

@ -10,5 +10,6 @@
imm32
===========================
An :ref:`integer_number<amdgpu_synid_integer_number>` or a :ref:`floating-point_number<amdgpu_synid_floating-point_number>`. The number is converted to *f16* as described :ref:`here<amdgpu_synid_lit_conv>`.
A :ref:`floating-point_number<amdgpu_synid_floating-point_number>`, an :ref:`integer_number<amdgpu_synid_integer_number>`, or an :ref:`absolute_expression<amdgpu_synid_absolute_expression>`.
The value is converted to *f16* as described :ref:`here<amdgpu_synid_fp_conv>`.

View File

@ -10,5 +10,6 @@
imm32
===========================
An :ref:`integer_number<amdgpu_synid_integer_number>` or a :ref:`floating-point_number<amdgpu_synid_floating-point_number>`. The value is converted to *f32* as described :ref:`here<amdgpu_synid_lit_conv>`.
A :ref:`floating-point_number<amdgpu_synid_floating-point_number>`, an :ref:`integer_number<amdgpu_synid_integer_number>`, or an :ref:`absolute_expression<amdgpu_synid_absolute_expression>`.
The value is converted to *f32* as described :ref:`here<amdgpu_synid_fp_conv>`.

View File

@ -14,18 +14,21 @@ Bits of a hardware register being accessed.
The bits of this operand have the following meaning:
============ ===================================
Bits Description
============ ===================================
5:0 Register *id*.
10:6 First bit *offset* (0..31).
15:11 *Size* in bits (1..32).
============ ===================================
======= ===================== ============
Bits Description Value Range
======= ===================== ============
5:0 Register *id*. 0..63
10:6 First bit *offset*. 0..31
15:11 *Size* in bits. 1..32
======= ===================== ============
This operand may be specified as a positive 16-bit :ref:`integer_number<amdgpu_synid_integer_number>` or using the syntax described below.
This operand may be specified as one of the following:
* An :ref:`integer_number<amdgpu_synid_integer_number>` or an :ref:`absolute_expression<amdgpu_synid_absolute_expression>`. The value must be in the range 0..0xFFFF.
* An *hwreg* value described below.
==================================== ============================================================================
Syntax Description
Hwreg Value Syntax Description
==================================== ============================================================================
hwreg({0..63}) All bits of a register indicated by its *id*.
hwreg(<*name*>) All bits of a register indicated by its *name*.
@ -33,7 +36,8 @@ This operand may be specified as a positive 16-bit :ref:`integer_number<amdgpu_s
hwreg(<*name*>, {0..31}, {1..32}) Register bits indicated by register *name*, first bit *offset* and *size*.
==================================== ============================================================================
Register *id*, *offset* and *size* must be specified as positive :ref:`integer numbers<amdgpu_synid_integer_number>`.
Numeric values may be specified as positive :ref:`integer numbers<amdgpu_synid_integer_number>`
or :ref:`absolute expressions<amdgpu_synid_absolute_expression>`.
Defined register *names* include:
@ -54,7 +58,16 @@ Examples:
.. parsed-literal::
s_getreg_b32 s2, 0x6
reg = 1
offset = 2
size = 4
hwreg_enc = reg | (offset << 6) | ((size - 1) << 11)
s_getreg_b32 s2, 0x1881
s_getreg_b32 s2, hwreg_enc // the same as above
s_getreg_b32 s2, hwreg(1, 2, 4) // the same as above
s_getreg_b32 s2, hwreg(reg, offset, size) // the same as above
s_getreg_b32 s2, hwreg(15)
s_getreg_b32 s2, hwreg(51, 1, 31)
s_getreg_b32 s2, hwreg(HW_REG_LDS_ALLOC, 0, 1)

View File

@ -0,0 +1,66 @@
..
**************************************************
* *
* Automatically generated file, do not edit! *
* *
**************************************************
.. _amdgpu_synid9_imask:
imask
===========================
This operand is a mask which controls indexing mode for operands of subsequent instructions.
Bits 0, 1 and 2 control indexing of *src0*, *src1* and *src2*, while bit 3 controls indexing of *dst*.
Value 1 enables indexing and value 0 disables it.
===== ========================================
Bit Meaning
===== ========================================
0 Enables or disables *src0* indexing.
1 Enables or disables *src1* indexing.
2 Enables or disables *src2* indexing.
3 Enables or disables *dst* indexing.
===== ========================================
This operand may be specified as one of the following:
* An :ref:`integer_number<amdgpu_synid_integer_number>` or an :ref:`absolute_expression<amdgpu_synid_absolute_expression>`. The value must be in the range 0..15.
* A *gpr_idx* value described below.
==================================== ===========================================
Gpr_idx Value Syntax Description
==================================== ===========================================
gpr_idx(*<operands>*) Enable indexing for specified *operands*
and disable it for the rest.
*Operands* is a comma-separated list of
values which may include:
* "SRC0" - enable *src0* indexing.
* "SRC1" - enable *src1* indexing.
* "SRC2" - enable *src2* indexing.
* "DST" - enable *dst* indexing.
Each of these values may be specified only
once.
*Operands* list may be empty; this syntax
disables indexing for all operands.
==================================== ===========================================
Examples:
.. parsed-literal::
s_set_gpr_idx_mode 0
s_set_gpr_idx_mode gpr_idx() // the same as above
s_set_gpr_idx_mode 15
s_set_gpr_idx_mode gpr_idx(DST,SRC0,SRC1,SRC2) // the same as above
s_set_gpr_idx_mode gpr_idx(SRC0,SRC1,SRC2,DST) // the same as above
s_set_gpr_idx_mode gpr_idx(DST,SRC1)

View File

@ -1,25 +0,0 @@
..
**************************************************
* *
* Automatically generated file, do not edit! *
* *
**************************************************
.. _amdgpu_synid9_imm4:
imm4
===========================
A positive :ref:`integer_number<amdgpu_synid_integer_number>`. The value is truncated to 4 bits.
This operand is a mask which controls indexing mode for operands of subsequent instructions. Value 1 enables indexing and value 0 disables it.
============ ========================================
Bit Meaning
============ ========================================
0 Enables or disables *src0* indexing.
1 Enables or disables *src1* indexing.
2 Enables or disables *src2* indexing.
3 Enables or disables *dst* indexing.
============ ========================================

View File

@ -12,19 +12,26 @@ label
A branch target which is a 16-bit signed integer treated as a PC-relative dword offset.
This operand may be specified as:
This operand may be specified as one of the following:
* An :ref:`integer_number<amdgpu_synid_integer_number>`. The number is truncated to 16 bits.
* An :ref:`absolute_expression<amdgpu_synid_absolute_expression>` which must start with an :ref:`integer_number<amdgpu_synid_integer_number>`. The value of the expression is truncated to 16 bits.
* A :ref:`symbol<amdgpu_synid_symbol>` (for example, a label). The value is handled as a 16-bit PC-relative dword offset to be resolved by a linker.
* An :ref:`integer_number<amdgpu_synid_integer_number>` or an :ref:`absolute_expression<amdgpu_synid_absolute_expression>`. The value must be in the range -32768..65535.
* A :ref:`symbol<amdgpu_synid_symbol>` (for example, a label) representing a relocatable address in the same compilation unit where it is referred from. The value is handled as a 16-bit PC-relative dword offset to be resolved by a linker.
Examples:
.. parsed-literal::
offset = 30
s_branch loop_end
s_branch 2 + offset
s_branch 32
loop_end:
label_1:
label_2 = . + 4
s_branch 32
s_branch offset + 2
s_branch label_1
s_branch label_2
s_branch label_3
s_branch label_4
label_3 = label_2 + 4
label_4:

View File

@ -12,24 +12,29 @@ msg
A 16-bit message code. The bits of this operand have the following meaning:
============ ======================================================
Bits Description
============ ======================================================
3:0 Message *type*.
6:4 Optional *operation*.
9:7 Optional *parameters*.
15:10 Unused.
============ ======================================================
============ =============================== ===============
Bits Description Value Range
============ =============================== ===============
3:0 Message *type*. 0..15
6:4 Optional *operation*. 0..7
7:7 Unused. \-
9:8 Optional *stream*. 0..3
15:10 Unused. \-
============ =============================== ===============
This operand may be specified as a positive 16-bit :ref:`integer_number<amdgpu_synid_integer_number>` or using the syntax described below:
This operand may be specified as one of the following:
======================================== ========================================================================
Syntax Description
======================================== ========================================================================
sendmsg(<*type*>) A message identified by its *type*.
sendmsg(<*type*>, <*op*>) A message identified by its *type* and *operation*.
sendmsg(<*type*>, <*op*>, <*stream*>) A message identified by its *type* and *operation* with a stream *id*.
======================================== ========================================================================
* An :ref:`integer_number<amdgpu_synid_integer_number>` or an :ref:`absolute_expression<amdgpu_synid_absolute_expression>`. The value must be in the range 0..0xFFFF.
* A *sendmsg* value described below.
==================================== ====================================================
Sendmsg Value Syntax Description
==================================== ====================================================
sendmsg(<*type*>) A message identified by its *type*.
sendmsg(<*type*>,<*op*>) A message identified by its *type* and *operation*.
sendmsg(<*type*>,<*op*>,<*stream*>) A message identified by its *type* and *operation*
with a stream *id*.
==================================== ====================================================
*Type* may be specified using message *name* or message *id*.
@ -37,7 +42,8 @@ This operand may be specified as a positive 16-bit :ref:`integer_number<amdgpu_s
Stream *id* is an integer in the range 0..3.
Message *id*, operation *id* and stream *id* must be specified as positive :ref:`integer numbers<amdgpu_synid_integer_number>`.
Numeric values may be specified as positive :ref:`integer numbers<amdgpu_synid_integer_number>`
or :ref:`absolute expressions<amdgpu_synid_absolute_expression>`.
Each message type supports specific operations:
@ -60,16 +66,32 @@ Each message type supports specific operations:
\ SYSMSG_OP_TTRACE_PC 4 \-
================= ========== ============================== ============ ==========
*Sendmsg* arguments are validated depending on how *type* value is specified:
* If message *type* is specified by name, arguments values must satisfy limitations detailed in the table above.
* If message *type* is specified as a number, each argument must not exceed corresponding value range (see the first table).
Examples:
.. parsed-literal::
// numeric message code
msg = 0x10
s_sendmsg 0x12
s_sendmsg msg + 2
// sendmsg with strict arguments validation
s_sendmsg sendmsg(MSG_INTERRUPT)
s_sendmsg sendmsg(MSG_GET_DOORBELL)
s_sendmsg sendmsg(2, GS_OP_CUT)
s_sendmsg sendmsg(MSG_GS, GS_OP_EMIT)
s_sendmsg sendmsg(MSG_GS, 2)
s_sendmsg sendmsg(MSG_GS_DONE, GS_OP_EMIT_CUT, 1)
s_sendmsg sendmsg(MSG_SYSMSG, SYSMSG_OP_TTRACE_PC)
s_sendmsg sendmsg(MSG_GET_DOORBELL)
// sendmsg with validation of value range only
msg = 2
op = 3
stream = 1
s_sendmsg sendmsg(msg, op, stream)
s_sendmsg sendmsg(2, GS_OP_CUT)

View File

@ -12,7 +12,8 @@ imm3
A bit mask which indicates request permissions.
This operand must be specified as an :ref:`integer_number<amdgpu_synid_integer_number>`. The value is truncated to 7 bits, but only 3 low bits are significant.
This operand must be specified as an :ref:`integer_number<amdgpu_synid_integer_number>` or an :ref:`absolute_expression<amdgpu_synid_absolute_expression>`.
The value is truncated to 7 bits, but only 3 low bits are significant.
============ ==============================
Bit Number Description

View File

@ -10,5 +10,5 @@
imm16
===========================
An :ref:`integer_number<amdgpu_synid_integer_number>`. The value is truncated to 16 bits and then sign-extended to 32 bits.
An :ref:`integer_number<amdgpu_synid_integer_number>` or an :ref:`absolute_expression<amdgpu_synid_absolute_expression>`. The value must be in the range -32768..65535.

View File

@ -10,5 +10,5 @@
imm16
===========================
An :ref:`integer_number<amdgpu_synid_integer_number>`. The value is truncated to 16 bits and then zero-extended to 32 bits.
An :ref:`integer_number<amdgpu_synid_integer_number>` or an :ref:`absolute_expression<amdgpu_synid_absolute_expression>`. The value must be in the range 0..65535.

View File

@ -14,30 +14,31 @@ Counts of outstanding instructions to wait for.
The bits of this operand have the following meaning:
============ ======================================================
Bits Description
============ ======================================================
3:0 VM_CNT: vector memory operations count, lower bits.
6:4 EXP_CNT: export count.
11:8 LGKM_CNT: LDS, GDS, Constant and Message count.
15:14 VM_CNT: vector memory operations count, upper bits.
============ ======================================================
========== ========= ================================================ ============
High Bits Low Bits Description Value Range
========== ========= ================================================ ============
15:14 3:0 VM_CNT: vector memory operations count. 0..63
\- 6:4 EXP_CNT: export count. 0..7
\- 11:8 LGKM_CNT: LDS, GDS, Constant and Message count. 0..15
========== ========= ================================================ ============
This operand may be specified as a positive 16-bit :ref:`integer_number<amdgpu_synid_integer_number>`
or as a combination of the following symbolic helpers:
This operand may be specified as one of the following:
* An :ref:`integer_number<amdgpu_synid_integer_number>` or an :ref:`absolute_expression<amdgpu_synid_absolute_expression>`. The value must be in the range 0..0xFFFF.
* A combination of *vmcnt*, *expcnt*, *lgkmcnt* and other values described below.
====================== ======================================================================
Syntax Description
====================== ======================================================================
vmcnt(<*N*>) VM_CNT value. *N* must not exceed the largest VM_CNT value.
expcnt(<*N*>) EXP_CNT value. *N* must not exceed the largest EXP_CNT value.
lgkmcnt(<*N*>) LGKM_CNT value. *N* must not exceed the largest LGKM_CNT value.
vmcnt_sat(<*N*>) VM_CNT value computed as min(*N*, the largest VM_CNT value).
expcnt_sat(<*N*>) EXP_CNT value computed as min(*N*, the largest EXP_CNT value).
lgkmcnt_sat(<*N*>) LGKM_CNT value computed as min(*N*, the largest LGKM_CNT value).
vmcnt(<*N*>) A VM_CNT value. *N* must not exceed the largest VM_CNT value.
expcnt(<*N*>) An EXP_CNT value. *N* must not exceed the largest EXP_CNT value.
lgkmcnt(<*N*>) An LGKM_CNT value. *N* must not exceed the largest LGKM_CNT value.
vmcnt_sat(<*N*>) A VM_CNT value computed as min(*N*, the largest VM_CNT value).
expcnt_sat(<*N*>) An EXP_CNT value computed as min(*N*, the largest EXP_CNT value).
lgkmcnt_sat(<*N*>) An LGKM_CNT value computed as min(*N*, the largest LGKM_CNT value).
====================== ======================================================================
These helpers may be specified in any order. Ampersands and commas may be used as optional separators.
These values may be specified in any order. Spaces, ampersands and commas may be used as optional separators.
*N* is either an
:ref:`integer number<amdgpu_synid_integer_number>` or an
@ -47,10 +48,18 @@ Examples:
.. parsed-literal::
s_waitcnt 0
vm_cnt = 1
exp_cnt = 2
lgkm_cnt = 3
cnt = vm_cnt | (exp_cnt << 4) | (lgkm_cnt << 8)
s_waitcnt cnt
s_waitcnt 1 | (2 << 4) | (3 << 8) // the same as above
s_waitcnt vmcnt(1) expcnt(2) lgkmcnt(3) // the same as above
s_waitcnt vmcnt(vm_cnt) expcnt(exp_cnt) lgkmcnt(lgkm_cnt) // the same as above
s_waitcnt vmcnt(1)
s_waitcnt expcnt(2) lgkmcnt(3)
s_waitcnt vmcnt(1) expcnt(2) lgkmcnt(3)
s_waitcnt vmcnt(1), expcnt(2), lgkmcnt(3)
s_waitcnt vmcnt(1) & lgkmcnt_sat(100) & expcnt(2)

View File

@ -34,19 +34,21 @@ Specifies an immediate unsigned 8-bit offset, in bytes. The default value is 0.
Used with DS instructions which have 2 addresses.
=================== =====================================================
=================== ====================================================================
Syntax Description
=================== =====================================================
=================== ====================================================================
offset:{0..0xFF} Specifies an unsigned 8-bit offset as a positive
:ref:`integer number <amdgpu_synid_integer_number>`.
=================== =====================================================
:ref:`integer number <amdgpu_synid_integer_number>`
or an :ref:`absolute expression<amdgpu_synid_absolute_expression>`.
=================== ====================================================================
Examples:
.. parsed-literal::
offset:255
offset:0xff
offset:2-x
offset:-x-y
.. _amdgpu_synid_ds_offset16:
@ -57,12 +59,13 @@ Specifies an immediate unsigned 16-bit offset, in bytes. The default value is 0.
Used with DS instructions which have 1 address.
==================== ======================================================
==================== ====================================================================
Syntax Description
==================== ======================================================
==================== ====================================================================
offset:{0..0xFFFF} Specifies an unsigned 16-bit offset as a positive
:ref:`integer number <amdgpu_synid_integer_number>`.
==================== ======================================================
:ref:`integer number <amdgpu_synid_integer_number>`
or an :ref:`absolute expression<amdgpu_synid_absolute_expression>`.
==================== ====================================================================
Examples:
@ -70,6 +73,7 @@ Examples:
offset:65535
offset:0xffff
offset:-x-y
.. _amdgpu_synid_sw_offset16:
@ -95,7 +99,7 @@ See AMD documentation for more information.
*mask* is a 5 character sequence which
specifies how to transform the bits of the
lane *id*.
lane *id*.
The following characters are allowed:
@ -116,7 +120,7 @@ See AMD documentation for more information.
size and must be equal to 2, 4, 8, 16 or 32.
The second numeric parameter is an index of the
lane being broadcasted.
lane being broadcasted.
The index must not exceed group size.
offset:swizzle(SWAP,{1..16}) Specifies a swap mode.
@ -128,7 +132,7 @@ See AMD documentation for more information.
Reverses the lanes for groups of 2, 4, 8, 16 or 32 lanes.
======================================================= ===========================================================
Numeric parameters may be specified as either :ref:`integer numbers<amdgpu_synid_integer_number>` or
Note: numeric values may be specified as either :ref:`integer numbers<amdgpu_synid_integer_number>` or
:ref:`absolute expressions<amdgpu_synid_absolute_expression>`.
Examples:
@ -137,7 +141,7 @@ Examples:
offset:255
offset:0xffff
offset:swizzle(QUAD_PERM, 0, 1, 2 ,3)
offset:swizzle(QUAD_PERM, 0, 1, 2, 3)
offset:swizzle(BITMASK_PERM, "01pi0")
offset:swizzle(BROADCAST, 2, 0)
offset:swizzle(SWAP, 8)
@ -212,19 +216,20 @@ Specifies an immediate unsigned 12-bit offset, in bytes. The default value is 0.
Cannot be used with *global/scratch* opcodes. GFX9 only.
================= ======================================================
================= ====================================================================
Syntax Description
================= ======================================================
================= ====================================================================
offset:{0..4095} Specifies a 12-bit unsigned offset as a positive
:ref:`integer number <amdgpu_synid_integer_number>`.
================= ======================================================
:ref:`integer number <amdgpu_synid_integer_number>`
or an :ref:`absolute expression<amdgpu_synid_absolute_expression>`.
================= ====================================================================
Examples:
.. parsed-literal::
offset:4095
offset:0xff
offset:x-0xff
.. _amdgpu_synid_flat_offset13s:
@ -235,12 +240,13 @@ Specifies an immediate signed 13-bit offset, in bytes. The default value is 0.
Can be used with *global/scratch* opcodes only. GFX9 only.
============================ =======================================================
Syntax Description
============================ =======================================================
offset:{-4096..4095} Specifies a 13-bit signed offset as an
:ref:`integer number <amdgpu_synid_integer_number>`.
============================ =======================================================
===================== ====================================================================
Syntax Description
===================== ====================================================================
offset:{-4096..4095} Specifies a 13-bit signed offset as an
:ref:`integer number <amdgpu_synid_integer_number>`
or an :ref:`absolute expression<amdgpu_synid_absolute_expression>`.
===================== ====================================================================
Examples:
@ -248,6 +254,7 @@ Examples:
offset:-4000
offset:0x10
offset:-x
.. _amdgpu_synid_flat_offset12s:
@ -260,12 +267,13 @@ Can be used with *global/scratch* opcodes only.
GFX10 only.
============================ =======================================================
Syntax Description
============================ =======================================================
offset:{-2048..2047} Specifies a 12-bit signed offset as an
:ref:`integer number <amdgpu_synid_integer_number>`.
============================ =======================================================
===================== ====================================================================
Syntax Description
===================== ====================================================================
offset:{-2048..2047} Specifies a 12-bit signed offset as an
:ref:`integer number <amdgpu_synid_integer_number>`
or an :ref:`absolute expression<amdgpu_synid_absolute_expression>`.
===================== ====================================================================
Examples:
@ -273,6 +281,7 @@ Examples:
offset:-2000
offset:0x10
offset:-x+y
.. _amdgpu_synid_flat_offset11:
@ -285,19 +294,20 @@ Cannot be used with *global/scratch* opcodes.
GFX10 only.
================= ======================================================
================= ====================================================================
Syntax Description
================= ======================================================
================= ====================================================================
offset:{0..2047} Specifies an 11-bit unsigned offset as a positive
:ref:`integer number <amdgpu_synid_integer_number>`.
================= ======================================================
:ref:`integer number <amdgpu_synid_integer_number>`
or an :ref:`absolute expression<amdgpu_synid_absolute_expression>`.
================= ====================================================================
Examples:
.. parsed-literal::
offset:2047
offset:0xff
offset:x+0xff
dlc
~~~
@ -340,19 +350,18 @@ dmask
Specifies which channels (image components) are used by the operation. By default, no channels
are used.
=============== =====================================================
=============== ====================================================================
Syntax Description
=============== =====================================================
=============== ====================================================================
dmask:{0..15} Specifies image channels as a positive
:ref:`integer number <amdgpu_synid_integer_number>`.
:ref:`integer number <amdgpu_synid_integer_number>`
or an :ref:`absolute expression<amdgpu_synid_absolute_expression>`.
Each bit corresponds to one of 4 image
components (RGBA).
Each bit corresponds to one of 4 image components (RGBA).
If the specified bit value
is 0, the component is not used, value 1 means
that the component is used.
=============== =====================================================
If the specified bit value is 0, the component is not used,
value 1 means that the component is used.
=============== ====================================================================
This modifier has some limitations depending on instruction kind:
@ -373,7 +382,7 @@ Examples:
dmask:0xf
dmask:0b1111
dmask:3
dmask:x|y|z
.. _amdgpu_synid_unorm:
@ -468,7 +477,7 @@ Specifies data size: 16 or 32 bits (32 bits by default). Not supported by GFX7.
Each 16-bit data element occupies 1 VGPR.
GFX8.1, GFX9 and GFX10 support data packing.
Each pair of 16-bit data elements
Each pair of 16-bit data elements
occupies 1 VGPR.
======================================== ================================================
@ -684,18 +693,19 @@ offset12
Specifies an immediate unsigned 12-bit offset, in bytes. The default value is 0.
=============================== ======================================================
Syntax Description
=============================== ======================================================
offset:{0..0xFFF} Specifies a 12-bit unsigned offset as a positive
:ref:`integer number <amdgpu_synid_integer_number>`.
=============================== ======================================================
================== ====================================================================
Syntax Description
================== ====================================================================
offset:{0..0xFFF} Specifies a 12-bit unsigned offset as a positive
:ref:`integer number <amdgpu_synid_integer_number>`
or an :ref:`absolute expression<amdgpu_synid_absolute_expression>`.
================== ====================================================================
Examples:
.. parsed-literal::
offset:0
offset:x+y
offset:0x10
glc
@ -782,14 +792,18 @@ GFX10 only.
dpp8_sel
~~~~~~~~
Selects which lane to pull data from, within a group of 8 lanes. This is a mandatory modifier.
Selects which lanes to pull data from, within a group of 8 lanes. This is a mandatory modifier.
There is no default value.
GFX10 only.
The *dpp8_sel* modifier must specify exactly 8 values, each ranging from 0 to 7.
The *dpp8_sel* modifier must specify exactly 8 values.
First value selects which lane to read from to supply data into lane 0.
Second value controls value for lane 1 and so on.
Second value controls lane 1 and so on.
Each value may be specified as either
an :ref:`integer number<amdgpu_synid_integer_number>` or
an :ref:`absolute expression<amdgpu_synid_absolute_expression>`.
=============================================================== ===========================
Syntax Description
@ -811,7 +825,7 @@ fi
Controls interaction with inactive lanes for *dpp8* instructions. The default value is zero.
Note. *Inactive* lanes are those whose :ref:`exec<amdgpu_synid_exec>` mask bit is zero.
Note: *inactive* lanes are those whose :ref:`exec<amdgpu_synid_exec>` mask bit is zero.
GFX10 only.
@ -822,6 +836,9 @@ GFX10 only.
fi:1 Fetch pre-exist values from inactive lanes.
==================================== =====================================================
Note: numeric values may be specified as either :ref:`integer numbers<amdgpu_synid_integer_number>` or
:ref:`absolute expressions<amdgpu_synid_absolute_expression>`.
DPP/DPP16 Modifiers
-------------------
@ -837,7 +854,7 @@ There is no default value.
GFX8 and GFX9 only. Use :ref:`dpp16_ctrl<amdgpu_synid_dpp16_ctrl>` for GFX10.
Note. The lanes of a wavefront are organized in four *rows* and four *banks*.
Note: the lanes of a wavefront are organized in four *rows* and four *banks*.
======================================== ================================================
Syntax Description
@ -856,7 +873,7 @@ Note. The lanes of a wavefront are organized in four *rows* and four *banks*.
row_ror:{1..15} Row rotate right by 1-15 threads.
======================================== ================================================
Note: Numeric parameters may be specified as either
Note: numeric values may be specified as either
:ref:`integer numbers<amdgpu_synid_integer_number>` or
:ref:`absolute expressions<amdgpu_synid_absolute_expression>`.
@ -877,7 +894,7 @@ There is no default value.
GFX10 only. Use :ref:`dpp_ctrl<amdgpu_synid_dpp_ctrl>` for GFX8 and GFX9.
Note. The lanes of a wavefront are organized in four *rows* and four *banks*.
Note: the lanes of a wavefront are organized in four *rows* and four *banks*.
(There are only two rows in *wave32* mode.)
======================================== ====================================================
@ -894,7 +911,7 @@ Note. The lanes of a wavefront are organized in four *rows* and four *banks*.
row_ror:{1..15} Row rotate right by 1-15 threads.
======================================== ====================================================
Note: Numeric parameters may be specified as either
Note: numeric values may be specified as either
:ref:`integer numbers<amdgpu_synid_integer_number>` or
:ref:`absolute expressions<amdgpu_synid_absolute_expression>`.
@ -912,21 +929,21 @@ row_mask
Controls which rows are enabled for data sharing. By default, all rows are enabled.
Note. The lanes of a wavefront are organized in four *rows* and four *banks*.
Note: the lanes of a wavefront are organized in four *rows* and four *banks*.
(There are only two rows in *wave32* mode.)
======================================== =====================================================
Syntax Description
======================================== =====================================================
row_mask:{0..15} Specifies a *row mask* as a positive
:ref:`integer number <amdgpu_synid_integer_number>`.
================= ====================================================================
Syntax Description
================= ====================================================================
row_mask:{0..15} Specifies a *row mask* as a positive
:ref:`integer number <amdgpu_synid_integer_number>`
or an :ref:`absolute expression<amdgpu_synid_absolute_expression>`.
Each of 4 bits in the mask controls one
row (0 - disabled, 1 - enabled).
Each of 4 bits in the mask controls one row
(0 - disabled, 1 - enabled).
In *wave32* mode the values should be limited to
{0..7}.
======================================== =====================================================
In *wave32* mode the values should be limited to 0..7.
================= ====================================================================
Examples:
@ -934,7 +951,7 @@ Examples:
row_mask:0xf
row_mask:0b1010
row_mask:0b1111
row_mask:x|y
.. _amdgpu_synid_bank_mask:
@ -943,18 +960,19 @@ bank_mask
Controls which banks are enabled for data sharing. By default, all banks are enabled.
Note. The lanes of a wavefront are organized in four *rows* and four *banks*.
Note: the lanes of a wavefront are organized in four *rows* and four *banks*.
(There are only two rows in *wave32* mode.)
======================================== =======================================================
Syntax Description
======================================== =======================================================
bank_mask:{0..15} Specifies a *bank mask* as a positive
:ref:`integer number <amdgpu_synid_integer_number>`.
================== ====================================================================
Syntax Description
================== ====================================================================
bank_mask:{0..15} Specifies a *bank mask* as a positive
:ref:`integer number <amdgpu_synid_integer_number>`
or an :ref:`absolute expression<amdgpu_synid_absolute_expression>`.
Each of 4 bits in the mask controls one
bank (0 - disabled, 1 - enabled).
======================================== =======================================================
Each of 4 bits in the mask controls one bank
(0 - disabled, 1 - enabled).
================== ====================================================================
Examples:
@ -962,7 +980,7 @@ Examples:
bank_mask:0x3
bank_mask:0b0011
bank_mask:0b1111
bank_mask:x&y
.. _amdgpu_synid_bound_ctrl:
@ -988,7 +1006,7 @@ fi
Controls interaction with *inactive* lanes for *dpp16* instructions. The default value is zero.
Note. *Inactive* lanes are those whose :ref:`exec<amdgpu_synid_exec>` mask bit is zero.
Note: *inactive* lanes are those whose :ref:`exec<amdgpu_synid_exec>` mask bit is zero.
GFX10 only.
@ -1001,6 +1019,9 @@ GFX10 only.
fi:1 Fetch pre-exist values from inactive lanes.
======================================== ==================================================
Note: numeric values may be specified as either :ref:`integer numbers<amdgpu_synid_integer_number>` or
:ref:`absolute expressions<amdgpu_synid_absolute_expression>`.
SDWA Modifiers
--------------
@ -1037,7 +1058,6 @@ Selects which bits in the destination are affected. By default, all bits are aff
dst_sel:WORD_1 Use bits 31:16.
======================================== ================================================
.. _amdgpu_synid_dst_unused:
dst_unused
@ -1151,7 +1171,7 @@ operands (both source and destination). First value controls src0, second value
and so on, except that the last value controls destination.
The value 0 selects the low bits, while 1 selects the high bits.
Note. op_sel modifier affects 16-bit operands only. For 32-bit operands the value specified
Note: op_sel modifier affects 16-bit operands only. For 32-bit operands the value specified
by op_sel must be 0.
GFX9 and GFX10 only.
@ -1164,6 +1184,10 @@ GFX9 and GFX10 only.
op_sel:[{0..1},{0..1},{0..1},{0..1}] Select operand bits for instructions with 3 source operands.
======================================== ============================================================
Note: numeric values may be specified as either
:ref:`integer numbers<amdgpu_synid_integer_number>` or
:ref:`absolute expressions<amdgpu_synid_absolute_expression>`.
Examples:
.. parsed-literal::
@ -1189,7 +1213,7 @@ Integer clamping is not supported by GFX7.
For floating point operations, clamp modifier indicates that the result must be clamped
to the range [0.0, 1.0]. By default, there is no clamping.
Note. Clamp modifier is applied after :ref:`output modifiers<amdgpu_synid_omod>` (if any).
Note: clamp modifier is applied after :ref:`output modifiers<amdgpu_synid_omod>` (if any).
======================================== ================================================
Syntax Description
@ -1205,12 +1229,12 @@ omod
Specifies if an output modifier must be applied to the result.
By default, no output modifiers are applied.
Note. Output modifiers are applied before :ref:`clamping<amdgpu_synid_clamp>` (if any).
Note: output modifiers are applied before :ref:`clamping<amdgpu_synid_clamp>` (if any).
Output modifiers are valid for f32 and f64 floating point results only.
They must not be used with f16.
Note. *v_cvt_f16_f32* is an exception. This instruction produces f16 result
Note: *v_cvt_f16_f32* is an exception. This instruction produces f16 result
but accepts output modifiers.
======================================== ================================================
@ -1221,6 +1245,16 @@ but accepts output modifiers.
div:2 Multiply the result by 0.5.
======================================== ================================================
Note: numeric values may be specified as either :ref:`integer numbers<amdgpu_synid_integer_number>` or
:ref:`absolute expressions<amdgpu_synid_absolute_expression>`.
Examples:
.. parsed-literal::
mul:2
mul:x // x must be equal to 2 or 4
.. _amdgpu_synid_vop3_operand_modifiers:
VOP3 Operand Modifiers
@ -1233,15 +1267,19 @@ Operand modifiers are not used separately. They are applied to source operands.
abs
~~~
Computes absolute value of its operand. Applied before :ref:`neg<amdgpu_synid_neg>` (if any).
Valid for floating point operands only.
Computes the absolute value of its operand. Must be applied before :ref:`neg<amdgpu_synid_neg>`
(if any). Valid for floating point operands only.
======================================== ================================================
======================================== ====================================================
Syntax Description
======================================== ================================================
abs(<operand>) Get absolute value of operand.
\|<operand>| The same as above.
======================================== ================================================
======================================== ====================================================
abs(<operand>) Get the absolute value of a floating-point operand.
\|<operand>| The same as above (an SP3 syntax).
======================================== ====================================================
Note: avoid using SP3 syntax with operands specified as expressions because the trailing '|'
may be misinterpreted. Such operands should be enclosed into additional parentheses as shown
in examples below.
Examples:
@ -1249,28 +1287,50 @@ Examples:
abs(v36)
\|v36|
abs(x|y) // ok
\|(x|y)| // additional parentheses are required
.. _amdgpu_synid_neg:
neg
~~~
Computes negative value of its operand. Applied after :ref:`abs<amdgpu_synid_abs>` (if any).
Valid for floating point operands only.
Computes the negative value of its operand. Must be applied after :ref:`abs<amdgpu_synid_abs>`
(if any). Valid for floating point operands only.
======================================== ================================================
Syntax Description
======================================== ================================================
neg(<operand>) Get negative value of operand.
-<operand> The same as above.
======================================== ================================================
================== ====================================================
Syntax Description
================== ====================================================
neg(<operand>) Get the negative value of a floating-point operand.
The operand may include an optional
:ref:`abs<amdgpu_synid_abs>` modifier.
-<operand> The same as above (an SP3 syntax).
================== ====================================================
Note: SP3 syntax is supported with limitations because of a potential ambiguity.
Currently it is allowed in the following cases:
* Before a register.
* Before an :ref:`abs<amdgpu_synid_abs>` modifier.
* Before an SP3 :ref:`abs<amdgpu_synid_abs>` modifier.
In all other cases "-" is handled as a part of an expression that follows the sign.
Examples:
.. parsed-literal::
// Operands with negate modifiers
neg(v[0])
-v4
neg(1.0)
neg(abs(v0))
-v5
-abs(v5)
-\|v5|
// Operands without negate modifiers
-1
-x+y
VOP3P Modifiers
---------------
@ -1304,6 +1364,10 @@ The value 0 selects the low bits, while 1 selects the high bits.
op_sel:[{0..1},{0..1},{0..1}] Select operand bits for instructions with 3 source operands.
================================= =============================================================
Note: numeric values may be specified as either
:ref:`integer numbers<amdgpu_synid_integer_number>` or
:ref:`absolute expressions<amdgpu_synid_absolute_expression>`.
Examples:
.. parsed-literal::
@ -1333,6 +1397,10 @@ The value 0 selects the low bits, while 1 selects the high bits.
op_sel_hi:[{0..1},{0..1},{0..1}] Select operand bits for instructions with 3 source operands.
=================================== =============================================================
Note: numeric values may be specified as either
:ref:`integer numbers<amdgpu_synid_integer_number>` or
:ref:`absolute expressions<amdgpu_synid_absolute_expression>`.
Examples:
.. parsed-literal::
@ -1367,6 +1435,10 @@ This modifier is valid for floating point operands only.
neg_lo:[{0..1},{0..1},{0..1}] Select affected operands for instructions with 3 source operands.
================================ ==================================================================
Note: numeric values may be specified as either
:ref:`integer numbers<amdgpu_synid_integer_number>` or
:ref:`absolute expressions<amdgpu_synid_absolute_expression>`.
Examples:
.. parsed-literal::
@ -1401,6 +1473,10 @@ This modifier is valid for floating point operands only.
neg_hi:[{0..1},{0..1},{0..1}] Select affected operands for instructions with 3 source operands.
=============================== ==================================================================
Note: numeric values may be specified as either
:ref:`integer numbers<amdgpu_synid_integer_number>` or
:ref:`absolute expressions<amdgpu_synid_absolute_expression>`.
Examples:
.. parsed-literal::
@ -1419,7 +1495,7 @@ VOP3P V_MAD_MIX Modifiers
-------------------------
*v_mad_mix_f32*, *v_mad_mixhi_f16* and *v_mad_mixlo_f16* instructions
use *op_sel* and *op_sel_hi* modifiers
use *op_sel* and *op_sel_hi* modifiers
in a manner different from *regular* VOP3P instructions.
See a description below.
@ -1449,6 +1525,10 @@ By default, low bits are used for all operands.
op_sel:[{0..1},{0..1},{0..1}] Select location of each 16-bit source operand.
=============================== ================================================
Note: numeric values may be specified as either
:ref:`integer numbers<amdgpu_synid_integer_number>` or
:ref:`absolute expressions<amdgpu_synid_absolute_expression>`.
Examples:
.. parsed-literal::
@ -1477,6 +1557,10 @@ The location of 16 bits in the operand may be specified by
op_sel_hi:[{0..1},{0..1},{0..1}] Select size of each source operand.
======================================== ====================================
Note: numeric values may be specified as either
:ref:`integer numbers<amdgpu_synid_integer_number>` or
:ref:`absolute expressions<amdgpu_synid_absolute_expression>`.
Examples:
.. parsed-literal::

View File

@ -38,7 +38,8 @@ Assembler currently supports sequences of 1, 2, 3, 4, 8 and 16 *vector* register
=================================================== ====================================================================
**v**\<N> A single 32-bit *vector* register.
*N* must be a decimal integer number.
*N* must be a decimal
:ref:`integer number<amdgpu_synid_integer_number>`.
**v[**\ <N>\ **]** A single 32-bit *vector* register.
*N* may be specified as an
@ -51,10 +52,11 @@ Assembler currently supports sequences of 1, 2, 3, 4, 8 and 16 *vector* register
or :ref:`absolute expressions<amdgpu_synid_absolute_expression>`.
**[v**\ <N>, \ **v**\ <N+1>, ... **v**\ <K>\ **]** A sequence of (\ *K-N+1*\ ) *vector* registers.
Register indices must be specified as decimal integer numbers.
Register indices must be specified as decimal
:ref:`integer numbers<amdgpu_synid_integer_number>`.
=================================================== ====================================================================
Note. *N* and *K* must satisfy the following conditions:
Note: *N* and *K* must satisfy the following conditions:
* *N* <= *K*.
* 0 <= *N* <= 255.
@ -77,26 +79,27 @@ Examples:
.. _amdgpu_synid_nsa:
*Image* instructions may use special *NSA* (Non-Sequential Address) syntax for *image addresses*:
GFX10 *Image* instructions may use special *NSA* (Non-Sequential Address) syntax for *image addresses*:
=================================================== ====================================================================
Syntax Description
=================================================== ====================================================================
**[v**\ <A>, \ **v**\ <B>, ... **v**\ <X>\ **]** A sequence of *vector* registers. At least one register
must be specified.
===================================== =================================================
Syntax Description
===================================== =================================================
**[Vm**, \ **Vn**, ... **Vk**\ **]** A sequence of 32-bit *vector* registers.
Each register may be specified using a syntax
defined :ref:`above<amdgpu_synid_v>`.
In contrast with standard syntax described above, registers in
this sequence are not required to have consecutive indices.
Moreover, the same register may appear in the list more than once.
=================================================== ====================================================================
Note. Reqister indices must be in the range 0..255. They must be specified as decimal integer numbers.
In contrast with standard syntax, registers
in *NSA* sequence are not required to have
consecutive indices. Moreover, the same register
may appear in the list more than once.
===================================== =================================================
Examples:
.. parsed-literal::
[v32,v1,v2]
[v32,v1,v[2]]
[v[32],v[1:1],[v2]]
[v4,v4,v4,v4]
.. _amdgpu_synid_s:
@ -126,7 +129,9 @@ Sequences of 4 and more *scalar* registers must be quad-aligned.
======================================================== ====================================================================
**s**\ <N> A single 32-bit *scalar* register.
*N* must be a decimal integer number.
*N* must be a decimal
:ref:`integer number<amdgpu_synid_integer_number>`.
**s[**\ <N>\ **]** A single 32-bit *scalar* register.
*N* may be specified as an
@ -137,12 +142,14 @@ Sequences of 4 and more *scalar* registers must be quad-aligned.
*N* and *K* may be specified as
:ref:`integer numbers<amdgpu_synid_integer_number>`
or :ref:`absolute expressions<amdgpu_synid_absolute_expression>`.
**[s**\ <N>, \ **s**\ <N+1>, ... **s**\ <K>\ **]** A sequence of (\ *K-N+1*\ ) *scalar* registers.
Register indices must be specified as decimal integer numbers.
Register indices must be specified as decimal
:ref:`integer numbers<amdgpu_synid_integer_number>`.
======================================================== ====================================================================
Note. *N* and *K* must satisfy the following conditions:
Note: *N* and *K* must satisfy the following conditions:
* *N* must be properly aligned based on sequence size.
* *N* <= *K*.
@ -210,7 +217,8 @@ Sequences of 4 and more *ttmp* registers must be quad-aligned.
============================================================= ====================================================================
**ttmp**\ <N> A single 32-bit *ttmp* register.
*N* must be a decimal integer number.
*N* must be a decimal
:ref:`integer number<amdgpu_synid_integer_number>`.
**ttmp[**\ <N>\ **]** A single 32-bit *ttmp* register.
*N* may be specified as an
@ -223,10 +231,11 @@ Sequences of 4 and more *ttmp* registers must be quad-aligned.
or :ref:`absolute expressions<amdgpu_synid_absolute_expression>`.
**[ttmp**\ <N>, \ **ttmp**\ <N+1>, ... **ttmp**\ <K>\ **]** A sequence of (\ *K-N+1*\ ) *ttmp* registers.
Register indices must be specified as decimal integer numbers.
Register indices must be specified as decimal
:ref:`integer numbers<amdgpu_synid_integer_number>`.
============================================================= ====================================================================
Note. *N* and *K* must satisfy the following conditions:
Note: *N* and *K* must satisfy the following conditions:
* *N* must be properly aligned based on sequence size.
* *N* <= *K*.
@ -266,8 +275,8 @@ Trap base address, 64-bits wide. Holds the pointer to the current trap handler p
Syntax Description Availability
================== ======================================================================= =============
tba 64-bit *trap base address* register. GFX7, GFX8
[tba] 64-bit *trap base address* register (an alternative syntax). GFX7, GFX8
[tba_lo,tba_hi] 64-bit *trap base address* register (an alternative syntax). GFX7, GFX8
[tba] 64-bit *trap base address* register (an SP3 syntax). GFX7, GFX8
[tba_lo,tba_hi] 64-bit *trap base address* register (an SP3 syntax). GFX7, GFX8
================== ======================================================================= =============
High and low 32 bits of *trap base address* may be accessed as separate registers:
@ -277,8 +286,8 @@ High and low 32 bits of *trap base address* may be accessed as separate register
================== ======================================================================= =============
tba_lo Low 32 bits of *trap base address* register. GFX7, GFX8
tba_hi High 32 bits of *trap base address* register. GFX7, GFX8
[tba_lo] Low 32 bits of *trap base address* register (an alternative syntax). GFX7, GFX8
[tba_hi] High 32 bits of *trap base address* register (an alternative syntax). GFX7, GFX8
[tba_lo] Low 32 bits of *trap base address* register (an SP3 syntax). GFX7, GFX8
[tba_hi] High 32 bits of *trap base address* register (an SP3 syntax). GFX7, GFX8
================== ======================================================================= =============
Note that *tba*, *tba_lo* and *tba_hi* are not accessible as assembler registers in GFX9 and GFX10,
@ -295,8 +304,8 @@ Trap memory address, 64-bits wide.
Syntax Description Availability
================= ======================================================================= ==================
tma 64-bit *trap memory address* register. GFX7, GFX8
[tma] 64-bit *trap memory address* register (an alternative syntax). GFX7, GFX8
[tma_lo,tma_hi] 64-bit *trap memory address* register (an alternative syntax). GFX7, GFX8
[tma] 64-bit *trap memory address* register (an SP3 syntax). GFX7, GFX8
[tma_lo,tma_hi] 64-bit *trap memory address* register (an SP3 syntax). GFX7, GFX8
================= ======================================================================= ==================
High and low 32 bits of *trap memory address* may be accessed as separate registers:
@ -306,8 +315,8 @@ High and low 32 bits of *trap memory address* may be accessed as separate regist
================= ======================================================================= ==================
tma_lo Low 32 bits of *trap memory address* register. GFX7, GFX8
tma_hi High 32 bits of *trap memory address* register. GFX7, GFX8
[tma_lo] Low 32 bits of *trap memory address* register (an alternative syntax). GFX7, GFX8
[tma_hi] High 32 bits of *trap memory address* register (an alternative syntax). GFX7, GFX8
[tma_lo] Low 32 bits of *trap memory address* register (an SP3 syntax). GFX7, GFX8
[tma_hi] High 32 bits of *trap memory address* register (an SP3 syntax). GFX7, GFX8
================= ======================================================================= ==================
Note that *tma*, *tma_lo* and *tma_hi* are not accessible as assembler registers in GFX9 and GFX10,
@ -324,8 +333,8 @@ Flat scratch address, 64-bits wide. Holds the base address of scratch memory.
Syntax Description
================================== ================================================================
flat_scratch 64-bit *flat scratch* address register.
[flat_scratch] 64-bit *flat scratch* address register (an alternative syntax).
[flat_scratch_lo,flat_scratch_hi] 64-bit *flat scratch* address register (an alternative syntax).
[flat_scratch] 64-bit *flat scratch* address register (an SP3 syntax).
[flat_scratch_lo,flat_scratch_hi] 64-bit *flat scratch* address register (an SP3 syntax).
================================== ================================================================
High and low 32 bits of *flat scratch* address may be accessed as separate registers:
@ -335,8 +344,8 @@ High and low 32 bits of *flat scratch* address may be accessed as separate regis
========================= =========================================================================
flat_scratch_lo Low 32 bits of *flat scratch* address register.
flat_scratch_hi High 32 bits of *flat scratch* address register.
[flat_scratch_lo] Low 32 bits of *flat scratch* address register (an alternative syntax).
[flat_scratch_hi] High 32 bits of *flat scratch* address register (an alternative syntax).
[flat_scratch_lo] Low 32 bits of *flat scratch* address register (an SP3 syntax).
[flat_scratch_hi] High 32 bits of *flat scratch* address register (an SP3 syntax).
========================= =========================================================================
.. _amdgpu_synid_xnack:
@ -355,8 +364,8 @@ received an *XNACK* due to a vector memory operation.
Syntax Description
============================== =====================================================
xnack_mask 64-bit *xnack mask* register.
[xnack_mask] 64-bit *xnack mask* register (an alternative syntax).
[xnack_mask_lo,xnack_mask_hi] 64-bit *xnack mask* register (an alternative syntax).
[xnack_mask] 64-bit *xnack mask* register (an SP3 syntax).
[xnack_mask_lo,xnack_mask_hi] 64-bit *xnack mask* register (an SP3 syntax).
============================== =====================================================
High and low 32 bits of *xnack mask* may be accessed as separate registers:
@ -366,8 +375,8 @@ High and low 32 bits of *xnack mask* may be accessed as separate registers:
===================== ==============================================================
xnack_mask_lo Low 32 bits of *xnack mask* register.
xnack_mask_hi High 32 bits of *xnack mask* register.
[xnack_mask_lo] Low 32 bits of *xnack mask* register (an alternative syntax).
[xnack_mask_hi] High 32 bits of *xnack mask* register (an alternative syntax).
[xnack_mask_lo] Low 32 bits of *xnack mask* register (an SP3 syntax).
[xnack_mask_hi] High 32 bits of *xnack mask* register (an SP3 syntax).
===================== ==============================================================
.. _amdgpu_synid_vcc:
@ -385,8 +394,8 @@ Note that GFX10 H/W does not use high 32 bits of *vcc* in *wave32* mode.
Syntax Description
================ =========================================================================
vcc 64-bit *vector condition code* register.
[vcc] 64-bit *vector condition code* register (an alternative syntax).
[vcc_lo,vcc_hi] 64-bit *vector condition code* register (an alternative syntax).
[vcc] 64-bit *vector condition code* register (an SP3 syntax).
[vcc_lo,vcc_hi] 64-bit *vector condition code* register (an SP3 syntax).
================ =========================================================================
High and low 32 bits of *vector condition code* may be accessed as separate registers:
@ -396,8 +405,8 @@ High and low 32 bits of *vector condition code* may be accessed as separate regi
================ =========================================================================
vcc_lo Low 32 bits of *vector condition code* register.
vcc_hi High 32 bits of *vector condition code* register.
[vcc_lo] Low 32 bits of *vector condition code* register (an alternative syntax).
[vcc_hi] High 32 bits of *vector condition code* register (an alternative syntax).
[vcc_lo] Low 32 bits of *vector condition code* register (an SP3 syntax).
[vcc_hi] High 32 bits of *vector condition code* register (an SP3 syntax).
================ =========================================================================
.. _amdgpu_synid_m0:
@ -412,7 +421,7 @@ including register indexing and bounds checking.
Syntax Description
=========== ===================================================
m0 A 32-bit *memory* register.
[m0] A 32-bit *memory* register (an alternative syntax).
[m0] A 32-bit *memory* register (an SP3 syntax).
=========== ===================================================
.. _amdgpu_synid_exec:
@ -430,8 +439,8 @@ Note that GFX10 H/W does not use high 32 bits of *exec* in *wave32* mode.
Syntax Description
===================== =================================================================
exec 64-bit *execute mask* register.
[exec] 64-bit *execute mask* register (an alternative syntax).
[exec_lo,exec_hi] 64-bit *execute mask* register (an alternative syntax).
[exec] 64-bit *execute mask* register (an SP3 syntax).
[exec_lo,exec_hi] 64-bit *execute mask* register (an SP3 syntax).
===================== =================================================================
High and low 32 bits of *execute mask* may be accessed as separate registers:
@ -441,8 +450,8 @@ High and low 32 bits of *execute mask* may be accessed as separate registers:
===================== =================================================================
exec_lo Low 32 bits of *execute mask* register.
exec_hi High 32 bits of *execute mask* register.
[exec_lo] Low 32 bits of *execute mask* register (an alternative syntax).
[exec_hi] High 32 bits of *execute mask* register (an alternative syntax).
[exec_lo] Low 32 bits of *execute mask* register (an SP3 syntax).
[exec_hi] High 32 bits of *execute mask* register (an SP3 syntax).
===================== =================================================================
.. _amdgpu_synid_vccz:
@ -452,7 +461,7 @@ vccz
A single bit flag indicating that the :ref:`vcc<amdgpu_synid_vcc>` is all zeros.
Note. When GFX10 operates in *wave32* mode, this register reflects state of :ref:`vcc_lo<amdgpu_synid_vcc_lo>`.
Note: when GFX10 operates in *wave32* mode, this register reflects state of :ref:`vcc_lo<amdgpu_synid_vcc_lo>`.
.. _amdgpu_synid_execz:
@ -461,7 +470,7 @@ execz
A single bit flag indicating that the :ref:`exec<amdgpu_synid_exec>` is all zeros.
Note. When GFX10 operates in *wave32* mode, this register reflects state of :ref:`exec_lo<amdgpu_synid_exec>`.
Note: when GFX10 operates in *wave32* mode, this register reflects state of :ref:`exec_lo<amdgpu_synid_exec>`.
.. _amdgpu_synid_scc:
@ -495,19 +504,20 @@ GFX10 only.
.. _amdgpu_synid_constant:
constant
--------
inline constant
---------------
A set of integer and floating-point *inline* constants and values:
An *inline constant* is an integer or a floating-point value encoded as a part of an instruction.
Compare *inline constants* with :ref:`literals<amdgpu_synid_literal>`.
Inline constants include:
* :ref:`iconst<amdgpu_synid_iconst>`
* :ref:`fconst<amdgpu_synid_fconst>`
* :ref:`ival<amdgpu_synid_ival>`
In contrast with :ref:`literals<amdgpu_synid_literal>`, these operands are encoded as a part of instruction.
If a number may be encoded as either
a :ref:`literal<amdgpu_synid_literal>` or
a :ref:`literal<amdgpu_synid_literal>` or
a :ref:`constant<amdgpu_synid_constant>`,
assembler selects the latter encoding as more efficient.
@ -516,17 +526,14 @@ assembler selects the latter encoding as more efficient.
iconst
~~~~~~
An :ref:`integer number<amdgpu_synid_integer_number>`
An :ref:`integer number<amdgpu_synid_integer_number>` or
an :ref:`absolute expression<amdgpu_synid_absolute_expression>`
encoded as an *inline constant*.
Only a small fraction of integer numbers may be encoded as *inline constants*.
They are enumerated in the table below.
Other integer numbers have to be encoded as :ref:`literals<amdgpu_synid_literal>`.
Integer *inline constants* are converted to
:ref:`expected operand type<amdgpu_syn_instruction_type>`
as described :ref:`here<amdgpu_synid_int_const_conv>`.
================================== ====================================
Value Note
================================== ====================================
@ -548,10 +555,6 @@ Only a small fraction of floating-point numbers may be encoded as *inline consta
They are enumerated in the table below.
Other floating-point numbers have to be encoded as :ref:`literals<amdgpu_synid_literal>`.
Floating-point *inline constants* are converted to
:ref:`expected operand type<amdgpu_syn_instruction_type>`
as described :ref:`here<amdgpu_synid_fp_const_conv>`.
===================== ===================================================== ==================
Value Note Availability
===================== ===================================================== ==================
@ -594,21 +597,18 @@ These operands provide read-only access to H/W registers.
literal
-------
A literal is a 64-bit value which is encoded as a separate 32-bit dword in the instruction stream.
A *literal* is a 64-bit value encoded as a separate 32-bit dword in the instruction stream.
Compare *literals* with :ref:`inline constants<amdgpu_synid_constant>`.
If a number may be encoded as either
a :ref:`literal<amdgpu_synid_literal>` or
a :ref:`literal<amdgpu_synid_literal>` or
an :ref:`inline constant<amdgpu_synid_constant>`,
assembler selects the latter encoding as more efficient.
Literals may be specified as :ref:`integer numbers<amdgpu_synid_integer_number>`,
:ref:`floating-point numbers<amdgpu_synid_floating-point_number>` or
:ref:`expressions<amdgpu_synid_expression>`
(expressions are currently supported for 32-bit operands only).
A 64-bit literal value is converted by assembler
to an :ref:`expected operand type<amdgpu_syn_instruction_type>`
as described :ref:`here<amdgpu_synid_lit_conv>`.
:ref:`floating-point numbers<amdgpu_synid_floating-point_number>`,
:ref:`absolute expressions<amdgpu_synid_absolute_expression>` or
:ref:`relocatable expressions<amdgpu_synid_relocatable_expression>`.
An instruction may use only one literal but several operands may refer the same literal.
@ -617,30 +617,38 @@ An instruction may use only one literal but several operands may refer the same
uimm8
-----
A 8-bit positive :ref:`integer number<amdgpu_synid_integer_number>`.
The value is encoded as part of the opcode so it is free to use.
A 8-bit :ref:`integer number<amdgpu_synid_integer_number>`
or an :ref:`absolute expression<amdgpu_synid_absolute_expression>`.
The value must be in the range 0..0xFF.
.. _amdgpu_synid_uimm32:
uimm32
------
A 32-bit positive :ref:`integer number<amdgpu_synid_integer_number>`.
The value is stored as a separate 32-bit dword in the instruction stream.
A 32-bit :ref:`integer number<amdgpu_synid_integer_number>`
or an :ref:`absolute expression<amdgpu_synid_absolute_expression>`.
The value must be in the range 0..0xFFFFFFFF.
.. _amdgpu_synid_uimm20:
uimm20
------
A 20-bit positive :ref:`integer number<amdgpu_synid_integer_number>`.
A 20-bit :ref:`integer number<amdgpu_synid_integer_number>`
or an :ref:`absolute expression<amdgpu_synid_absolute_expression>`.
The value must be in the range 0..0xFFFFF.
.. _amdgpu_synid_uimm21:
uimm21
------
A 21-bit positive :ref:`integer number<amdgpu_synid_integer_number>`.
A 21-bit :ref:`integer number<amdgpu_synid_integer_number>`
or an :ref:`absolute expression<amdgpu_synid_absolute_expression>`.
The value must be in the range 0..0x1FFFFF.
.. WARNING:: Assembler currently supports 20-bit offsets only. Use :ref:`uimm20<amdgpu_synid_uimm20>` as a replacement.
@ -649,7 +657,10 @@ A 21-bit positive :ref:`integer number<amdgpu_synid_integer_number>`.
simm21
------
A 21-bit :ref:`integer number<amdgpu_synid_integer_number>`.
A 21-bit :ref:`integer number<amdgpu_synid_integer_number>`
or an :ref:`absolute expression<amdgpu_synid_absolute_expression>`.
The value must be in the range -0x100000..0x0FFFFF.
.. WARNING:: Assembler currently supports 20-bit unsigned offsets only. Use :ref:`uimm20<amdgpu_synid_uimm20>` as a replacement.
@ -678,27 +689,20 @@ Integer Numbers
---------------
Integer numbers are 64 bits wide.
They may be specified in binary, octal, hexadecimal and decimal formats:
They are converted to :ref:`expected operand type<amdgpu_syn_instruction_type>`
as described :ref:`here<amdgpu_synid_int_conv>`.
============== ====================================
Format Syntax
============== ====================================
Decimal [-]?[1-9][0-9]*
Binary [-]?0b[01]+
Octal [-]?0[0-7]+
Hexadecimal [-]?0x[0-9a-fA-F]+
\ [-]?[0x]?[0-9][0-9a-fA-F]*[hH]
============== ====================================
Integer numbers may be specified in binary, octal, hexadecimal and decimal formats:
Examples:
.. parsed-literal::
-1234
0b1010
010
0xff
0ffh
============ =============================== ========
Format Syntax Example
============ =============================== ========
Decimal [-]?[1-9][0-9]* -1234
Binary [-]?0b[01]+ 0b1010
Octal [-]?0[0-7]+ 010
Hexadecimal [-]?0x[0-9a-fA-F]+ 0xff
\ [-]?[0x]?[0-9][0-9a-fA-F]*[hH] 0ffh
============ =============================== ========
.. _amdgpu_synid_floating-point_number:
@ -706,31 +710,29 @@ Floating-Point Numbers
----------------------
All floating-point numbers are handled as double (64 bits wide).
They are converted to
:ref:`expected operand type<amdgpu_syn_instruction_type>`
as described :ref:`here<amdgpu_synid_fp_conv>`.
Floating-point numbers may be specified in hexadecimal and decimal formats:
============== ======================================================== ========================================================
Format Syntax Note
============== ======================================================== ========================================================
Decimal [-]?[0-9]*[.][0-9]*([eE][+-]?[0-9]*)? Must include either a decimal separator or an exponent.
Hexadecimal [-]0x[0-9a-fA-F]*(.[0-9a-fA-F]*)?[pP][+-]?[0-9a-fA-F]+
============== ======================================================== ========================================================
Examples:
.. parsed-literal::
-1.234
234e2
-0x1afp-10
0x.1afp10
============ ======================================================== ====================== ====================
Format Syntax Examples Note
============ ======================================================== ====================== ====================
Decimal [-]?[0-9]*[.][0-9]*([eE][+-]?[0-9]*)? -1.234, 234e2 Must include either
a decimal separator
or an exponent.
Hexadecimal [-]0x[0-9a-fA-F]*(.[0-9a-fA-F]*)?[pP][+-]?[0-9a-fA-F]+ -0x1afp-10, 0x.1afp10
============ ======================================================== ====================== ====================
.. _amdgpu_synid_expression:
Expressions
===========
An expression specifies an address or a numeric value.
An expression is evaluated to a 64-bit integer.
Note that floating-point expressions are not supported.
There are two kinds of expressions:
* :ref:`Absolute<amdgpu_synid_absolute_expression>`.
@ -741,10 +743,14 @@ There are two kinds of expressions:
Absolute Expressions
--------------------
The value of an absolute expression remains the same after program relocation.
The value of an absolute expression does not change after program relocation.
Absolute expressions must not include unassigned and relocatable values
such as labels.
Absolute expressions are evaluated to 64-bit integer values and converted to
:ref:`expected operand type<amdgpu_syn_instruction_type>`
as described :ref:`here<amdgpu_synid_int_conv>`.
Examples:
.. parsed-literal::
@ -760,46 +766,39 @@ Relocatable Expressions
The value of a relocatable expression depends on program relocation.
Note that use of relocatable expressions is limited with branch targets
and 32-bit :ref:`literals<amdgpu_synid_literal>`.
and 32-bit integer operands.
Addition information about relocation may be found :ref:`here<amdgpu-relocation-records>`.
Examples:
A relocatable expression is evaluated to a 64-bit integer value
which depends on operand kind and :ref:`relocation type<amdgpu-relocation-records>`
of symbol(s) used in the expression. For example, if an instruction refers a label,
this reference is evaluated to an offset from the address after the instruction
to the label address:
.. parsed-literal::
y = x + 10 // x is not yet defined. Undefined symbols are assumed to be PC-relative.
z = .
label:
v_add_co_u32_e32 v0, vcc, label, v1 // 'label' operand is evaluated to -4
Expression Data Type
--------------------
Note that values of relocatable expressions are usually unknown at assembly time;
they are resolved later by a linker and converted to
:ref:`expected operand type<amdgpu_syn_instruction_type>`
as described :ref:`here<amdgpu_synid_rl_conv>`.
Expressions and operands of expressions are interpreted as 64-bit integers.
Operands and Operations
-----------------------
Expressions may include 64-bit :ref:`floating-point numbers<amdgpu_synid_floating-point_number>` (double).
However these operands are also handled as 64-bit integers
using binary representation of specified floating-point numbers.
No conversion from floating-point to integer is performed.
Examples:
.. parsed-literal::
x = 0.1 // x is assigned an integer 4591870180066957722 which is a binary representation of 0.1.
y = x + x // y is a sum of two integer values; it is not equal to 0.2!
Syntax
------
Expressions are composed of
:ref:`symbols<amdgpu_synid_symbol>`,
:ref:`integer numbers<amdgpu_synid_integer_number>`,
:ref:`floating-point numbers<amdgpu_synid_floating-point_number>`,
:ref:`binary operators<amdgpu_synid_expression_bin_op>`,
:ref:`unary operators<amdgpu_synid_expression_un_op>` and subexpressions.
Expressions are composed of 64-bit integer operands and operations.
Operands include :ref:`integer numbers<amdgpu_synid_integer_number>`
and :ref:`symbols<amdgpu_synid_symbol>`.
Expressions may also use "." which is a reference to the current PC (program counter).
:ref:`Unary<amdgpu_synid_expression_un_op>` and :ref:`binary<amdgpu_synid_expression_bin_op>`
operations produce 64-bit integer results.
Syntax of Expressions
---------------------
The syntax of expressions is shown below::
expr ::= expr binop expr | primaryexpr ;
@ -887,7 +886,7 @@ They operate on and produce 64-bit integers.
Symbols
-------
A symbol is a named 64-bit value, representing a relocatable
A symbol is a named 64-bit integer value, representing a relocatable
address or an absolute (non-relocatable) number.
Symbol names have the following syntax:
@ -907,128 +906,78 @@ The table below provides several examples of syntax used for symbol definition.
A symbol may be used before it is declared or assigned;
unassigned symbols are assumed to be PC-relative.
Addition information about symbols may be found :ref:`here<amdgpu-symbols>`.
Additional information about symbols may be found :ref:`here<amdgpu-symbols>`.
.. _amdgpu_synid_conv:
Conversions
===========
Type and Size Conversion
========================
This section describes what happens when a 64-bit
:ref:`integer number<amdgpu_synid_integer_number>`, a
:ref:`floating-point numbers<amdgpu_synid_floating-point_number>` or a
:ref:`symbol<amdgpu_synid_symbol>`
:ref:`floating-point number<amdgpu_synid_floating-point_number>` or an
:ref:`expression<amdgpu_synid_expression>`
is used for an operand which has a different type or size.
Depending on operand kind, this conversion is performed by either assembler or AMDGPU H/W:
.. _amdgpu_synid_int_conv:
* Values encoded as :ref:`inline constants<amdgpu_synid_constant>` are handled by H/W.
* Values encoded as :ref:`literals<amdgpu_synid_literal>` are converted by assembler.
Conversion of Integer Values
----------------------------
.. _amdgpu_synid_const_conv:
Instruction operands may be specified as 64-bit :ref:`integer numbers<amdgpu_synid_integer_number>` or
:ref:`absolute expressions<amdgpu_synid_absolute_expression>`. These values are converted to
the :ref:`expected operand type<amdgpu_syn_instruction_type>` using the following steps:
Inline Constants
----------------
1. *Validation*. Assembler checks if the input value may be truncated without loss to the required *truncation width*
(see the table below). There are two cases when this operation is enabled:
.. _amdgpu_synid_int_const_conv:
* The truncated bits are all 0.
* The truncated bits are all 1 and the value after truncation has its MSB bit set.
Integer Inline Constants
~~~~~~~~~~~~~~~~~~~~~~~~
In all other cases assembler triggers an error.
Integer :ref:`inline constants<amdgpu_synid_constant>`
may be thought of as 64-bit
:ref:`integer numbers<amdgpu_synid_integer_number>`;
when used as operands they are truncated to the size of
:ref:`expected operand type<amdgpu_syn_instruction_type>`.
No data type conversions are performed.
2. *Conversion*. The input value is converted to the expected type as described in the table below.
Depending on operand kind, this conversion is performed by either assembler or AMDGPU H/W (or both).
Examples:
============== ================= =============== ====================================================================
Expected type Truncation Width Conversion Description
============== ================= =============== ====================================================================
i16, u16, b16 16 num.u16 Truncate to 16 bits.
i32, u32, b32 32 num.u32 Truncate to 32 bits.
i64 32 {-1,num.i32} Truncate to 32 bits and then sign-extend the result to 64 bits.
u64, b64 32 {0,num.u32} Truncate to 32 bits and then zero-extend the result to 64 bits.
f16 16 num.u16 Use low 16 bits as an f16 value.
f32 32 num.u32 Use low 32 bits as an f32 value.
f64 32 {num.u32,0} Use low 32 bits of the number as high 32 bits
of the result; low 32 bits of the result are zeroed.
============== ================= =============== ====================================================================
Examples of enabled conversions:
.. parsed-literal::
// GFX9
v_add_u16 v0, -1, 0 // v0 = 0xFFFF
v_add_f16 v0, -1, 0 // v0 = 0xFFFF (NaN)
v_add_u16 v0, -1, 0 // src0 = 0xFFFF
v_add_f16 v0, -1, 0 // src0 = 0xFFFF (NaN)
//
v_add_u32 v0, -1, 0 // src0 = 0xFFFFFFFF
v_add_f32 v0, -1, 0 // src0 = 0xFFFFFFFF (NaN)
//
v_add_u16 v0, 0xff00, v0 // src0 = 0xff00
v_add_u16 v0, 0xffffffffffffff00, v0 // src0 = 0xff00
v_add_u16 v0, -256, v0 // src0 = 0xff00
//
s_bfe_i64 s[0:1], 0xffefffff, s3 // src0 = 0xffffffffffefffff
s_bfe_u64 s[0:1], 0xffefffff, s3 // src0 = 0x00000000ffefffff
v_ceil_f64_e32 v[0:1], 0xffefffff // src0 = 0xffefffff00000000 (-1.7976922776554302e308)
//
x = 0xffefffff //
s_bfe_i64 s[0:1], x, s3 // src0 = 0xffffffffffefffff
s_bfe_u64 s[0:1], x, s3 // src0 = 0x00000000ffefffff
v_ceil_f64_e32 v[0:1], x // src0 = 0xffefffff00000000 (-1.7976922776554302e308)
v_add_u32 v0, -1, 0 // v0 = 0xFFFFFFFF
v_add_f32 v0, -1, 0 // v0 = 0xFFFFFFFF (NaN)
.. _amdgpu_synid_fp_const_conv:
Floating-Point Inline Constants
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Floating-point :ref:`inline constants<amdgpu_synid_constant>`
may be thought of as 64-bit
:ref:`floating-point numbers<amdgpu_synid_floating-point_number>`;
when used as operands they are converted to a floating-point number of
:ref:`expected operand size<amdgpu_syn_instruction_type>`.
Examples:
.. parsed-literal::
// GFX9
v_add_f16 v0, 1.0, 0 // v0 = 0x3C00 (1.0)
v_add_u16 v0, 1.0, 0 // v0 = 0x3C00
v_add_f32 v0, 1.0, 0 // v0 = 0x3F800000 (1.0)
v_add_u32 v0, 1.0, 0 // v0 = 0x3F800000
.. _amdgpu_synid_lit_conv:
Literals
--------
.. _amdgpu_synid_int_lit_conv:
Integer Literals
~~~~~~~~~~~~~~~~
Integer :ref:`literals<amdgpu_synid_literal>`
are specified as 64-bit :ref:`integer numbers<amdgpu_synid_integer_number>`.
When used as operands they are converted to
:ref:`expected operand type<amdgpu_syn_instruction_type>` as described below.
============== ============== =============== ====================================================================
Expected type Condition Result Note
============== ============== =============== ====================================================================
i16, u16, b16 cond(num,16) num.u16 Truncate to 16 bits.
i32, u32, b32 cond(num,32) num.u32 Truncate to 32 bits.
i64 cond(num,32) {-1,num.i32} Truncate to 32 bits and then sign-extend the result to 64 bits.
u64, b64 cond(num,32) { 0,num.u32} Truncate to 32 bits and then zero-extend the result to 64 bits.
f16 cond(num,16) num.u16 Use low 16 bits as an f16 value.
f32 cond(num,32) num.u32 Use low 32 bits as an f32 value.
f64 cond(num,32) {num.u32,0} Use low 32 bits of the number as high 32 bits
of the result; low 32 bits of the result are zeroed.
============== ============== =============== ====================================================================
The condition *cond(X,S)* indicates if a 64-bit number *X*
can be converted to a smaller size *S* by truncation of upper bits.
There are two cases when the conversion is possible:
* The truncated bits are all 0.
* The truncated bits are all 1 and the value after truncation has its MSB bit set.
Examples of valid literals:
.. parsed-literal::
// GFX9
// Literal value after conversion:
v_add_u16 v0, 0xff00, v0 // 0xff00
v_add_u16 v0, 0xffffffffffffff00, v0 // 0xff00
v_add_u16 v0, -256, v0 // 0xff00
// Literal value after conversion:
s_bfe_i64 s[0:1], 0xffefffff, s3 // 0xffffffffffefffff
s_bfe_u64 s[0:1], 0xffefffff, s3 // 0x00000000ffefffff
v_ceil_f64_e32 v[0:1], 0xffefffff // 0xffefffff00000000 (-1.7976922776554302e308)
Examples of invalid literals:
Examples of disabled conversions:
.. parsed-literal::
@ -1037,49 +986,57 @@ Examples of invalid literals:
v_add_u16 v0, 0x1ff00, v0 // truncated bits are not all 0 or 1
v_add_u16 v0, 0xffffffffffff00ff, v0 // truncated bits do not match MSB of the result
.. _amdgpu_synid_fp_lit_conv:
.. _amdgpu_synid_fp_conv:
Floating-Point Literals
~~~~~~~~~~~~~~~~~~~~~~~
Conversion of Floating-Point Values
-----------------------------------
Floating-point :ref:`literals<amdgpu_synid_literal>` are specified as 64-bit
:ref:`floating-point numbers<amdgpu_synid_floating-point_number>`.
Instruction operands may be specified as 64-bit :ref:`floating-point numbers<amdgpu_synid_floating-point_number>`.
These values are converted to the :ref:`expected operand type<amdgpu_syn_instruction_type>` using the following steps:
When used as operands they are converted to
:ref:`expected operand type<amdgpu_syn_instruction_type>` as described below.
1. *Validation*. Assembler checks if the input f64 number can be converted
to the *required floating-point type* (see the table below) without overflow or underflow.
Precision lost is allowed. If this conversion is not possible, assembler triggers an error.
============== ============== ================= =================================================================
Expected type Condition Result Note
============== ============== ================= =================================================================
i16, u16, b16 cond(num,16) f16(num) Convert to f16 and use bits of the result as an integer value.
i32, u32, b32 cond(num,32) f32(num) Convert to f32 and use bits of the result as an integer value.
i64, u64, b64 false \- Conversion disabled because of an unclear semantics.
f16 cond(num,16) f16(num) Convert to f16.
f32 cond(num,32) f32(num) Convert to f32.
f64 true {num.u32.hi,0} Use high 32 bits of the number as high 32 bits of the result;
zero-fill low 32 bits of the result.
2. *Conversion*. The input value is converted to the expected type as described in the table below.
Depending on operand kind, this is performed by either assembler or AMDGPU H/W (or both).
Note that the result may differ from the original number.
============== ============== ================= =================================================================
============== ================ ================= =================================================================
Expected type Required FP Type Conversion Description
============== ================ ================= =================================================================
i16, u16, b16 f16 f16(num) Convert to f16 and use bits of the result as an integer value.
i32, u32, b32 f32 f32(num) Convert to f32 and use bits of the result as an integer value.
i64, u64, b64 \- \- Conversion disabled.
f16 f16 f16(num) Convert to f16.
f32 f32 f32(num) Convert to f32.
f64 f64 {num.u32.hi,0} Use high 32 bits of the number as high 32 bits of the result;
zero-fill low 32 bits of the result.
The condition *cond(X,S)* indicates if an f64 number *X* can be converted
to a smaller *S*-bit floating-point type without overflow or underflow.
Precision lost is allowed.
Note that the result may differ from the original number.
============== ================ ================= =================================================================
Examples of valid literals:
Examples of enabled conversions:
.. parsed-literal::
// GFX9
v_add_f16 v1, 65500.0, v2
v_add_f32 v1, 65600.0, v2
v_add_f16 v0, 1.0, 0 // src0 = 0x3C00 (1.0)
v_add_u16 v0, 1.0, 0 // src0 = 0x3C00
//
v_add_f32 v0, 1.0, 0 // src0 = 0x3F800000 (1.0)
v_add_u32 v0, 1.0, 0 // src0 = 0x3F800000
// Literal value before conversion: 1.7976931348623157e308 (0x7fefffffffffffff)
// Literal value after conversion: 1.7976922776554302e308 (0x7fefffff00000000)
// src0 before conversion:
// 1.7976931348623157e308 = 0x7fefffffffffffff
// src0 after conversion:
// 1.7976922776554302e308 = 0x7fefffff00000000
v_ceil_f64 v[0:1], 1.7976931348623157e308
Examples of invalid literals:
v_add_f16 v1, 65500.0, v2 // ok for f16.
v_add_f32 v1, 65600.0, v2 // ok for f32, but would result in overflow for f16.
Examples of disabled conversions:
.. parsed-literal::
@ -1087,25 +1044,35 @@ Examples of invalid literals:
v_add_f16 v1, 65600.0, v2 // overflow
.. _amdgpu_synid_exp_conv:
.. _amdgpu_synid_rl_conv:
Expressions
~~~~~~~~~~~
Conversion of Relocatable Values
--------------------------------
Expressions operate with and result in 64-bit integers.
:ref:`Relocatable expressions<amdgpu_synid_relocatable_expression>`
may be used with 32-bit integer operands and jump targets.
When used as operands they are truncated to
:ref:`expected operand size<amdgpu_syn_instruction_type>`.
No data type conversions are performed.
When the value of a relocatable expression is resolved by a linker, it is
converted as needed and truncated to the operand size. The conversion depends
on :ref:`relocation type<amdgpu-relocation-records>` and operand kind.
Examples:
For example, when a 32-bit operand of an instruction refers a relocatable expression *expr*,
this reference is evaluated to a 64-bit offset from the address after the
instruction to the address being referenced, *counted in bytes*.
Then the value is truncated to 32 bits and encoded as a literal:
.. parsed-literal::
// GFX9
expr = .
v_add_co_u32_e32 v0, vcc, expr, v1 // 'expr' operand is evaluated to -4
// and then truncated to 0xFFFFFFFC
x = 0.1
v_sqrt_f32 v0, x // v0 = [low 32 bits of 0.1 (double)]
v_sqrt_f32 v0, (0.1 + 0) // the same as above
v_sqrt_f32 v0, 0.1 // v0 = [0.1 (double) converted to float]
As another example, when a branch instruction refers a label,
this reference is evaluated to an offset from the address after the
instruction to the label address, *counted in dwords*.
Then the value is truncated to 16 bits:
.. parsed-literal::
label:
s_branch label // 'label' operand is evaluated to -1 and truncated to 0xFFFF