forked from OSchip/llvm-project
1249 lines
43 KiB
ReStructuredText
1249 lines
43 KiB
ReStructuredText
======================================
|
|
Syntax of AMDGPU Instruction Modifiers
|
|
======================================
|
|
|
|
.. contents::
|
|
:local:
|
|
|
|
Conventions
|
|
===========
|
|
|
|
The following notation is used throughout this document:
|
|
|
|
=================== =============================================================
|
|
Notation Description
|
|
=================== =============================================================
|
|
{0..N} Any integer value in the range from 0 to N (inclusive).
|
|
<x> Syntax and meaning of *x* is explained elsewhere.
|
|
=================== =============================================================
|
|
|
|
.. _amdgpu_syn_modifiers:
|
|
|
|
Modifiers
|
|
=========
|
|
|
|
DS Modifiers
|
|
------------
|
|
|
|
.. _amdgpu_synid_ds_offset8:
|
|
|
|
ds_offset8
|
|
~~~~~~~~~~
|
|
|
|
Specifies an immediate unsigned 8-bit offset, in bytes. The default value is 0.
|
|
|
|
Used with DS instructions which have 2 addresses.
|
|
|
|
=================== =====================================================
|
|
Syntax Description
|
|
=================== =====================================================
|
|
offset:{0..0xFF} Specifies an unsigned 8-bit offset as a positive
|
|
:ref:`integer number <amdgpu_synid_integer_number>`.
|
|
=================== =====================================================
|
|
|
|
Examples:
|
|
|
|
.. parsed-literal::
|
|
|
|
offset:255
|
|
offset:0xff
|
|
|
|
.. _amdgpu_synid_ds_offset16:
|
|
|
|
ds_offset16
|
|
~~~~~~~~~~~
|
|
|
|
Specifies an immediate unsigned 16-bit offset, in bytes. The default value is 0.
|
|
|
|
Used with DS instructions which have 1 address.
|
|
|
|
==================== ======================================================
|
|
Syntax Description
|
|
==================== ======================================================
|
|
offset:{0..0xFFFF} Specifies an unsigned 16-bit offset as a positive
|
|
:ref:`integer number <amdgpu_synid_integer_number>`.
|
|
==================== ======================================================
|
|
|
|
Examples:
|
|
|
|
.. parsed-literal::
|
|
|
|
offset:65535
|
|
offset:0xffff
|
|
|
|
.. _amdgpu_synid_sw_offset16:
|
|
|
|
sw_offset16
|
|
~~~~~~~~~~~
|
|
|
|
This is a special modifier which may be used with *ds_swizzle_b32* instruction only.
|
|
It specifies a swizzle pattern in numeric or symbolic form. The default value is 0.
|
|
|
|
See AMD documentation for more information.
|
|
|
|
======================================================= ===========================================================
|
|
Syntax Description
|
|
======================================================= ===========================================================
|
|
offset:{0..0xFFFF} Specifies a 16-bit swizzle pattern.
|
|
offset:swizzle(QUAD_PERM,{0..3},{0..3},{0..3},{0..3}) Specifies a quad permute mode pattern
|
|
|
|
Each number is a lane *id*.
|
|
offset:swizzle(BITMASK_PERM, "<mask>") Specifies a bitmask permute mode pattern.
|
|
|
|
The pattern converts a 5-bit lane *id* to another
|
|
lane *id* with which the lane interacts.
|
|
|
|
*mask* is a 5 character sequence which
|
|
specifies how to transform the bits of the
|
|
lane *id*.
|
|
|
|
The following characters are allowed:
|
|
|
|
* "0" - set bit to 0.
|
|
|
|
* "1" - set bit to 1.
|
|
|
|
* "p" - preserve bit.
|
|
|
|
* "i" - inverse bit.
|
|
|
|
offset:swizzle(BROADCAST,{2..32},{0..N}) Specifies a broadcast mode.
|
|
|
|
Broadcasts the value of any particular lane to
|
|
all lanes in its group.
|
|
|
|
The first numeric parameter is a group
|
|
size and must be equal to 2, 4, 8, 16 or 32.
|
|
|
|
The second numeric parameter is an index of the
|
|
lane being broadcasted.
|
|
|
|
The index must not exceed group size.
|
|
offset:swizzle(SWAP,{1..16}) Specifies a swap mode.
|
|
|
|
Swaps the neighboring groups of
|
|
1, 2, 4, 8 or 16 lanes.
|
|
offset:swizzle(REVERSE,{2..32}) Specifies a reverse mode.
|
|
|
|
Reverses the lanes for groups of 2, 4, 8, 16 or 32 lanes.
|
|
======================================================= ===========================================================
|
|
|
|
Numeric parameters may be specified as either :ref:`integer numbers<amdgpu_synid_integer_number>` or
|
|
:ref:`absolute expressions<amdgpu_synid_absolute_expression>`.
|
|
|
|
Examples:
|
|
|
|
.. parsed-literal::
|
|
|
|
offset:255
|
|
offset:0xffff
|
|
offset:swizzle(QUAD_PERM, 0, 1, 2 ,3)
|
|
offset:swizzle(BITMASK_PERM, "01pi0")
|
|
offset:swizzle(BROADCAST, 2, 0)
|
|
offset:swizzle(SWAP, 8)
|
|
offset:swizzle(REVERSE, 30 + 2)
|
|
|
|
.. _amdgpu_synid_gds:
|
|
|
|
gds
|
|
~~~
|
|
|
|
Specifies whether to use GDS or LDS memory (LDS is the default).
|
|
|
|
======================================== ================================================
|
|
Syntax Description
|
|
======================================== ================================================
|
|
gds Use GDS memory.
|
|
======================================== ================================================
|
|
|
|
|
|
EXP Modifiers
|
|
-------------
|
|
|
|
.. _amdgpu_synid_done:
|
|
|
|
done
|
|
~~~~
|
|
|
|
Specifies if this is the last export from the shader to the target. By default, current
|
|
instruction does not finish an export sequence.
|
|
|
|
======================================== ================================================
|
|
Syntax Description
|
|
======================================== ================================================
|
|
done Indicates the last export operation.
|
|
======================================== ================================================
|
|
|
|
.. _amdgpu_synid_compr:
|
|
|
|
compr
|
|
~~~~~
|
|
|
|
Indicates if the data are compressed (data are not compressed by default).
|
|
|
|
======================================== ================================================
|
|
Syntax Description
|
|
======================================== ================================================
|
|
compr Data are compressed.
|
|
======================================== ================================================
|
|
|
|
.. _amdgpu_synid_vm:
|
|
|
|
vm
|
|
~~
|
|
|
|
Specifies valid mask flag state (off by default).
|
|
|
|
======================================== ================================================
|
|
Syntax Description
|
|
======================================== ================================================
|
|
vm Set valid mask flag.
|
|
======================================== ================================================
|
|
|
|
FLAT Modifiers
|
|
--------------
|
|
|
|
.. _amdgpu_synid_flat_offset12:
|
|
|
|
flat_offset12
|
|
~~~~~~~~~~~~~
|
|
|
|
Specifies an immediate unsigned 12-bit offset, in bytes. The default value is 0.
|
|
|
|
Cannot be used with *global/scratch* opcodes. GFX9 only.
|
|
|
|
================= ======================================================
|
|
Syntax Description
|
|
================= ======================================================
|
|
offset:{0..4095} Specifies a 12-bit unsigned offset as a positive
|
|
:ref:`integer number <amdgpu_synid_integer_number>`.
|
|
================= ======================================================
|
|
|
|
Examples:
|
|
|
|
.. parsed-literal::
|
|
|
|
offset:4095
|
|
offset:0xff
|
|
|
|
.. _amdgpu_synid_flat_offset13:
|
|
|
|
flat_offset13
|
|
~~~~~~~~~~~~~
|
|
|
|
Specifies an immediate signed 13-bit offset, in bytes. The default value is 0.
|
|
|
|
Can be used with *global/scratch* opcodes only. GFX9 only.
|
|
|
|
============================ =======================================================
|
|
Syntax Description
|
|
============================ =======================================================
|
|
offset:{-4096..+4095} Specifies a 13-bit signed offset as an
|
|
:ref:`integer number <amdgpu_synid_integer_number>`.
|
|
============================ =======================================================
|
|
|
|
Examples:
|
|
|
|
.. parsed-literal::
|
|
|
|
offset:-4000
|
|
offset:0x10
|
|
|
|
glc
|
|
~~~
|
|
|
|
See a description :ref:`here<amdgpu_synid_glc>`.
|
|
|
|
slc
|
|
~~~
|
|
|
|
See a description :ref:`here<amdgpu_synid_slc>`.
|
|
|
|
tfe
|
|
~~~
|
|
|
|
See a description :ref:`here<amdgpu_synid_tfe>`.
|
|
|
|
nv
|
|
~~
|
|
|
|
See a description :ref:`here<amdgpu_synid_nv>`.
|
|
|
|
MIMG Modifiers
|
|
--------------
|
|
|
|
.. _amdgpu_synid_dmask:
|
|
|
|
dmask
|
|
~~~~~
|
|
|
|
Specifies which channels (image components) are used by the operation. By default, no channels
|
|
are used.
|
|
|
|
=============== =====================================================
|
|
Syntax Description
|
|
=============== =====================================================
|
|
dmask:{0..15} Specifies image channels as a positive
|
|
:ref:`integer number <amdgpu_synid_integer_number>`.
|
|
|
|
Each bit corresponds to one of 4 image
|
|
components (RGBA).
|
|
|
|
If the specified bit value
|
|
is 0, the component is not used, value 1 means
|
|
that the component is used.
|
|
=============== =====================================================
|
|
|
|
This modifier has some limitations depending on instruction kind:
|
|
|
|
=================================================== ========================
|
|
Instruction Kind Valid dmask Values
|
|
=================================================== ========================
|
|
32-bit atomic *cmpswap* 0x3
|
|
32-bit atomic instructions except for *cmpswap* 0x1
|
|
64-bit atomic *cmpswap* 0xF
|
|
64-bit atomic instructions except for *cmpswap* 0x3
|
|
*gather4* 0x1, 0x2, 0x4, 0x8
|
|
Other instructions any value
|
|
=================================================== ========================
|
|
|
|
Examples:
|
|
|
|
.. parsed-literal::
|
|
|
|
dmask:0xf
|
|
dmask:0b1111
|
|
dmask:3
|
|
|
|
.. _amdgpu_synid_unorm:
|
|
|
|
unorm
|
|
~~~~~
|
|
|
|
Specifies whether the address is normalized or not (the address is normalized by default).
|
|
|
|
======================== ========================================
|
|
Syntax Description
|
|
======================== ========================================
|
|
unorm Force the address to be unnormalized.
|
|
======================== ========================================
|
|
|
|
glc
|
|
~~~
|
|
|
|
See a description :ref:`here<amdgpu_synid_glc>`.
|
|
|
|
slc
|
|
~~~
|
|
|
|
See a description :ref:`here<amdgpu_synid_slc>`.
|
|
|
|
.. _amdgpu_synid_r128:
|
|
|
|
r128
|
|
~~~~
|
|
|
|
Specifies texture resource size. The default size is 256 bits.
|
|
|
|
GFX7 and GFX8 only.
|
|
|
|
=================== ================================================
|
|
Syntax Description
|
|
=================== ================================================
|
|
r128 Specifies 128 bits texture resource size.
|
|
=================== ================================================
|
|
|
|
.. WARNING:: Using this modifier should descrease *rsrc* register size from 8 to 4 dwords, but assembler does not currently support this feature.
|
|
|
|
tfe
|
|
~~~
|
|
|
|
See a description :ref:`here<amdgpu_synid_tfe>`.
|
|
|
|
.. _amdgpu_synid_lwe:
|
|
|
|
lwe
|
|
~~~
|
|
|
|
Specifies LOD warning status (LOD warning is disabled by default).
|
|
|
|
======================================== ================================================
|
|
Syntax Description
|
|
======================================== ================================================
|
|
lwe Enables LOD warning.
|
|
======================================== ================================================
|
|
|
|
.. _amdgpu_synid_da:
|
|
|
|
da
|
|
~~
|
|
|
|
Specifies if an array index must be sent to TA. By default, array index is not sent.
|
|
|
|
======================================== ================================================
|
|
Syntax Description
|
|
======================================== ================================================
|
|
da Send an array-index to TA.
|
|
======================================== ================================================
|
|
|
|
.. _amdgpu_synid_d16:
|
|
|
|
d16
|
|
~~~
|
|
|
|
Specifies data size: 16 or 32 bits (32 bits by default). Not supported by GFX7.
|
|
|
|
======================================== ================================================
|
|
Syntax Description
|
|
======================================== ================================================
|
|
d16 Enables 16-bits data mode.
|
|
|
|
On loads, convert data in memory to 16-bit
|
|
format before storing it in VGPRs.
|
|
|
|
For stores, convert 16-bit data in VGPRs to
|
|
32 bits before going to memory.
|
|
|
|
Note that GFX8.0 does not support data packing.
|
|
Each 16-bit data element occupies 1 VGPR.
|
|
|
|
GFX8.1 and GFX9 support data packing.
|
|
Each pair of 16-bit data elements
|
|
occupies 1 VGPR.
|
|
======================================== ================================================
|
|
|
|
.. _amdgpu_synid_a16:
|
|
|
|
a16
|
|
~~~
|
|
|
|
Specifies size of image address components: 16 or 32 bits (32 bits by default). GFX9 only.
|
|
|
|
======================================== ================================================
|
|
Syntax Description
|
|
======================================== ================================================
|
|
a16 Enables 16-bits image address components.
|
|
======================================== ================================================
|
|
|
|
Miscellaneous Modifiers
|
|
-----------------------
|
|
|
|
.. _amdgpu_synid_glc:
|
|
|
|
glc
|
|
~~~
|
|
|
|
This modifier has different meaning for loads, stores, and atomic operations.
|
|
The default value is off (0).
|
|
|
|
See AMD documentation for details.
|
|
|
|
======================================== ================================================
|
|
Syntax Description
|
|
======================================== ================================================
|
|
glc Set glc bit to 1.
|
|
======================================== ================================================
|
|
|
|
.. _amdgpu_synid_slc:
|
|
|
|
slc
|
|
~~~
|
|
|
|
Specifies cache policy. The default value is off (0).
|
|
|
|
See AMD documentation for details.
|
|
|
|
======================================== ================================================
|
|
Syntax Description
|
|
======================================== ================================================
|
|
slc Set slc bit to 1.
|
|
======================================== ================================================
|
|
|
|
.. _amdgpu_synid_tfe:
|
|
|
|
tfe
|
|
~~~
|
|
|
|
Controls access to partially resident textures. The default value is off (0).
|
|
|
|
See AMD documentation for details.
|
|
|
|
======================================== ================================================
|
|
Syntax Description
|
|
======================================== ================================================
|
|
tfe Set tfe bit to 1.
|
|
======================================== ================================================
|
|
|
|
.. _amdgpu_synid_nv:
|
|
|
|
nv
|
|
~~
|
|
|
|
Specifies if instruction is operating on non-volatile memory. By default, memory is volatile.
|
|
|
|
GFX9 only.
|
|
|
|
======================================== ================================================
|
|
Syntax Description
|
|
======================================== ================================================
|
|
nv Indicates that instruction operates on
|
|
non-volatile memory.
|
|
======================================== ================================================
|
|
|
|
MUBUF/MTBUF Modifiers
|
|
---------------------
|
|
|
|
.. _amdgpu_synid_idxen:
|
|
|
|
idxen
|
|
~~~~~
|
|
|
|
Specifies whether address components include an index. By default, no components are used.
|
|
|
|
Can be used together with :ref:`offen<amdgpu_synid_offen>`.
|
|
|
|
Cannot be used with :ref:`addr64<amdgpu_synid_addr64>`.
|
|
|
|
======================================== ================================================
|
|
Syntax Description
|
|
======================================== ================================================
|
|
idxen Address components include an index.
|
|
======================================== ================================================
|
|
|
|
.. _amdgpu_synid_offen:
|
|
|
|
offen
|
|
~~~~~
|
|
|
|
Specifies whether address components include an offset. By default, no components are used.
|
|
|
|
Can be used together with :ref:`idxen<amdgpu_synid_idxen>`.
|
|
|
|
Cannot be used with :ref:`addr64<amdgpu_synid_addr64>`.
|
|
|
|
======================================== ================================================
|
|
Syntax Description
|
|
======================================== ================================================
|
|
offen Address components include an offset.
|
|
======================================== ================================================
|
|
|
|
.. _amdgpu_synid_addr64:
|
|
|
|
addr64
|
|
~~~~~~
|
|
|
|
Specifies whether a 64-bit address is used. By default, no address is used.
|
|
|
|
GFX7 only. Cannot be used with :ref:`offen<amdgpu_synid_offen>` and
|
|
:ref:`idxen<amdgpu_synid_idxen>` modifiers.
|
|
|
|
======================================== ================================================
|
|
Syntax Description
|
|
======================================== ================================================
|
|
addr64 A 64-bit address is used.
|
|
======================================== ================================================
|
|
|
|
.. _amdgpu_synid_buf_offset12:
|
|
|
|
buf_offset12
|
|
~~~~~~~~~~~~
|
|
|
|
Specifies an immediate unsigned 12-bit offset, in bytes. The default value is 0.
|
|
|
|
=============================== ======================================================
|
|
Syntax Description
|
|
=============================== ======================================================
|
|
offset:{0..0xFFF} Specifies a 12-bit unsigned offset as a positive
|
|
:ref:`integer number <amdgpu_synid_integer_number>`.
|
|
=============================== ======================================================
|
|
|
|
Examples:
|
|
|
|
.. parsed-literal::
|
|
|
|
offset:0
|
|
offset:0x10
|
|
|
|
glc
|
|
~~~
|
|
|
|
See a description :ref:`here<amdgpu_synid_glc>`.
|
|
|
|
slc
|
|
~~~
|
|
|
|
See a description :ref:`here<amdgpu_synid_slc>`.
|
|
|
|
.. _amdgpu_synid_lds:
|
|
|
|
lds
|
|
~~~
|
|
|
|
Specifies where to store the result: VGPRs or LDS (VGPRs by default).
|
|
|
|
======================================== ===========================
|
|
Syntax Description
|
|
======================================== ===========================
|
|
lds Store result in LDS.
|
|
======================================== ===========================
|
|
|
|
tfe
|
|
~~~
|
|
|
|
See a description :ref:`here<amdgpu_synid_tfe>`.
|
|
|
|
.. _amdgpu_synid_dfmt:
|
|
|
|
dfmt
|
|
~~~~
|
|
|
|
TBD
|
|
|
|
.. _amdgpu_synid_nfmt:
|
|
|
|
nfmt
|
|
~~~~
|
|
|
|
TBD
|
|
|
|
SMRD/SMEM Modifiers
|
|
-------------------
|
|
|
|
glc
|
|
~~~
|
|
|
|
See a description :ref:`here<amdgpu_synid_glc>`.
|
|
|
|
nv
|
|
~~
|
|
|
|
See a description :ref:`here<amdgpu_synid_nv>`.
|
|
|
|
VINTRP Modifiers
|
|
----------------
|
|
|
|
.. _amdgpu_synid_high:
|
|
|
|
high
|
|
~~~~
|
|
|
|
Specifies which half of the LDS word to use. Low half of LDS word is used by default.
|
|
GFX9 only.
|
|
|
|
======================================== ================================
|
|
Syntax Description
|
|
======================================== ================================
|
|
high Use high half of LDS word.
|
|
======================================== ================================
|
|
|
|
VOP1/VOP2 DPP Modifiers
|
|
-----------------------
|
|
|
|
GFX8 and GFX9 only.
|
|
|
|
.. _amdgpu_synid_dpp_ctrl:
|
|
|
|
dpp_ctrl
|
|
~~~~~~~~
|
|
|
|
Specifies how data are shared between threads. This is a mandatory modifier.
|
|
There is no default value.
|
|
|
|
Note. The lanes of a wavefront are organized in four banks and four rows.
|
|
|
|
======================================== ================================================
|
|
Syntax Description
|
|
======================================== ================================================
|
|
quad_perm:[{0..3},{0..3},{0..3},{0..3}] Full permute of 4 threads.
|
|
row_mirror Mirror threads within row.
|
|
row_half_mirror Mirror threads within 1/2 row (8 threads).
|
|
row_bcast:15 Broadcast 15th thread of each row to next row.
|
|
row_bcast:31 Broadcast thread 31 to rows 2 and 3.
|
|
wave_shl:1 Wavefront left shift by 1 thread.
|
|
wave_rol:1 Wavefront left rotate by 1 thread.
|
|
wave_shr:1 Wavefront right shift by 1 thread.
|
|
wave_ror:1 Wavefront right rotate by 1 thread.
|
|
row_shl:{1..15} Row shift left by 1-15 threads.
|
|
row_shr:{1..15} Row shift right by 1-15 threads.
|
|
row_ror:{1..15} Row rotate right by 1-15 threads.
|
|
======================================== ================================================
|
|
|
|
Note: Numeric parameters may be specified as either
|
|
:ref:`integer numbers<amdgpu_synid_integer_number>` or
|
|
:ref:`absolute expressions<amdgpu_synid_absolute_expression>`.
|
|
|
|
Examples:
|
|
|
|
.. parsed-literal::
|
|
|
|
quad_perm:[0, 1, 2, 3]
|
|
row_shl:3
|
|
|
|
.. _amdgpu_synid_row_mask:
|
|
|
|
row_mask
|
|
~~~~~~~~
|
|
|
|
Controls which rows are enabled for data sharing. By default, all rows are enabled.
|
|
|
|
Note. The lanes of a wavefront are organized in four banks and four rows.
|
|
|
|
======================================== =====================================================
|
|
Syntax Description
|
|
======================================== =====================================================
|
|
row_mask:{0..15} Specifies a *row mask* as a positive
|
|
:ref:`integer number <amdgpu_synid_integer_number>`.
|
|
|
|
Each of 4 bits in the mask controls one
|
|
row (0 - disabled, 1 - enabled).
|
|
======================================== =====================================================
|
|
|
|
Examples:
|
|
|
|
.. parsed-literal::
|
|
|
|
row_mask:0xf
|
|
row_mask:0b1010
|
|
row_mask:0b1111
|
|
|
|
.. _amdgpu_synid_bank_mask:
|
|
|
|
bank_mask
|
|
~~~~~~~~~
|
|
|
|
Controls which banks are enabled for data sharing. By default, all banks are enabled.
|
|
|
|
Note. The lanes of a wavefront are organized in four banks and four rows.
|
|
|
|
======================================== =======================================================
|
|
Syntax Description
|
|
======================================== =======================================================
|
|
bank_mask:{0..15} Specifies a *bank mask* as a positive
|
|
:ref:`integer number <amdgpu_synid_integer_number>`.
|
|
|
|
Each of 4 bits in the mask controls one
|
|
bank (0 - disabled, 1 - enabled).
|
|
======================================== =======================================================
|
|
|
|
Examples:
|
|
|
|
.. parsed-literal::
|
|
|
|
bank_mask:0x3
|
|
bank_mask:0b0011
|
|
bank_mask:0b1111
|
|
|
|
.. _amdgpu_synid_bound_ctrl:
|
|
|
|
bound_ctrl
|
|
~~~~~~~~~~
|
|
|
|
Controls data sharing when accessing an invalid lane. By default, data sharing with
|
|
invalid lanes is disabled.
|
|
|
|
======================================== ================================================
|
|
Syntax Description
|
|
======================================== ================================================
|
|
bound_ctrl:0 Enables data sharing with invalid lanes.
|
|
|
|
Accessing data from an invalid lane will
|
|
return zero.
|
|
======================================== ================================================
|
|
|
|
VOP1/VOP2/VOPC SDWA Modifiers
|
|
-----------------------------
|
|
|
|
GFX8 and GFX9 only.
|
|
|
|
clamp
|
|
~~~~~
|
|
|
|
See a description :ref:`here<amdgpu_synid_clamp>`.
|
|
|
|
omod
|
|
~~~~
|
|
|
|
See a description :ref:`here<amdgpu_synid_omod>`.
|
|
|
|
GFX9 only.
|
|
|
|
.. _amdgpu_synid_dst_sel:
|
|
|
|
dst_sel
|
|
~~~~~~~
|
|
|
|
Selects which bits in the destination are affected. By default, all bits are affected.
|
|
|
|
======================================== ================================================
|
|
Syntax Description
|
|
======================================== ================================================
|
|
dst_sel:DWORD Use bits 31:0.
|
|
dst_sel:BYTE_0 Use bits 7:0.
|
|
dst_sel:BYTE_1 Use bits 15:8.
|
|
dst_sel:BYTE_2 Use bits 23:16.
|
|
dst_sel:BYTE_3 Use bits 31:24.
|
|
dst_sel:WORD_0 Use bits 15:0.
|
|
dst_sel:WORD_1 Use bits 31:16.
|
|
======================================== ================================================
|
|
|
|
|
|
.. _amdgpu_synid_dst_unused:
|
|
|
|
dst_unused
|
|
~~~~~~~~~~
|
|
|
|
Controls what to do with the bits in the destination which are not selected
|
|
by :ref:`dst_sel<amdgpu_synid_dst_sel>`.
|
|
By default, unused bits are preserved.
|
|
|
|
======================================== ================================================
|
|
Syntax Description
|
|
======================================== ================================================
|
|
dst_unused:UNUSED_PAD Pad with zeros.
|
|
dst_unused:UNUSED_SEXT Sign-extend upper bits, zero lower bits.
|
|
dst_unused:UNUSED_PRESERVE Preserve bits.
|
|
======================================== ================================================
|
|
|
|
.. _amdgpu_synid_src0_sel:
|
|
|
|
src0_sel
|
|
~~~~~~~~
|
|
|
|
Controls which bits in the src0 are used. By default, all bits are used.
|
|
|
|
======================================== ================================================
|
|
Syntax Description
|
|
======================================== ================================================
|
|
src0_sel:DWORD Use bits 31:0.
|
|
src0_sel:BYTE_0 Use bits 7:0.
|
|
src0_sel:BYTE_1 Use bits 15:8.
|
|
src0_sel:BYTE_2 Use bits 23:16.
|
|
src0_sel:BYTE_3 Use bits 31:24.
|
|
src0_sel:WORD_0 Use bits 15:0.
|
|
src0_sel:WORD_1 Use bits 31:16.
|
|
======================================== ================================================
|
|
|
|
.. _amdgpu_synid_src1_sel:
|
|
|
|
src1_sel
|
|
~~~~~~~~
|
|
|
|
Controls which bits in the src1 are used. By default, all bits are used.
|
|
|
|
======================================== ================================================
|
|
Syntax Description
|
|
======================================== ================================================
|
|
src1_sel:DWORD Use bits 31:0.
|
|
src1_sel:BYTE_0 Use bits 7:0.
|
|
src1_sel:BYTE_1 Use bits 15:8.
|
|
src1_sel:BYTE_2 Use bits 23:16.
|
|
src1_sel:BYTE_3 Use bits 31:24.
|
|
src1_sel:WORD_0 Use bits 15:0.
|
|
src1_sel:WORD_1 Use bits 31:16.
|
|
======================================== ================================================
|
|
|
|
.. _amdgpu_synid_sdwa_operand_modifiers:
|
|
|
|
VOP1/VOP2/VOPC SDWA Operand Modifiers
|
|
-------------------------------------
|
|
|
|
Operand modifiers are not used separately. They are applied to source operands.
|
|
|
|
GFX8 and GFX9 only.
|
|
|
|
abs
|
|
~~~
|
|
|
|
See a description :ref:`here<amdgpu_synid_abs>`.
|
|
|
|
neg
|
|
~~~
|
|
|
|
See a description :ref:`here<amdgpu_synid_neg>`.
|
|
|
|
.. _amdgpu_synid_sext:
|
|
|
|
sext
|
|
~~~~
|
|
|
|
Sign-extends value of a (sub-dword) operand to fill all 32 bits.
|
|
Has no effect for 32-bit operands.
|
|
|
|
Valid for integer operands only.
|
|
|
|
======================================== ================================================
|
|
Syntax Description
|
|
======================================== ================================================
|
|
sext(<operand>) Sign-extend operand value.
|
|
======================================== ================================================
|
|
|
|
Examples:
|
|
|
|
.. parsed-literal::
|
|
|
|
sext(v4)
|
|
sext(v255)
|
|
|
|
VOP3 Modifiers
|
|
--------------
|
|
|
|
.. _amdgpu_synid_vop3_op_sel:
|
|
|
|
vop3_op_sel
|
|
~~~~~~~~~~~
|
|
|
|
Selects the low [15:0] or high [31:16] operand bits for source and destination operands.
|
|
By default, low bits are used for all operands.
|
|
|
|
The number of values specified with the op_sel modifier must match the number of instruction
|
|
operands (both source and destination). First value controls src0, second value controls src1
|
|
and so on, except that the last value controls destination.
|
|
The value 0 selects the low bits, while 1 selects the high bits.
|
|
|
|
Note. op_sel modifier affects 16-bit operands only. For 32-bit operands the value specified
|
|
by op_sel must be 0.
|
|
|
|
GFX9 only.
|
|
|
|
======================================== ============================================================
|
|
Syntax Description
|
|
======================================== ============================================================
|
|
op_sel:[{0..1},{0..1}] Select operand bits for instructions with 1 source operand.
|
|
op_sel:[{0..1},{0..1},{0..1}] Select operand bits for instructions with 2 source operands.
|
|
op_sel:[{0..1},{0..1},{0..1},{0..1}] Select operand bits for instructions with 3 source operands.
|
|
======================================== ============================================================
|
|
|
|
Examples:
|
|
|
|
.. parsed-literal::
|
|
|
|
op_sel:[0,0]
|
|
op_sel:[0,1]
|
|
|
|
.. _amdgpu_synid_clamp:
|
|
|
|
clamp
|
|
~~~~~
|
|
|
|
Clamp meaning depends on instruction.
|
|
|
|
For *v_cmp* instructions, clamp modifier indicates that the compare signals
|
|
if a floating point exception occurs. By default, signaling is disabled.
|
|
Not supported by GFX7.
|
|
|
|
For integer operations, clamp modifier indicates that the result must be clamped
|
|
to the largest and smallest representable value. By default, there is no clamping.
|
|
Integer clamping is not supported by GFX7.
|
|
|
|
For floating point operations, clamp modifier indicates that the result must be clamped
|
|
to the range [0.0, 1.0]. By default, there is no clamping.
|
|
|
|
Note. Clamp modifier is applied after :ref:`output modifiers<amdgpu_synid_omod>` (if any).
|
|
|
|
======================================== ================================================
|
|
Syntax Description
|
|
======================================== ================================================
|
|
clamp Enables clamping (or signaling).
|
|
======================================== ================================================
|
|
|
|
.. _amdgpu_synid_omod:
|
|
|
|
omod
|
|
~~~~
|
|
|
|
Specifies if an output modifier must be applied to the result.
|
|
By default, no output modifiers are applied.
|
|
|
|
Note. Output modifiers are applied before :ref:`clamping<amdgpu_synid_clamp>` (if any).
|
|
|
|
Output modifiers are valid for f32 and f64 floating point results only.
|
|
They must not be used with f16.
|
|
|
|
Note. *v_cvt_f16_f32* is an exception. This instruction produces f16 result
|
|
but accepts output modifiers.
|
|
|
|
======================================== ================================================
|
|
Syntax Description
|
|
======================================== ================================================
|
|
mul:2 Multiply the result by 2.
|
|
mul:4 Multiply the result by 4.
|
|
div:2 Multiply the result by 0.5.
|
|
======================================== ================================================
|
|
|
|
.. _amdgpu_synid_vop3_operand_modifiers:
|
|
|
|
VOP3 Operand Modifiers
|
|
----------------------
|
|
|
|
Operand modifiers are not used separately. They are applied to source operands.
|
|
|
|
.. _amdgpu_synid_abs:
|
|
|
|
abs
|
|
~~~
|
|
|
|
Computes absolute value of its operand. Applied before :ref:`neg<amdgpu_synid_neg>` (if any).
|
|
Valid for floating point operands only.
|
|
|
|
======================================== ================================================
|
|
Syntax Description
|
|
======================================== ================================================
|
|
abs(<operand>) Get absolute value of operand.
|
|
\|<operand>| The same as above.
|
|
======================================== ================================================
|
|
|
|
Examples:
|
|
|
|
.. parsed-literal::
|
|
|
|
abs(v36)
|
|
\|v36|
|
|
|
|
.. _amdgpu_synid_neg:
|
|
|
|
neg
|
|
~~~
|
|
|
|
Computes negative value of its operand. Applied after :ref:`abs<amdgpu_synid_abs>` (if any).
|
|
Valid for floating point operands only.
|
|
|
|
======================================== ================================================
|
|
Syntax Description
|
|
======================================== ================================================
|
|
neg(<operand>) Get negative value of operand.
|
|
-<operand> The same as above.
|
|
======================================== ================================================
|
|
|
|
Examples:
|
|
|
|
.. parsed-literal::
|
|
|
|
neg(v[0])
|
|
-v4
|
|
|
|
VOP3P Modifiers
|
|
---------------
|
|
|
|
This section describes modifiers of *regular* VOP3P instructions.
|
|
|
|
*v_mad_mix_f32*, *v_mad_mixhi_f16* and *v_mad_mixlo_f16*
|
|
instructions use these modifiers :ref:`in a special manner<amdgpu_synid_mad_mix>`.
|
|
|
|
GFX9 only.
|
|
|
|
.. _amdgpu_synid_op_sel:
|
|
|
|
op_sel
|
|
~~~~~~
|
|
|
|
Selects the low [15:0] or high [31:16] operand bits as input to the operation
|
|
which results in the lower-half of the destination.
|
|
By default, low bits are used for all operands.
|
|
|
|
The number of values specified by the *op_sel* modifier must match the number of source
|
|
operands. First value controls src0, second value controls src1 and so on.
|
|
|
|
The value 0 selects the low bits, while 1 selects the high bits.
|
|
|
|
================================= =============================================================
|
|
Syntax Description
|
|
================================= =============================================================
|
|
op_sel:[{0..1}] Select operand bits for instructions with 1 source operand.
|
|
op_sel:[{0..1},{0..1}] Select operand bits for instructions with 2 source operands.
|
|
op_sel:[{0..1},{0..1},{0..1}] Select operand bits for instructions with 3 source operands.
|
|
================================= =============================================================
|
|
|
|
Examples:
|
|
|
|
.. parsed-literal::
|
|
|
|
op_sel:[0,0]
|
|
op_sel:[0,1,0]
|
|
|
|
.. _amdgpu_synid_op_sel_hi:
|
|
|
|
op_sel_hi
|
|
~~~~~~~~~
|
|
|
|
Selects the low [15:0] or high [31:16] operand bits as input to the operation
|
|
which results in the upper-half of the destination.
|
|
By default, high bits are used for all operands.
|
|
|
|
The number of values specified by the *op_sel_hi* modifier must match the number of source
|
|
operands. First value controls src0, second value controls src1 and so on.
|
|
|
|
The value 0 selects the low bits, while 1 selects the high bits.
|
|
|
|
=================================== =============================================================
|
|
Syntax Description
|
|
=================================== =============================================================
|
|
op_sel_hi:[{0..1}] Select operand bits for instructions with 1 source operand.
|
|
op_sel_hi:[{0..1},{0..1}] Select operand bits for instructions with 2 source operands.
|
|
op_sel_hi:[{0..1},{0..1},{0..1}] Select operand bits for instructions with 3 source operands.
|
|
=================================== =============================================================
|
|
|
|
Examples:
|
|
|
|
.. parsed-literal::
|
|
|
|
op_sel_hi:[0,0]
|
|
op_sel_hi:[0,0,1]
|
|
|
|
.. _amdgpu_synid_neg_lo:
|
|
|
|
neg_lo
|
|
~~~~~~
|
|
|
|
Specifies whether to change sign of operand values selected by
|
|
:ref:`op_sel<amdgpu_synid_op_sel>`. These values are then used
|
|
as input to the operation which results in the upper-half of the destination.
|
|
|
|
The number of values specified by this modifier must match the number of source
|
|
operands. First value controls src0, second value controls src1 and so on.
|
|
|
|
The value 0 indicates that the corresponding operand value is used unmodified,
|
|
the value 1 indicates that negative value of the operand must be used.
|
|
|
|
By default, operand values are used unmodified.
|
|
|
|
This modifier is valid for floating point operands only.
|
|
|
|
================================ ==================================================================
|
|
Syntax Description
|
|
================================ ==================================================================
|
|
neg_lo:[{0..1}] Select affected operands for instructions with 1 source operand.
|
|
neg_lo:[{0..1},{0..1}] Select affected operands for instructions with 2 source operands.
|
|
neg_lo:[{0..1},{0..1},{0..1}] Select affected operands for instructions with 3 source operands.
|
|
================================ ==================================================================
|
|
|
|
Examples:
|
|
|
|
.. parsed-literal::
|
|
|
|
neg_lo:[0]
|
|
neg_lo:[0,1]
|
|
|
|
.. _amdgpu_synid_neg_hi:
|
|
|
|
neg_hi
|
|
~~~~~~
|
|
|
|
Specifies whether to change sign of operand values selected by
|
|
:ref:`op_sel_hi<amdgpu_synid_op_sel_hi>`. These values are then used
|
|
as input to the operation which results in the upper-half of the destination.
|
|
|
|
The number of values specified by this modifier must match the number of source
|
|
operands. First value controls src0, second value controls src1 and so on.
|
|
|
|
The value 0 indicates that the corresponding operand value is used unmodified,
|
|
the value 1 indicates that negative value of the operand must be used.
|
|
|
|
By default, operand values are used unmodified.
|
|
|
|
This modifier is valid for floating point operands only.
|
|
|
|
=============================== ==================================================================
|
|
Syntax Description
|
|
=============================== ==================================================================
|
|
neg_hi:[{0..1}] Select affected operands for instructions with 1 source operand.
|
|
neg_hi:[{0..1},{0..1}] Select affected operands for instructions with 2 source operands.
|
|
neg_hi:[{0..1},{0..1},{0..1}] Select affected operands for instructions with 3 source operands.
|
|
=============================== ==================================================================
|
|
|
|
Examples:
|
|
|
|
.. parsed-literal::
|
|
|
|
neg_hi:[1,0]
|
|
neg_hi:[0,1,1]
|
|
|
|
clamp
|
|
~~~~~
|
|
|
|
See a description :ref:`here<amdgpu_synid_clamp>`.
|
|
|
|
.. _amdgpu_synid_mad_mix:
|
|
|
|
VOP3P V_MAD_MIX Modifiers
|
|
-------------------------
|
|
|
|
*v_mad_mix_f32*, *v_mad_mixhi_f16* and *v_mad_mixlo_f16* instructions
|
|
use *op_sel* and *op_sel_hi* modifiers
|
|
in a manner different from *regular* VOP3P instructions.
|
|
|
|
See a description below.
|
|
|
|
GFX9 only.
|
|
|
|
.. _amdgpu_synid_mad_mix_op_sel:
|
|
|
|
mad_mix_op_sel
|
|
~~~~~~~~~~~~~~
|
|
|
|
This operand has meaning only for 16-bit source operands as indicated by
|
|
:ref:`mad_mix_op_sel_hi<amdgpu_synid_mad_mix_op_sel_hi>`.
|
|
It specifies to select either the low [15:0] or high [31:16] operand bits
|
|
as input to the operation.
|
|
|
|
The number of values specified by the *op_sel* modifier must match the number of source
|
|
operands. First value controls src0, second value controls src1 and so on.
|
|
|
|
The value 0 indicates the low bits, the value 1 indicates the high 16 bits.
|
|
|
|
By default, low bits are used for all operands.
|
|
|
|
=============================== ================================================
|
|
Syntax Description
|
|
=============================== ================================================
|
|
op_sel:[{0..1},{0..1},{0..1}] Select location of each 16-bit source operand.
|
|
=============================== ================================================
|
|
|
|
Examples:
|
|
|
|
.. parsed-literal::
|
|
|
|
op_sel:[0,1]
|
|
|
|
.. _amdgpu_synid_mad_mix_op_sel_hi:
|
|
|
|
mad_mix_op_sel_hi
|
|
~~~~~~~~~~~~~~~~~
|
|
|
|
Selects the size of source operands: either 32 bits or 16 bits.
|
|
By default, 32 bits are used for all source operands.
|
|
|
|
The number of values specified by the *op_sel_hi* modifier must match the number of source
|
|
operands. First value controls src0, second value controls src1 and so on.
|
|
|
|
The value 0 indicates 32 bits, the value 1 indicates 16 bits.
|
|
|
|
The location of 16 bits in the operand may be specified by
|
|
:ref:`mad_mix_op_sel<amdgpu_synid_mad_mix_op_sel>`.
|
|
|
|
======================================== ====================================
|
|
Syntax Description
|
|
======================================== ====================================
|
|
op_sel_hi:[{0..1},{0..1},{0..1}] Select size of each source operand.
|
|
======================================== ====================================
|
|
|
|
Examples:
|
|
|
|
.. parsed-literal::
|
|
|
|
op_sel_hi:[1,1,1]
|
|
|
|
abs
|
|
~~~
|
|
|
|
See a description :ref:`here<amdgpu_synid_abs>`.
|
|
|
|
neg
|
|
~~~
|
|
|
|
See a description :ref:`here<amdgpu_synid_neg>`.
|
|
|
|
clamp
|
|
~~~~~
|
|
|
|
See a description :ref:`here<amdgpu_synid_clamp>`.
|