forked from OSchip/llvm-project
[AMDGPU] Correct gfx940 memory model documentation.
Differential Revision: https://reviews.llvm.org/D121397
This commit is contained in:
parent
2ebe971103
commit
3a37d08b35
|
@ -8712,12 +8712,17 @@ For GFX940:
|
|||
work-group since they execute on the same CU. The exception is when in
|
||||
tgsplit execution mode as wavefronts of the same work-group can be in
|
||||
different CUs and so a ``buffer_inv sc0`` is required which will invalidate
|
||||
the L1 cache is in tgsplit mode.
|
||||
the L1 cache.
|
||||
|
||||
* A ``buffer_inv sc1`` is required to invalidate the L1 cache for coherence
|
||||
* A ``buffer_inv sc0`` is required to invalidate the L1 cache for coherence
|
||||
between wavefronts executing in different work-groups as they may be
|
||||
executing on different CUs.
|
||||
|
||||
* Atomic read-modify-write instructions implicitly bypass the L1 cache.
|
||||
Therefore, they do not use the sc0 bit for coherence and instead use it to
|
||||
indicate if the instruction returns the original value being updated. They
|
||||
do use sc1 to indicate system or agent scope coherence.
|
||||
|
||||
* The scalar memory operations access a scalar L1 cache shared by all wavefronts
|
||||
on a group of CUs. The scalar and vector L1 caches are not coherent. However,
|
||||
scalar operations are used in a restricted way so do not impact the memory
|
||||
|
@ -8891,8 +8896,6 @@ in table :ref:`amdgpu-amdhsa-memory-model-code-sequences-gfx940-table`.
|
|||
- generic sc0=1 sc1=1
|
||||
store atomic monotonic - singlethread - global 1. buffer/global/flat_store
|
||||
- wavefront - generic
|
||||
store atomic monotonic - singlethread - global 1. buffer/global/flat_store
|
||||
- wavefront - generic
|
||||
store atomic monotonic - workgroup - global 1. buffer/global/flat_store
|
||||
- generic sc0=1
|
||||
store atomic monotonic - agent - global 1. buffer/global/flat_store
|
||||
|
@ -9639,7 +9642,7 @@ in table :ref:`amdgpu-amdhsa-memory-model-code-sequences-gfx940-table`.
|
|||
store that is being
|
||||
released.
|
||||
|
||||
3. buffer/global/flat_store sc1=1
|
||||
3. buffer/global/flat_store sc1=1
|
||||
store atomic release - system - global 1. buffer_wbl2 sc0=1 sc1=1
|
||||
- generic
|
||||
- Must happen before
|
||||
|
@ -9694,7 +9697,7 @@ in table :ref:`amdgpu-amdhsa-memory-model-code-sequences-gfx940-table`.
|
|||
store that is being
|
||||
released.
|
||||
|
||||
2. buffer/global/flat_store
|
||||
3. buffer/global/flat_store
|
||||
sc0=1 sc1=1
|
||||
atomicrmw release - singlethread - global 1. buffer/global/flat_atomic
|
||||
- wavefront - generic
|
||||
|
@ -10878,7 +10881,7 @@ in table :ref:`amdgpu-amdhsa-memory-model-code-sequences-gfx940-table`.
|
|||
------------------------------------------------------------------------------------
|
||||
load atomic seq_cst - singlethread - global *Same as corresponding
|
||||
- wavefront - local load atomic acquire,
|
||||
- generic except must generated
|
||||
- generic except must generate
|
||||
all instructions even
|
||||
for OpenCL.*
|
||||
load atomic seq_cst - workgroup - global 1. s_waitcnt lgkm/vmcnt(0)
|
||||
|
@ -10963,7 +10966,7 @@ in table :ref:`amdgpu-amdhsa-memory-model-code-sequences-gfx940-table`.
|
|||
instructions same as
|
||||
corresponding load
|
||||
atomic acquire,
|
||||
except must generated
|
||||
except must generate
|
||||
all instructions even
|
||||
for OpenCL.*
|
||||
load atomic seq_cst - workgroup - local *If TgSplit execution mode,
|
||||
|
@ -10972,7 +10975,7 @@ in table :ref:`amdgpu-amdhsa-memory-model-code-sequences-gfx940-table`.
|
|||
|
||||
*Same as corresponding
|
||||
load atomic acquire,
|
||||
except must generated
|
||||
except must generate
|
||||
all instructions even
|
||||
for OpenCL.*
|
||||
|
||||
|
@ -11066,22 +11069,22 @@ in table :ref:`amdgpu-amdhsa-memory-model-code-sequences-gfx940-table`.
|
|||
instructions same as
|
||||
corresponding load
|
||||
atomic acquire,
|
||||
except must generated
|
||||
except must generate
|
||||
all instructions even
|
||||
for OpenCL.*
|
||||
store atomic seq_cst - singlethread - global *Same as corresponding
|
||||
- wavefront - local store atomic release,
|
||||
- workgroup - generic except must generated
|
||||
- workgroup - generic except must generate
|
||||
- agent all instructions even
|
||||
- system for OpenCL.*
|
||||
atomicrmw seq_cst - singlethread - global *Same as corresponding
|
||||
- wavefront - local atomicrmw acq_rel,
|
||||
- workgroup - generic except must generated
|
||||
- workgroup - generic except must generate
|
||||
- agent all instructions even
|
||||
- system for OpenCL.*
|
||||
fence seq_cst - singlethread *none* *Same as corresponding
|
||||
- wavefront fence acq_rel,
|
||||
- workgroup except must generated
|
||||
- workgroup except must generate
|
||||
- agent all instructions even
|
||||
- system for OpenCL.*
|
||||
============ ============ ============== ========== ================================
|
||||
|
|
Loading…
Reference in New Issue