Stuart Adams
|
02c2468864
|
[Clang][NVPTX] Add NVPTX intrinsics and builtins for CUDA PTX cp.async instructions
Adds NVPTX builtins and intrinsics for the CUDA PTX `cp.async` instructions for
`sm_80` architecture or newer.
PTX ISA description of `cp.async`:
https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#data-movement-and-conversion-instructions-asynchronous-copy
https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#parallel-synchronization-and-communication-instructions-cp-async-mbarrier-arrive
Authored-by: Stuart Adams <stuart.adams@codeplay.com>
Co-Authored-by: Alexander Johnston <alexander@codeplay.com>
Differential Revision: https://reviews.llvm.org/D100394
|
2021-05-17 09:46:59 -07:00 |