forked from OSchip/llvm-project
875eb523c1
Add warp synchronous matrix-multiply accumulate ops in GPU and NVVM dialect. Add following three ops to GPU dialect :- 1.) subgroup_mma_load_matrix 2.) subgroup_mma_store_matrix 3.) subgroup_mma_compute Add following three ops to NVVM dialect :- 1.) wmma.m16n16k16.load.[a,b,c].[f16,f32].row.stride 2.) wmma.m16n16k16.store.d.[f16,f32].row.stride 3.) wmma.m16n16k16.mma.row.row.[f16,f32].[f16,f32] Reviewed By: bondhugula, ftynse, ThomasRaoux Differential Revision: https://reviews.llvm.org/D95330 |
||
---|---|---|
.. | ||
amx.mlir | ||
arm-neon.mlir | ||
arm-sve.mlir | ||
import.ll | ||
llvmir-debug.mlir | ||
llvmir-intrinsics.mlir | ||
llvmir-invalid.mlir | ||
llvmir-types.mlir | ||
llvmir.mlir | ||
nvvmir.mlir | ||
openmp-llvm.mlir | ||
rocdl.mlir | ||
vector-to-llvm-ir.mlir | ||
x86vector.mlir |