The commit in question changed the syntax but did not update the runner
tests. This also required registering the MemRef dialect for custom
parser to work correctly.
Summary:
Use a nested symbol to identify the kernel to be invoked by a `LaunchFuncOp` in the GPU dialect.
This replaces the two attributes that were used to identify the kernel module and the kernel within seperately.
Differential Revision: https://reviews.llvm.org/D78551
Summary:
Use the shortcu `kernel` for the `gpu.kernel` attribute of `gpu.func`.
The parser supports this and test cases are easier to read.
Differential Revision: https://reviews.llvm.org/D78542
Summary:
The test performs add on vector<16384xf32> with
number of workgroups = (128, 1, 1)
local workgroup size = (128, 1, 1)
On a NVIDIA Quadro P1000, I see the following results:
Command buffer submit time: 13us
Compute shader execution time: 19.616us
Differential Revision: https://reviews.llvm.org/D76799