Define a new dialect related to GPU kernels. Currently, it only contains a
single operation for launching a kernel on a three-dimensional grid of thread
blocks, following a model similar to that of CUDA. In particular, the body of
the kernel contains operations executed by each thread and uses region
arguments to accept thread and block identifiers (similar to how the loop body
region accepts the induction value).
--
PiperOrigin-RevId: 245713728