If MLIR_CUDA_RUNNER_ENABLED, register a 'gpu-to-cubin' conversion pass to mlir-opt.
The next step is to switch CUDA integration tests from mlir-cuda-runner to mlir-opt + mlir-cpu-runner and remove mlir-cuda-runner.
Depends On D98279
Reviewed By: herhut, rriddle, mehdi_amini
Differential Revision: https://reviews.llvm.org/D98203
Instead of configuring kernel-to-cubin/rocdl lowering through callbacks, introduce a base class that target-specific passes can derive from.
Put the base class in GPU/Transforms, according to the discussion in D98203.
The mlir-cuda-runner will go away shortly, and the mlir-rocdl-runner as well at some point. I therefore kept the existing code path working and will remove it in a separate step.
Depends On D98168
Reviewed By: herhut
Differential Revision: https://reviews.llvm.org/D98279
Until now, the GPU translation to NVVM or ROCDL intrinsics relied on the
presence of the generic `gpu.kernel` attribute to attach additional LLVM IR
metadata to the relevant functions. This would be problematic if each dialect
were to handle the conversion of its own options, which is the intended
direction for the translation infrastructure. Introduce `nvvm.kernel` and
`rocdl.kernel` in addition to `gpu.kernel` and base translation on these new
attributes instead.
Reviewed By: herhut
Differential Revision: https://reviews.llvm.org/D96591
The RewritePattern will become one of several, and will be part of the LLVM conversion pass (instead of a separate pass following LLVM conversion).
Reviewed By: herhut
Differential Revision: https://reviews.llvm.org/D84946
Summary:
This changes the casing of MLIRGPUtoGPURuntimeTransforms to be consistent
with other transform libraries.
Differential Revision: https://reviews.llvm.org/D82841
D80142 restructured MLIR-to-GPU-binary conversion to support multiple
targets. It also modified cmake files to link relevant LLVM components
in test/lib, which broke shared-library builds, and likely made the
conversions unusable outside mlir-opt (or other tools that link in test
library targets). Link these components to GPUCommon instead.
Differential Revision: https://reviews.llvm.org/D80739
Make ConvertKernelFuncToCubin pass to be generic:
- Rename to ConvertKernelFuncToBlob.
- Allow specifying triple, target chip, target features.
- Initializing LLVM backend is supplied by a callback function.
- Lowering process from MLIR module to LLVM module is via another callback.
- Change mlir-cuda-runner to adopt the revised pass.
- Add new tests for lowering to ROCm HSA code object (HSACO).
- Tests for CUDA and ROCm are kept in separate directories.
Differential Revision: https://reviews.llvm.org/D80142
Due to similar APIs between CUDA and ROCm (HIP),
ConvertGpuLaunchFuncToCudaCalls pass could be used on both platforms with some
refactoring.
In this commit:
- Migrate ConvertLaunchFuncToCudaCalls from GPUToCUDA to GPUCommon, and rename.
- Rename runtime wrapper APIs be platform-neutral.
- Let GPU binary annotation attribute be specifiable as a PassOption.
- Naming changes within the implementation and tests.
Subsequent patches would introduce ROCm-specific tests and runtime wrapper
APIs.
Differential Revision: https://reviews.llvm.org/D80167
This reverts commit cdb6f05e2d.
The build is broken with:
You have called ADD_LIBRARY for library obj.MLIRGPUtoCUDATransforms without any source files. This typically indicates a problem with your CMakeLists.txt file
Due to similar APIs between CUDA and ROCm (HIP),
ConvertGpuLaunchFuncToCudaCalls pass could be used on both platforms with some
refactoring.
In this commit:
- Migrate ConvertLaunchFuncToCudaCalls from GPUToCUDA to GPUCommon, and rename.
- Rename runtime wrapper APIs be platform-neutral.
- Let GPU binary annotation attribute be specifiable as a PassOption.
- Naming changes within the implementation and tests.
Subsequent patches would introduce ROCm-specific tests and runtime wrapper
APIs.
Differential Revision: https://reviews.llvm.org/D80167