Make ConvertKernelFuncToCubin pass to be generic:
- Rename to ConvertKernelFuncToBlob.
- Allow specifying triple, target chip, target features.
- Initializing LLVM backend is supplied by a callback function.
- Lowering process from MLIR module to LLVM module is via another callback.
- Change mlir-cuda-runner to adopt the revised pass.
- Add new tests for lowering to ROCm HSA code object (HSACO).
- Tests for CUDA and ROCm are kept in separate directories.
Differential Revision: https://reviews.llvm.org/D80142
Summary:
Add DynamicMemRefType which can reference one of the statically ranked StridedMemRefType or a UnrankedMemRefType so that runner utils only need to be implemented once.
There is definitely room for more clean up and unification, but I will keep that for follow-ups.
Reviewers: nicolasvasilache
Reviewed By: nicolasvasilache
Subscribers: mehdi_amini, rriddle, jpienaar, shauheen, antiagainst, nicolasvasilache, arpith-jacob, mgester, lucyrfox, liufengdb, stephenneuendorffer, Joonsoo, grosul1, frgossen, Kayjukh, jurahul, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D80513
Due to similar APIs between CUDA and ROCm (HIP),
ConvertGpuLaunchFuncToCudaCalls pass could be used on both platforms with some
refactoring.
In this commit:
- Migrate ConvertLaunchFuncToCudaCalls from GPUToCUDA to GPUCommon, and rename.
- Rename runtime wrapper APIs be platform-neutral.
- Let GPU binary annotation attribute be specifiable as a PassOption.
- Naming changes within the implementation and tests.
Subsequent patches would introduce ROCm-specific tests and runtime wrapper
APIs.
Differential Revision: https://reviews.llvm.org/D80167
This reverts commit cdb6f05e2d.
The build is broken with:
You have called ADD_LIBRARY for library obj.MLIRGPUtoCUDATransforms without any source files. This typically indicates a problem with your CMakeLists.txt file
Due to similar APIs between CUDA and ROCm (HIP),
ConvertGpuLaunchFuncToCudaCalls pass could be used on both platforms with some
refactoring.
In this commit:
- Migrate ConvertLaunchFuncToCudaCalls from GPUToCUDA to GPUCommon, and rename.
- Rename runtime wrapper APIs be platform-neutral.
- Let GPU binary annotation attribute be specifiable as a PassOption.
- Naming changes within the implementation and tests.
Subsequent patches would introduce ROCm-specific tests and runtime wrapper
APIs.
Differential Revision: https://reviews.llvm.org/D80167
The JitRunner library is logically very close to the execution engine,
and shares similar dependencies.
find -name "*.cpp" -exec sed -i "s/Support\/JitRunner/ExecutionEngine\/JitRunner/" "{}" \;
Differential Revision: https://reviews.llvm.org/D79899
add_llvm_library and add_llvm_executable may need to create new targets with
appropriate dependencies. As a result, it is not sufficient in some
configurations (namely LLVM_BUILD_LLVM_DYLIB=on) to only call
add_dependencies(). Instead, the explicit TableGen dependencies must
be passed to add_llvm_library() or add_llvm_executable() using the DEPENDS
keyword.
Differential Revision: https://reviews.llvm.org/D74930
CMake allows calling target_link_libraries() without a keyword,
but this usage is not preferred when also called with a keyword,
and has surprising behavior. This patch explicitly specifies a
keyword when using target_link_libraries().
Differential Revision: https://reviews.llvm.org/D75725
add_llvm_library and add_llvm_executable may need to create new targets with
appropriate dependencies. As a result, it is not sufficient in some
configurations (namely LLVM_BUILD_LLVM_DYLIB=on) to only call
add_dependencies(). Instead, the explicit TableGen dependencies must
be passed to add_llvm_library() or add_llvm_executable() using the DEPENDS
keyword.
Differential Revision: https://reviews.llvm.org/D74930
When compiling libLLVM.so, add_llvm_library() manipulates the link libraries
being used. This means that when using add_llvm_library(), we need to pass
the list of libraries to be linked (using the LINK_LIBS keyword) instead of
using the standard target_link_libraries call. This is preparation for
properly dealing with creating libMLIR.so as well.
Differential Revision: https://reviews.llvm.org/D74864
add_llvm_library and add_llvm_executable may need to create new targets with
appropriate dependencies. As a result, it is not sufficient in some
configurations (namely LLVM_BUILD_LLVM_DYLIB=on) to only call
add_dependencies(). Instead, the explicit TableGen dependencies must
be passed to add_llvm_library() or add_llvm_executable() using the DEPENDS
keyword.
Differential Revision: https://reviews.llvm.org/D74930
When compiling libLLVM.so, add_llvm_library() manipulates the link libraries
being used. This means that when using add_llvm_library(), we need to pass
the list of libraries to be linked (using the LINK_LIBS keyword) instead of
using the standard target_link_libraries call. This is preparation for
properly dealing with creating libMLIR.so as well.
Differential Revision: https://reviews.llvm.org/D74864
Previously, lib/Support/JitRunner.cpp was essentially a complete application,
performing all library initialization, along with dealing with command line
arguments and actually running passes. This differs significantly from
mlir-opt and required a dependency on InitAllDialects.h. This dependency
is significant, since it requires a dependency on all of the resulting
libraries.
This patch refactors the code so that tools are responsible for library
initialization, including registering all dialects, prior to calling
JitRunnerMain. This places the concern about what dialect to support
with the end application, enabling more extensibility at the cost of
a small amount of code duplication between tools. It also fixes
BUILD_SHARED_LIBS=on.
Differential Revision: https://reviews.llvm.org/D75272
Collect a list of conversion libraries in cmake, so we don't have to
list these explicitly in most binaries.
Differential Revision: https://reviews.llvm.org/D75222
Instead of creating extra libraries we don't really need, collect a
list of all dialects and use that instead.
Differential Revision: https://reviews.llvm.org/D75221
Summary:
This was broken recently when moving from dialect registration via
static initializers to explicit intialization.
Differential Revision: https://reviews.llvm.org/D74480
The existing (default) calling convention for memrefs in standard-to-LLVM
conversion was motivated by interfacing with LLVM IR produced from C sources.
In particular, it passes a pointer to the memref descriptor structure when
calling the function. Therefore, the descriptor is allocated on stack before
the call. This convention leads to several problems. PR44644 indicates a
problem with stack exhaustion when calling functions with memref-typed
arguments in a loop. Allocating outside of the loop may lead to concurrent
access problems in case the loop is parallel. When targeting GPUs, the contents
of the stack-allocated memory for the descriptor (passed by pointer) needs to
be explicitly copied to the device. Using an aggregate type makes it impossible
to attach pointer-specific argument attributes pertaining to alignment and
aliasing in the LLVM dialect.
Change the default calling convention for memrefs in standard-to-LLVM
conversion to transform a memref into a list of arguments, each of primitive
type, that are comprised in the memref descriptor. This avoids stack allocation
for ranked memrefs (and thus stack exhaustion and potential concurrent access
problems) and simplifies the device function invocation on GPUs.
Provide an option in the standard-to-LLVM conversion to generate auxiliary
wrapper function with the same interface as the previous calling convention,
compatible with LLVM IR porduced from C sources. These auxiliary functions
pack the individual values into a descriptor structure or unpack it. They also
handle descriptor stack allocation if necessary, serving as an allocation
scope: the memory reserved by `alloca` will be freed on exiting the auxiliary
function.
The effect of this change on MLIR-generated only LLVM IR is minimal. When
interfacing MLIR-generated LLVM IR with C-generated LLVM IR, the integration
only needs to require auxiliary functions and change the function name to call
the wrapper function instead of the original function.
This also opens the door to forwarding aliasing and alignment information from
memrefs to LLVM IR pointers in the standrd-to-LLVM conversion.
Summary:
This revision adds basic support for emitting line table information when exporting to LLVMIR. We don't yet have a story for supporting all of the LLVM debug metadata, so this revision stubs some features(like subprograms) to enable emitting line tables.
Differential Revision: https://reviews.llvm.org/D73934
This binplaces `mlir-translate`, `mlir-cuda-runner`, and `mlir-cpu-runner` when building the CMake install target.
Differential Revision: https://reviews.llvm.org/D73986
Summary:
This is based on the use of code constantly checking for an attribute on
a model and instead represents the distinct operaion with a different
op. Instead, this op can be used to provide better filtering.
Reverts "Revert "[mlir] Create a gpu.module operation for the GPU Dialect.""
This reverts commit ac446302ca4145cdc89f377c0c364c29ee303be5 after
fixing internal Google issues.
This additionally updates ROCDL lowering to use the new gpu.module.
Reviewers: herhut, mravishankar, antiagainst, nicolasvasilache
Subscribers: jholewinski, mgorny, mehdi_amini, jpienaar, burmako, shauheen, csigg, arpith-jacob, mgester, lucyrfox, aartbik, liufengdb, llvm-commits, mravishankar, rriddle, antiagainst, bkramer
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D72921
Summary:
This is based on the use of code constantly checking for an attribute on
a model and instead represents the distinct operaion with a different
op. Instead, this op can be used to provide better filtering.
Reviewers: herhut, mravishankar, antiagainst, rriddle
Reviewed By: herhut, antiagainst, rriddle
Subscribers: liufengdb, aartbik, jholewinski, mgorny, mehdi_amini, rriddle, jpienaar, burmako, shauheen, antiagainst, nicolasvasilache, csigg, arpith-jacob, mgester, lucyrfox, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D72336
This CL adds an extra pointer to the memref descriptor to allow specifying alignment.
In a previous implementation, we used 2 types: `linalg.buffer` and `view` where the buffer type was the unit of allocation/deallocation/alignment and `view` was the unit of indexing.
After multiple discussions it was decided to use a single type, which conflates both, so the memref descriptor now needs to carry both pointers.
This is consistent with the [RFC-Proposed Changes to MemRef and Tensor MLIR Types](https://groups.google.com/a/tensorflow.org/forum/#!searchin/mlir/std.view%7Csort:date/mlir/-wKHANzDNTg/4K6nUAp8AAAJ).
PiperOrigin-RevId: 279959463
Originally, we were attaching attributes containing CUBIN blobs to the kernel
function called by `gpu.launch_func`. This kernel is now contained in a nested
module that is used as a compilation unit. Attach compiled CUBIN blobs to the
module rather than to the function since we were compiling the module. This
also avoids duplication of the attribute on multiple kernels within the same
module.
PiperOrigin-RevId: 273497303
Now that the accessor function is a trivial getter of the global variable, it
makes less sense to have the getter generation as a separate pass. Move the
getter generation into the lowering of `gpu.launch_func` to CUDA calls. This
change is mostly code motion, but the process can be simplified further by
generating the addressof inplace instead of using a call. This is will be done
in a follow-up.
PiperOrigin-RevId: 273492517
This makes the name of the conversion pass more consistent with the naming
scheme, since it actually converts from the Loop dialect to the Standard
dialect rather than working with arbitrary control flow operations.
PiperOrigin-RevId: 272612112
This CL finishes the implementation of the lowering part of the [strided memref RFC](https://groups.google.com/a/tensorflow.org/forum/#!topic/mlir/MaL8m2nXuio).
Strided memrefs correspond conceptually to the following templated C++ struct:
```
template <typename Elem, size_t Rank>
struct {
Elem *ptr;
int64_t offset;
int64_t sizes[Rank];
int64_t strides[Rank];
};
```
The linearization procedure for address calculation for strided memrefs is the same as for linalg views:
`base_offset + SUM_i index_i * stride_i`.
The following CL will unify Linalg and Standard by removing !linalg.view in favor of strided memrefs.
PiperOrigin-RevId: 272033399
The strided MemRef RFC discusses a normalized descriptor and interaction with library calls (https://groups.google.com/a/tensorflow.org/forum/#!topic/mlir/MaL8m2nXuio).
Lowering of nested LLVM structs as value types does not play nicely with externally compiled C/C++ functions due to ABI issues.
Solving the ABI problem generally is a very complex problem and most likely involves taking
a dependence on clang that we do not want atm.
A simple workaround is to pass pointers to memref descriptors at function boundaries, which this CL implement.
PiperOrigin-RevId: 271591708
Roll forward of commit 5684a12.
When outlining GPU kernels, put the kernel function inside a nested module. Then use a nested pipeline to generate the cubins, independently per kernel. In a final pass, move the cubins back to the parent module.
PiperOrigin-RevId: 270639748
When outlining GPU kernels, put the kernel function inside a nested module. Then use a nested pipeline to generate the cubins, independently per kernel. In a final pass, move the cubins back to the parent module.
PiperOrigin-RevId: 269987720
This change generalizes the structure of the pass manager to allow arbitrary nesting pass managers for other operations, at any level. The only user visible change to existing code is the fact that a PassManager must now provide an MLIRContext on construction. A new class `OpPassManager` has been added that represents a pass manager on a specific operation type. `PassManager` will remain the top-level entry point into the pipeline, with OpPassManagers being nested underneath. OpPassManagers will still be implicitly nested if the operation type on the pass differs from the pass manager. To explicitly build a pipeline, the 'nest' methods on OpPassManager may be used:
// Pass manager for the top-level module.
PassManager pm(ctx);
// Nest a pipeline operating on FuncOp.
OpPassManager &fpm = pm.nest<FuncOp>();
fpm.addPass(...);
// Nest a pipeline under the FuncOp pipeline that operates on spirv::ModuleOp
OpPassManager &spvModulePM = pm.nest<spirv::ModuleOp>();
// Nest a pipeline on FuncOps inside of the spirv::ModuleOp.
OpPassManager &spvFuncPM = spvModulePM.nest<FuncOp>();
To help accomplish this a new general OperationPass is added that operates on opaque Operations. This pass can be inserted in a pass manager of any type to operate on any operation opaquely. An example of this opaque OperationPass is a VerifierPass, that simply runs the verifier opaquely on the current operation.
/// Pass to verify an operation and signal failure if necessary.
class VerifierPass : public OperationPass<VerifierPass> {
void runOnOperation() override {
Operation *op = getOperation();
if (failed(verify(op)))
signalPassFailure();
markAllAnalysesPreserved();
}
};
PiperOrigin-RevId: 266840344
Instead of lowering the program in two steps (Standard->LLVM followed
by GPU->NVVM), leading to invalid IR inbetween, the runner now uses
one pattern based rewrite step to go directly from Standard+GPU to
LLVM+NVVM.
PiperOrigin-RevId: 265861934
Switch to C++14 standard method as llvm::make_unique has been removed (
https://reviews.llvm.org/D66259). Also mark some targets as c++14 to ease next
integrates.
PiperOrigin-RevId: 263953918
This allows for proper forward declaration, as opposed to leaking the internal implementation via a using directive. This also allows for all pattern building to go through 'insert' methods on the OwningRewritePatternList, replacing uses of 'push_back' and 'RewriteListBuilder'.
PiperOrigin-RevId: 261816316