There is no need for the interface implementations to be exposed, opaque
registration functions are sufficient for all users, similarly to passes.
Reviewed By: mehdi_amini
Differential Revision: https://reviews.llvm.org/D97852
Rationale:
Providing the wrong number of sparse/dense annotations was silently
ignored or caused unrelated crashes. This minor change verifies that
the provided number matches the rank.
Reviewed By: bixia
Differential Revision: https://reviews.llvm.org/D97034
Rationale:
Narrower types for overhead storage yield a smaller memory footprint for
sparse tensors and thus needs to be supported. Also, more value types
need to be supported to deal with all kinds of kernels. Since the
"one-size-fits-all" sparse storage scheme implementation is used
instead of actual codegen, the library needs to be able to support
all combinations of desired types. With some crafty templating and
overloading, the actual code for this is kept reasonably sized though.
Reviewed By: bixia
Differential Revision: https://reviews.llvm.org/D96819
A series of preceding patches changed the mechanism for translating MLIR to
LLVM IR to use dialect interface with delayed registration. It is no longer
necessary for specific dialects to derive from ModuleTranslation. Remove all
virtual methods from ModuleTranslation and factor out the entry point to be a
free function.
Also perform some cleanups in ModuleTranslation internals.
Depends On D96774
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D96775
The patch extends the runner utils by verification methods that compare two memrefs. The methods compare the content of the two memrefs and print success if the data is identical up to a small numerical error. The methods are meant to simplify the development of integration tests that compare the results against a reference implementation (cf. the updates to the linalg matmul integration tests).
Originally landed in 5fa893c (https://reviews.llvm.org/D96326) and reverted in dd719fd due to a Windows build failure.
Changes:
- Remove the max function that requires the "algorithm" header on Windows
- Eliminate the truncation warning in the float specialization of verifyElem by using a float constant
Reviewed By: Kayjukh
Differential Revision: https://reviews.llvm.org/D96593
Historically, JitRunner has been registering all available dialects with the
context and depending on them without the real need. Make it take a registry
that contains only the dialects that are expected in the input and stop linking
in all dialects.
Reviewed By: mehdi_amini
Differential Revision: https://reviews.llvm.org/D96436
This revision connects the generated sparse code with an actual
sparse storage scheme, which can be initialized from a test file.
Lacking a first-class citizen SparseTensor type (with buffer),
the storage is hidden behind an opaque pointer with some "glue"
to bring the pointer back to tensor land. Rather than generating
sparse setup code for each different annotated tensor (viz. the
"pack" methods in TACO), a single "one-size-fits-all" implementation
has been added to the runtime support library. Many details and
abstractions need to be refined in the future, but this revision
allows full end-to-end integration testing and performance
benchmarking (with on one end, an annotated Lingalg
op and, on the other end, a JIT/AOT executable).
Reviewed By: nicolasvasilache, bixia
Differential Revision: https://reviews.llvm.org/D95847
MLIRContext allows its users to access directly to the DialectRegistry it
contains. While sometimes useful for registering additional dialects on an
already existing context, this breaks the encapsulation by essentially giving
raw accesses to a part of the context's internal state. Remove this mutable
access and instead provide a method to append a given DialectRegistry to the
one already contained in the context. Also provide a shortcut mechanism to
construct a context from an already existing registry, which seems to be a
common use case in the wild. Keep read-only access to the registry contained in
the context in case it needs to be copied or used for constructing another
context.
With this change, DialectRegistry is no longer concerned with loading the
dialects and deciding whether to invoke delayed interface registration. Loading
is concentrated in the MLIRContext, and the functionality of the registry
better reflects its name.
Depends On D96137
Reviewed By: mehdi_amini
Differential Revision: https://reviews.llvm.org/D96331
The patch extends the runner utils by verification methods that compare two memrefs. The methods compare the content of the two memrefs and print success if the data is identical up to a small numerical error. The methods are meant to simplify the development of integration tests that for example compare optimized and unoptimized code paths (cf. the updates to the linalg matmul integration tests).
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D96326
This new invoke will pack a list of argument before calling the
`invokePacked` method. It accepts returned value as output argument
wrapped in `ExecutionEngine::Result<T>`, and delegate the packing of
arguments to a trait to allow for customization for some types.
Reviewed By: ftynse
Differential Revision: https://reviews.llvm.org/D95961
The AsyncRuntime declares prototypes for extern "C" functions inside a
namespace in the header, but not inside that namespace in the
definition. This causes Visual Studio to treat them as different
entities and thus the dllexport is ignored for the definitions.
Using the same namespace fixes this issue.
Secondly, this commit moves the dllexport to be consistent with the
JITs expectation.
This is an update to https://reviews.llvm.org/D95386 that fixes the
compile issues in old versions of Visual studio.
Differential Revision: https://reviews.llvm.org/D95933
The MLIR Async runtime uses different namespacing for the header file,
and the definitions of its C API. The header file places the extern "C"
functions inside namespace mlir::runtime, and the definitions are not
in a namespace. This causes issues in cl.exe. It treats the declaration
and definition as different, and thus does not apply dllexport to the
definition, which leads to the mlir_async_runtime.dll containing no
definitions, and the mlir_async_runtime.lib not being generated.
This patch moves the namespace to cover the definitions, and thus
generates the dll correctly on Windows with cl.exe.
This was tested with Visual Studio C++ 19.28.29336.
Differential Revision: https://reviews.llvm.org/D95386
The library is not actually static when BUILD_SHARED_LIBS is on, and tests need to explicitly load it already. Also, the shared objects it was linked to did not use any symbols from it and it was therefore never linked to it.
Reviewed By: ftynse
Differential Revision: https://reviews.llvm.org/D95612
`emplace???` functions running concurrently can set the ready flag and then pending awaiter will never be executed
Differential Revision: https://reviews.llvm.org/D95517
Rationale:
Since I made the argument that metadata helps with extra
verification checks, I better actually do that ;-)
Reviewed By: penpornk
Differential Revision: https://reviews.llvm.org/D95072
Resumed coroutine potentially can deallocate the token/value/group and destroy the mutex before the std::unique_ptr destructor.
Reviewed By: mehdi_amini
Differential Revision: https://reviews.llvm.org/D95037
The runtime-wrappers depend on LLVMSupport, pulling in static initialization code (e.g. command line arguments). Dynamically loading multiple such libraries results in ODR violoations.
So far this has not been an issue, but in D94421, I would like to load both the async-runtime and the cuda-runtime-wrappers as part of a cuda-runner integration test. When doing this, code that asserts that an option category is only registered once fails (note that I've only experienced this in Google's bazel where the async-runtime depends on LLVMSupport, but a similar issue would happen in cmake if more than one runtime-wrapper starts to depend on LLVMSupport).
The underlying issue is that we have a mix of static and dynamic linking. If all dependencies were loaded as shared objects (i.e. if LLVMSupport was linked dynamically to the runtime wrappers), each dependency would only get loaded once. However, linking dependencies dynamically would require special attention to paths (one could dynamically load the dependencies first given explicit paths). The simpler approach seems to be to link all dependencies statically into a single shared object.
This change basically applies the same logic that we have in the c_runner_utils: we have a shared object target that can be loaded dynamically, and we have a static library target that can be linked to other runtime-wrapper shared object targets.
Reviewed By: herhut
Differential Revision: https://reviews.llvm.org/D94399
Added the ability to read (an extended version of) the FROSTT
file format, so that we can now read in sparse tensors of arbitrary
rank. Generalized the API to deal with more than two dimensions.
Also added the ability to sort the indices of sparse tensors
lexicographically. This is an important step towards supporting
auto gen of initialization code, since sparse storage formats
are easier to initialize if the indices are sorted. Since most
external formats don't enforce such properties, it is convenient
to have this ability in our runtime support library.
Lastly, the re-entrant problem of the original implementation
is fixed by passing an opaque object around (rather than having
a single static variable, ugh!).
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D94852
Use custom mlir runner init/destroy functions to safely init and destroy shared libraries loaded by the JitRunner.
This mechanism is ignored for Windows builds (for now) because init/destroy functions are not exported, and library unloading relies on static destructors.
Re-submit https://reviews.llvm.org/D94270 with a temporary workaround for windows
Differential Revision: https://reviews.llvm.org/D94312
Continue the convergence between LLVM dialect and built-in types by replacing
the bfloat, half, float and double LLVM dialect types with their built-in
counterparts. At the API level, this is a direct replacement. At the syntax
level, we change the keywords to `bf16`, `f16`, `f32` and `f64`, respectively,
to be compatible with the built-in type syntax. The old keywords can still be
parsed but produce a deprecation warning and will be eventually removed.
Depends On D94178
Reviewed By: mehdi_amini, silvas, antiagainst
Differential Revision: https://reviews.llvm.org/D94179
Use custom mlir runner init/destroy functions to safely init and destroy shared libraries loaded by the JitRunner.
Reviewed By: mehdi_amini
Differential Revision: https://reviews.llvm.org/D94270
The LLVM dialect type system has been closed until now, i.e. did not support
types from other dialects inside containers. While this has had obvious
benefits of deriving from a common base class, it has led to some simple types
being almost identical with the built-in types, namely integer and floating
point types. This in turn has led to a lot of larger-scale complexity: simple
types must still be converted, numerous operations that correspond to LLVM IR
intrinsics are replicated to produce versions operating on either LLVM dialect
or built-in types leading to quasi-duplicate dialects, lowering to the LLVM
dialect is essentially required to be one-shot because of type conversion, etc.
In this light, it is reasonable to trade off some local complexity in the
internal implementation of LLVM dialect types for removing larger-scale system
complexity. Previous commits to the LLVM dialect type system have adapted the
API to support types from other dialects.
Replace LLVMIntegerType with the built-in IntegerType plus additional checks
that such types are signless (these are isolated in a utility function that
replaced `isa<LLVMType>` and in the parser). Temporarily keep the possibility
to parse `!llvm.i32` as a synonym for `i32`, but add a deprecation notice.
Reviewed By: mehdi_amini, silvas, antiagainst
Differential Revision: https://reviews.llvm.org/D94178
1. Add new methods to Async runtime API to support yielding async values
2. Add lowering from `async.yield` with value payload to the new runtime API calls
`async.value` lowering requires that payload type is convertible to LLVM and supported by `llvm.mlir.cast` (DialectCast) operation.
Reviewed By: csigg
Differential Revision: https://reviews.llvm.org/D93592
LLVMType contains multiple instance methods that were introduced initially for
compatibility with LLVM API. These methods boil down to `cast` followed by
type-specific call. Arguably, they are mostly used in an LLVM cast-follows-isa
anti-pattern. This doesn't connect nicely to the rest of the MLIR
infrastructure and actively prevents it from making the LLVM dialect type
system more open, e.g., reusing built-in types when appropriate. Remove such
instance methods and replaces their uses with apporpriate casts and methods on
derived classes. In some cases, the result may look slightly more verbose, but
most cases should actually use a stricter subtype of LLVMType anyway and avoid
the isa/cast.
Reviewed By: mehdi_amini
Differential Revision: https://reviews.llvm.org/D93680
Define Async runtime related typedefs in the `mlir::runtime` namespace.
Reviewed By: mehdi_amini
Differential Revision: https://reviews.llvm.org/D93391
This is part of a larger refactoring the better congregates the builtin structures under the BuiltinDialect. This also removes the problematic "standard" naming that clashes with the "standard" dialect, which is not defined within IR/. A temporary forward is placed in StandardTypes.h to allow time for downstream users to replaced references.
Differential Revision: https://reviews.llvm.org/D92435
ExecutionEngine/LLJIT do not run globals destructors in loaded dynamic libraries when destroyed, and threads managed by ThreadPool can race with program termination, and it leads to segfaults.
TODO: Re-enable threading after fixing a problem with destructors, or removing static globals from dynamic library.
Differential Revision: https://reviews.llvm.org/D92368
1. Move ThreadPool ownership to the runtime, and wait for the async tasks completion in the destructor.
2. Remove MLIR_ASYNCRUNTIME_EXPORT from method definitions because they are unnecessary in .cpp files, as only function declarations need to be exported, not their definitions.
3. Fix concurrency bugs in group emplace and potential use-after-free in token emplace.
Tested internally 10k runs in `async.mlir` and `async-group.mlir`.
Fixed: https://bugs.llvm.org/show_bug.cgi?id=48267
Reviewed By: mehdi_amini
Differential Revision: https://reviews.llvm.org/D91988
All these potential null pointer dereferences are reported by my static analyzer for null smart pointer dereferences, which has a different implementation from `alpha.cplusplus.SmartPtr`.
The checked pointers in this patch are initialized by Target::createXXX functions. When the creator function pointer is not correctly set, a null pointer will be returned, or the creator function may originally return a null pointer.
Some of them may not make sense as they may be checked before entering the function, but I fixed them all in this patch. I submit this fix because 1) similar checks are found in some other places in the LLVM codebase for the same return value of the function; and, 2) some of the pointers are dereferenced before they are checked, which may definitely trigger a null pointer dereference if the return value is nullptr.
Reviewed By: tejohnson, MaskRay, jpienaar
Differential Revision: https://reviews.llvm.org/D91410
Depends On D89963
**Automatic reference counting algorithm outline:**
1. `ReturnLike` operations forward the reference counted values without
modifying the reference count.
2. Use liveness analysis to find blocks in the CFG where the lifetime of
reference counted values ends, and insert `drop_ref` operations after
the last use of the value.
3. Insert `add_ref` before the `async.execute` operation capturing the
value, and pairing `drop_ref` before the async body region terminator,
to release the captured reference counted value when execution
completes.
4. If the reference counted value is passed only to some of the block
successors, insert `drop_ref` operations in the beginning of the blocks
that do not have reference coutned value uses.
Reviewed By: silvas
Differential Revision: https://reviews.llvm.org/D90716
These includes have been deprecated in favor of BuiltinDialect.h, which contains the definitions of ModuleOp and FuncOp.
Differential Revision: https://reviews.llvm.org/D91572
Depends On D89958
1. Adds `async.group`/`async.awaitall` to group together multiple async tokens/values
2. Rewrite scf.parallel operation into multiple concurrent async.execute operations over non overlapping subranges of the original loop.
Example:
```
scf.for (%i, %j) = (%lbi, %lbj) to (%ubi, %ubj) step (%si, %sj) {
"do_some_compute"(%i, %j): () -> ()
}
```
Converted to:
```
%c0 = constant 0 : index
%c1 = constant 1 : index
// Compute blocks sizes for each induction variable.
%num_blocks_i = ... : index
%num_blocks_j = ... : index
%block_size_i = ... : index
%block_size_j = ... : index
// Create an async group to track async execute ops.
%group = async.create_group
scf.for %bi = %c0 to %num_blocks_i step %c1 {
%block_start_i = ... : index
%block_end_i = ... : index
scf.for %bj = %c0 t0 %num_blocks_j step %c1 {
%block_start_j = ... : index
%block_end_j = ... : index
// Execute the body of original parallel operation for the current
// block.
%token = async.execute {
scf.for %i = %block_start_i to %block_end_i step %si {
scf.for %j = %block_start_j to %block_end_j step %sj {
"do_some_compute"(%i, %j): () -> ()
}
}
}
// Add produced async token to the group.
async.add_to_group %token, %group
}
}
// Await completion of all async.execute operations.
async.await_all %group
```
In this example outer loop launches inner block level loops as separate async
execute operations which will be executed concurrently.
At the end it waits for the completiom of all async execute operations.
Reviewed By: ftynse, mehdi_amini
Differential Revision: https://reviews.llvm.org/D89963
Exposing the C versions of the methods of the sparse runtime support lib
through header files will enable using the same methods in an MLIR program
as well as a C++ program, which will simplify future benchmarking comparisons
(e.g. comparing MLIR generated code with eigen for Matrix Market sparse matrices).
Reviewed By: penpornk
Differential Revision: https://reviews.llvm.org/D91316