Depends On D105037
Avoid creating too many tasks when the number of workers is large.
Reviewed By: herhut
Differential Revision: https://reviews.llvm.org/D105126
Depends On D104999
Automatic reference counting based on the liveness analysis can add a lot of reference counting overhead at runtime. If the IR is known to be constrained to few particular "shapes", it's much more efficient to provide a custom reference counting policy that will specify where it is required to update the async value reference count.
Reviewed By: mehdi_amini
Differential Revision: https://reviews.llvm.org/D105037
This revision adds the minimal plumbing to create a simple ComprehensiveModuleBufferizePass that can behave conservatively in the presence of CallOps.
A topological sort of caller/callee is performed and, if the call-graph is cycle-free, analysis can proceed.
Differential revision: https://reviews.llvm.org/D104859
Depends On D104998
Function calls "transfer ownership" to the callee and it puts additional constraints on the reference counting optimization pass
Reviewed By: mehdi_amini
Differential Revision: https://reviews.llvm.org/D104999
Depends On D104891
Outlining scf.parallel body as a function requires async-parallel-for pass to be a ModuleOp pass
Reviewed By: bondhugula
Differential Revision: https://reviews.llvm.org/D104998
Depends On D104850
Add a test that verifies that canonicalization removes all async overheads if it is statically known that the scf.parallel operation will be computed using a single block.
Reviewed By: herhut
Differential Revision: https://reviews.llvm.org/D104891
The case where a non-dominating read can be found is captured by slightly generalizing `AliasInfo::wouldCreaateReadAfterWriteInterference`.
This simplification will make it easier to implement bufferization across function call.
APIs are also simplified were possible.
Differential revision: https://reviews.llvm.org/D104845
This patch brings support for setting runtime preemption specifiers of
LLVM's GlobalValues. In LLVM semantics, if the `dso_local` attribute
is not explicitly requested, then it is inferred based on linkage and
visibility. We model this same behavior with a UnitAttribute: if it is
present, then we explicitly request the GlobalValue to marked as
`dso_local`, otherwise we rely on the GlobalValue itself to make this
decision.
Reviewed By: ftynse
Differential Revision: https://reviews.llvm.org/D104983
Without it BufferDeallocationPass process only CloneOps created during pass itself and ignore all CloneOps that were already present in IR.
For our specific usecase:
```
func @dealloc_existing_clones(%arg0: memref<?x?xf64>, %arg1: memref<?x?xf64>) -> memref<?x?xf64> {
return %arg0 : memref<?x?xf64>
}
```
Input arguments will be freed immediately after return from function and we want to prolong lifetime for the returned argument.
To achieve this we explicitly add clones to all input memrefs and expect that BufferDeallocationPass will add correct deallocs to them (unnessesary clone+dealloc pairs will be canonicalized away later).
Differential Revision: https://reviews.llvm.org/D104973
Adapt the StructuredOp verifier to ensure all operands are either in the input or the output group. The change is possible after adding support for scalar input operands (https://reviews.llvm.org/D104220).
Differential Revision: https://reviews.llvm.org/D104783
The current code does not preserve the order of the parallel
dimensions when doing multi-reductions and thus we can end
up in scenarios where the result shape does not match the
desired shape after reduction.
This patch fixes that by ensuring that the parallel indices
are in order and then concatenates them to the reduction dimensions
so that the reduction dimensions are innermost.
Differential Revision: https://reviews.llvm.org/D104884
This revision adds detection for changes to either the mlir-lsp-server binary or the setting, and prompts the user to restart the server. Whether the user gets prompted or not is a configurable setting in the extension, and this setting may updated based on the user response to the prompt.
Differential Revision: https://reviews.llvm.org/D104501
This enables creating a replacement rule where range of positional replacements
need not be spelled out, or are not known (e.g., enable having a rewrite that
forward all operands to a call generically).
Differential Revision: https://reviews.llvm.org/D104955
Input/output types can be integers, which represent a quantized convolution.
Update verifier to expect this behavior.
Reviewed By: sjarus
Differential Revision: https://reviews.llvm.org/D104949
The executeregionop is used to allow multiple blocks within SCF constructs. If the container allows multiple blocks, inline the region
Differential Revision: https://reviews.llvm.org/D104960
MemRefDataFlow performs mem2reg style operations for affine load/stores. Unfortunately, it is not presently correct in the presence of external operations such as memref.cast, or function calls. This diff extends the functionality of the pass to remain correct in the presence of such ops.
Differential Revision: https://reviews.llvm.org/D104053
A canonicalization accidentally will remove a memref allocation if it is only stored into. However, this is incorrect if the allocation is the value being stored, not the allocation being stored into.
Differential Revision: https://reviews.llvm.org/D104947
Given a select that returns the logical negation of the condition, replace it with a not of the condition.
Differential Revision: https://reviews.llvm.org/D104966
Reduce code duplication: Move various helper functions, that are duplicated in TensorDialect, MemRefDialect, LinalgDialect, StandardDialect, into a new StaticValueUtils.cpp.
Differential Revision: https://reviews.llvm.org/D104687
Depends On D104780
Recursive work splitting instead of sequential async tasks submission gives ~20%-30% speedup in microbenchmarks.
Algorithm outline:
1. Collapse scf.parallel dimensions into a single dimension
2. Compute the block size for the parallel operations from the 1d problem size
3. Launch parallel tasks
4. Each parallel task reconstructs its own bounds in the original multi-dimensional iteration space
5. Each parallel task computes the original parallel operation body using scf.for loop nest
Reviewed By: herhut
Differential Revision: https://reviews.llvm.org/D104850
Specify the `!async.group` size (the number of tokens that will be added to it) at construction time. `async.await_all` operation can potentially race with `async.execute` operations that keep updating the group, for this reason it is required to know upfront how many tokens will be added to the group.
Reviewed By: ftynse, herhut
Differential Revision: https://reviews.llvm.org/D104780
Moves iteration lattice/merger code into new SparseTensor/Utils directory. A follow-up CL will add lattice/merger unit tests.
Reviewed By: aartbik
Differential Revision: https://reviews.llvm.org/D104757
This commit moves the type translator from LLVM to MLIR to a public header for use by external projects or other code.
Unlike a previous attempt (https://reviews.llvm.org/D104726), this patch moves the type conversion into separate files which remedies the linker error which was only caught by CI.
Differential Revision: https://reviews.llvm.org/D104834
scf::ForOp bufferization analysis proceeds just like for any other op (including FuncOp) at its boundaries; i.e. if:
1. The tensor operand is inplaceable.
2. The matching result has no subsequent read (i.e. all reads dominate the scf::ForOp).
3. In and does not create a RAW interference.
then it can bufferize inplace.
Still there are a few differences:
1. bbArgs for an scf::ForOp are always considered inplaceable when seen from ops inside the body. This is because a) either the matching tensor operand is not inplaceable and an alloc will be inserted (which makes bbArg itself inplaceable); or b) the tensor operand and bbArg are both already inplaceable.
2. Bufferization within the scf::ForOp body has implications to the outside world : the scf.yield terminator may well ping-pong values of the same type. This muddies the water for alias analysis and is not supported atm. Such cases result in a pass failure.
Differential revision: https://reviews.llvm.org/D104490
Add an index_dim annotation to specify the shape to loop mapping of shape-only tensors. A shape-only tensor serves is not accessed withing the body of the operation but is required to span the iteration space of certain operations such as pooling.
Differential Revision: https://reviews.llvm.org/D104767
This test shows how convert-linalg-to-std rewrites named linalg ops as library calls.
This can be coupled with a C++ shim to connect to existing libraries such as https://gist.github.com/nicolasvasilache/691ef992404c49dc9b5d543c4aa6db38.
Everything can then be linked together with mlir-cpu-runner and MLIR can call C++ (which can itself call MLIR if needed).
This should evolve into specific rewrite patterns that can be applied on op instances independently rather than having to use a full conversion.
Differential Revision: https://reviews.llvm.org/D104842
Extend the OpDSL with index attributes. After tensors and scalars, index attributes are the third operand type. An index attribute represents a compile-time constant that is limited to index expressions. A use cases are the strides and dilations defined by convolution and pooling operations.
The patch only updates the OpDSL. The C++ yaml codegen is updated by a followup patch.
Differential Revision: https://reviews.llvm.org/D104711