- Generic visitors invoke operation callbacks before/in-between/after visiting the regions
attached to an operation and use a `WalkStage` to indicate which regions have been
visited.
- This can be useful for cases where we need to visit the operation in between visiting
regions attached to the operation.
Differential Revision: https://reviews.llvm.org/D116230
Clean up return value on affineDataCopyGenerate utility. Return the
actual success/failure status instead of the "number of bytes" which
isn't being used in the codebase in any way. The success/failure status
wasn't being sent out earlier.
Differential Revision: https://reviews.llvm.org/D117209
By default, copies are inserted right before the tensor OpOperand use. With this change, `bufferize` implementation can change the insertion point. This is needed for some ops where it would be illegal to insert a copy right before the use.
Differential Revision: https://reviews.llvm.org/D117291
LLVM dialect supports terminators with repeated successor blocks that take
different operands. This cannot be directly expressed in LLVM IR though since
it uses the number of the predecessor block to differentiate values in its PHI
nodes. Therefore, the translation to LLVM IR inserts dummy blocks to forward
arguments in case of repeated succesors with arguments. The insertion works
correctly. However, when connecting PHI nodes to their source values, the
assertion of the insertion having worked correctly was incorrect: it would only
trigger if repeated blocks were adjacent in the successor list (not guaranteed
by anything) and would not check if the successors have operands (no need for
dummy blocks in absence of operands since no PHIs are being created). Change
the assertion to only trigger in case of duplicate successors with operands,
and don't expect them to be adjacent.
Reviewed By: wsmoses
Differential Revision: https://reviews.llvm.org/D117214
When constructing an OperationName, the overwhelming majority of
cases are from registered operations. This revision adds a non-locked
lookup into the currently registered operations, which prevents locking
in the common case. This revision also optimizes several uses of
RegisteredOperationName that expect the operation to be registered,
e.g. such as in OpBuilder.
These changes provides a reasonable speedup (5-10%) in some
compilations, especially on platforms where locking is expensive.
Differential Revision: https://reviews.llvm.org/D117187
This guarantees the preconditions of fromCOO; whereas prior to this, one could call the constructor directly with an unsorted tensor, which would cause fromCOO to misbehave.
Reviewed By: aartbik
Differential Revision: https://reviews.llvm.org/D117167
Adding the optional decompositions have been verified to improve memory
usage on common models. Added the decomposition to the default tosa to linalg
passes.
Reviewed By: NatashaKnk
Differential Revision: https://reviews.llvm.org/D117175
Spotted this in a final grep of projects I don't usually build before
pushing https://reviews.llvm.org/D115380, which makes
`SmallVector::set_size()` private.
Update to `truncate()`, a new-ish variant of `resize()` that asserts the
new size is not bigger and that avoids pulling in the allocation and
initialization code for growing. Doesn't really look like the perf
impact of that would matter here, but since `dirLength` is known to be a
smaller size then we might as well.
Differential Revision: https://reviews.llvm.org/D117073
Enable ReassociatingReshapeOpConversion with "non-identity" layouts.
This removes an early-return in this function, which seems unnecessary and is
preventing some memref.collapse_shape from converting to LLVM (see included lit test).
It seems unnecessary because the return message says "only empty layout map is supported"
but there actually is code in this function to deal with non-empty layout maps. Maybe
it refers to an earlier state of implementation and is just out of date?
Though, there is another concern about this early return: the condition that it actually
checks, `{src,dst}MemrefType.getLayout().isIdentity()`, is not quite the same as what the
return message says, "only empty layout map is supported". Stepping through this
`getLayout().isIdentity()` code in GDB, I found that it evaluates to `.getAffineMap().isIdentity()`
which does (AffineMap.cpp:271):
```
if (getNumDims() != getNumResults())
return false;
```
This seems that it would always return false for memrefs of rank greater than 1 ?
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D114808
LLVM Dialect Constant Op translations assume that if the attribute is a
vector, it's a fixed length one, generating an invalid translation for
constant scalable vector initializations.
Differential Revision: https://reviews.llvm.org/D117125
* This is useful when you need to build mixed array from external static/dynamic arrays (e.g. from adaptor during dialect conversion)
* Also, to reduce C++ code in td and generated files
Differential Revision: https://reviews.llvm.org/D117106
This overload parses a pipeline string that contains the anchor operation type, and returns an OpPassManager
corresponding to the provided pipeline. This is useful for various situations, such as dynamic pass pipelines
which are not anchored within a parent pass pipeline.
fixes#52813
Differential Revision: https://reviews.llvm.org/D116525
Dynamic batch for rescale, gather, max_pool, avg_pool, conv2D and depthwise_conv2D. Split helper functions into a separate header file.
Reviewed By: rsuderman
Differential Revision: https://reviews.llvm.org/D117031
ShapedType was created in a time before interfaces, and is one of the earliest
type base classes in the ecosystem. This commit refactors ShapedType into
an interface, which is what it would have been if interfaces had existed at that
time. The API of ShapedType and it's derived classes are essentially untouched
by this refactor, with the exception being the API surrounding kDynamicIndex
(which requires a sole home).
For now, the API of ShapedType and its name have been kept as consistent to
the current state of the world as possible (to help with potential migration churn,
among other reasons). Moving forward though, we should look into potentially
restructuring its API and possible its name as well (it should really have "Interface"
at the end like other interfaces at the very least).
One other potentially interesting note is that I've attached the ShapedType::Trait
to TensorType/BaseMemRefType to act as mixins for the ShapedType API. This
is kind of weird, but allows for sharing the same API (i.e. preventing API loss from
the transition from base class -> Interface). This inheritance doesn't affect any
of the derived classes, it is just for API mixin.
Differential Revision: https://reviews.llvm.org/D116962
This field allows for defining a code block that is placed in both the interface
and trait declarations. This is very useful when defining a set of utilities to
expose on both the Interface class and the derived attribute/operation/type.
In non-static methods, `$_attr`/`$_op`/`$_type` (depending on the type of
interface) may be used to refer to an instance of the IR entity. In the interface
declaration, this is an instance of the interface class. In the trait declaration,
this is an instance of the concrete entity class (e.g. `IntegerAttr`, `FuncOp`, etc.).
Differential Revision: https://reviews.llvm.org/D116961
Apply scale may encounter scalar, tensor, or vector operations. Expand the
lowering so that it can lower arbitrary of container types.
Reviewed By: NatashaKnk
Differential Revision: https://reviews.llvm.org/D117080
This method simply forwards to populateFunctionLikeTypeConversionPattern,
which is more general. This also helps to remove special treatment of FuncOp from
DialectConversion.
Differential Revision: https://reviews.llvm.org/D116624
This patch fixes:
mlir/lib/Dialect/Arithmetic/Transforms/ExpandOps.cpp:161:52: error:
'static_assert' with no message is a C++17 extension
[-Werror,-Wc++17-extensions]
There have been a few API pieces remaining to allow for a smooth transition for
downstream users, but these have been up for a few months now. After this only
the C API will have reference to "Identifier", but those will be reworked in a followup.
The main updates are:
* Identifier -> StringAttr
* StringAttr::get requires the context as the first parameter
- i.e. `Identifier::get("...", ctx)` -> `StringAttr::get(ctx, "...")`
Reviewed By: mehdi_amini
Differential Revision: https://reviews.llvm.org/D116626
If any of the operands is NaN, return the operand instead of a new constant.
When the rhs operand is a constant, the second arith.cmpf+select ops will be folded away.
https://reviews.llvm.org/D117010 marks the two ops commutative, which will place the constant on the rhs.
Reviewed By: herhut
Differential Revision: https://reviews.llvm.org/D117011
Enable constant folding of ops within the math dialect, and introduce constant folders for ceil and log2
Reviewed By: mehdi_amini
Differential Revision: https://reviews.llvm.org/D117085
This change makes it possible to use a different buffer deallocation strategy. E.g., `-buffer-deallocation` can be used, which also works for allocations that are not in destination-passing style.
Differential Revision: https://reviews.llvm.org/D117096
This op is an example for how to deal with ops who's OpResult may aliasing with one of multiple OpOperands.
Differential Revision: https://reviews.llvm.org/D116868
Add inliner interface for GPU dialect. The interface marks all GPU
dialect ops legal to inline anywhere.
Differential Revision: https://reviews.llvm.org/D116889
Enable constant folding of ops within the math dialect, and introduce constant folders for ceil and log2
Reviewed By: mehdi_amini
Differential Revision: https://reviews.llvm.org/D117085
Given a while loop whose condition is given by a cmp, don't recomputed the comparison (or its inverse) in the after region, instead use a constant since the original condition must be true if we branched to the after region.
Reviewed By: mehdi_amini
Differential Revision: https://reviews.llvm.org/D117047
Most convolution operations need explicit padding of the input to
ensure all accesses are inbounds. In such cases, having a pad
operation can be a significant overhead. One way to reduce that
overhead is to try to fuse the pad operation with the producer of its
source.
A sequence
```
linalg.generic -> linalg.pad_tensor
```
can be replaced with
```
linalg.fill -> tensor.extract_slice -> linalg.generic ->
tensor.insert_slice.
```
if the `linalg.generic` has all parallel iterator types.
Differential Revision: https://reviews.llvm.org/D116418
This patch fixes:
mlir/lib/Dialect/Tosa/Transforms/TosaOptionalDecompositions.cpp:28:8:
error: 'runOnFunction' overrides a member function but is not marked
'override' [-Werror,-Wsuggest-override]
Moved all TOSA decomposition patterns so that they can be optionally populated
and used by external rewrites. This avoids decomposing TOSa operations when
backends may benefit from the non-decomposed version.
Reviewed By: rsuderman, mehdi_amini
Differential Revision: https://reviews.llvm.org/D116526
Given an if of the form, simplify it by eliminating the not and swapping the regions
scf.if not(c) {
yield origTrue
} else {
yield origFalse
}
becomes
scf.if c {
yield origFalse
} else {
yield origTrue
}
Reviewed By: mehdi_amini
Differential Revision: https://reviews.llvm.org/D116990
Previously, CallOps did not have any aliasing OpResult/OpOperand pairs. Therefore, CallOps were mostly ignored by the analysis and buffer copies were not inserted when necessary.
This commit introduces the following changes:
* Function bbArgs writable by default. A function can now be bufferized without inspecting its callers.
* Callers must introduce buffer copies of function arguments when necessary. If a function is external, the caller must conservatively assume that a function argument is modified by the callee after bufferization. If the function is not external, the caller inspects the callee to determine if a function argument is modified.
Differential Revision: https://reviews.llvm.org/D116457
Changed the remaining appearances of alloc to memref.alloc in several
documentation sections, since they lead to misunderstandings, if they
are used.
Differential Revision: https://reviews.llvm.org/D116999
D115722 added a DL spec to GPU modules. It happens that the DL
default interface implementation is sensitive to the name of the
DL spec attribute. This patch is fixing the name of the attribute
to be the expected one.
Reviewed By: ftynse
Differential Revision: https://reviews.llvm.org/D116956
When the unroll factor is 1, we should only fail "unrolling" when the trip count also is determined to be 1 and it is unable to be promoted.
Reviewed By: bondhugula
Differential Revision: https://reviews.llvm.org/D115365
Given a select whose result is an i1, we can eliminate the conditional in the select completely by adding a few arithmetic operations.
Reviewed By: ftynse
Differential Revision: https://reviews.llvm.org/D116839
This revision fixes SubviewOp, InsertSliceOp, ExtractSliceOp construction during bufferization
where not all offset/size/stride operands were properly specified.
A test that exhibited problematic behaviors related to incorrect memref casts is introduced.
Init tensor optimization is disabled in teh testing func bufferize pass.
Differential Revision: https://reviews.llvm.org/D116899
init_tensor elimination is arguably a pre-optimization that should be separated from comprehensive bufferization.
In any case it is still experimental and easily results in wrong IR with violated SSA def-use orderings.
Isolate the optimization behind a flag, separate the test cases and add a test case that would results in wrong IR.
Differential Revision: https://reviews.llvm.org/D116936
During iterative inlining of the functions in a multi-step call chain, the
inliner could add the same call operation several times to the worklist, which
led to use-after-free when this op was considered more than once.
Closes#52887.
Reviewed By: wsmoses
Differential Revision: https://reviews.llvm.org/D116820
We allow the omission of a map in memref.reinterpret_cast under the assumption,
that the cast might cast to an identity layout. This change adds verification
that the static knowledge that is present in the reinterpret_cast supports
this assumption.
Differential Revision: https://reviews.llvm.org/D116601
This patch changes the syntax of omp.atomic.read to take the address of
destination, instead of having the value in a result. This will allow
using omp.atomic.read operation within an omp.atomic.capture operation
thus making its implementation less complex.
Reviewed By: peixin
Differential Revision: https://reviews.llvm.org/D116396
Fix crash in the presence of yield values. Multiple fixes to affine loop
tiling pre-condition checks and return status. Do not signal pass
failure on a failure to tile since the IR is still valid. Detect index
set computation failure in checkIfHyperrectangular and return failure.
Replace assertions with proper status return. Move checks to an
appropriate place earlier in the utility before mutation happens.
Differential Revision: https://reviews.llvm.org/D116738
This patch moves PresburgerSet to Presburger/ directory. This patch is purely
mechincal, it only moves and renames functionality and tests.
This patch is part of a series of patches to move presburger functionality to
Presburger/ directory.
Reviewed By: arjunp
Differential Revision: https://reviews.llvm.org/D116836
a32300a changed it to create a ThreadPool eagerly so that it gets reused
across buffers, however it also made it so that we create a ThreadPool
early even if we're not gonna use it later because of the command line
option `--mlir-disable-threading` is provided.
Fix#53056
Reland 45adf60802 after build fixes
Differential Revision: https://reviews.llvm.org/D116848
a32300a changed it to create a ThreadPool eagerly so that it gets reused
across buffers, however it also made it so that we create a ThreadPool
early even if we're not gonna use it later because of the command line
option `--mlir-disable-threading` is provided.
Fix#53056
Differential Revision: https://reviews.llvm.org/D116848
This patch moves all presburger functionality from FlatAffineConstraints to
IntegerPolyhedron. This patch is purely mechanical, it only moves and renames
functionality and tests.
This patch is part of a series of patches to move presburger functionality to
Presburger/ directory.
Reviewed By: bondhugula
Differential Revision: https://reviews.llvm.org/D116681
This patch fixes:
mlir/lib/Dialect/Linalg/ComprehensiveBufferize/LinalgInterfaceImpl.cpp:292:12:
error: comparison of integers of different signs: 'int' and
'unsigned int' [-Werror,-Wsign-compare]
This function runs just the analysis of Comprehensive Bufferize, but does not bufferize the IR yet.
This is in preparation of fixing CallOp bufferization. Also needed for unifying Comprehensive Bufferize and core bufferization; the new partial bufferization can simply run bufferization without an analysis.
Differential Revision: https://reviews.llvm.org/D116456
Instead of `lookupBuffer` and `getResultBuffer`, there is now a single `getBuffer` function. This simplifies the `BufferizableOpInterface` API and is less confusing to users. They could previously have called the wrong function.
Furthermore, since `getBuffer` now takes an `OpOperand &` instead of a `Value`, users can no longer accidentally use one of the previous two functions incorrectly, which would have resulted in missing buffer copies.
Differential Revision: https://reviews.llvm.org/D116455
With this change, the analysis takes a look at OpOperands instead of OpResults. OpOperands can bufferize out-of-place (even if they have no aliasing OpResults). The analysis does no longer care about OpResults.
Previously, only OpResults could bufferize out-of-place, so OpOperands that have no aliasing OpResults were never copied by Comprehensive Bufferize. This does not fit wwell with the new CallOp bufferization that is introduced in a subsequent change. In essence, called FuncOps can then be treated as "black boxes" that may read/write to any bbArg, even if they do not return anything.
Differential Revision: https://reviews.llvm.org/D115706
This is in preparation of fixing CallOp bufferization. Add explicit linalg.inplaceable attrs to all bbArgs, except for the ones where inplaceability should be decided by the analysis.
Differential Revision: https://reviews.llvm.org/D115840
The revision renames `PrimFn` to `ArithFn`. The name resembles the newly introduced arith dialect that implements most of the arithmetic functions. An exception are log/exp that are part of the math dialect.
Depends On D115239
Reviewed By: stellaraccident
Differential Revision: https://reviews.llvm.org/D115240
This revision introduces a the `TypeFn` class that similar to the `PrimFn` class contains an extensible set of type conversion functions. Having the same mechanism for both type conversion functions and arithmetic functions improves code consistency. Additionally, having an explicit function class and function name is a prerequisite to specify a conversion or arithmetic function via attribute. In a follow up commits, we will introduce function attributes to make OpDSL operations more generic. In particular, the goal is to handle signed and unsigned computation in one operations. Today, there is a linalg.matmul and a linalg.matmul_unsigned.
The commit implements the following changes:
- Introduce the class of type conversion functions `TypeFn`
- Replace the hardwired cast and cast_unsigned ops by the `TypeFn` counterparts
- Adapt the python and C++ code generation paths to support the new cast operations
Example:
```
cast(U, A[D.m, D.k])
```
changes to
```
TypeFn.cast(U, A[D.m, D.k])
```
Depends On D115237
Reviewed By: stellaraccident
Differential Revision: https://reviews.llvm.org/D115239
Recent commits added a possibility for indices in LLVM dialect GEP operations
to be supplied directly as constant attributes to ensure they remain such until
translation to LLVM IR happens. Make this required for indexing into LLVM
struct types to match LLVM IR requirements, otherwise the translation would
assert on constructing such IR.
For better compatibility with MLIR-style operation construction interface,
allow GEP operations to be constructed programmatically using Values pointing
to known constant operations as struct indices.
Depends On D116758
Reviewed By: wsmoses
Differential Revision: https://reviews.llvm.org/D116759
This make LLVM dialect constants to work with `m_constant` matches. Implement
the folding hook for this operation as required by the trait. This in turn
allows LLVM::ConstantOp to properly participate in constant-folding.
Depends On D116757
Reviewed By: wsmoses
Differential Revision: https://reviews.llvm.org/D116758
In LLVM IR, the GEP indices that correspond to structures are required to be
i32 constants. MLIR models constants as just values defined by special
operations, and there is no verification that it is the case for structure
indices in GEP. Furthermore, some common transformations such as control flow
simplification may lead to the operands becoming non-constant. Make it possible
to directly supply constant values to LLVM GEPOp to guarantee they remain
constant until the translation to LLVM IR. This is not yet a requirement and
the verifier is not modified, this will be introduced separately.
Reviewed By: wsmoses
Differential Revision: https://reviews.llvm.org/D116757
This patch adds two folds. One for a repeated xor (e.g. xor(xor(x, a), a)) and one for a repeated trunc (e.g. trunc(trunc(x))).
Reviewed By: ftynse
Differential Revision: https://reviews.llvm.org/D116383
This patch adds omp.atomic.write lowering to LLVM IR.
Also, changed the syntax to have equal symbol instead of the comma to
make it more intuitive.
Reviewed By: kiranchandramohan, peixin
Differential Revision: https://reviews.llvm.org/D116416
This has two advantages.
1. It is more efficient. No need to clone the entire region.
2. Recreating ops (via cloning) invalidates analysis results. Previously, an OpResult could have bufferized out-of-place, even though the analysis requested an in-place bufferization. That is because BufferizationState keeps track of OpResults for storing bufferization analysis results (and cloned ops have new OpResults).
Differential Revision: https://reviews.llvm.org/D116453
In addition, all functions that call `allocationFn` now return FailureOr<Value>. This resolves a few TODOs in the code base.
Differential Revision: https://reviews.llvm.org/D116452
If `createDealloc` is deactivated (enabled by default), newly allocated buffers are not deallocated anymore. In such a case, the missing deallocations can be inserted by the existing "BufferDeallocation" pass.
This change is needed for unifying core bufferization and Comprehensive Bufferize. Core bufferization has a separate pass for generating deallocations.
Note: In the future, this will evolve towards generating deallocation ops only for buffer allocations that do not escape block boundaries (i.e., that are in destination passing style).
Differential Revision: https://reviews.llvm.org/D116450
The old function names (e.g., `replaceOp`) could have been confusing to users because they sound similar to rewriter functions, but have slightly different semantics.
Differential Revision: https://reviews.llvm.org/D116449
`tensor.collapse_shape` op when fused with a consumer elementwise
`linalg.generic` operation results in creation of tensor.expand_shape
ops. In purely dynamic cases this can end up with a dynamic dimensions
being expanded to more than one dynamic dimension. This is disallowed
by the semantics of `tensor.expand_shape` operation. (While the
transformation is itself correct, its a gap in the specification of
`tensor.expand_shape` that is the issue). So disallow fusions which
result in such a pattern.
Differential Revision: https://reviews.llvm.org/D116703
This change simplifies BufferizableOpInterface and other functions. Overall, the API will get smaller: Functions related to custom IR traversal are deleted entirely. This will makes it easier to write BufferizableOpInterface implementations.
This is also in preparation of unifying Comprehensive Bufferize and core bufferization. While Comprehensive Bufferize could theoretically maintain its own IR traversal, there is no reason to do so, because all bufferize implementations in BufferizableOpInterface have to support partial bufferization anyway. And we can share a larger part of the code base between the two bufferizations.
Differential Revision: https://reviews.llvm.org/D116448
This is mostly for documentation purposes: Passing the object as a const reference signifies that analysis decisions cannot be changed after the analysis.
Differential Revision: https://reviews.llvm.org/D116742
Initialize some variables to zero to avoid a warning about them possibly being
used uninitialized. In actuality, they will never be used before initialization.
This commits adds division normalization in the `getDivRepr` function which extracts
the gcd from the dividend and divisor and normalizes them.
Signed-off-by: Prashant Kumar <pk5561@gmail.com>
Reviewed By: bondhugula
Differential Revision: https://reviews.llvm.org/D115595
This patch allows the usage of the normalDestOperands and unwindDestOperands operands of llvm.invoke and have them be correctly mapped to phis in the successor when exported to LLVM IR.
Differential Revision: https://reviews.llvm.org/D116706
Historically, the bindings for the Linalg dialect were included into the
"core" bindings library because they depended on the C++ implementation
of the "core" bindings. The other dialects followed the pattern. Now
that this dependency is gone, split out each dialect into a separate
Python extension library.
Depends On D116649, D116605
Reviewed By: stellaraccident
Differential Revision: https://reviews.llvm.org/D116662
Until now, bufferization assumed that the yieleded tensor of a linalg.tiled_loop is an output tensor. This is not necessarily the case.
Differential Revision: https://reviews.llvm.org/D116685
No need to keep track of equivalent extract_slice / insert_slice tensors during bufferization. Just emit a copy, it will fold away.
Note: The analysis still keeps track of equivalent tensors to make the correct inplace bufferization decisions.
Differential Revision: https://reviews.llvm.org/D116684
Since the analysis is described to be suitable for a forward
data-flow analysis, maintaining the worklist as a queue mimics
RPO ordering of block visits, thus reaching the fixpoint earlier.
Differential Revision: https://reviews.llvm.org/D116393
Extra definitions are placed in the generated source file for each op class. The substitution `$cppClass` is replaced by the op's C++ class name.
This is useful when declaring but not defining methods in TableGen base classes:
```
class BaseOp<string mnemonic>
: Op<MyDialect, mnemonic, [DeclareOpInterfaceMethods<SomeInterface>] {
let extraClassDeclaration = [{
// ZOp is declared at at the bottom of the file and is incomplete here
ZOp getParent();
}];
let extraClassDefinition = [{
int $cppClass::someInterfaceMethod() {
return someUtilityFunction(*this);
}
ZOp $cppClass::getParent() {
return dyn_cast<ZOp>(this->getParentOp());
}
}];
}
```
Certain things may prevent defining these functions inline, in the declaration. In this example, `ZOp` in the same dialect is incomplete at the function declaration because ops classes are declared in alphabetical order. Alternatively, functions may be too big to be desired as inlined, or they may require dependencies that create cyclic includes, or they may be calling a templated utility function that one may not want to expose in a header. If the functions are not inlined, then inheriting from the base class N times means that each function will need to be defined N times. With `extraClassDefinitions`, they only need to be defined once.
Reviewed By: rriddle
Differential Revision: https://reviews.llvm.org/D115783
Instead of failing when it encounters a reference to an unknown symbol, Symbol DCE should ignore them. References to unknown symbols do not affect the overall function of Symbol DCE, so it should not need to fail when it encounters one.
In general, requiring that symbol references always be valid rather than only when necessary can be overly conservative.
Reviewed By: rriddle
Differential Revision: https://reviews.llvm.org/D116047
This patch adds clearAndCopyFrom to IntegerPolyhedron. This requires moving
LLVM-style RTTI from FlatAffineConstraints to IntegerPolyhedron.
This patch is part of a series of patches to move presburger math to Presburger
directory.
Reviewed By: arjunp
Differential Revision: https://reviews.llvm.org/D116533
This patch fixes:
mlir/lib/Dialect/Linalg/ComprehensiveBufferize/ModuleBufferization.cpp:635:23:
error: comparison of integers of different signs: 'int' and 'size_t'
(aka 'unsigned long') [-Werror,-Wsign-compare]
There is no need to inspect the ReturnOp of the called function.
This change also refactors the bufferization of CallOps in such a way that `lookupBuffer` is called only a single time. This is important for a later change that fixes CallOp bufferization. (There is currently a TODO among the test cases.)
Note: This change modifies a test case but is marked as NFC. There is no change of functionality, but FuncOps with empty bodies are now reported with a different error message.
Differential Revision: https://reviews.llvm.org/D116446
So far, only the custom dialect types are exposed.
The build and packaging is same as for Linalg and SparseTensor, and in
need of refactoring that is beyond the scope of this patch.
Reviewed By: stellaraccident
Differential Revision: https://reviews.llvm.org/D116605
Such CallOps were not handled properly. When computing the new result types (and replacement values) of a CallOp, non-tensor return values were not accounted for.
Differential Revision: https://reviews.llvm.org/D116445
Previously, the Python bindings for the Linalg dialect relied on the internal
implementation of core bindings. Most of that functionality was moved, and the
remaining one does not need access to the implementation: it used to accept a
dialect pointer as argument, but it can always be extracted from the operation
that it also accepts; operations are available through PybindAdaptors in an
opaque way. Change the bindings in that direction.
This enables the decoupling of the Linalg dialect Python extension from the
core IR Python extension.
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D116649
This change simplifies BufferizationState. Having `rewriter` in BufferizationState could be confusing to users because a rewriter is also passed to each `bufferize` function and it is not obvious (by looking at the API) that these two rewriters are the same.
Differential Revision: https://reviews.llvm.org/D116444
LICM checks that nested ops depend only on values defined outside
before performing hoisting.
However, it specifically omits to check for terminators which can
lead to SSA violations.
This revision fixes the incorrect behavior.
Differential Revision: https://reviews.llvm.org/D116657
Pass unique_ptr<BufferizationOption> to the bufferization. This allows the bufferization to enqueue additional PostAnalysisSteps. When running bufferization a second time, a new BufferizationOptions must be constructed.
Differential Revision: https://reviews.llvm.org/D116101
This revision refactors the implementation of outlineIfOp to expose
a finer-grain functionality `outlineSingleBlockRegion` that will be
reused in other contexts.
Differential Revision: https://reviews.llvm.org/D116591
Each attribute has two accessor: one suffixed with `Attr` which returns the attribute itself
and one without the suffix which unwrap the attribute.
For example for a StringAttr attribute with a field named `kind`, we'll generate:
StringAttr getKindAttr();
StringRef getKind();
Differential Revision: https://reviews.llvm.org/D116466
This fixes bug49264.
Simply, coroutine shouldn't be inlined before CoroSplit. And the marker
for pre-splited coroutine is created in CoroEarly pass, which ran after
AlwaysInliner Pass in O0 pipeline. So that the AlwaysInliner couldn't
detect it shouldn't inline a coroutine. So here is the error.
This patch set the presplit attribute in clang and mlir. So the inliner
would always detect the attribute before splitting.
Reviewed By: rjmccall, ezhulenev
Differential Revision: https://reviews.llvm.org/D115790
Depends On D115010
This changes a couple of places that used to `return failure();` to now use `llvm_unreachable()` instead. However, `Transforms/Sparsification.cpp` should be doing the necessary type checks to ensure that those cases are in fact unreachable.
Reviewed By: aartbik
Differential Revision: https://reviews.llvm.org/D115012
Depends On D115008
This change opens the way for D115012, and removes some corner cases in `CodegenUtils.cpp`. The `SparseTensorAttrDefs.td` already specifies that we allow `0` bitwidth for the two overhead types and that it is interpreted to mean the architecture's native width.
Reviewed By: aartbik
Differential Revision: https://reviews.llvm.org/D115010
This moves a bunch of helper functions from `Transforms/SparseTensorConversion.cpp` into `Transforms/CodegenUtils.{cpp,h}` so that they can be reused by `Transforms/Sparsification.cpp`, etc.
See also the dependent D115010 which cleans up some corner cases in this change.
Reviewed By: aartbik, rriddle
Differential Revision: https://reviews.llvm.org/D115008
If a fusedloc is created with a single location then no fusedloc
was previously created and single location returned instead. In the case
where there is a metadata associated with the location this results in
discarding the metadata. Instead only canonicalize where there is no
loss of information.
Differential Revision: https://reviews.llvm.org/D115605
I considered multiple approaches for this but settled on this one because I could make the lifetime management work in a reasonably easy way (others had issues with not being able to cast to a Python reference from a C++ constructor). We could stand to have more formatting helpers, but best to get the core mechanism in first.
Differential Revision: https://reviews.llvm.org/D116568
Exporting a llvm.landingpad operation with the cleanup flag set is currently ignored by the export code.
Differential Revision: https://reviews.llvm.org/D116565
By default, the listeners do nothing unless linked in.
This revision allows the "Perf" and "Intel" Jit event listeners to be used.
The "OProfile" event listener is not enabled at this time,
the associated library structure is not well-isolated.
Differential Revision: https://reviews.llvm.org/D116552
This diff adds support to printing a Value when it is null. We encounter this situation when debugging the PDL bytcode execution (where a null Value is perfectly valid). Currently, the AsmPrinter crashes (with an assert in a cast) when it encounters such Value.
We follow the same format used in other printed entities (e.g., null attribute).
Reviewed By: mehdi_amini, bondhugula
Differential Revision: https://reviews.llvm.org/D116084
Presently the result type verification checks if the type is used by a `pdl::OperationOp` inside the matcher. This is unnecessarily restrictive; the type could come from a `pdl::OperandOp or `pdl::OperandsOp` and still be inferrable.
Reviewed By: rriddle, Mogball
Differential Revision: https://reviews.llvm.org/D116083
This diff adds an integration test to multi-root PDL matching. It consists of two subtests:
1) A 1-layer perceptron with split forward / backward operations.
2) A 2-layer perceptron with fused forward / backward operations.
These tests use a collection of hand-written patterns and TensorFlow operations to be matched. The first test has a DAG / SSA dominant resulting match; the second does not and is therefore stored in a graph region.
This diff also includes two bug fixes:
1) Mark the pdl_interp dialect as a dependent in the TestPDLByteCodePass. This is needed, because we create ops from that dialect as a part of the PDL-to-PDLInterp lowering.
2) Fix of the starting index in the liveness range for the ForEach operations (bug exposed by the integration test).
Reviewed By: Mogball
Differential Revision: https://reviews.llvm.org/D116082
The tree merging of pattern predicates places the predicates in an unordered set. When the predicates are sorted, they are taken in the set order, not the insertion order. This results in nondeterministic behavior.
One solution to this problem would be to use `SetVector`. However, the value `SetVector` does not provide a `find` function for fast O(1) lookups and stores the predicates twice -- once in the set and once in the vector, which is undesirable, because we store patternToAnswer in each predicate. A simpler solution is to store the tie breaking ID (which follows the insertion order), and use this ID to break any ties when comparing predicates.
Reviewed By: Mogball
Differential Revision: https://reviews.llvm.org/D116081
When the original version of multi-root patterns was reviewed, several improvements were made to the pdl_interp operations during the review process. Specifically, the "get users of a value at the specified operand index" was split up into "get users" and "compare the users' operands with that value". The iterative execution was also cleaned up to `pdl_interp.foreach`. However, the positions in the pdl-to-pdl_interp lowering were not similarly refactored. This introduced several problems, including hard-to-detect bugs in the lowering and duplicate evaluation of `pdl_interp.get_users`.
This diff cleans up the positions. The "upward" `OperationPosition` was split-out into `UsersPosition` and `ForEachPosition`, and the operand comparison was replaced with a simple predicate. In the process, I fixed three bugs:
1. When multiple roots were had the same connector (i.e., a node that they shared with a subtree at the previously visited root), we would generate a single foreach loop rather than one foreach loop for each such root. The reason for this is that such connectors shared the position. The solution for this is to add root index as an id to the newly introduced `ForEachPosition`.
2. Previously, we would use `pdl_interp.get_operands` indiscriminately, whether or not the operand was variadic. We now correctly detect variadic operands and insert `pdl_interp.get_operand` when needed.
3. In certain corner cases, we would trigger the "connector has not been traversed yet" assertion. This was caused by not inserting the values during the upward traversal correctly. This has now been fixed.
Reviewed By: Mogball
Differential Revision: https://reviews.llvm.org/D116080
The result value of a llvm.invoke operation is currently not mapped to the corresponding llvm::Value* when exporting to LLVM IR. This leads to any later operations using the result to crash as it receives a nullptr.
Differential Revision: https://reviews.llvm.org/D116564
After https://reviews.llvm.org/D115821 it became possible to create
`tensor<elem_type>` with a single `tensor.from_elements` operation without
collapsing tensor shape from `tensor<1xelem_type>` to `tensor<elem_type>`
Differential Revision: https://reviews.llvm.org/D115891
Fix confusing diagnostic during partial dialect conversion. A failure to
legalize is not the same as an operation being illegal: for eg. an
operation neither explicity marked legal nor explicitly marked illegal
could have been generated and may have failed to legalize further. The
op isn't an illegal one per
https://mlir.llvm.org/docs/DialectConversion/#conversion-target
which is an op that is explicitly marked illegal.
Differential Revision: https://reviews.llvm.org/D116152
This patch removes unnecessary dependency on IR for Simplex. This patch allows
users to use Presburger library without depending on MLIRIR.
Reviewed By: ftynse
Differential Revision: https://reviews.llvm.org/D116530
Two canonicalizations for select %x, 1, 0
If the return type is i1, return simply the condition %x, otherwise extui %x to the return type.
Reviewed By: mehdi_amini
Differential Revision: https://reviews.llvm.org/D116517
Replace and(ext(a),ext(b)) with ext(and(a,b)). This both reduces one instruction, and results in the computation (and/or) being done on a smaller type.
Reviewed By: mehdi_amini
Differential Revision: https://reviews.llvm.org/D116519
I'm not sure what is the right fix here, but adding a name to all these
lead to many segfaults.
Reviewed By: stellaraccident
Differential Revision: https://reviews.llvm.org/D116506
This patch moves LinearTransform to Presburger/ and makes it use
IntegerPolyhedron instead of FlatAffineConstraints. Also modifies its usage in
`FlatAffineConstraints::findIntegerSample` to support the changes.
This patch is part of a series of patches for moving presburger math functionality into Presburger directory.
Reviewed By: arjunp
Differential Revision: https://reviews.llvm.org/D116311
This reduce an unnecessary amount of copy of non-trivial objects, like
APFloat.
Reviewed By: rriddle, jpienaar
Differential Revision: https://reviews.llvm.org/D116505
This patch creates folds for cmpi( ext(%x : i1, iN) != 0) -> %x
In essence this matches patterns matching an extension of a boolean, that != 0, which is equivalent to the original condition.
Reviewed By: mehdi_amini
Differential Revision: https://reviews.llvm.org/D116504
https://reviews.llvm.org/D109555 added support to APInt for this, so the special case to disable it is no longer valid. It is in fact legal to construct these programmatically today, and they print properly but do not parse.
Justification: zero bit integers arise naturally in various bit reduction optimization problems, and having them defined for MLIR reduces special casing.
I think there is a solid case for i0 and ui0 being supported. I'm less convinced about si0 and opted to just allow the parser to round-trip values that already verify. The counter argument is that the proper singular value for an si0 is -1. But the counter to this counter is that the sign bit is N-1, which does not exist for si0 and it is not unreasonable to consider this non-existent bit to be 0. Various sources consider it having the singular value "0" to be the least surprising.
Reviewed By: lattner
Differential Revision: https://reviews.llvm.org/D116413
These method currently takes a SmallVector<AffineExpr> & as an
argument to return the dims as AffineExpr. This creation of
AffineExpr objects is unnecessary.
Differential Revision: https://reviews.llvm.org/D116422
Per the discussion in https://reviews.llvm.org/D116345 it makes sense
to move AtomicRMWOp out of the standard dialect. This was accentuated by the
need to add a fold op with a memref::cast. The only dialect
that would permit this is the memref dialect (keeping it in the standard dialect
or moving it to the arithmetic dialect would require those dialects to have a
dependency on the memref dialect, which breaks linking).
As the AtomicRMWKind enum is used throughout, this has been moved to Arith.
Reviewed By: Mogball
Differential Revision: https://reviews.llvm.org/D116392
vector.transfer operations do not have rank-reducing semantics.
Bail on illegal rank-reduction: we need to check that the rank-reduced
dims are exactly the leading dims. I.e. the following is illegal:
```
%0 = vector.transfer_write %v, %t[0,0], %cst :
vector<2x4xf32>, tensor<2x4xf32>
%1 = tensor.insert_slice %0 into %tt[0,0,0][2,1,4][1,1,1] :
tensor<2x4xf32> into tensor<2x1x4xf32>
```
Cannot fold into:
```
%0 = vector.transfer_write %v, %t[0,0,0], %cst :
vector<2x4xf32>, tensor<2x1x4xf32>
```
For this, check the trailing `vectorRank` dims of the insert_slice result
tensor match the trailing dims of the inferred result tensor.
Differential Revision: https://reviews.llvm.org/D116409
The semantics of the ops that implement the
`OffsetSizeAndStrideOpInterface` is that if the number of offsets,
sizes or strides are less than the rank of the source, then some
default values are filled along the trailing dimensions (0 for offset,
source dimension of sizes, and 1 for strides). This is confusing,
especially with rank-reducing semantics. Immediate issue here is that
the methods of `OffsetSizeAndStridesOpInterface` assumes that the
number of values is same as the source rank. This cause out-of-bounds
errors.
So simplifying the specification of `OffsetSizeAndStridesOpInterface`
to make it invalid to specify number of offsets/sizes/strides not
equal to the source rank.
Differential Revision: https://reviews.llvm.org/D115677
LLVM (dialect and IR) have atomics for and/or. This patch enables atomic_rmw ops in the standard dialect for and/or that lower to these (in addition to the existing atomics such as addi, etc).
Reviewed By: mehdi_amini
Differential Revision: https://reviews.llvm.org/D116345
Includes dependency fix that resulted in canonicalizer pass not linking in.
Linalg named ops lowering are moved to a separate pass. This allows TOSA
canonicalizers to run between named-ops lowerings and the general TOSA
lowerings. This allows the TOSA canonicalizers to run between lowerings.
Differential Revision: https://reviews.llvm.org/D116057
This patch replaces usage of FlatAffineConstraints in Simplex with
IntegerPolyhedron. This removes dependency of Simplex on FlatAffineConstraints
and puts it on IntegerPolyhedron, which is part of Presburger library.
Reviewed By: arjunp
Differential Revision: https://reviews.llvm.org/D116287
This patch moves `FlatAffineConstraints::print` and
`FlatAffineConstraints::dump()` to IntegerPolyhedron.
Reviewed By: arjunp
Differential Revision: https://reviews.llvm.org/D116289
This patch moves some static functions from AffineStructures.cpp to
Presburger/Utils.cpp and some to be private members of FlatAffineConstraints
(which will later be moved to IntegerPolyhedron) to allow for a smoother
transition for moving FlatAffineConstraints math functionality to
Presburger/IntegerPolyhedron.
This patch is part of a series of patches for moving math functionality to
Presburger directory.
Reviewed By: arjunp, bondhugula
Differential Revision: https://reviews.llvm.org/D115869
This patch moves some static functions from AffineStructures.cpp to
Presburger/Utils.cpp and some to be private members of FlatAffineConstraints
(which will later be moved to IntegerPolyhedron) to allow for a smoother
transition for moving FlatAffineConstraints math functionality to
Presburger/IntegerPolyhedron.
This patch is part of a series of patches for moving math functionality to
Presburger directory.
Reviewed By: arjunp, bondhugula
Differential Revision: https://reviews.llvm.org/D115869
Querying threads directly from the thread pool fails if there is no thread pool or if multithreading is not enabled. Returns 1 by default.
Reviewed By: mehdi_amini
Differential Revision: https://reviews.llvm.org/D116259
The computed number of hardware threads can change over the life of the process based on affinity changes. Since we need a data structure that is at least as large as the maximum parallelism, it is important to use the value that was actually latched for the thread pool we will be dispatching work to.
Also adds an assert specifically for if it doesn't line up (I was getting a crash on an index into the vector).
Differential Revision: https://reviews.llvm.org/D116257
This reverts commit 313de31fbb.
There is a missing CMake dependency, building with shared libraries is
broken:
55.509 [45/4/3061] Linking CXX shared library lib/libMLIRTosaToLinalg.so.14git
FAILED: lib/libMLIRTosaToLinalg.so.14git
...
TosaToLinalgPass.cpp: undefined reference to `mlir::createCanonicalizerPass()'
Linalg named ops lowering are moved to a separate pass. This allows TOSA
canonicalizers to run between named-ops lowerings and the general TOSA
lowerings. This allows the TOSA canonicalizers to run between lowerings.
Reviewed By: NatashaKnk
Differential Revision: https://reviews.llvm.org/D116057
Previously, we defined a struct named `RootOrderingCost`, which stored the cost (a pair consisting of the depth of the connector and a tie breaking ID), as well as the connector itself. This created some confusion, because we would sometimes write, e.g., `cost.cost.first` (the first `cost` referring to the struct, the second one referring to the `cost` field, and `first` referring to the depth). In order to address this confusion, here we rename `RootOrderingCost` to `RootOrderingEntry` (keeping the fields and their names as-is).
This clarification exposed non-determinism in the optimal branching algorithm. When choosing the best local parent, we were previuosly only considering its depth (`cost.first`) and not the tie-breaking ID (`cost.second`). This led to non-deterministic choice of the parent when multiple potential parents had the same depth. The solution is to compare both the depth and the tie-breaking ID.
Testing: Rely on existing unit tests. Non-detgerminism is hard to unit-test.
Reviewed By: rriddle, Mogball
Differential Revision: https://reviews.llvm.org/D116079
There is no way to programmatically configure the list of disabled and enabled patterns in the canonicalizer pass, other than the duplicate the whole pass. This patch exposes the `disabledPatterns` and `enabledPatterns` options.
Reviewed By: mehdi_amini
Differential Revision: https://reviews.llvm.org/D116055
* There is no reason to forbid that case
* Also, user will get very unfriendly error like `expected result type with offset = -9223372036854775808 instead of 1`
Differential Revision: https://reviews.llvm.org/D114678
These conversions are better suited to be applied at whole tensor
level. Applying these as canonicalizations end up triggering such
canonicalizations at all levels of the stack which might be
undesirable. For example some of the resulting code patterns wont
bufferize in-place and need additional stack buffers. Best is to be
more deliberate in when these canonicalizations apply.
Differential Revision: https://reviews.llvm.org/D115912
This method is more suitable as an opinterface: it seems intrinsic to
individual instances of the operation instead of the dialect.
Also remove the restriction on the interface being applicable to the entry block only.
Differential Revision: https://reviews.llvm.org/D116018
This is a purely mechanical patch moving some functionality out from the
`Simplex` class out into a `SimplexBase` class. This pavees the way for
a future patch adding support for lexicographic optimization with a class
`LexSimplex`, which will inherit from `SimplexBase`. Inheriting directly
from `Simplex` would bring many additional functions that would not work in
`LexSimplex` because it operates slighty differently from `Simplex`. So We
split out only the basic functionality it needs to inherit into `SimplexBase`.
Reviewed By: Groverkss
Differential Revision: https://reviews.llvm.org/D115831
TOSA's canonicalizers that change dense operations should be moved to a
seperate optimization pass to avoid canonicalizing to operations not supported
for relevant backends.
Reviewed By: rsuderman
Differential Revision: https://reviews.llvm.org/D115890
It is possible for the shift value to exceed the number of bits. In these
cases we can just multiply by zero. This is relatively rare occurence but
should be handled.
Reviewed By: not-jenni
Differential Revision: https://reviews.llvm.org/D115779
When the input and output of a pool2d op are both 1x1, it can be canonicalized to a no-op
Reviewed By: rsuderman
Differential Revision: https://reviews.llvm.org/D115908