Currently, the parser for IntegerSet, only allows constraints like:
```
affine-constraint ::= affine-expr `>=` `0`
| affine-expr `==` `0`
```
This form is sometimes unreadable and painful to use when writing unittests
for Presburger library and tests in general.
This patch extends the parser to allow affine constraints with affine-expr on
the RHS:
```
affine-constraint ::= affine-expr `>=` `affine-expr`
| affine-expr `==` `affine-expr`
```
The internal storage and printing of IntegerSet is still in the original format.
Reviewed By: bondhugula
Differential Revision: https://reviews.llvm.org/D128915
We can canonicalize consecutive complex.exp and complex.log which are inverse functions each other.
Reviewed By: bixia
Differential Revision: https://reviews.llvm.org/D128966
This is a followup to D128847. The `AffineMap::getPermutedPosition` method performs a linear scan of the map, thus the previous implementation had asymptotic complexity of `O(|topSort| * |m|)`. This change reduces that to `O(|topSort| + |m|)`.
Reviewed By: aartbik
Differential Revision: https://reviews.llvm.org/D129011
The TOSA Specification doesn't have a dilation attribute for transpose_conv2d,
and the padding array is of size 4. (top,bottom,left,right).
This change updates the dialect to match the specification, and updates the lit
tests to match the dialect changes.
Differential Revision: https://reviews.llvm.org/D127332
This also changes the space of the returned lexmin for IntegerPolyhedrons;
the symbols in the poly now correspond to symbols in the result rather than dims.
Reviewed By: Groverkss
Differential Revision: https://reviews.llvm.org/D128933
This allows purging references of scf.ForeachThreadOp and scf.PerformConcurrentlyOp from
ParallelInsertSliceOp.
This will allowmoving the op closer to tensor::InsertSliceOp with which it should share much more
code.
In the future, the decoupling will also allow extending the type of ops that can be used in the
parallel combinator as well as semantics related to multiple concurrent inserts to the same
result.
Differential Revision: https://reviews.llvm.org/D128857
This revision avoids a crash in the 0-D case of distributing vector.transfer ops out of
vector.warp_execute_on_lane_0.
Due to the code complexity and lack of documentation, it took untangling the implementation
before realizing that the simple fix was to fail in the 0-D case.
The rewrite is still very useful to understand this code better.
Differential Revision: https://reviews.llvm.org/D128793
This differential improves two error conditions, by detecting them earlier and by providing better messages to help users understand what went wrong.
Reviewed By: jpienaar
Differential Revision: https://reviews.llvm.org/D128555
This patch implements the analysis state classes needed for sparse data-flow analysis and implements a dead-code analysis using those states to determine liveness of blocks, control-flow edges, region predecessors, and function callsites.
Depends on D126751
Reviewed By: rriddle, phisiart
Differential Revision: https://reviews.llvm.org/D127064
For the conversion to nvgpu `mma.sync` and `ldmatrix` pathways, the code
was missing support for the `i4` data type. While fixing this, another
bug was discoverd that caused the number of ldmatrix tiles calculated for
certain operand types and configurations to be incorrect. This change
fixes both issues and adds additional tests.
Differential Revision: https://reviews.llvm.org/D128074
This revision merges the 2 split_reduction transforms and adds extra control by using attributes.
SplitReduction is known to require a concrete additional buffer to store tempoaray information.
Add an option to introduce a `bufferization.alloc_tensor` instead of `linalg.init_tensor`.
This behaves better with subset-based tiling and bufferization.
Differential Revision: https://reviews.llvm.org/D128722
When the iteration graph is cyclic (even after several attempts using less and less constraints), the current sparse compiler bails out, and no rewriting hapens. However, this revision adds some new logic where the sparse compiler tries to find a single input sparse tensor that breaks the cycle, and then adds a proper sparse conversion operation. This way, more incoming kernels can be handled!
Note, the resulting code is not optimal (although it keeps more or less proper "sparse" complexity), and more improvements should be added (especially when the kernel directly yields without computation, such as the transpose example). However, handling is better than not handling ;-)
Reviewed By: bixia
Differential Revision: https://reviews.llvm.org/D128847
This feature is tested by unit test since not many places in the codebase
use SubElementTypeInterface.
Differential Revision: https://reviews.llvm.org/D127539
Since only mutable types and attributes can go into infinite recursion
inside SubElementInterface::walkSubElement, and there are only a few of
them (mutable types and attributes), we introduce new traits for Type
and Attribute: TypeTrait::IsMutable and AttributeTrait::IsMutable,
respectively. They indicate whether a type or attribute is mutable.
Such traits are required if the ImplType defines a `mutate` function.
Then, inside SubElementInterface, we use a set to record visited mutable
types and attributes that have been visited before.
Differential Revision: https://reviews.llvm.org/D127537
We currently generate reproducer configurations using a comment placed at
the top of the generated .mlir file. This is kind of hacky given that comments
have no semantic context in the source file and can easily be dropped. This
strategy also wouldn't work if/when we have a bitcode format. This commit
switches to using an external assembly resource, which is verifiable/can
work with a hypothetical bitcode naturally/and removes the awkward processing
from mlir-opt for splicing comments and re-applying command line options. With
the removal of command line munging, this opens up new possibilities for
executing reproducers in memory.
Differential Revision: https://reviews.llvm.org/D126447
This commit enables support for providing and processing external
resources within MLIR assembly formats. This is a mechanism with which
dialects, and external clients, may attach additional information when
printing IR without that information being encoded in the IR itself.
External resources are not uniqued within the MLIR context, are not
attached directly to any operation, and are solely intended to live and be
processed outside of the immediate IR. There are many potential uses of this
functionality, for example MLIR's pass crash reproducer could utilize this to
attach the pass resource executing when a crash occurs. Other types of
uses may be embedding large amounts of binary data, such as weights in ML
applications, that shouldn't be copied directly into the MLIR context, but
need to be kept adjacent to the IR.
External resources are encoded using a key-value pair nested within a
dictionary anchored by name either on a dialect, or an externally registered
entity. The key is an identifier used to disambiguate the data. The value
may be stored in various limited forms, but general encodings use a string
(human readable) or blob format (binary). Within the textual format, an
example may be of the form:
```mlir
{-#
// The `dialect_resources` section within the file-level metadata
// dictionary is used to contain any dialect resource entries.
dialect_resources: {
// Here is a dictionary anchored on "foo_dialect", which is a dialect
// namespace.
foo_dialect: {
// `some_dialect_resource` is a key to be interpreted by the dialect,
// and used to initialize/configure/etc.
some_dialect_resource: "Some important resource value"
}
},
// The `external_resources` section within the file-level metadata
// dictionary is used to contain any non-dialect resource entries.
external_resources: {
// Here is a dictionary anchored on "mlir_reproducer", which is an
// external entity representing MLIR's crash reproducer functionality.
mlir_reproducer: {
// `pipeline` is an entry that holds a crash reproducer pipeline
// resource.
pipeline: "func.func(canonicalize,cse)"
}
}
```
Differential Revision: https://reviews.llvm.org/D126446
The original code is more readable because the goal is to check if the given
value does *not* lie in the range. It is harder to understand this by
reading the rewritten code.
Reviewed By: mehdi_amini
Differential Revision: https://reviews.llvm.org/D128753
Should be NFC. We can just do the base conversion manually and avoid
warnings about it. Clang before Clang 13 didn't implement P1825 and
complains:
mlir/lib/Analysis/Presburger/IntegerRelation.cpp:226:10: warning: local variable 'result' will be copied
despite being returned by name [-Wreturn-std-move]
return result;
^~~~~~
mlir/lib/Analysis/Presburger/IntegerRelation.cpp:226:10: note: call 'std::move' explicitly to avoid copying
return result;
^~~~~~
std::move(result)
Consecutive complex.neg are redundant so that we can canonicalize them to the original operands.
Reviewed By: pifon2a
Differential Revision: https://reviews.llvm.org/D128781
1. Remove the redundant collapse clause in MLIR OpenMP worksharing-loop
operation.
2. Fix several typos.
3. Refactor the chunk size type conversion since CreateSExtOrTrunc has
both type check and type conversion.
Reviewed By: kiranchandramohan
Differential Revision: https://reviews.llvm.org/D128338
`enableSplitting` simply enables/disables whether we should split
or use the full buffer. `insertMarkerInOutput` toggles if split markers
should be inserted in between prcessed output chunks.
These options allow for merging the duplicate code paths we have
when splitting is optional.
Differential Revision: https://reviews.llvm.org/D128764