Querying threads directly from the thread pool fails if there is no thread pool or if multithreading is not enabled. Returns 1 by default.
Reviewed By: mehdi_amini
Differential Revision: https://reviews.llvm.org/D116259
This reverts commit 313de31fbb.
There is a missing CMake dependency, building with shared libraries is
broken:
55.509 [45/4/3061] Linking CXX shared library lib/libMLIRTosaToLinalg.so.14git
FAILED: lib/libMLIRTosaToLinalg.so.14git
...
TosaToLinalgPass.cpp: undefined reference to `mlir::createCanonicalizerPass()'
Linalg named ops lowering are moved to a separate pass. This allows TOSA
canonicalizers to run between named-ops lowerings and the general TOSA
lowerings. This allows the TOSA canonicalizers to run between lowerings.
Reviewed By: NatashaKnk
Differential Revision: https://reviews.llvm.org/D116057
There is no way to programmatically configure the list of disabled and enabled patterns in the canonicalizer pass, other than the duplicate the whole pass. This patch exposes the `disabledPatterns` and `enabledPatterns` options.
Reviewed By: mehdi_amini
Differential Revision: https://reviews.llvm.org/D116055
These conversions are better suited to be applied at whole tensor
level. Applying these as canonicalizations end up triggering such
canonicalizations at all levels of the stack which might be
undesirable. For example some of the resulting code patterns wont
bufferize in-place and need additional stack buffers. Best is to be
more deliberate in when these canonicalizations apply.
Differential Revision: https://reviews.llvm.org/D115912
This method is more suitable as an opinterface: it seems intrinsic to
individual instances of the operation instead of the dialect.
Also remove the restriction on the interface being applicable to the entry block only.
Differential Revision: https://reviews.llvm.org/D116018
This is a purely mechanical patch moving some functionality out from the
`Simplex` class out into a `SimplexBase` class. This pavees the way for
a future patch adding support for lexicographic optimization with a class
`LexSimplex`, which will inherit from `SimplexBase`. Inheriting directly
from `Simplex` would bring many additional functions that would not work in
`LexSimplex` because it operates slighty differently from `Simplex`. So We
split out only the basic functionality it needs to inherit into `SimplexBase`.
Reviewed By: Groverkss
Differential Revision: https://reviews.llvm.org/D115831
TOSA's canonicalizers that change dense operations should be moved to a
seperate optimization pass to avoid canonicalizing to operations not supported
for relevant backends.
Reviewed By: rsuderman
Differential Revision: https://reviews.llvm.org/D115890
`EnumAttr` is a pure TableGen implementation of enum attributes using `AttrDef`. This is meant as a drop-in replacement for `StrEnumAttr`, which is soon to be deprecated. `StrEnumAttr` is often used over `IntEnumAttr` because its more readable in MLIR assembly formats. However, storing and manipulating strings is not efficient. Defining `StrEnumAttr` can also be awkward and relies on a lot of special logic in `EnumsGen`, and has some hidden sharp edges.
Also, `EnumAttr` stores the enum directly, removing the need to convert to/from integers when calling attribute getters on ops.
Reviewed By: mehdi_amini
Differential Revision: https://reviews.llvm.org/D115181
When the input and output of a pool2d op are both 1x1, it can be canonicalized to a no-op
Reviewed By: rsuderman
Differential Revision: https://reviews.llvm.org/D115908
This patch extends the GPU kernel outlining pass so that it can take in
an optional data layout specification that will be attached to the GPU
module operation generated. If the data layout specification is not provided
the default data layout is used instead.
Reviewed By: herhut, mehdi_amini
Differential Revision: https://reviews.llvm.org/D115722
Having a default value for the lowering strategy of the multi-reduction op has proven
to be unexpected by users. This patch is dropping the default value so that users have
to explicitly choose the lowering strategy to be applied.
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D115805
This allows op interface implementations to make decisions based on dialect-specific bufferization state.
This is in preparation of fixing conflict detection of CallOps in ModuleBufferization.
Differential Revision: https://reviews.llvm.org/D115705
The `rewrite` statement allows for rewriting a given root
operation with a block of nested rewriters. The root operation is
not implicitly erased or replaced, and any transformations to it
must be expressed within the nested rewrite block. The inner body
may contain any number of other rewrite statements, variables, or
expressions.
Differential Revision: https://reviews.llvm.org/D115299
This statement acts as a companion to the existing `erase`
statement, and is the corresponding PDLL construct for the
`PatternRewriter::replaceOp` C++ API. This statement replaces a
given operation with a set of values.
Differential Revision: https://reviews.llvm.org/D115298
Tuples are used to group multiple elements into a single
compound value. The values in a tuple can be of any type, and
do not need to be of the same type. There is also no limit to
the number of elements held by a tuple.
Tuples will be used to support multiple results from
Constraints and Rewrites (added in a followup), and will also
make it easier to support more complex primitives (such as
range based maps that can operate on multiple values).
Differential Revision: https://reviews.llvm.org/D115297
An operation expression in PDLL represents an MLIR operation. In
the match section of a pattern, this expression models one of
the input operations to the pattern. In the rewrite section of
a pattern, this expression models one of the operations to
create. The general structure of the operation expression is very
similar to that of the "generic form" of textual MLIR assembly:
```
let root = op<my_dialect.foo>(operands: ValueRange) {attr = attr: Attr} -> (resultTypes: TypeRange);
```
For now we only model the components that are within PDL, as PDL
gains support for blocks and regions so will this expression.
Differential Revision: https://reviews.llvm.org/D115296
This allows for using literal attributes and types within PDLL,
which simplifies building both constraints and rewriters. For
example, checking if an attribute is true is as simple as
`attr<"true">`.
Differential Revision: https://reviews.llvm.org/D115295
This is a new pattern rewrite frontend designed from the ground
up to support MLIR constructs, and to target PDL. This frontend
language was proposed in https://llvm.discourse.group/t/rfc-pdll-a-new-declarative-rewrite-frontend-for-mlir/4798
This commit starts sketching out the base structure of the
frontend, and is intended to be a minimal starting point for
building up the language. It essentially contains support for
defining a pattern, variables, and erasing an operation. The
features mentioned in the proposal RFC (including IDE support)
will be added incrementally in followup commits.
I intend to upstream the documentation for the language in a
followup when a bit more of the pieces have been landed.
Differential Revision: https://reviews.llvm.org/D115093
Implements the RegionBranchOpInterface method getNumRegionInvocations to `scf::IfOp` so that, when the condition is constant, the number of region executions can be analyzed by `NumberOfExecutions`.
Reviewed By: jpienaar, ftynse
Differential Revision: https://reviews.llvm.org/D115087
* Call `replaceOp` instead of `mapBuffer`.
* Remove bvm and all helper functions around bvm.
* Simplify FuncOp bufferization and rely on existing functionality to generate ToMemrefOps for function BlockArguments.
Differential Revision: https://reviews.llvm.org/D115515
Ops for the signed counterparts "llvm.smin" and "llvm.smax" already exist. This patch adds the unsigned versions as well.
Differential Revision: https://reviews.llvm.org/D115796
After removing the range type, Linalg does not define any type. The revision thus consolidates the LinalgOps.h and LinalgTypes.h into a single Linalg.h header. Additionally, LinalgTypes.cpp is renamed to LinalgDialect.cpp to follow the convention adopted by other dialects such as the tensor dialect.
Depends On D115727
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D115728
With VectorType supporting scalable dimensions, we don't need many of
the operations currently present in ArmSVE, like mask generation and
basic arithmetic instructions. Therefore, this patch also gets
rid of those.
Having built-in scalable vector support also simplifies the lowering of
scalable vector dialects down to LLVMIR.
Scalable dimensions are indicated with the scalable dimensions
between square brackets:
vector<[4]xf32>
Is a scalable vector of 4 single precission floating point elements.
More generally, a VectorType can have a set of fixed-length dimensions
followed by a set of scalable dimensions:
vector<2x[4x4]xf32>
Is a vector with 2 scalable 4x4 vectors of single precission floating
point elements.
The scale of the scalable dimensions can be obtained with the Vector
operation:
%vs = vector.vscale
This change is being discussed in the discourse RFC:
https://llvm.discourse.group/t/rfc-add-built-in-support-for-scalable-vector-types/4484
Differential Revision: https://reviews.llvm.org/D111819
Instead of modifying the existing scf.for op, create a new op with memref OpOperands/OpResults and delete the old op.
New allocations / other memrefs can now be yielded from the loop. This functionality is deactivated by default and guarded against by AssertDestinationPassingStyle.
This change also introduces `replaceOp`, which will be utilized by all other `bufferize` implementations in future commits. Bufferization will then no longer rely on old (pre-bufferize) ops to DCE away. Instead old ops are deleted on the spot. This improves debuggability because there won't be any duplicate ops anymore (bufferized + not-yet-bufferized) when dumping IR during bufferization. It is also less fragile because unbufferized IR can no longer silently "hang around" due to an implementation bug.
Differential Revision: https://reviews.llvm.org/D114926
Added documentation to clearify the purpose of the bufferization to memref pass
and added some remarks.
Differential Revision: https://reviews.llvm.org/D115326
Remove the RangeOp and the RangeType that are not actively used anymore. After removing RangeType, the LinalgTypes header only includes the generated dialect header.
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D115727
Break up the vectorization pre-condition into the part checking for
static shape and the rest checking if the linalg op is supported by
vectorization. This allows checking if an op could be vectorized if it
had static shapes.
Differential Revision: https://reviews.llvm.org/D115754
Instead of printing analysis debug information to stderr, annotate the IR. This makes it easier to understand decisions made by the analysis, especially in larger input IR.
Differential Revision: https://reviews.llvm.org/D115575
Implementation of the interface allows querying the size and alignments of an LLVMArrayType as well as query the size and alignment of a struct containing an LLVMArrayType.
The implementation should yield the same results as llvm::DataLayout, including support for over aligned element types.
There is no customization point for adjusting an arrays alignment; it is simply taken from the element type.
Differential Revision: https://reviews.llvm.org/D115704
This is the second part of https://reviews.llvm.org/D114993 after slicing
into 2 independent commits.
This is needed at the moment to get good codegen from 2d vector.transfer
ops that aim to compile to SIMD load/store instructions but that can
only do so if the whole 2d transfer shape is handled in one piece, in
particular taking advantage of the memref being contiguous rowmajor.
For instance, if the target architecture has 128bit SIMD then we would
expect that contiguous row-major transfers of <4x4xi8> map to one SIMD
load/store instruction each.
The current generic lowering of multi-dimensional vector.transfer ops
can't achieve that because it peels dimensions one by one, so a transfer
of <4x4xi8> becomes 4 transfers of <4xi8>.
The new patterns here are only enabled for now by
-test-vector-transfer-flatten-patterns.
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D114993
This is the first part of https://reviews.llvm.org/D114993 which has been
split into small independent commits.
This is needed at the moment to get good codegen from 2d vector.transfer
ops that aim to compile to SIMD load/store instructions but that can
only do so if the whole 2d transfer shape is handled in one piece, in
particular taking advantage of the memref being contiguous rowmajor.
For instance, if the target architecture has 128bit SIMD then we would
expect that contiguous row-major transfers of <4x4xi8> map to one SIMD
load/store instruction each.
The current generic lowering of multi-dimensional vector.transfer ops
can't achieve that because it peels dimensions one by one, so a transfer
of <4x4xi8> becomes 4 transfers of <4xi8>.
The new patterns here are only enabled for now by
-test-vector-transfer-flatten-patterns.
Reviewed By: nicolasvasilache
* Generalizes passes linalg-detensorize, linalg-fold-unit-extent-dims, convert-elementwise-to-linalg.
* I feel that more work could be done in the future (i.e. make FunctionLike into a proper OpInterface and extend actions in dialect conversion to be trait based), and this patch would be a good record of why that is useful.
* Note for downstreams:
* Since these passes are now generic, they do not automatically nest with pass managers set up for implicit nesting.
* The Detensorize pass must run on a FunctionLike, and this requires explicit nesting.
* Addressed missed comments from the original and per-suggestion removed the assert on FunctionLike in ElementwiseToLinalg and DropUnitDims.cpp, which also is what was causing the integration test to fail.
This reverts commit aa8815e42e.
Differential Revision: https://reviews.llvm.org/D115671
* Generalizes passes linalg-detensorize, linalg-fold-unit-extent-dims, convert-elementwise-to-linalg.
* I feel that more work could be done in the future (i.e. make FunctionLike into a proper OpInterface and extend actions in dialect conversion to be trait based), and this patch would be a good record of why that is useful.
* Note for downstreams:
* Since these passes are now generic, they do not automatically nest with pass managers set up for that.
* If running them over nested functions, you must nest explicitly. Upstream has adopted this style but *-opt still has some uses of implicit pipelines via args. See tests for argument changes needed.
Differential Revision: https://reviews.llvm.org/D115645
Using this implementation of the interface it is possible to query the size, ABI alignment as well as the preferred alignment of a struct. It should yield the same results as LLVMs `llvm::DataLayout` on an equivalent `llvm::StructType`, including for packed structs.
Additionally it is also possible to increase the ABI and preferred alignment using a data layout entry with the type `llvm.struct<()>, which serves the same functionality as the `a:` component in LLVMs data layout string.
Differential Revision: https://reviews.llvm.org/D115600
The 0-D case gets lowered in almost the same way that the 1-D case does
in VectorCreateMaskOpConversion. I also had to slightly update the
verifier for the op to always require exactly 1 operand in the 0-D case.
Depends On D115220
Reviewed by: ftynse
Differential revision: https://reviews.llvm.org/D115221
Following the example of `VectorOfAnyRankOf`, I've done a few changes in the
`.td` files to help with adding the support for the 0-D case gradually.
Reviewed By: ftynse
Differential Revision: https://reviews.llvm.org/D115220
`(void)` was added when LogicalResult was marked as non
discard. This commit cleans them up to properly propagate
failures.
Reviewed By: scotttodd
Differential Revision: https://reviews.llvm.org/D115541
This should really come from a matching target environment. But
as a default, it can be handy (to avoid always listing the full
resource limits attribute in IR, etc.). It's common to see 32
so use that as the subgroup size.
Reviewed By: scotttodd
Differential Revision: https://reviews.llvm.org/D115534
In SPIR-V, symbol names are encoded as `OpName` instructions.
They are not semantic impacting and can be omitted, which can
reduce the binary size.
Reviewed By: scotttodd
Differential Revision: https://reviews.llvm.org/D115531
Introduce a function `getNumIdKind` that returns the number of ids of the
specified kind. Remove the function `assertAtMostNumIdKind` and instead just
directly assert the inequality with a call to `getNumIdKind`.
NFC. Move out and expose affine scalar replacement utility through
affine utils. Renaming misleading forwardStoreToLoad ->
affineScalarReplace. Update a stale doc comment.
Differential Revision: https://reviews.llvm.org/D115495
Switch the attribute creation operations to use attr-dict-with-
keyword to avoid conflicts (in the case of pdl.attribute) and
confusion(in the case of pdl_interp.create_attribute) with
having a DictionaryAttr as a value and specifying the
attributes of the operation itself (as a dictionary).
Differential Revision: https://reviews.llvm.org/D114815
The results of a rewrite are optional, but we currently require
them to be present in the assembly format. This commit
makes the results component in the format optional.
Differential Revision: https://reviews.llvm.org/D114814
Custom ops that have no parser or printer should fall back to the dialect's parser and/or printer hooks. This avoids the need to define parsers and printers that simply dispatch to the dialect hook.
Reviewed By: mehdi_amini, rriddle
Differential Revision: https://reviews.llvm.org/D115481
This patch adds the documentation for the operations `omp.atomic.read`,
`omp.atomic.write` and `omp.atomic.update`.
Reviewed By: peixin
Differential Revision: https://reviews.llvm.org/D115445
- Define a gpu.printf op, which can be lowered to any GPU printf() support (which is present in CUDA, HIP, and OpenCL). This op only supports constant format strings and scalar arguments
- Define the lowering of gpu.pirntf to a call to printf() (which is what is required for AMD GPUs when using OpenCL) as well as to the hostcall interface present in the AMD Open Compute device library, which is the interface present when kernels are running under HIP.
- Add a "runtime" enum that allows specifying which of the possible runtimes a ROCDL kernel will be executed under or that the runtime is unknown. This enum controls how gpu.printf is lowered
This change does not enable lowering for Nvidia GPUs, but such a lowering should be possible in principle.
And:
[MLIR][AMDGPU] Always set amdgpu-implicitarg-num-bytes=56 on kernels
This is something that Clang always sets on both OpenCL and HIP kernels, and failing to include it causes mysterious crashes with printf() support.
In addition, revert the max-flat-work-group-size to (1, 256) to avoid triggering bugs in the AMDGPU backend.
Reviewed By: mehdi_amini
Differential Revision: https://reviews.llvm.org/D110448
This patch factors out math functionality that is a subset of Presburger arithmetic and moves it from FlatAffineConstraints to Presburger/IntegerPolyhedron. This patch only moves some parts of the functionality planned to be moved, with subsequent patches moving more functionality. There are three main reasons for this:
1. This split makes the Presburger Library easier and more flexible to use
across MLIR, by not depending on IR.
2. This split allows the Presburger library to be developed independently from
Affine Analysis, with Affine Analysis using this library.
3. With more functionality being upstreamed to the Presburger Library, the
mlir/Analysis directory will be cluttered with Presburger library components
since they depend on math functionality from FlatAffineConstraints. Moving this
functionality to the Presburger directory allows keeping the new functionality
in the Presburger directory.
This patch is part of an ongoing effort to make the Presburger Library easier to use. The motivation for this effort is the feedback received at the LLVM conference from Mehdi and others.
Reviewed By: bondhugula
Differential Revision: https://reviews.llvm.org/D114674
This patch provides functionality for simplifying `PresburgerSet`s by checking if any `FlatAffineConstraints` in the set is contained in another, and removing such redundant FACs.
This is part of a series of patches to provide functionality for [integer set coalescing](http://impact.gforge.inria.fr/impact2015/papers/impact2015-verdoolaege.pdf) in MLIR.
Reviewed By: arjunp
Differential Revision: https://reviews.llvm.org/D110617
This patch supports the atomic construct (update) following section 2.17.7 of OpenMP 5.0 standard. Also added tests and verifier for the same.
Reviewed By: kiranchandramohan, peixin
Differential Revision: https://reviews.llvm.org/D112982
Count leading/trailing zeros are an existing LLVM intrinsic. Added LLVM
support for the intrinsics with lowerings from the math dialect to LLVM
dialect.
Reviewed By: ftynse
Differential Revision: https://reviews.llvm.org/D115206
This adds a new option `dialectFilter` to BufferizationOptions. Only ops from dialects that are allow-listed in the filter are bufferized. Other ops are left unbufferized. Note: This option requires `allowUnknownOps = true`.
To make use of `dialectFilter`, BufferizationOptions or BufferizationState must be passed to various helper functions.
The purpose of this change is to provide a better infrastructure for partial bufferization, which will be fully activated in a subsequent change.
Differential Revision: https://reviews.llvm.org/D114691
The new form of printing attribute in the declarative assembly is eliding the `#dialect.mnemonic` prefix to only keep the `<....>` part.
Differential Revision: https://reviews.llvm.org/D113873
This revision implements sparse outputs (from scratch) in all cases where
the loops can be reordered with all but one parallel loops outer. If the
inner parallel loop appears inside one or more reductions loops, then an
access pattern expansion is required (aka. workspaces in TACO speak).
Reviewed By: bixia
Differential Revision: https://reviews.llvm.org/D115091
These functions are generic utility functions that operates on
affine ops within SCF regions. Moving them to their own files
for a better code structure, instead of mixing with loop
specialization logic.
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D115245
This change mainly changes the API. There is no mentioning of FuncOps in ComprehensiveBufferize anymore.
Also, bufferize methods of the op interface are called for ops without tensor operands/results if they have a region.
Differential Revision: https://reviews.llvm.org/D115212
For a 1x1 weight and stride of 1, the input/weight can be reshaped and
multiplied elementwise then reshaped back
Reviewed By: rsuderman, KoolJBlack
Differential Revision: https://reviews.llvm.org/D115207
Make fields private and clean up the interface. In particular, BufferizableOpInterface::bufferize no longer has access to `aliasInfo`. This was potentially dangerous because some of the ops registered in BufferizationAliasInfo may have been deleted.
Differential Revision: https://reviews.llvm.org/D114931
Fixed the tosa.conv2d to tosa.fully_connected canonicalization for incorrect
output channels. Included uptes to tests to include checks for the result
shapes during canonicalization.
This allows conv2d to transform to the simpler fully_connected operation.
Reviewed By: mravishankar
Differential Revision: https://reviews.llvm.org/D115170
Conversion of LLVM named structs leads to them being renamed since we cannot
modify the body of the struct type once it is set. Previously, this applied to
all named struct types, even if their element types were not affected by the
conversion. Make this behvaior only applicable when element types are changed.
This requires making the LLVM dialect type-compatibility check recursively look
at the element types (arguably, it should have been doing than since the moment
the LLVM dialect type system stopped being closed). In addition, have a more
lax check for outer types only to avoid repeated check when necessary (e.g.,
parser, verifiers that are going to also look at the inner type).
Reviewed By: wsmoses
Differential Revision: https://reviews.llvm.org/D115037
This is a cleanup of ModuleBufferization. Instead of storing information about writable function arguments in BufferizationAliasInfo, we can use isWritable and make the decision there, based on dialect-specifc bufferization state.
Differential Revision: https://reviews.llvm.org/D114930
Remove all function calls related to buffer equivalence from bufferize implementations.
Add a new PostAnalysisStep for scf.for that ensures that yielded values are equivalent to the corresponding BBArgs. (This was previously checked in `bufferize`.) This will be relaxed in a subsequent commit.
Note: This commit changes two test cases. These were broken by design
and should not have passed. With the new scf.for PostAnalysisStep, this
bug was fixed.
Differential Revision: https://reviews.llvm.org/D114927
Adding the default implementation of `getLoopIteratorTypes` and
`getLoopBounds` allows ExternalModels to override these methods.
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D115101
Collect equivalent BBArgs right after the equivalence analysis of the FuncOp and before bufferizing. This is in preparation of decoupling bufferization from aliasInfo.
Also gather equivalence info for CallOps, which was missing in the
previous commit.
Differential Revision: https://reviews.llvm.org/D114847
To support creating both a mask with just a single `true` and `false` values,
I had to relax the restriction in the verifier that the rank is always equal to
the length of the attribute array, in other words, we now allow:
- `vector.constant_mask [0] : vector<i1>` which gets lowered to
`arith.constant dense<false> : vector<i1>`
- `vector.constant_mask [1] : vector<i1>` which gets lowered to
`arith.constant dense<true> : vector<i1>`
(the attribute list for the 0-D case must be a singleton containing
either `0` or `1`)
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D115023
Let the user registers their own handler to processing the matching
failure information.
Reviewed By: rriddle
Differential Revision: https://reviews.llvm.org/D110896
Instead of checking buffer equivalence during bufferization, gather buffer equivalence information right after the analysis. This is in preparation of decoupling bufferization from BufferizationAliasInfo.
This change also fixes equivalence analysis for scf.if op results, which was not fully implemented. scf.if op results are equivalent to their corresponding yield values if both yield values are equivalent.
Differential Revision: https://reviews.llvm.org/D114774
Also set insertion point right before calling `bufferize`. No need to put an InsertionGuard anymore.
Differential Revision: https://reviews.llvm.org/D114928
This reverts commit 13bdb7ab4a. The commit introduced/uncovered an unintended bug in models containing Conv2D.
Reviewed By: mravishankar
Differential Revision: https://reviews.llvm.org/D115079
BufferizationState had map/lookup overloads for non-tensor values. This was necessary for IREE. There is now a better way to do this, so these overloads can be removed.
Differential Revision: https://reviews.llvm.org/D114929
Allow ops that are not bufferizable in the input IR. (Deactivated by default.)
bufferization::ToMemrefOp and bufferization::ToTensorOp are generated at the bufferization boundaries.
Differential Revision: https://reviews.llvm.org/D114669
Also store a reference to BufferizationOptions in BufferizationState. This is in preparation of adding support for partial bufferization.
Differential Revision: https://reviews.llvm.org/D114661
The implementation only allows to bit-cast between two 0-D vectors. We could
probably support casting from/to vectors like `vector<1xf32>`, but I wasn't
convinced that this would be important and it would require breaking the
invariant that `BitCastOp` works only on vectors with equal rank.
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D114854
This change provides `BufferizableOpInterface` implementations for ops from the Bufferization dialects. These ops are needed at the bufferization boundaries for partial bufferization.
Differential Revision: https://reviews.llvm.org/D114618
Affine maps and integer sets previously relied on a single lock for creating unique instances. In a multi-threaded setting, this lock becomes a contention point. This commit updates AffineMap and IntegerSet to use StorageUniquer instead. StorageUniquer internally uses sharded locks and thread-local caches to reduce contention. It is already used for affine expressions, types and attributes. On my local machine, this gives me a 5X speedup for an application that manipulates a lot of affine maps and integer sets.
This commit also removes the integer set uniquer threshold. The threshold was used to avoid adding integer sets with a lot of constraints to the hash_map containing unique instances, but the constraints and the integer set were still allocated in the same allocator and never freed, thus not saving any space expect for the hash-map entry.
Reviewed By: rriddle
Differential Revision: https://reviews.llvm.org/D114942
This patch implements detecting duplicate local identifiers by extracting their
division representation while merging local identifiers.
For example, given the FACs A, B:
```
A: (x, y)[s0] : (exists d0 = [x / 4], d1 = [y / 4]: d0 <= s0, d1 <= s0, x + y >= 2)
B: (x, y)[s0] : (exists d0 = [x / 4], d1 = [y / 4]: d0 <= s0, d1 <= s0, x + y >= 5)
```
The intersection of A and B without this patch would lead to the following FAC:
```
(x, y)[s0] : (exists d0 = [x / 4], d1 = [y / 4], d2 = [x / 4], d3 = [x / 4]: d0 <= s0, d1 <= s0, d2 <= s0, d3 <= s0, x + y >= 2, x + y >= 5)
```
after this patch, merging of local ids will detect that `d0 = d2` and `d1 = d3`,
and the intersection of these two FACs will be (after removing duplicate constraints):
```
(x, y)[s0] : (exists d0 = [x / 4], d1 = [y / 4] : d0 <= s0, d1 <= s0, x + y >= 2, x + y >= 5)
```
This reduces the number of constraints by 2 (constraints) + 4 (2 constraints for each extra division) for this case.
This is used to reduce the output size representation of operations like
PresburgerSet::subtract, PresburgerSet::intersect which require merging local
variables.
Reviewed By: arjunp, bondhugula
Differential Revision: https://reviews.llvm.org/D112867
Revert changes that were meant to be sent as a single commit with
summary for the differential review, but were accidently sent directly.
This reverts commit 3bc5353fc6.