This ensures that we generate memref types with matching layout maps. (Especially when using partial bufferization passes.)
Differential Revision: https://reviews.llvm.org/D120893
This commit deletes the old dialect conversion-based bufferization patterns, which are now obsolete.
Differential Revision: https://reviews.llvm.org/D120883
The loopInfos gets invalidated after collapsing nested loops. Use the
saved afterIP since the returned afterIP by applyDynamicWorkshareLoop
may be not valid.
Reviewed By: shraiysh
Differential Revision: https://reviews.llvm.org/D120294
StandardToSPIRV currently contains an assortment of patterns converting from
different dialects to SPIRV. This commit splits up StandardToSPIRV into separate
conversions for each of the dialects involved (some of which already exist).
Differential Revision: https://reviews.llvm.org/D120767
Add support for extensible dialects, which are dialects that can be
extended at runtime with new operations and types.
These operations and types cannot at the moment implement traits
or interfaces.
Differential Revision: https://reviews.llvm.org/D104554
LLVM defines several default datalayouts for integer and floating point types that are not being considered when importing into MLIR. This patch remedies this.
Reviewed By: ftynse
Differential Revision: https://reviews.llvm.org/D120832
https://reviews.llvm.org/D120423 replaced the use of stacksave/restore with memref.alloca_scope, but kept the save/restore at the same location. This PR places the allocation scope within the wsloop, thus keeping the same allocation scope as the original scf.parallel (e.g. no longer over stack allocating).
Reviewed By: ftynse
Differential Revision: https://reviews.llvm.org/D120772
This patch moves all functionality from IntegerPolyhedron to IntegerRelation.
IntegerPolyhedron is now implemented as a relation with no domain. All existing
functionality is extended to work on relations.
This patch does not affect external users like FlatAffineConstraints as they
can still continue to use IntegerPolyhedron abstraction.
This patch is part of a series of patches to support relations in Presburger
library.
Reviewed By: arjunp
Differential Revision: https://reviews.llvm.org/D120652
Add support for translating data layout specifications for integer and float
types between MLIR and LLVM IR. This is a first step towards removing the
string-based LLVM dialect data layout attribute on modules. The latter is still
available and will remain so until the first-class MLIR modeling can fully
replace it.
Depends On D120739
Reviewed By: wsmoses
Differential Revision: https://reviews.llvm.org/D120740
Add support for integer and float types into the data layout subsystem with
default logic similar to LLVM IR. Given the flexibility of the sybsystem, the
logic can be easily overwritten by operations if necessary. This provides the
connection necessary, e.g., for the GPU target where alignment requirements for
integers and floats differ from those provided by default (although still
compatible with the LLVM IR model). Previously, it was impossible to use
non-default alignment requirements for integer and float types, which could
lead to incorrect address and size calculations when targeting GPUs.
Depends On D120737
Reviewed By: wsmoses
Differential Revision: https://reviews.llvm.org/D120739
This patch adds assemblyFormat for `omp.critical.declare`, `omp.atomic.read`,
`omp.atomic.write`, `omp.atomic.update` and `omp.atomic.capture`.
Also removing those clauses from `parseClauses` that aren't needed
anymore, thanks to the new assemblyFormats.
Reviewed By: NimishMishra, rriddle
Differential Revision: https://reviews.llvm.org/D120248
sparsity values.
Previously, we can't properly handle input tensors with a dimension
ordering that is different from the natural ordering or with a mixed of
compressed and dense dimensions. This change fixes the problems by
passing the dimension ordering and sparsity values to the runtime
routine.
Modify an existing test to test the situation.
Reviewed By: aartbik
Differential Revision: https://reviews.llvm.org/D120777
This adds an option to configure the CMake python search priming
behaviour that was introduced in D118148. In some environments the
priming would cause the "real" search to fail. The default behaviour is
unchanged, i.e. the search will be primed.
Reviewed By: stellaraccident
Differential Revision: https://reviews.llvm.org/D120765
The remanants of Standard was renamed to Func, but the test directory
remained named as Standard. In adidition to fixing the name, this commit
also moves the tests for operations not in the Func dialect to the proper
parent dialect test directory.
The Func has a large number of legacy dependencies carried over from the old
Standard dialect, which was pervasive and contained a large number of varied
operations. With the split of the standard dialect and its demise, a lot of lingering
dead dependencies have survived to the Func dialect. This commit removes a
large majority of then, greatly reducing the dependence surface area of the
Func dialect.
The last remaining operations in the standard dialect all revolve around
FuncOp/function related constructs. This patch simply handles the initial
renaming (which by itself is already huge), but there are a large number
of cleanups unlocked/necessary afterwards:
* Removing a bunch of unnecessary dependencies on Func
* Cleaning up the From/ToStandard conversion passes
* Preparing for the move of FuncOp to the Func dialect
See the discussion at https://discourse.llvm.org/t/standard-dialect-the-final-chapter/6061
Differential Revision: https://reviews.llvm.org/D120624
Previously, convertToMLIRSparseTensor assumes identity storage ordering and all
compressed dimensions. This change extends the function with two parameters for
users to specify the storage ordering and the sparsity of each dimension.
Modify PyTACO to reflect this change.
Reviewed By: aartbik
Differential Revision: https://reviews.llvm.org/D120643
As discussed in https://reviews.llvm.org/D119743 scf.parallel would continuously stack allocate since the alloca op was placd in the wsloop rather than the omp.parallel. This PR is the second stage of the fix for that problem. Specifically, we now introduce an alloca scope around the inlined body of the scf.parallel and enable a canonicalization to hoist the allocations to the surrounding allocation scope (e.g. omp.parallel).
Reviewed By: ftynse
Differential Revision: https://reviews.llvm.org/D120423
The llvm.mlir.global operation accepts a region as initializer. This region
corresponds to an LLVM IR constant expression and therefore should not accept
operations with side effects. Add a corresponding verifier.
Reviewed By: wsmoses, bondhugula
Differential Revision: https://reviews.llvm.org/D120632
The revision renames the following OpDSL functions:
```
TypeFn.cast -> TypeFn.cast_signed
BinaryFn.min -> BinaryFn.min_signed
BinaryFn.max -> BinaryFn.max_signed
```
The corresponding enum values on the C++ side are renamed accordingly:
```
#linalg.type_fn<cast> -> #linalg.type_fn<cast_signed>
#linalg.binary_fn<min> -> #linalg.binary_fn<min_signed>
#linalg.binary_fn<max> -> #linalg.binary_fn<max_signed>
```
Depends On D120110
Reviewed By: aartbik
Differential Revision: https://reviews.llvm.org/D120562
The revision extends OpDSL with unary and binary function attributes. A function attribute, makes the operations used in the body of a structured operation configurable. For example, a pooling operation may take an aggregation function attribute that specifies if the op shall implement a min or a max pooling. The goal of this revision is to define less and more flexible operations.
We may thus for example define an element wise op:
```
linalg.elem(lhs, rhs, outs=[out], op=BinaryFn.mul)
```
If the op argument is not set the default operation is used.
Depends On D120109
Reviewed By: nicolasvasilache, aartbik
Differential Revision: https://reviews.llvm.org/D120110
Add applyStaticChunkedWorkshareLoop method implementing static schedule when chunk-size is specified. Unlike a static schedule without chunk-size (where chunk-size is chosen by the runtime such that each thread receives one chunk), we need two nested loops: one for looping over the iterations of a chunk, and a second for looping over all chunks assigned to the threads.
This patch includes the following related changes:
* Adapt applyWorkshareLoop to triage between the schedule types, now possible since all schedules have been implemented. The default schedule is assumed to be non-chunked static, as without OpenMPIRBuilder.
* Remove the chunk parameter from applyStaticWorkshareLoop, it is ignored by the runtime. Change the value for the value passed to the init function to 0, as without OpenMPIRBuilder.
* Refactor CanonicalLoopInfo::setTripCount and CanonicalLoopInfo::mapIndVar as used by both, applyStaticWorkshareLoop and applyStaticChunkedWorkshareLoop.
* Enable Clang to use the OpenMPIRBuilder in the presence of the schedule clause.
Differential Revision: https://reviews.llvm.org/D114413
Add a pattern matcher for ExtractSliceOp when its source is a constant.
The matching heuristics can be governed by the control function since
generating a new constant is not always beneficial.
Differential Revision: https://reviews.llvm.org/D119605
If we have a chain of `tensor.insert_slice` ops inserting some
`tensor.pad` op into a `linalg.fill` and ranges do not overlap,
we can also elide the `tensor.pad` later.
Reviewed By: ThomasRaoux
Differential Revision: https://reviews.llvm.org/D120446
Fold tensor.insert_slice(tensor.pad(<input>), linalg.fill) into
tensor.insert_slice(<input>, linalg.fill) if the padding value and
the filling value are the same.
Reviewed By: ThomasRaoux
Differential Revision: https://reviews.llvm.org/D120410
Improve the LinalgOp verification to ensure the iterator types is known. Previously, unknown iterator types have been ignored without warning, which can lead to confusing bugs.
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D120649
This patch removes the builders for `omp.wsloop` operation that aren't
specifically needed anywhere. We can add them later if the need arises.
Reviewed By: kiranchandramohan
Differential Revision: https://reviews.llvm.org/D120533
This patch moves IntegerPolyhedron::reset to FlatAffineConstraints::reset. This
function is not required in IntegerPolyhedron and creates ambiguity while
shifting implementations to IntegerRelation.
This patch is part of a series of patches to introduce relations in Presburger
library.
Reviewed By: arjunp
Differential Revision: https://reviews.llvm.org/D120628
PDL currently doesn't support result values from constraints, meaning we need
to error out until this is actually supported to avoid crashes.
Differential Revision: https://reviews.llvm.org/D119782
This commits adds a C++ generator to PDLL that generates wrapper PDL patterns
directly usable in C++ code, and also generates the definitions of native constraints/rewrites
that have code bodies specified in PDLL. This generator is effectively the PDLL equivalent of
the current DRR generator, and will allow easy replacement of DRR patterns with PDLL patterns.
A followup will start to utilize this for end-to-end integration testing and show case how to
use this as a drop-in replacement for DRR tablegen usage.
Differential Revision: https://reviews.llvm.org/D119781
If the operand list or result list of an operation expression is not specified, we interpret
this as meaning that the operands/results are "unconstraint" (i.e. "could be anything").
We currently don't properly handle differentiating this case from the case of
"no operands/results". This commit adds the insertion of implicit value/type range
variables when these lists are unspecified. This allows for adding proper support
for when zero operands or results are expected.
Differential Revision: https://reviews.llvm.org/D119780
This commits starts to plumb PDLL down into MLIR and adds an initial
PDL generator. After this commit, we will have conceptually support
end-to-end execution of PDLL. Followups will add CPP generation to
match the current DRR setup, and begin to add various end-to-end
tests to test PDLL execution.
Differential Revision: https://reviews.llvm.org/D119779
This patch removes a redundant check in hasConsistentState which is always true
after introduction of PresburgerSpace.
Reviewed By: arjunp
Differential Revision: https://reviews.llvm.org/D120615
This patch moves identifier kind specific insert/append functions like
`insertDimId`, `appendSymbolId`, etc. from IntegerPolyhedron to
FlatAffineConstraints.
This change allows for a smoother transition to IntegerRelation.
This change is part of a series of patches to introduce Relations in Presburger
library.
Reviewed By: arjunp
Differential Revision: https://reviews.llvm.org/D120576
This patch factors out various checks for dimension compatibility to
PresburgerSpace::isEqual and PresburgerLocalSpace::isEqual (for local
identifiers).
Reviewed By: arjunp
Differential Revision: https://reviews.llvm.org/D120565
In the main-loop of the current coalesce implementation `i` was incremented
twice for some cases. This patch fixes this bug and adds a regression
testcase.
Reviewed By: arjunp
Differential Revision: https://reviews.llvm.org/D120613
The PyTACO DSL doesn't support reduction to scalars. This change
enhances the MLIR-PyTACO implementation to support reduction to scalars.
Extend an existing test to show the syntax of reduction to scalars and
two methods to retrieve the scalar values.
Reviewed By: aartbik
Differential Revision: https://reviews.llvm.org/D120572
The AVX2 lowering for transpose operations is only applicable to f32 vector types.
Reviewed By: aartbik
Differential Revision: https://reviews.llvm.org/D120427
The existing AVX2 lowering patterns for the transpose op only triggers if the
input vector is 2-D. This patch extends the patterns to trigger for n-D vectors
which are effectively 2-D vectors (e.g., vector<1x4x1x8x1). The main constraint
for the generalized AVX2 patterns to be applicable to these vectors is that the
dimensions that are greater than one must be transposed. Otherwise, the existing
patterns are not applicable.
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D119505
This change gives explicit order of verifier execution and adds
`hasRegionVerifier` and `verifyWithRegions` to increase the granularity
of verifier classification. The orders are as below,
1. InternalOpTrait will be verified first, they can be run independently.
2. `verifyInvariants` which is constructed by ODS, it verifies the type,
attributes, .etc.
3. Other Traits/Interfaces that have marked their verifier as
`verifyTrait` or `verifyWithRegions=0`.
4. Custom verifier which is defined in the op and has marked
`hasVerifier=1`
If an operation has regions, then it may have the second phase,
5. Traits/Interfaces that have marked their verifier as
`verifyRegionTrait` or
`verifyWithRegions=1`. This implies the verifier needs to access the
operations in its regions.
6. Custom verifier which is defined in the op and has marked
`hasRegionVerifier=1`
Note that the second phase will be run after the operations in the
region are verified. Based on the verification order, you will be able to
avoid verifying duplicate things.
Reviewed By: Mogball
Differential Revision: https://reviews.llvm.org/D116789
Fix MLIR-PyTACO and some tests to use np.array_equal to compare integer
values.
Reviewed By: aartbik
Differential Revision: https://reviews.llvm.org/D120526
Split arithmetic function into unary and binary functions. The revision prepares the introduction of unary and binary function attributes that work similar to type function attributes.
Depends On D120108
Reviewed By: aartbik
Differential Revision: https://reviews.llvm.org/D120109
This change allows the use of scalar tensors with index 0 in tensor index
expressions. In this case, the scalar value is broadcast to match the
dimensions of other tensors in the same expression.
Using scalar tensors as a destination in tensor index expressions is not
supported in the PyTACO DSL.
Add a PyTACO test to show the use of scalar tensors.
Reviewed By: aartbik
Differential Revision: https://reviews.llvm.org/D120524
Prepare the OpDSL function handling to introduce more function classes. A follow up commit will split ArithFn into UnaryFn and BinaryFn. This revision prepares the split by adding a function kind enum to handle different function types using a single class on the various levels of the stack (for example, there is now one TensorFn and one ScalarFn).
Depends On D119718
Reviewed By: aartbik
Differential Revision: https://reviews.llvm.org/D120108
This patch refactors the looping strategy of coalesce for future patches. The new strategy works in-place and uses IneqType to organize inequalities into vectors of the same type. Future coalesce cases will pattern match on this organization. E.g. the contained case needs all inequalities and equalities to be redundant, so this case becomes checking whether the respective vectors are empty. For other cases, the patterns consider the types of all inequalities of both sets making it wasteful to only consider whether a can be coalesced with b in one step, as inequalities would need to be typed again for the opposite case. Therefore, the new strategy tries to coalesce a with b and b with a in a single step.
Reviewed By: Groverkss, arjunp
Differential Revision: https://reviews.llvm.org/D120392
This patch replaces various functions over inequalities/equalities in
IntegerPolyhedron with Matrix functions already implementing them or refactors
them to a Matrix function.
Reviewed By: arjunp
Differential Revision: https://reviews.llvm.org/D120482
This patch moves the Presburger library to a new `presburger` namespace.
This allows to shorten some names, helps to avoid polluting the mlir namespace,
and also provides some structure.
Reviewed By: arjunp
Differential Revision: https://reviews.llvm.org/D120505
This patch removes redundant code from fourierMotzkinEliminate implementation
using existing functions in IntegerPolyhedron.
Reviewed By: arjunp
Differential Revision: https://reviews.llvm.org/D120502
Previously, OpDSL operation used hardcoded type conversion operations (cast or cast_unsigned). Supporting signed and unsigned casts thus meant implementing two different operations. Type function attributes allow us to define a single operation that has a cast type function attribute which at operation instantiation time may be set to cast or cast_unsigned. We may for example, defina a matmul operation with a cast argument:
```
@linalg_structured_op
def matmul(A=TensorDef(T1, S.M, S.K), B=TensorDef(T2, S.K, S.N), C=TensorDef(U, S.M, S.N, output=True),
cast=TypeFnAttrDef(default=TypeFn.cast)):
C[D.m, D.n] += cast(U, A[D.m, D.k]) * cast(U, B[D.k, D.n])
```
When instantiating the operation the attribute may be set to the desired cast function:
```
linalg.matmul(lhs, rhs, outs=[out], cast=TypeFn.cast_unsigned)
```
The revsion introduces a enum in the Linalg dialect that maps one-by-one to the type functions defined by OpDSL.
Reviewed By: aartbik
Differential Revision: https://reviews.llvm.org/D119718
This patch removes the `const` from `usingBigM` to enable the implicit copy assignment operator for Simplex.
Reviewed By: Groverkss
Differential Revision: https://reviews.llvm.org/D120542
This transformation is useful to break dependency between consecutive loop
iterations by increasing the size of a temporary buffer. This is usually
combined with heavy software pipelining.
Differential Revision: https://reviews.llvm.org/D119406
This adds a variable op, emitted as C/C++ locale variable, which can be
used if the `emitc.constant` op is not sufficient.
As an example, the canonicalization pass would transform
```mlir
%0 = "emitc.constant"() {value = 0 : i32} : () -> i32
%1 = "emitc.constant"() {value = 0 : i32} : () -> i32
%2 = emitc.apply "&"(%0) : (i32) -> !emitc.ptr<i32>
%3 = emitc.apply "&"(%1) : (i32) -> !emitc.ptr<i32>
emitc.call "write"(%2, %3) : (!emitc.ptr<i32>, !emitc.ptr<i32>) -> ()
```
into
```mlir
%0 = "emitc.constant"() {value = 0 : i32} : () -> i32
%1 = emitc.apply "&"(%0) : (i32) -> !emitc.ptr<i32>
%2 = emitc.apply "&"(%0) : (i32) -> !emitc.ptr<i32>
emitc.call "write"(%1, %2) : (!emitc.ptr<i32>, !emitc.ptr<i32>) -> ()
```
resulting in pointer aliasing, as %1 and %2 point to the same address.
In such a case, the `emitc.variable` operation can be used instead.
Reviewed By: jpienaar
Differential Revision: https://reviews.llvm.org/D120098
This patch removes binary operator enum which was introduced with `omp.atomic.update`. Now the update operation handles update in a region so this is no longer required.
Reviewed By: kiranchandramohan
Differential Revision: https://reviews.llvm.org/D120458
Documentation exists about the details of the API but is missing a
description of the overall structure per dialect.
Reviewed By: shabalin
Differential Revision: https://reviews.llvm.org/D117002
The current implementation of ShuffleVectorOp assumes all vectors are
scalable. LLVM IR allows shufflevector operations on scalable vectors,
and the current translation between LLVM Dialect and LLVM IR does the
rigth thing when the shuffle mask is all zeroes. This is required to
do a splat operation on a scalable vector, but it doesn't make sense
for scalable vectors outside of that operation, i.e.: with non-all zero
masks.
Differential Revision: https://reviews.llvm.org/D118371
In D115022, we introduced an optimization where OpResults of a `linalg.generic` may bufferize in-place with an "in" OpOperand if the corresponding "out" OpOperand is not used in the computation.
This optimization can lead to unexpected behavior if the newly chosen OpOperand is in the same alias set as another OpOperand (that is used in the computation). In that case, the newly chosen OpOperand must bufferize out-of-place. This can be confusing to users, as always choosing the "out" OpOperand (regardless of whether it is used) would be expected when having the notion of "destination-passing style" in mind.
With this change, we go back to always bufferizing in-place with "out" OpOperands by default, but letting users override the behavior with a bufferization option.
Differential Revision: https://reviews.llvm.org/D120182
Previously only accessing values for `index` and signless int types
would work; signed and unsigned ints would hit an assert in
`IntegerAttr::getInt`. This exposes `IntegerAttr::get{S,U}Int` to the C
API and calls the appropriate function from the python bindings.
Reviewed By: ftynse
Differential Revision: https://reviews.llvm.org/D120194
Previously, we only support float64. We now support float32 and float64. When
constructing a tensor without providing a data type, the default is float32.
Fix the tests to data type consistency. All PyTACO application tests now use
float32 to match the default data type of TACO. Other tests may use float32 or
float64.
Reviewed By: aartbik
Differential Revision: https://reviews.llvm.org/D120356
Now that sparse tensor types are first-class citizens and the sparse compiler
is taking shape, it is time to make sure other compiler optimizations compose
well with sparse tensors. Mostly, this should be completely transparent (i.e.,
dense and sparse take the same path). However, in some cases, optimizations
only make sense in the context of sparse tensors. This is a first example of
such an optimization, where fusing a sampled elt-wise multiplication only makes
sense when the resulting kernel has a potential lower asymptotic complexity due
to the sparsity.
As an extreme example, running SDDMM with 1024x1024 matrices and a sparse
sampling matrix with only two elements runs in 463.55ms in the unfused
case but just 0.032ms in the fused case, with a speedup of 14485x that
is only possible in the exciting world of sparse computations!
Reviewed By: mravishankar
Differential Revision: https://reviews.llvm.org/D120429
By specifying a sectionMemoryMapper, users can control how
memory for JIT code is allocated.
In particular, I need this in order to use a named memory
region so that profilers such as perf(1) can correctly label
execution cycles coming from JIT'ed code.
Reviewed-by: ezhulenev
Differential Revision: https://reviews.llvm.org/D120415
Given a cmpf of either uitofp or sitofp and a constant, attempt to canonicalize it to a cmpi.
This PR rewrites equivalent code within LLVM to now apply to MLIR arith.
Reviewed By: rriddle
Differential Revision: https://reviews.llvm.org/D117257
+ compare block size with the unrollable inner dimension
+ reduce nesting in the code and simplify a bit IR building
Reviewed By: cota
Differential Revision: https://reviews.llvm.org/D120075
This eliminates the requirement that pass-related strings outlive pass
instances, which will facilitate future work enabling dynamic passes
written in other languages.
Differential Revision: https://reviews.llvm.org/D120341
Its number of optional parameters has grown too large,
which makes adding new optional parameters quite a chore.
Fix this by using an options struct.
Reviewed By: mehdi_amini
Differential Revision: https://reviews.llvm.org/D120380
Use an `MLIRContext` declared in a single place in the `parsePoly` function that almost all Presburger unit tests use for parsing sets. This function is only used in tests.
This saves us from having to declare and pass a new `MLIRContext` in every test.
Reviewed By: bondhugula, mehdi_amini
Differential Revision: https://reviews.llvm.org/D119251
This trait results in PDL ops being erroneously CSE'd. These ops are side-effect free in the rewriter but not in the matcher (where unused values aren't allowed anyways). These ops should have a more nuanced side-effect modeling, this is fixing a bug introduced by a previous change.
Reviewed By: rriddle
Differential Revision: https://reviews.llvm.org/D120222
Header class in SPIR-V HTML spec has changed. Update script to reflect that.
Reviewed By: antiagainst
Differential Revision: https://reviews.llvm.org/D120179
The related functionality is moved over to the bufferization dialect. Test cases are cleaned up a bit.
Differential Revision: https://reviews.llvm.org/D120191
This patch adds a class to represent a relation in Presburger Library.
This patch only adds the skeleton class. Functionality from IntegerPolyhedron
will be moved to IntegerRelation in later patches to make it easier to review.
This patch is a part of a series of patches adding support for relations in
Presburger Library.
Reviewed By: arjunp
Differential Revision: https://reviews.llvm.org/D120156
The `.def` and `.def_property_readonly` functions in PybindAdaptors.h should
construct the functions as method of the current class rather than as method of
pybind11:none(), which is an object and not even a class.
Depends On D117658
Reviewed By: gysit
Differential Revision: https://reviews.llvm.org/D117659
This commit adds canonicalization pattern in `linalg.generic` op
for static shape inference. If any of the inputs or outputs have
static shape or is casted from a tensor of static shape, then
shapes of all the inputs and outputs can be inferred by using the
affine map of the static shape input/output.
Signed-Off-By: Prateek Gupta <prateek@nod-labs.com>
Reviewed By: mravishankar
Differential Revision: https://reviews.llvm.org/D118929
This patch adds assemblyFormat for omp.sections operation.
Some existing functions have been altered to fit the custom directive
in assemblyFormat. This has led to their callsites to get modified too,
but those will be removed in later patches, when other operations get
their assemblyFormat. All operations were not changed in one patch for
ease of review.
Reviewed By: Mogball
Differential Revision: https://reviews.llvm.org/D120176
This patch adds typing of inequalities to the simplex. This is a cental part of the coalesce algorithm and will be heavily used in later coalesce patches. Currently, only the three most basic types are supported with more to be introduced when they are needed.
Reviewed By: arjunp
Differential Revision: https://reviews.llvm.org/D119925
This allows to differentiate between the cases where the optimum does not
exist due to being unbounded and due to the polytope being empty.
Reviewed By: Groverkss
Differential Revision: https://reviews.llvm.org/D120127
Add `BufferizableOpInterface::verifyAnalysis`. Ops can implement this method to check for expected invariants and limitations.
The purpose of this change is to introduce a modular way of checking assertions such as `assertScfForAliasingProperties`.
Differential Revision: https://reviews.llvm.org/D120189
This patch adds assemblyFormat for omp.parallel operation.
Some existing functions have been altered to fit the custom directive
in assemblyFormat. This has led to their callsites to get modified too,
but those will be removed in later patches, when other operations get
their assemblyFormat. All operations were not changed in one patch for
ease of review.
Reviewed By: Mogball
Differential Revision: https://reviews.llvm.org/D120157
These routines will need to be specialized a lot more based on value types,
index types, pointer types, and permutation/dimension ordering. This is a
careful first step, providing some functionality needed in PyTACO bridge.
Reviewed By: bixia
Differential Revision: https://reviews.llvm.org/D120154
This patch removes the following clauses from OpenMP Dialect:
- private
- firstprivate
- lastprivate
- shared
- default
- copyin
- copyprivate
The privatization clauses are being handled in the flang frontend. The
data copying clauses are not being handled anywhere for now. Once
we have a better picture of how to handle these clauses in OpenMP
Dialect, we can add these. For the time being, removing unneeded
clauses.
For detailed discussion about this refer to [[ https://discourse.llvm.org/t/rfc-privatisation-in-openmp-dialect/3526 | Privatisation in OpenMP dialect ]]
Reviewed By: kiranchandramohan, clementval
Differential Revision: https://reviews.llvm.org/D120029
This patch introducing seperating dimensions into two types: Domain and Range.
This allows building relations over PresburgerSpace.
This patch is part of a series of patches to introduce relations in Presburger
library.
Reviewed By: arjunp
Differential Revision: https://reviews.llvm.org/D119709
MLIR has the notion of allocation scopes which specify that stack allocations (e.g. memref.alloca, llvm.alloca) should be freed or equivalently aren't available at the end of the corresponding region.
Currently neither OpenMP parallel nor SCF parallel regions have the notion of such a scope.
This clearly makes sense for an OpenMP parallel as this is implemented in with a new function which outlines the region, and clearly any allocations in that newly outlined function have a lifetime that ends at the return of the function, by definition.
While SCF.parallel doesn't have a guaranteed runtime which it is implemented with, this similarly makes sense for SCF.parallel since otherwise an allocation within an SCF.parallel will needlessly continue to allocate stack memory that isn't cleaned up until the function (or other allocation scope op) which contains the SCF.parallel returns. This means that it is impossible to represent thread or iteration-local memory without causing a stack blow-up. In the case that this stack-blow-up behavior is intended, this can be equivalently represented with an allocation outside of the SCF.parallel with a size equal to the number of iterations.
Reviewed By: ftynse
Differential Revision: https://reviews.llvm.org/D119743
insert is soft deprecated, so remove all references so it's less likely
to be used and can be easily removed in the future.
Differential Revision: https://reviews.llvm.org/D120021
This is a bit awkward since ExtractOp allows both `f32` and
`vector<1xf32>` results for a scalar extraction. Allow both, but make
inference return the scalar to make this as NFC as possible.
This change changes the handling of trailing dimensions with unknown
extent. Users of the changessociationIndicesForReshape helper should
see benefits when transforming reshape like operations into
expand/collapse pairs if the higher-rank type has trailing unknown
dimensions.
The motivating example is a reshape from tensor<16x1x?xi32> to
tensor<16xi32> that can be modeled as collapsing the three dimensions.
Differential Revision: https://reviews.llvm.org/D119730
Previously, NaNs would be dropped in favor of bounded values which was
strictly incorrect. Now the min/max operation propagate this
information. Not all uses of min/max need this, but the given change
will help protect future additions, and this prevents the need for an
additional cmpf and select operation to handle NaNs.
Differential Revision: https://reviews.llvm.org/D120020
This op is added to allow MLIR code running on multi-GPU systems to
select the GPU they want to execute operations on when no GPU is
otherwise specified.
Reviewed By: mehdi_amini
Differential Revision: https://reviews.llvm.org/D119883
NamedAttrList.append(StringAttr, StringAttr) fails to compile because it is
matched to the IteratorT append. Fixes the method to only match if the type is
an iterator.
It is time to compose Linalg related optimizations with SparseTensor
related optimizations. This is a careful first start by adding some
general Linalg optimizations "upstream" of the sparse compiler in the
full sparse compiler pipeline. Some minor changes were needed to make
those optimizations aware of sparsity.
Note that after this, we will add a sparse specific fusion rule,
just to demonstrate the power of the new composition.
Reviewed By: bixia
Differential Revision: https://reviews.llvm.org/D119971
In SPIR-V, resources are represented as global variables that
are bound to certain descriptor. SPIR-V requires those global
variables to be declared as aliased if multiple ones are bound
to the same slot. Such aliased decorations can cause issues
for transcompilers like SPIRV-Cross when converting to source
shading languages like MSL.
So this commit adds a pass to perform analysis of aliased
resources and see if we can unify them into one.
Reviewed By: ThomasRaoux
Differential Revision: https://reviews.llvm.org/D119872
This would create a double free when a memref is passed twice to the
same op. This wasn't a problem at the time the pass was written but is
common since the introduction of scf.while.
There's a latent non-determinism that's triggered by the test, but this
change is messy enough as-is so I'll leave that for later.
Differential Revision: https://reviews.llvm.org/D120044
The MLIR parser allows regions to have an unnamed entry block.
Make this explicit in the language grammar.
Reviewed By: mehdi_amini
Differential Revision: https://reviews.llvm.org/D119950
During dialect conversion, target materialization is triggered to create
cast-like operations when a type mismatch occurs between the value that
replaces a rewritten operation and the type that another operations expects as
operands processed by the type conversion. First, a dummy cast is inserted to
make sure the pattern application can proceed. The decision to trigger the
user-provided materialization hook is taken later based on the result of the
dummy cast having uses. However, it only has uses if other patterns constructed
new operations using the casted value as operand. If existing (legal)
operations use the replaced value, they may have not been updated to use the
casted value yet. The conversion infra would then delete the dummy cast first,
and then would replace the uses with now-invalid (null in the bast case) value.
When deciding whether to trigger cast materialization, check for liveness the
uses not only of the casted value, but also of all the values that it replaces.
This was discovered in the finalizing bufferize pass that cleans up
mutually-cancelling casts without touching other operations. It is not
impossible that there are other scenarios where the dialect converison infra
could produce invalid operand uses because of dummy casts erased too eagerly.
Reviewed By: springerm
Differential Revision: https://reviews.llvm.org/D119937
Previously `gpu-kernel-outlining` pass was also doing index computation sinking into gpu.launch before actual outlining.
Split ops sinking from `gpu-kernel-outlining` pass into separate pass, so users can use theirs own sinking pass before outlining.
To achieve old behavior users will need to call both passes: `-gpu-launch-sink-index-computations -gpu-kernel-outlining`.
Differential Revision: https://reviews.llvm.org/D119932
Currently some of the nested IR building inconsistently uses `nb` and `b`, it's very easy to call wrong builder outside of the current scope, so for simplicity all builders are always called `b`, and in nested IR building regions they just shadow the "parent" builder.
Reviewed By: mehdi_amini
Differential Revision: https://reviews.llvm.org/D120003
A very small refactoring, but a big impact on tests that expect an exact order.
This revision fixes the tests, but also makes them less brittle for similar
minor changes in the future!
Reviewed By: bixia
Differential Revision: https://reviews.llvm.org/D119992
This commit adds a pattern to wrap a tensor.pad op with
an scf.if op to separate the cases where we don't need padding
(all pad sizes are actually zeros) and where we indeed need
padding.
This pattern is meant to handle padding inside tiled loops.
Under such cases the padding sizes typically depend on the
loop induction variables. Splitting them would allow treating
perfect tiles and edge tiles separately.
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D117018
The pad-slice swap pattern generates `scf.if` and `tensor.generate`
to guard against zero-sized slices if it cannot prove the slice is
always non-zero. This is safe but quite conservative. It can be
unnecessary for cases where we know by problem definition such cases
does not exist, even if with dynamic shaped ops or unknown tile/slice
sizes, e.g., convolution padding size = 1 with kernel dim size = 3.
So this commit introduces a control to the pattern to specify
whether to generate the if constructs to handle such cases better,
given that once the if constructs is materialized, it's very hard
to analyze and simplify.
Reviewed By: mravishankar
Differential Revision: https://reviews.llvm.org/D117017
Fusion of `linalg.generic` with
`tensor.expand_shape/tensor.collapse_shape` currently handles fusion
with reshape by expanding the dimensionality of the `linalg.generic`
operation. This helps fuse elementwise operations better since they
are fused at the highest dimensionality while keeping all indexing
maps involved projected permutations. The intent of these is to push
the reshape to the boundaries of functions.
The presence of named ops (or other ops across which the reshape
cannot be propagated) stops the propagation to the edges of the
function. At this stage, the converse patterns that fold the reshapes
with generic ops by collapsing the dimensions of the generic op can
push the reshape towards edges. In particular it helps the case where
reshapes exist in between named ops and generic ops.
`linalg.named_op` -> `tensor.expand_shape` -> `linalg.generic`
Pushing the reshape down will help fusion of `linalg.named_op` ->
`linalg.generic` using tile + fuse transformations.
This pattern is intended to replace the following patterns
1) FoldReshapeByLinearization : These patterns create indexing maps
that are not projected permutations that affect future
transformations. They are only useful for folding unit-dimensions.
2) PushReshapeByExpansion : This pattern has the same functionality
but has some restrictions
a) It tries to avoid creating new reshapes that limits its
applicability. The pattern added here can achieve the same
functionality through use of the `controlFn` that allows clients
of the pattern freedom to make this decision.
b) It does not work for ops with indexing semantics.
These patterns will be deprecated in a future patch.
Differential Revision: https://reviews.llvm.org/D119365
This change exposes printer flags in AsmState and AsmStateImpl. All functions
receiving AsmState as a parameter now use the flags from the AsmState instead of
taking an additional OpPrintingFlags parameter.
Reviewed By: rriddle
Differential Revision: https://reviews.llvm.org/D119870
This allow user to register a callback that can annotate operations
during software pipelining. This allows user potential annotate op to
know what part of the pipeline they correspond to.
Differential Revision: https://reviews.llvm.org/D119866
Without results, there is no getType injected and so generating one in prefixed form doesn't result in any failures during C++ compilation.
Differential Revision: https://reviews.llvm.org/D119871
This test shows that when access patterns do not match (e.g. transposing
a row-wise sparse matrix into another row-wise sparse matrix), a conversion
operation in between can enable codegen (i.e. avoid cycle in iteration graph).
Reviewed By: bixia
Differential Revision: https://reviews.llvm.org/D119864
Optional parameters with `defaultValue` set will be populated with that value if they aren't encountered during parsing. Moreover, parameters equal to their default values are elided when printing.
Depends on D118210
Reviewed By: rriddle
Differential Revision: https://reviews.llvm.org/D118544
This change is needed when lowering alloca()-using code on targets
such as ROCDL that represent private scratch space as a separate
address space.
Reviewed By: ftynse
Differential Revision: https://reviews.llvm.org/D119775
Casting between scalable vectors and fixed-length vectors doesn't make
sense. If one of the operands is scalable, the other has to be scalable
to be able to guarantee they have the same shape at runtime.
Differential Revision: https://reviews.llvm.org/D119568
This patch changes the syntax of omp.atomic.update to allow the other
dialects to modify the variable with appropriate operations in the
region.
Reviewed By: ftynse
Differential Revision: https://reviews.llvm.org/D119522
Add verifier for gpu.alloc op to verify if the dimension operand counts
and symbol operand counts are same as their memref counterparts.
Differential Revision: https://reviews.llvm.org/D117427
Support ALLOW filters and DENY filters. This is needed for compatibility with existing code that specifies more complex op filters.
Differential Revision: https://reviews.llvm.org/D119820
Also, it seems Khronos has changed html spec format so small adjustment to script was needed.
Base op parsing is also probably broken.
Differential Revision: https://reviews.llvm.org/D119678
The only method to create a true dense tensor (i.e un-annotated) in MLIR-PyTACO
is through the from_array method. However, the annotated all dense tensors are
also implemented as true dense tensor currently. The PR fixes the
implementation to support annotated all dense sparse tensors.
Extend the tensor init method to support the construction of a tensor without
any sparsity annotation.
Change the tensor to_file method to only support writing unpacked sparse
tensors to file through the MLIR sparse tensor dialect.
Add unit tests for true dense tensors and all dense sparse tensors.
Reviewed By: aartbik
Differential Revision: https://reviews.llvm.org/D119500
This patch changes the argument from template-IRBuilder to IRBuilderBase
thus allowing us to write less code while getting the location from a
builder.
Reviewed By: ftynse
Differential Revision: https://reviews.llvm.org/D119717
* While annoying, this is the only way to get C++ exception handling out of the happy path for normal iteration.
* Implements sq_length and sq_item for the sequence protocol (used for iteration, including list() construction).
* Implements mp_subscript for general use (i.e. foo[1] and foo[1:1]).
* For constructing a `list(op.results)`, this reduces the time from ~4-5us to ~1.5us on my machine (give or take measurement overhead) and eliminates C++ exceptions, which is a worthy goal in itself.
* Compared to a baseline of similar construction of a three-integer list, which takes 450ns (might just be measuring function call overhead).
* See issue discussed on the pybind side: https://github.com/pybind/pybind11/issues/2842
Differential Revision: https://reviews.llvm.org/D119691
Rationale:
empty line between main include for this file
moved include that actually defines code into right section
Note that this revision started as breaking up ops/attrs even more
(for bug https://github.com/llvm/llvm-project/issues/52748), but due
the the connection in Dialect.initalize(), this cannot be split further).
All heavy lifting refactoring was already done by River in previous cleanup.
Reviewed By: bixia
Differential Revision: https://reviews.llvm.org/D119617
Adds a pointer type to EmitC. The emission of pointers is so far only
possible by using the `emitc.opaque` type
Co-authored-by: Simon Camphausen <simon.camphausen@iml.fraunhofer.de>
Reviewed By: jpienaar
Differential Revision: https://reviews.llvm.org/D119337
Adapt the region builder signature to hand in the attributes of the created ops. The revision is a preparation step the support named ops that need access to the operation attributes during op creation.
Depends On D119692
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D119693
Group and reorder the classed defined by comprehension.py and add type annotations.
Depends On D119126
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D119692
Index attributes had no default value, which means the attribute values had to be set on the operation. This revision adds a default parameter to `IndexAttrDef`. After the change, every index attribute has to define a default value. For example, we may define the following strides attribute:
```
```
When using the operation the default stride is used if the strides attribute is not set. The mechanism is implemented using `DefaultValuedAttr`.
Additionally, the revision uses the naming index attribute instead of attribute more consistently, which is a preparation for follow up revisions that will introduce function attributes.
Depends On D119125
Reviewed By: stellaraccident
Differential Revision: https://reviews.llvm.org/D119126
This pass doesn't have any limitations specific to FuncOp and it will be useful to be able to run it on other ops (e.g. gpu.func).
Differential Revision: https://reviews.llvm.org/D119662
The module flag to indicate use of hostcall is insufficient to catch
all cases where hostcall might be in use by a kernel. This is now
replaced by a function attribute that gets propagated to top-level
kernel functions via their respective call-graph.
If the attribute "amdgpu-no-hostcall-ptr" is absent on a kernel, the
default behaviour is to emit kernel metadata indicating that the
kernel uses the hostcall buffer pointer passed as an implicit
argument.
The attribute may be placed explicitly by the user, or inferred by the
AMDGPU attributor by examining the call-graph. The attribute is
inferred only if the function is not being sanitized, and the
implictarg_ptr does not result in a load of any byte in the hostcall
pointer argument.
Reviewed By: jdoerfert, arsenm, kpyzhov
Differential Revision: https://reviews.llvm.org/D119216
When lowering to memrefCopy call, the size for i1 type was calculated as 0.
Instead of using getTypeSizeInBits() and dividing by 8, we should just use getTypeSize().
Differential Revision: https://reviews.llvm.org/D119540
The lowering creates llvm.insertvalue with the rank value, so it needs to use
index type instead of 64 bit integer type. Otherwise, we get an error:
llvm.insertvalue' op Type mismatch: cannot insert 'i64' into '!llvm.struct<(i32, ptr<i8>)>'
Differential Revision: https://reviews.llvm.org/D119534
This patch simply adds an optional garbage collector attribute to LLVMFuncOp which maps 1:1 to the "gc" property of functions in LLVM.
Differential Revision: https://reviews.llvm.org/D119492
This allows operations to control the block ids used by the printer in nested regions.
Reviewed By: Mogball
Differential Revision: https://reviews.llvm.org/D115849
Previously, OpDSL did not support rank polymorphism, which required a separate implementation of linalg.fill. This revision extends OpDSL to support rank polymorphism for a limited class of operations that access only scalars and tensors of rank zero. At operation instantiation time, it scales these scalar computations to multi-dimensional pointwise computations by replacing the empty indexing maps with identity index maps. The revision does not change the DSL itself, instead it adapts the Python emitter and the YAML generator to generate different indexing maps and and iterators depending on the rank of the first output.
Additionally, the revision introduces a `linalg.fill_tensor` operation that in a future revision shall replace the current handwritten `linalg.fill` operation. `linalg.fill_tensor` is thus only temporarily available and will be renamed to `linalg.fill`.
Reviewed By: nicolasvasilache, stellaraccident
Differential Revision: https://reviews.llvm.org/D119003
Add new operations to the gpu dialect to represent device side
asynchronous copies. This also add the lowering of those operations to
nvvm dialect.
Those ops are meant to be low level and map directly to llvm dialects
like nvvm or rocdl.
We can further add higher level of abstraction by building on top of
those operations.
This has been discuss here:
https://discourse.llvm.org/t/modeling-gpu-async-copy-ampere-feature/4924
Differential Revision: https://reviews.llvm.org/D119191
These functions allow for defining pattern fragments usable within the `match` and `rewrite` sections of a pattern. The main structure of Constraints and Rewrites functions are the same, and are similar to functions in other languages; they contain a signature (i.e. name, argument list, result list) and a body:
```pdll
// Constraint that takes a value as an input, and produces a value:
Constraint Cst(arg: Value) -> Value { ... }
// Constraint that returns multiple values:
Constraint Cst() -> (result1: Value, result2: ValueRange);
```
When returning multiple results, each result can be optionally be named (the result of a Constraint/Rewrite in the case of multiple results is a tuple).
These body of a Constraint/Rewrite functions can be specified in several ways:
* Externally
In this case we are importing an external function (registered by the user outside of PDLL):
```pdll
Constraint Foo(op: Op);
Rewrite Bar();
```
* In PDLL (using PDLL constructs)
In this case, the body is defined using PDLL constructs:
```pdll
Rewrite BuildFooOp() {
// The result type of the Rewrite is inferred from the return.
return op<my_dialect.foo>;
}
// Constraints/Rewrites can also implement a lambda/expression
// body for simple one line bodies.
Rewrite BuildFooOp() => op<my_dialect.foo>;
```
* In PDLL (using a native/C++ code block)
In this case the body is specified using a C++(or potentially other language at some point) code block. When building PDLL in AOT mode this will generate a native constraint/rewrite and register it with the PDL bytecode.
```pdll
Rewrite BuildFooOp() -> Op<my_dialect.foo> [{
return rewriter.create<my_dialect::FooOp>(...);
}];
```
Differential Revision: https://reviews.llvm.org/D115836
This allows for defining simple patterns in a single line. The lambda
body of a Pattern expects a single operation rewrite statement:
```
Pattern => replace op<my_dialect.foo>(operands: ValueRange) with operands;
```
Differential Revision: https://reviews.llvm.org/D115835
Having clarified that executing the SerializeToHsaco pass can
depend on a ROCm installation, switch from calling lld as a library to
using the copy of lld guaranteed to be included in a ROCm install.
This removes the workaround introduced in D119277
Reviewed By: whchung
Differential Revision: https://reviews.llvm.org/D119463
If the result operand has a unit leading dim it is removed from all operands.
Reviewed By: ThomasRaoux
Differential Revision: https://reviews.llvm.org/D119206
This patch factors out space information from IntegerPolyhedron, PresburgerSet
and PWMAFunction to PresburgerSpace and its extension with local variables,
PresburgerLocalSpace.
Generally any new data structure additions in Presburger library will require
space information. This patch removes the need to duplicate the space
information.
Reviewed By: arjunp
Differential Revision: https://reviews.llvm.org/D119280
Fix fold-memref-subview-ops for affine.load/store. We need to expand out
the affine apply on its operands.
Differential Revision: https://reviews.llvm.org/D119402
Move expandAffineMap and expandAffineApplyExpr out to AffineUtils. This
is a useful method. The child revision uses it. NFC.
Reviewed By: rriddle
Differential Revision: https://reviews.llvm.org/D119401
removed obsoleted TODO
removed strange Fp precision for coordinates
lined up meta data testing code for readability
Reviewed By: bixia
Differential Revision: https://reviews.llvm.org/D119377
TypeIDAllocator enables the allocation of new TypeIDs at runtime,
that are unique during the lifetime of the allocator.
NonMovableTypeIDOwner is a class used to define a new TypeID for each instance
of a class, using the instance address. This class cannot be copied or moved.
Reviewed By: rriddle
Differential Revision: https://reviews.llvm.org/D104534
While testing LLVM 14.0.0 rc1 on Solaris, I ran into a compile failure:
from /var/llvm/llvm-14.0.0-rc1/rc1/llvm-project/mlir/lib/ExecutionEngine/SparseTensorUtils.cpp:22:
/usr/include/sys/types.h:103:16: error: conflicting declaration ‘typedef short int index_t’
103 | typedef short index_t;
| ^~~~~~~
In file included from
/var/llvm/llvm-14.0.0-rc1/rc1/llvm-project/mlir/lib/ExecutionEngine/SparseTensorUtils.cpp:17:
/var/llvm/llvm-14.0.0-rc1/rc1/llvm-project/mlir/include/mlir/ExecutionEngine/SparseTensorUtils.h:26:7:
note: previous declaration as ‘using index_t = uint64_t’
26 | using index_t = uint64_t;
| ^~~~~~~
The same issue had already occured in the past and fixed in D72619
<https://reviews.llvm.org/D72619>. More detailed explanation can also be
found there.
Tested on `amd64-pc-solaris2.11` and `sparcv9-solaris2.11`.
Differential Revision: https://reviews.llvm.org/D119323
This makes getAliasingOpResult symmetric to getAliasingOpOperand. The previous implementation was confusing for users and implemented in such a way only because there are currently no bufferizable ops that have multiple aliasing OpResults.
Differential Revision: https://reviews.llvm.org/D119259
They used to be classes with a virtual `run` function. This was inconvenient because post analysis steps are stored in BufferizationOptions. Because of this design choice, BufferizationOptions were not copyable.
Differential Revision: https://reviews.llvm.org/D119258
Supports whitespace elements: ` ` and `\\n` as well as the "empty" whitespace `` that removes an otherwise printed space.
Depends on D118208
Reviewed By: rriddle
Differential Revision: https://reviews.llvm.org/D118210
Reuse the higher precision F32 approximation for the F16 one (by expanding and
truncating). This is partly RFC as I'm not sure what the expectations are here
(e.g., these are only for F32 and should not be expanded, that reusing
higher-precision ones for lower precision is undesirable due to increased
compute cost and only approximations per exact type is preferred, or this is
appropriate [at least as fallback] but we need to see how to make it more
generic across all the patterns here).
Differential Revision: https://reviews.llvm.org/D118968
Implements optional attribute or type parameters, including support for such parameters in the assembly format `struct` directive. Also implements optional groups.
Depends on D117971
Reviewed By: rriddle
Differential Revision: https://reviews.llvm.org/D118208
For 0-D as well as 1-D vectors, both these patterns should
return a failure as there is no need to collapse the shape
of the source. Currently, only 1-D vectors were handled. This
patch handles the 0-D case as well.
Reviewed By: Benoit, ThomasRaoux
Differential Revision: https://reviews.llvm.org/D119202
There are a few different test passes that check elementwise fusion in
Linalg. Consolidate them to a single pass controlled by different pass
options (in keeping with how `TestLinalgTransforms` exists).
Add a Python method, output_sparse_tensor, to use sparse_tensor.out to write
a sparse tensor value to a file.
Modify the method that evaluates a tensor expression to return a pointer of the
MLIR sparse tensor for the result to delay the extraction of the coordinates and
non-zero values.
Implement the Tensor to_file method to evaluate the tensor assignment and write
the result to a file.
Add unit tests. Modify test golden files to reflect the change that TNS outputs
now have a comment line and two meta data lines.
Reviewed By: aartbik
Differential Revision: https://reviews.llvm.org/D118956
The conversion to the new ControlFlow dialect didn't change the
GPUToROCDL pass - this commit fixes this issue.
Reviewed By: rriddle
Differential Revision: https://reviews.llvm.org/D119188
Add support for computing an overapproximation of the number of integer points
in a polyhedron. The returned result is actually the number of integer points
one gets by computing the "rational shadow" obtained by projecting out the
local IDs, finding the minimal axis-parallel hyperrectangular approximation
of the shadow, and returning the number of integer points in that. This does
not currently support symbols.
Reviewed By: Groverkss
Differential Revision: https://reviews.llvm.org/D119228
This makes it applicable to both ArrayAttr and its typed subclasses instead of
only the latter. There is no good reason why ArrayAttr shouldn't be
const-buildable while its typed subclasses are, this was likely just an
omission.
Depends On D119113
Reviewed By: rriddle, nicolasvasilache
Differential Revision: https://reviews.llvm.org/D119114
ODS provides a mechanism for defalut-valued attributes based on a wrapper
TableGen class that is recognized by mlir-tblgen. Such attributes, if not set
on the operaiton, can be construted on-the-fly in their getter given a constant
value. In order for this construction to work, the attribute specificaiton in
ODS must set the constBuilderCall field correctly. This has not been verified,
which could lead to invalid C++ code being generated by mlir-tblgen.
Closes#53588.
Reviewed By: rriddle, mehdi_amini
Differential Revision: https://reviews.llvm.org/D119113
FunctionPass has been deprecated in favor of OperationPass<FuncOp>
for a few weeks, and this commit finished the deprecation with deletion.
The only difference between the two is that FunctionPass filters out function
declarations. When updating references to FunctionPass, ensure that
the pass either can handle declarations or explicitly add in filtering.
See https://llvm.discourse.group/t/functionpass-deprecated-in-favor-of-operationpass-funcop
Differential Revision: https://reviews.llvm.org/D118735
These have generally been replaced by better ODS functionality, and do not
need to be explicitly provided anymore.
Differential Revision: https://reviews.llvm.org/D119065
Currently if an operation wants a C++ implemented parser/printer, it specifies inline
code blocks. This is quite problematic for various reasons, e.g. it requires defining
C++ inside of Tablegen which is discouraged when possible, but mainly because
nearly all usages simply forward to static functions (e.g. `static void parseSomeOp(...)`)
with users devising their own standards for how these are defined.
This commit adds support for a `hasCustomAssemblyFormat` bit field that specifies if
a C++ parser/printer is needed, and when set to 1 declares the parse/print methods for
operations to override. For migration purposes, the existing behavior is untouched. Upstream
usages will be replaced in a followup to keep this patch focused on the new implementation.
Differential Revision: https://reviews.llvm.org/D119054
There are a few different test passes that check elementwise fusion in
Linalg. Consolidate them to a single pass controlled by different pass
options (in keeping with how `TestLinalgTransforms` exists).
Fix the verification function of spirv::ConstantOp to allow nesting
array attributes.
Reviewed By: antiagainst
Differential Revision: https://reviews.llvm.org/D118939
Add the class MultiAffineFunction which represents functions whose domain is an
IntegerPolyhedron and which produce an output given by a tuple of affine
expressions in the IntegerPolyhedron's ids.
Also add support for piece-wise MultiAffineFunctions, which are defined on a
union of IntegerPolyhedrons, and may have different output affine expressions
on each IntegerPolyhedron. Thus the function is affine on each individual
IntegerPolyhedron piece in the domain.
This is part of a series of patches leading up to parametric integer programming.
Depends on D118778.
Reviewed By: Groverkss
Differential Revision: https://reviews.llvm.org/D118779
* Implement `FlatAffineConstraints::getConstantBound(EQ)`.
* Inject a simpler constraint for loops that have at most 1 iteration.
* Taking into account constant EQ bounds of FlatAffineConstraint dims/symbols during canonicalization of the resulting affine map in `canonicalizeMinMaxOp`.
Differential Revision: https://reviews.llvm.org/D119153
This is both more efficient and more ergonomic to use, as inverting a
bit vector is trivial while inverting a set is annoying.
Sadly this leaks into a bunch of APIs downstream, so adapt them as well.
This would be NFC, but there is an ordering dependency in MemRefOps's
computeMemRefRankReductionMask. This is now deterministic, previously it
was dependent on SmallDenseSet's unspecified iteration order.
Differential Revision: https://reviews.llvm.org/D119076
This patch makes IntegerPolyhedron and derived classes use of getters to access
IntegerPolyhedron space information (`numIds, numDims, numSymbols`) instead of
directly accessing them.
This patch makes it easier to change the underlying implementation of the way
identifiers are stored, making it easier to extend/modify existing implementation.
Reviewed By: arjunp
Differential Revision: https://reviews.llvm.org/D118888
Adapt `tileConsumerAndFuseProducers` to return failure if the generated tile loop nest is empty since all tile sizes are zero. Additionally, fix `LinalgTileAndFuseTensorOpsPattern` to return success if the pattern applied successfully.
Reviewed By: mravishankar
Differential Revision: https://reviews.llvm.org/D118878
This header is very large (3M Lines once expended) and was included in location
where dwarf-specific information were not needed.
More specifically, this commit suppresses the dependencies on
llvm/BinaryFormat/Dwarf.h in two headers: llvm/IR/IRBuilder.h and
llvm/IR/DebugInfoMetadata.h. As these headers (esp. the former) are widely used,
this has a decent impact on number of preprocessed lines generated during
compilation of LLVM, as showcased below.
This is achieved by moving some definitions back to the .cpp file, no
performance impact implied[0].
As a consequence of that patch, downstream user may need to manually some extra
files:
llvm/IR/IRBuilder.h no longer includes llvm/BinaryFormat/Dwarf.h
llvm/IR/DebugInfoMetadata.h no longer includes llvm/BinaryFormat/Dwarf.h
In some situations, codes maybe relying on the fact that
llvm/BinaryFormat/Dwarf.h was including llvm/ADT/Triple.h, this hidden
dependency now needs to be explicit.
$ clang++ -E -Iinclude -I../llvm/include ../llvm/lib/Transforms/Scalar/*.cpp -std=c++14 -fno-rtti -fno-exceptions | wc -l
after: 10978519
before: 11245451
Related Discourse thread: https://llvm.discourse.group/t/include-what-you-use-include-cleanup
[0] https://llvm-compile-time-tracker.com/compare.php?from=fa7145dfbf94cb93b1c3e610582c495cb806569b&to=995d3e326ee1d9489145e20762c65465a9caeab4&stat=instructions
Differential Revision: https://reviews.llvm.org/D118781
Simple pass that changes all symbols to private unless symbol is excluded (and
in which case there is no change to symbol's visibility).
Differential Revision: https://reviews.llvm.org/D118752
I see a lot of array sorting in stack traces of our compiler, canonicalizer traverses this list every time it builds a pattern set, and it gets expensive very quickly.
Reviewed By: rriddle, mehdi_amini
Differential Revision: https://reviews.llvm.org/D118937
Induction variable calculation was ignoring scf.for step value. Fix it to get
the correct induction variable value in the prologue.
Differential Revision: https://reviews.llvm.org/D118932
Reorder the methods and patterns to move related patterns/methods
closer (textually).
Reviewed By: gysit
Differential Revision: https://reviews.llvm.org/D118870
Some translations do work with unregistered dialects, this allows one
to write testcases against them. It works the same way as it does for
mlir-opt.
Differential Revision: https://reviews.llvm.org/D118872
Replace the Python implementation for reading tensor input data from files with
create_sparse_tensor that uses sparse_tensor.new.
The MLIR TNS format has two extra meta data lines. Add the extra meta data to a
test data file.
Implement TACO tensor methods evaluate and unpack.
Add unit tests.
Reviewed By: aartbik
Differential Revision: https://reviews.llvm.org/D118803
-- This commit adds a canonicalization pattern on scf.while to remove
the loop invariant arguments.
-- An argument is considered loop invariant if the iteration argument value is
the same as the corresponding one being yielded (at the same position) in both
the before/after block of scf.while.
-- For the arguments removed, their use within scf.while and their corresponding
scf.while's result are replaced with their corresponding initial value.
Signed-off-by: Abhishek Varma <abhishek.varma@polymagelabs.com>
Reviewed By: ftynse
Differential Revision: https://reviews.llvm.org/D116923
Prior to this patch, using an operation without any results as the location would result in the generation of invalid C++ code. It'd try to format using the result values, which would would end up being an empty string for an operation without any.
This patch fixes that issue by instead using getValueAndRangeUse which handles both ranges as well as the case for an op without any results.
Differential Revision: https://reviews.llvm.org/D118885
The code assumes that a TypeConstraint in the additional constraints list specifies precisely one argument.
If the user were to not specify any, it'd result in a crash. If given more than one, the additional ones were ignored.
This patch fixes the crash and disallows user errors by adding a check that a single argument is supplied to the TypeConstraint
Differential Revision: https://reviews.llvm.org/D118763
This is completely unused upstream, and does not really have well defined semantics
on what this is supposed to do/how this fits into the ecosystem. Given that, as part of
splitting up the standard dialect it's best to just remove this behavior, instead of try
to awkwardly fit it somewhere upstream. Downstream users are encouraged to
define their own operations that clearly can define the semantics of this.
This also uncovered several lingering uses of ConstantOp that weren't
updated to use arith::ConstantOp, and worked during conversions because
the constant was removed/converted into something else before
verification.
See https://llvm.discourse.group/t/standard-dialect-the-final-chapter/ for more discussion.
Differential Revision: https://reviews.llvm.org/D118654
The Utils.cpp file in StandardOps essentially just contains utilities for interacting with arithmetic
operations, and at this point makes more sense as a utility file for the arithemtic dialect.
Differential Revision: https://reviews.llvm.org/D118280
This is part of the larger effort to split the standard dialect. This will also allow for pruning some
additional dependencies on Standard (done in a followup).
Differential Revision: https://reviews.llvm.org/D118202
Currently if an operation requires additional verification, it specifies an inline
code block (`let verifier = "blah"`). This is quite problematic for various reasons, e.g.
it requires defining C++ inside of Tablegen which is discouraged when possible, but mainly because
nearly all usages simply forward to a static function `static LogicalResult verify(SomeOp op)`.
This commit adds support for a `hasVerifier` bit field that specifies if an additional verifier
is needed, and when set to `1` declares a `LogicalResult verify()` method for operations to
override. For migration purposes, the existing behavior is untouched. Upstream usages will
be replaced in a followup to keep this patch focused on the hasVerifier implementation.
One main user facing change is that what was one `MyOp::verify` is now `MyOp::verifyInvariants`.
This better matches the name this method is called everywhere else, and also frees up `verify` for
the user defined additional verification. The `verify` function when generated now (for additional
verification) is private to the operation class, which should also help avoid accidental usages after
this switch.
Differential Revision: https://reviews.llvm.org/D118742
This CL supports adding dependency between traits verifiers and the
dependency will be checked statically.
Reviewed By: jpienaar
Differential Revision: https://reviews.llvm.org/D115135
Use `SmallVector` instead of `std::vector` in `getLocalRepr` function.
Also, fix the casing of a variable.
Reviewed By: arjunp
Differential Revision: https://reviews.llvm.org/D118722
This revision avoids incorrect hoisting of alloca'd buffers across an AutomaticAllocationScope boundary.
In the more general case, we will probably need a ParallelScope-like interface.
Differential Revision: https://reviews.llvm.org/D118768
By fully qualifying the use of any types and functions from the mlir namespace, users are not required to add using namespace mlir; into the C++ file including the Tablegen output.
Differential Revision: https://reviews.llvm.org/D118767
Use type inference when building the TransferWriteOp in the TransferWritePermutationLowering. Previously, the result type has been set to Type() which triggers an assertion if the pattern is used with tensors instead of memrefs.
Reviewed By: springerm
Differential Revision: https://reviews.llvm.org/D118758
Some FormatElement subclasses contain `std::vector`. Since these use
BumpPtrAllocator, they need to be converted to trailing objects.
However, this is not a trivial fix so I will leave it as a FIXME and use
a workaround.
Move the functions that retrieve the supporting C library, compile an MLIR
module and build a JIT execution engine to mlir_pytaco_utils.
Add a function to create an MLIR sparse tensor from a file and return a pointer
to the MLIR sparse tensor as well as the shape of the sparse tensor.
Add unit tests.
Reviewed By: aartbik
Differential Revision: https://reviews.llvm.org/D118496
Exposes mlir::DialectRegistry to the C API as MlirDialectRegistry along with
helper functions. A hook has been added to MlirDialectHandle that inserts
the dialect into a registry.
A future possible change is removing mlirDialectHandleRegisterDialect in
favor of using mlirDialectHandleInsertDialect, which it is now implemented with.
Differential Revision: https://reviews.llvm.org/D118293
When attempting to cast a pybind11 handle to an MLIR C API object through
capsules, the binding code would attempt to directly access the "_CAPIPtr"
attribute on the object, leading to a rather obscure AttributeError when the
attribute was missing, e.g., on non-MLIR types. Check for its presence and
throw a TypeError instead.
Depends On D117646
Reviewed By: stellaraccident
Differential Revision: https://reviews.llvm.org/D117658
Somehow the test introduced in https://reviews.llvm.org/D118006 produces the expected result but running
through lli with Intel SDE activated sneaks in an error code 2 (before this commit) or an error code 10
(after this commit).
The test as is is still meaningful in that the LLVMIR generation would crash if the `elementtype` is set
improperly.
Still, this should run with lli turned on.
This revision adds enough support to allow InlineAsmOp to work properly with indirect memory constraints "*m".
These require an explicit "elementtype" TypeAttr on the operands to pass LLVM verification and need to be provided.
Reviewed By: bkramer
Differential Revision: https://reviews.llvm.org/D118006
Extract the division representation from equality constraints.
For example:
32*k == 16*i + j - 31 <-- k is the localVariable
expr = 16*i + j - 31, divisor = 32
k = (16*i + j - 32) floordiv 32
The dividend of the division is set to [16, 1, -32] and the divisor is set
to 32.
Reviewed By: Groverkss
Differential Revision: https://reviews.llvm.org/D117959
Following the discussion in D118318, mark `arith.addf/mulf` commutative.
Reviewed By: mehdi_amini
Differential Revision: https://reviews.llvm.org/D118600
Part 2 of 3 of unifying the assembly formats of attributes/types and operations.The last patch that introduced attribute/type formats (D111594) factored out the format lexer entirely. This patch factors out most of the format parsers such that the attribute/type and op parsers only need to implement handling for specific elements.
Certain things could be factored better (element verification, 'seen' variables) but the primary goal of factoring is so that features can be used across both assembly formats.
Reviewed By: rriddle
Differential Revision: https://reviews.llvm.org/D117971
This matches the same API usage as attributes/ops/types. For example:
```c++
Dialect *dialect = ...;
// Instead of this:
if (auto *interface = dialect->getRegisteredInterface<DialectInlinerInterface>())
// You can do this:
if (auto *interface = dyn_cast<DialectInlinerInterface>(dialect))
```
Differential Revision: https://reviews.llvm.org/D117859
- Remove the `{Op,Attr,Type}Trait` TableGen classes and replace with `Trait`
- Rename `OpTraitList` to `TraitList` and use it in a few places
The bulk of this change is a mechanical s/OpTrait/Trait/ throughout the codebase.
Reviewed By: rriddle, jpienaar, herhut
Differential Revision: https://reviews.llvm.org/D118543
This reduces the dependencies of the MLIRVector target and makes the dialect consistent with other dialects.
Differential Revision: https://reviews.llvm.org/D118533
Support affine.load/store ops in fold-memref-subview ops pass. The
existing pass just "inlines" the subview operation on load/stores by
inserting affine.apply ops in front of the memref load/store ops: this
is by design always consistent with the semantics on affine.load/store
ops and the same would work even more naturally/intuitively with the
latter.
Differential Revision: https://reviews.llvm.org/D118565
Update SCF pass cmd line names to prefix `scf`. This is consistent with
guidelines/convention on how to name dialect passes. This also avoids
ambiguity on the context given the multiple `for` operations in the
tree.
NFC.
Differential Revision: https://reviews.llvm.org/D118564
There was a bug where some of the OpOperands needed in the replacement op were not in scope.
It does not matter where the replacement op is inserted. Any insertion point is OK as long as there are no dominance errors. In the worst case, the newly inserted op will bufferize out-of-place. This is no worse than not eliminating the InitTensorOp at all.
Differential Revision: https://reviews.llvm.org/D117685
Also reimplement `std-bufferize` in terms of BufferizableOpInterface-based bufferization. The old `std.select` bufferization pattern is no longer needed and deleted.
Differential Revision: https://reviews.llvm.org/D118559
The bufferization of arith.constant ops is also switched over to BufferizableOpInterface-based bufferization. The old implementation is deleted. Both implementations utilize GlobalCreator, now renamed to just `getGlobalFor`.
GlobalCreator no longer maintains a set of all created allocations to avoid duplicate allocations of the same constant. Instead, `getGlobalFor` scans the module to see if there is already a global allocation with the same constant value.
For compatibility reasons, it is still possible to create a pass that bufferizes only `arith.constant`. This pass (createConstantBufferizePass) could be deleted once all users were switched over to One-Shot bufferization.
Differential Revision: https://reviews.llvm.org/D118483
llvm/Support/Path.h was likely previously implicitly included, and a
refactoring removed that inclusion, breaking the pass.
Differential Revision: https://reviews.llvm.org/D118508
Type constraints such as BoolLike and SignlessIntegerLike appear to
have been defined by copy-paste and all share an underlying TypesLike
structure that can be factored out.
This also allows for defining additional constraints of a similar
form, such as F32Like.
Reviewed By: rriddle
Differential Revision: https://reviews.llvm.org/D118382
This patch adds the vector.scan op which computes the
scan for a given n-d vector. It requires specifying the operator,
the identity element and whether the scan is inclusive or
exclusive.
TEST: Added test in ops.mlir
Reviewed By: ThomasRaoux
Differential Revision: https://reviews.llvm.org/D117171
The unit tests for PyTACO hasn't been upstreamed yet. A unit test for this
change will be added when we upstream all the unit tests for PyTACO.
Reviewed By: aartbik
Differential Revision: https://reviews.llvm.org/D118417
mlir-cpu-runner has a dependency on ExecutionEngine which is only built for the native arch. So currently mlir-cpu-runner does not link correctly when the native arch is not targeted.
Reviewed By: mehdi_amini
Differential Revision: https://reviews.llvm.org/D118422
This patch introduces a class LexSimplex that can currently be used to find the
lexicographically minimal rational point in an IntegerPolyhedron. This is a
series of patches leading to computing the lexicographically minimal integer
lattice point as well parametric lexicographic minimization.
Reviewed By: Groverkss
Differential Revision: https://reviews.llvm.org/D117437
With opaque pointers, we can no longer derive this from the pointer
type, so we need to explicitly provide the element type the atomic
operation should work with.
Differential Revision: https://reviews.llvm.org/D118359
This is is the start of the MLIR benchmarks. It sets up a command
line tool along with conventions to define and run benchmarks
using mlir's python bindings.
Reviewed By: aartbik
Differential Revision: https://reviews.llvm.org/D115174
Although cmake should be platform-independent, we've observed
that some aspects of Python detection don't work on all platforms,
even with recent versions of cmake. This appears to be due to bugs
in the python detection logic, especially when the NumPy component
is required and not located within the python installation.
As a workaround, this patch first searches for "Development" before
searching for "Development.Module", which seems to workaround the
issue.
Differential Revision: https://reviews.llvm.org/D118148
This is in preparation of switching `-tensor-constant-bufferize` and `-arith-bufferize` to BufferizableOpInterface-based implementations.
Differential Revision: https://reviews.llvm.org/D118324
This commit switches the `tensor-bufferize` pass over to BufferizableOpInterface-based bufferization.
Differential Revision: https://reviews.llvm.org/D118246
The pass can currently not handle to_memref(to_tensor(x)) folding where a cast is necessary. This is required with the new unified bufferization. There is already a canonicalization pattern that handles such foldings and it should be used during this pass.
Differential Revision: https://reviews.llvm.org/D117988
Stop the Cpp target from emitting unused labels. The previosly generated
code generated warning if `-Wunused-label` is passed to a compiler.
Co-authored-by: Simon Camphausen <simon.camphausen@iml.fraunhofer.de>
Reviewed By: jpienaar
Differential Revision: https://reviews.llvm.org/D118154
Prefix "affine-" to affine transform passes that were missing it -- to
avoid ambiguity and for uniformity. There were only two needed this.
Move mispaced affine coalescing test case file.
NFC.
Differential Revision: https://reviews.llvm.org/D118314
OwningRewritePatternList has been deprecated for ~10 months now, we can remove
the leftover using directives at this point.
Differential Revision: https://reviews.llvm.org/D118287
These transformations already operate on memref operations (as part of
splitting up the standard dialect). Now that the operations have moved,
it's time for these transformations to move as well.
Differential Revision: https://reviews.llvm.org/D118285
We already convert to BitVector internally, and other APIs (namely Operation::eraseOperands)
already use BitVector as well. Switching over provides a common format between
API and also reduces the amount of format conversions necessary.
Fixes#53325
Differential Revision: https://reviews.llvm.org/D118083
The option parser currently does not properly handle nested options, meaning that
in some cases we can print pass pipelines that we can't actually parse back in. For example,
from #52885 we currently can't parse in inliner pipelines that have nested pipeline strings.
This commit adds handling for string values (e.g. "...") and nested options
(e.g. `foo{baz{bar=10 pizza=11}}`).
Fixes#52885
Differential Revision: https://reviews.llvm.org/D118078
Rationale:
Demonstrates the maximum tile size allowed for the f32 <= bf16 x bf16 op
Reviewed By: dcaballe
Differential Revision: https://reviews.llvm.org/D118277
This diff modifies the tablegen specification and code generation for
BitEnumAttr attributes in MLIR Operation Definition Specification (ODS) files.
Specifically:
- there is a new tablegen class for "none" values (i.e. no bits set)
- single-bit enum cases are specified via bit index (i.e. [0, 31]) instead of
the resulting enum integer value
- there is a new tablegen class to represent a "grouped" bitwise OR of other
enum values
This diff is intended as an initial step towards improving "fastmath"
optimization support in MLIR, to allow more precise control of whether certain
floating point optimizations are applied in MLIR passes. "Fast" math options
for floating point MLIR operations would (following subsequent RFC and
discussion) be specified by using the improved enum bit support in this diff.
For example, a "fast" enum value would act as an alias for a group of other
cases (e.g. finite-math-only, no-signed-zeros, etc.), in a way that is similar
to support in C/C++ compilers (clang, gcc).
Reviewed By: rriddle
Differential Revision: https://reviews.llvm.org/D117029
This is part of splitting up the standard dialect. The move makes sense anyways,
given that the memref dialect already holds memref.atomic_rmw which is the non-region
sibling operation of std.generic_atomic_rmw (the relationship is even more clear given
they have nearly the same description % how they represent the inner computation).
Differential Revision: https://reviews.llvm.org/D118209
The GPU dialect currently contains an explicit reference to LLVMFuncOp
during verification to handle the situation where the kernel has already been
converted. This commit changes that reference to instead use FunctionOpInterface,
which has two main benefits:
* It allows for removing an otherwise unnecessary dependency on the LLVM dialect
* It removes hardcoded assumptions about the lowering path and use of the GPU dialect
Differential Revision: https://reviews.llvm.org/D118172
This reverts commit ef82063207.
- It conflicts with the existing llvm::size in STLExtras, which will now
never be called.
- Calling it without llvm:: breaks C++17 compat
This is for compatibility with existing bufferization passes. Also clean up memref type generation a bit.
Differential Revision: https://reviews.llvm.org/D118243
This revision adds enough support to allow InlineAsmOp to work properly with indirect memory constraints "*m".
These require an explicit "elementtype" TypeAttr on the operands to pass LLVM verification and need to be provided.
Reviewed By: bkramer
Differential Revision: https://reviews.llvm.org/D118006
The existing implementation called DenseMap::insert, which is a no-op if the map already contains an entry with the same key.
Differential Revision: https://reviews.llvm.org/D118165
If we are extracting it is more useful to push the index_cast past the
extraction. This increases the chance the tensor.extract can evaluated at
compile time.
Reviewed By: rriddle
Differential Revision: https://reviews.llvm.org/D118204
There is not much of a benefit to reshape a from element vs reloading it.
Updated to progagate shape manipulations into the output type of
tensor.from_elements.
Reviewed By: NatashaKnk
Differential Revision: https://reviews.llvm.org/D118201
Add methods to
- Get block argument that is tied with an opOperand
- Get the yield value that is tied with a output opOperand.
Differential Revision: https://reviews.llvm.org/D118085
Fusion of reshape ops by linearization incorrectly inverted the
indexing map before linearizing dimensions. This leads to incorrect
indexing maps used in the fused operation.
Differential Revision: https://reviews.llvm.org/D117908
This pattern is not written to handle operations with `linalg.index`
operations in its body, i.e. operations that have index semantics.
Differential Revision: https://reviews.llvm.org/D117856
This is mostly a copy of the existing tensor.from_elements bufferization. Once TensorInterfaceImpl.cpp is moved to the tensor dialect, the existing rewrite pattern can be deleted.
Differential Revision: https://reviews.llvm.org/D117775
This is mostly a copy of the existing tensor.generate bufferization. Once TensorInterfaceImpl.cpp is moved to the tensor dialect, the existing rewrite pattern can be deleted.
Differential Revision: https://reviews.llvm.org/D117770
This patch supports the atomic construct (capture) following section 2.17.7 of OpenMP 5.0 standard. Also added tests for the same.
Reviewed By: peixin, kiranchandramohan
Differential Revision: https://reviews.llvm.org/D115851
This is a pretty important debugging option to stay hidden. Also,
improve its cmd-line description; the current description gives no hint
that this is the one to use to have locations printed inline.
Out-of-line locations are also unproductive to work with in many cases
where the locations are actually compact, which is also why this option
should be more visible. This revision doesn't change the default on it
though.
Reviewed By: rriddle, jpienaar
Differential Revision: https://reviews.llvm.org/D117186
A lot of dialects have dependencies that are unnecessary, either because of copy/paste
of files when creating things or some other means. This commit cleans up a bunch of
the simple ones:
* Copy/Paste or missed during refactoring
Most of the dependencies cleaned up here look like copy/paste errors when creating
new dialects/transformations, or because the dependency wasn't removed during a
refactoring (e.g. when splitting the standard dialect).
* Unnecessary hard coding of constant operations in matchers
There are a few instances where a dialect had a dependency because it
was hardcoding checks for constant operations instead of using the better m_Constant
approach.
Differential Revision: https://reviews.llvm.org/D118062
This has been a major TODO for a very long time, and is necessary for establishing a proper
dialect-free dependency layering for the Transforms library. Code was moved to effectively
two main locations:
* Affine/
There was quite a bit of affine dialect related code in Transforms/ do to historical reasons
(of a time way into MLIR's past). The following headers were moved to:
Transforms/LoopFusionUtils.h -> Dialect/Affine/LoopFusionUtils.h
Transforms/LoopUtils.h -> Dialect/Affine/LoopUtils.h
Transforms/Utils.h -> Dialect/Affine/Utils.h
The following transforms were also moved:
AffineLoopFusion, AffinePipelineDataTransfer, LoopCoalescing
* SCF/
Only one SCF pass was in Transforms/ (likely accidentally placed here): ParallelLoopCollapsing
The SCF specific utilities in LoopUtils have been moved to SCF/Utils.h
* Misc:
mlir::moveLoopInvariantCode was also moved to LoopLikeInterface.h given
that it is a simple utility defined in terms of LoopLikeOpInterface.
Differential Revision: https://reviews.llvm.org/D117848
Transforms/ should only contain transformations that are dialect-independent and
this pass interacts with MemRef operations (making it a better fit for living in that
dialect).
Differential Revision: https://reviews.llvm.org/D117841
Transforms/ should only contain dialect-independent transformations,
and these files are a much better fit for the bufferization dialect anyways.
Differential Revision: https://reviews.llvm.org/D117839
The current lowering from GPU to NVVM does
not correctly handle the following cases when
lowering the gpu shuffle op.
1. When the active width is set to 32 (all lanes),
then the current approach computes (1 << 32) -1 which
results in poison values in the LLVM IR. We fix this by
defining the active mask as (-1) >> (32 - width).
2. In the case of shuffle up, the computation of the third
operand c has to be different from the other 3 modes due to
the op definition in the ISA reference.
(https://docs.nvidia.com/cuda/parallel-thread-execution/index.html)
Specifically, the predicate value is computed as j >= maxLane
for up and j <= maxLane for all other modes. We fix this by
computing maskAndClamp as 32 - width for this mode.
TEST: We modify the existing test and add more checks for the up mode.
Reviewed By: ThomasRaoux
Differential Revision: https://reviews.llvm.org/D118086