This patch supports the atomic construct (read and write) following
section 2.17.7 of OpenMP 5.0 standard. Also added tests and
verifier for the same.
Reviewed By: kiranchandramohan
Differential Revision: https://reviews.llvm.org/D111992
This also fixes the vector.shuffle C++ builder which had an incorrect type assumption that triggers with this new rewrite.
The vector.shuffle semantics were correct though.
Differential revision: https://reviews.llvm.org/D112578
This patch fixes:
mlir/lib/Transforms/Utils/DialectConversion.cpp:2775:5: error:
default label in switch which covers all enumeration values
[-Werror,-Wcovered-switch-default]
by removing the default case. This way, the compiler should issue a
warning in the future when somebody adds a new enum value without a
corresponding case in the switch statement.
The current implementation invokes materializations
whenever an input operand does not have a mapping for the
desired type, i.e. it requires materialization at the earliest possible
point. This conflicts with goal of dialect conversion (and also the
current documentation) which states that a materialization is only
required if the materialization is supposed to persist after the
conversion process has finished.
This revision refactors this such that whenever a target
materialization "might" be necessary, we insert an
unrealized_conversion_cast to act as a temporary materialization.
This allows for deferring the invocation of the user
materialization hooks until the end of the conversion process,
where we actually have a better sense if it's actually
necessary. This has several benefits:
* In some cases a target materialization hook is no longer
necessary
When performing a full conversion, there are some situations
where a temporary materialization is necessary. Moving forward,
these users won't need to provide any target materializations,
as the temporary materializations do not require the user to
provide materialization hooks.
* getRemappedValue can now handle values that haven't been
converted yet
Before this commit, it wasn't well supported to get the remapped
value of a value that hadn't been converted yet (making it
difficult/impossible to convert multiple operations in many
situations). This commit updates getRemappedValue to properly
handle this case by inserting temporary materializations when
necessary.
Another code-health related benefit is that with this change we
can move a majority of the complexity related to materializations
to the end of the conversion process, instead of handling adhoc
while conversion is happening.
Differential Revision: https://reviews.llvm.org/D111620
Dyn-cast should be checked and bailed out if the dyn_cast failed.
Reviewed By: sjarus, NatashaKnk
Differential Revision: https://reviews.llvm.org/D112574
Rationale:
The currently used trait was demanding that all types are the same
which is not true (since the sparse part may change and the dim sizes
may be relaxed). This revision uses the correct trait and makes the
rank match test explicit in the verify method.
Reviewed By: ftynse
Differential Revision: https://reviews.llvm.org/D112576
This refactoring adds a few "event" functions (start/end loop-seq/loop) for
readability of the core function of codegen. This also prepares sparse tensor
output codegen, where these "event" functions will provide convenient
placeholders to start or stop insertion bookkeeping.
This revision also includes a few various minor changes that kept on
pending in my local workspace.
Reviewed By: bixia
Differential Revision: https://reviews.llvm.org/D112506
Polynomial approximation can be extented to support N-d vectors.
N-dimensional vectors are useful when vectorizing operations on N-dimensional
tiles. Before lowering to LLVM these vectors are usually unrolled or flattened
to 1-dimensional vectors.
Differential Revision: https://reviews.llvm.org/D112566
1.Combining kind min/max of Vector reduction op has been changed to
minf/maxf, minsi/maxsi, and minui/maxui. Modify getVectorReductionOp
accordingly.
2.Add min/max to supported reductions.
Reviewed By: dcaballe, nicolasvasilache
Differential Revision: https://reviews.llvm.org/D112246
Fix AffineExpr `getLargestKnownDivisor` for ceil/floor div cases.
In these cases, nothing can be inferred on the divisor of the
result.
Add test case for `mod` as well.
Differential Revision: https://reviews.llvm.org/D112523
The current behavior is conveniently allowing to iterate on the regions of an operation
implicitly by exposing an operation as Iterable. However this is also error prone and
code that may intend to iterate on the results or the operands could end up "working"
apparently instead of throwing a runtime error.
The lack of static type checking in Python contributes to the ambiguity here, it seems
safer to not do this and require and explicit qualification to iterate (`op.results`, `op.regions`, ...).
Reviewed By: ftynse
Differential Revision: https://reviews.llvm.org/D111697
Specification specified the output type for quantized average pool should be
an i32. Only accumulator should be an i32, result type should match the input
type.
Caused in https://reviews.llvm.org/D111590
Reviewed By: sjarus, GMNGeoffrey
Differential Revision: https://reviews.llvm.org/D112484
Using callbacks for allocation/deallocation allows users to override
the default.
Also add an option to comprehensive bufferization pass to use `alloca`
instead of `alloc`s. Note that this option is just for testing. The
option to use `alloca` does not work well with the option to allow for
returning memrefs.
Even though tensor.cast is not part of the sparse tensor dialect,
it may be used to cast static dimension sizes to dynamic dimension
sizes for sparse tensors without changing the actual sparse tensor
itself. Those cases should be lowered properly when replacing sparse
tensor types with their opaque pointers. Likewise, no op sparse
conversions are handled by this revision in a similar manner.
Reviewed By: bixia
Differential Revision: https://reviews.llvm.org/D112173
Using callbacks for allocation/deallocation allows users to override
the default.
Also add an option to comprehensive bufferization pass to use `alloca`
instead of `alloc`s. Note that this option is just for testing. The
option to use `alloca` does not work well with the option to allow for
returning memrefs.
Differential Revision: https://reviews.llvm.org/D112166
In several cases, operation result types can be unambiguously inferred from
operands and attributes at operation construction time. Stop requiring the user
to provide these types as arguments in the ODS-generated constructors in Python
bindings. In particular, handle the SameOperandAndResultTypes and
FirstAttrDerivedResultType traits as well as InferTypeOpInterface using the
recently added interface support. This is a significant usability improvement
for IR construction, similar to what C++ ODS provides.
Depends On D111656
Reviewed By: gysit
Differential Revision: https://reviews.llvm.org/D111811
Introduce the initial support for operation interfaces in C API and Python
bindings. Interfaces are a key component of MLIR's extensibility and should be
available in bindings to make use of full potential of MLIR.
This initial implementation exposes InferTypeOpInterface all the way to the
Python bindings since it can be later used to simplify the operation
construction methods by inferring their return types instead of requiring the
user to do so. The general infrastructure for binding interfaces is defined and
InferTypeOpInterface can be used as an example for binding other interfaces.
Reviewed By: gysit
Differential Revision: https://reviews.llvm.org/D111656
This removes duplication and makes nesting more clear.
It also reduces the amount of changes necessary for exposing future options.
Differential revision: https://reviews.llvm.org/D112344
This allows to clear an OpPassManager and populated it again with a new
pipeline, while preserving all the other options (including instrumentations).
Differential Revision: https://reviews.llvm.org/D112393
This patch fixes a bug in implementation `mergeSymbolIds` where symbol
identifiers were not unique after merging them. Asserts for checking uniqueness
before and after the merge are also added. The asserts checking uniqueness
after the merge fail without the fix on existing test cases.
Reviewed By: arjunp
Differential Revision: https://reviews.llvm.org/D111958
This removes duplication and makes nesting more clear.
It also reduces the amount of changes necessary for exposing future options.
Differential revision: https://reviews.llvm.org/D112344
This patch adds a polynomial approximation that matches the
approximation in Eigen.
Note that the approximation only applies to vectorized inputs;
the scalar rsqrt is left unmodified.
The approximation is protected with a flag since it emits an AVX2
intrinsic (generated via the X86Vector). This is the only reasonably
clean way that I could find to generate the exact approximation that
I wanted (i.e. an identical one to Eigen's).
I considered two alternatives:
1. Introduce a Rsqrt intrinsic in LLVM, which doesn't exist yet.
I believe this is because there is no definition of Rsqrt that
all backends could agree on, since hardware instructions that
implement it have widely varying degrees of precision.
This is something that the standard could mandate, but Rsqrt is
not part of IEEE754, so I don't think this option is feasible.
2. Emit fdiv(1.0, sqrt) with fast math flags to allow reciprocal
transformations. Although portable, this doesn't allow us
to generate exactly the code we want; it is the LLVM backend,
and not MLIR, who controls what code is generated based on the
target CPU.
Reviewed By: ezhulenev
Differential Revision: https://reviews.llvm.org/D112192
Pass the modifiers from the Flang parser to FIR/MLIR workshare
loop operation.
Not yet supporting the SIMD modifier, which is a bit more work
than just adding it to the list of modifiers, so will go in a
separate patch.
This adds a new field to the WsLoopOp.
Also add test for dynamic WSLoop, checking that dynamic schedule calls
the init and next functions as expected.
Reviewed By: ftynse
Differential Revision: https://reviews.llvm.org/D111053
This commit adds support for scf::IfOp to comprehensive bufferization. Support is currently limited to cases where both branches yield tensors that bufferize to the same buffer.
To keep the analysis simple, scf::IfOp are treated as memory writes for analysis purposes, even if no op inside any branch is writing. (scf::ForOps are handled in the same way.)
Differential Revision: https://reviews.llvm.org/D111929
ConstantOp should be used instead of ConstantIntOp to be able to support index type.
Reviewed By: Mogball
Differential Revision: https://reviews.llvm.org/D112191
Handle contraction op like all the other generic op reductions. This
simpifies the code. We now rely on contractionOp canonicalization to
keep the same code quality.
Differential Revision: https://reviews.llvm.org/D112171
add several patterns that will simplify contraction vectorization in the
future. With those canonicalizationns we will be able to remove the special
case for contration during vectorization and rely on those transformations to
avoid materizalizing broadcast ops.
Differential Revision: https://reviews.llvm.org/D112121
This effectively mirrors the logging in dialect conversion, which has proven
very useful for understanding the pattern application process.
Differential Revision: https://reviews.llvm.org/D112120
In the stride == 1 case, conv1d reads contiguous data along the input dimension. This can be advantageaously used to bulk memory transfers and compute while avoiding unrolling. Experimentally, this can yield speedups of up to 50%.
Reviewed By: antiagainst
Differential Revision: https://reviews.llvm.org/D112139
An InitTensorOp is replaced with an ExtractSliceOp on the InsertSliceOp's destination. This optimization is applied after analysis and only to InsertSliceOps that were decided to bufferize inplace. Another analysis on the new ExtractSliceOp is needed after the rewrite.
Differential Revision: https://reviews.llvm.org/D111955
This commit is in preparation for scf.if support.
* `condition` in findValueInReverseUseDefChain takes a Value instead of OpOperand*.
* Return a SetVector<Value> instead of a single Value. This SetVector always contains exactly one Value at the moment.
Differential Revision: https://reviews.llvm.org/D111928
This patch supports the ordered construct in OpenMP dialect following
Section 2.19.9 of the OpenMP 5.1 standard. Also lowering to LLVM IR
using OpenMP IRBduiler. Lowering to LLVM IR for ordered simd directive
is not supported yet since LLVM optimization passes do not support it
for now.
Reviewed By: kiranchandramohan, clementval, ftynse, shraiysh
Differential Revision: https://reviews.llvm.org/D110015
In a subsequent commit, getResultBuffer can return a "null" Value. This is the case when the returned buffer from an scf.if is not unique.
This commit is in preparation for scf.if support to keep the next commit smaller.
Differential Revision: https://reviews.llvm.org/D111927
This is required for bufferization of scf::IfOp, which is added in a subsequent commit.
Some ops (scf::ForOp, TiledLoopOp) require PreOrder traversal to make sure that bbArgs are mapped before bufferizing the loop body.
Differential Revision: https://reviews.llvm.org/D111924
This patch supports the ordered construct in OpenMP dialect following
Section 2.19.9 of the OpenMP 5.1 standard. Also lowering to LLVM IR
using OpenMP IRBduiler. Lowering to LLVM IR for ordered simd directive
is not supported yet since LLVM optimization passes do not support it
for now.
Reviewed By: kiranchandramohan, clementval, ftynse, shraiysh
Differential Revision: https://reviews.llvm.org/D110015
The current implementation used explicit index->int64_t casts for some, but
not all instances of passing values of type "index" in and from the sparse
support library. This revision makes the situation more consistent by
using new "index_t" type at all such places (which allows for less trivial
casting in the generated MLIR code). Note that the current revision still
assumes that "index" is 64-bit wide. If we want to support targets with
alternative "index" bit widths, we need to build the support library different.
But the current revision is a step forward by making this requirement explicit
and more visible.
Reviewed By: wrengr
Differential Revision: https://reviews.llvm.org/D112122
Add a pattern to take a rank-reducing subview and drop inner most
contiguous unit dim.
This is useful when lowering vector to backends with 1d vector types.
Reviewed By: ThomasRaoux
Differential Revision: https://reviews.llvm.org/D111561
According to the OpenMP 5.0 standard, names and hints of critical operation are
closely related. The following are the restrictions on them:
- Unless the effect is as if `hint(omp_sync_hint_none)` was specified, the
critical construct must specify a name.
- If the hint clause is specified, each of the critical constructs with the
same name must have a hint clause for which the hint-expression evaluates to
the same value.
These restrictions will be enforced by design if the hint expression is a part
of the `omp.critical.declare` operation.
- Any operation with no "name" will be considered to have
`hint(omp_sync_hint_none)`.
- All the operations with the same "name" will have the same hint value.
Reviewed By: kiranchandramohan
Differential Revision: https://reviews.llvm.org/D112134
Follow up to also use the prefixed emitters in OpFormatGen (moved
getGetterName(s) and getSetterName(s) to Operator as that is most
convenient usage wise even though it just depends on Dialect). Prefix
accessors in Test dialect and follow up on missed changes in
OpDefinitionsGen.
Differential Revision: https://reviews.llvm.org/D112118
This revision uses the newly refactored StructuredGenerator to create a simple vectorization for conv1d_nwc_wcf.
Note that the pattern is not specific to the op and is technically not even specific to the ConvolutionOpInterface (modulo minor details related to dilations and strides).
The overall design follows the same ideas as the lowering of vector::ContractionOp -> vector::OuterProduct: it seeks to be minimally complex, composable and extensible while avoiding inference analysis. Instead, we metaprogram the maps/indexings we expect and we match against them.
This is just a first stab and still needs to be evaluated for performance.
Other tradeoffs are possible that should be explored.
Reviewed By: ftynse
Differential Revision: https://reviews.llvm.org/D111894
This canonicalizer replaces reshapes of constant tensors that contain the updated shape (skipping the reshape operation).
Differential Revision: https://reviews.llvm.org/D112038
The functions are moved above the parseClauses function as they
will be used inside it to parse `hint` clause
Reviewed By: clementval
Differential Revision: https://reviews.llvm.org/D112071
Code reorganized in OpenMPDialect.cpp to have all functions corresponding to an operation together.
Added parseClauses function to avoid code duplication while parsing clauses in OpenMP operations. Also added printers and verifiers for clauses, which are being used for multiple operations.
Reviewed By: kiranchandramohan, peixin
Differential Revision: https://reviews.llvm.org/D110903
The change is based on the proposal from the following discussion:
https://llvm.discourse.group/t/rfc-memreftype-affine-maps-list-vs-single-item/3968
* Introduce `MemRefLayoutAttr` interface to get `AffineMap` from an `Attribute`
(`AffineMapAttr` implements this interface).
* Store layout as a single generic `MemRefLayoutAttr`.
This change removes the affine map composition feature and related API.
Actually, while the `MemRefType` itself supported it, almost none of the upstream
can work with more than 1 affine map in `MemRefType`.
The introduced `MemRefLayoutAttr` allows to re-implement this feature
in a more stable way - via separate attribute class.
Also the interface allows to use different layout representations rather than affine maps.
For example, the described "stride + offset" form, which is currently supported in ASM parser only,
can now be expressed as separate attribute.
Reviewed By: ftynse, bondhugula
Differential Revision: https://reviews.llvm.org/D111553
- `assign` with ArrayRef was calling `append`
- `assign` with empty ArrayRef was not clearing storage
Reviewed By: jpienaar
Differential Revision: https://reviews.llvm.org/D112043
This helper function checks if two given ops are in mutually exclusive branches of the same scf::IfOp.
Differential Revision: https://reviews.llvm.org/D111957
This revison lifts the artificial restriction on having exact matches between
source and destination type shapes. A static size may become dynamic. We still
reject changing a dynamic size into a static size to avoid the need for a
runtime "assert" on the conversion. This revision also refactors some of the
conversion code to share same-content buffers.
Reviewed By: bixia
Differential Revision: https://reviews.llvm.org/D111915
The functionality already exists in AsmParser to parse optional ArrayAttrs and
StringAttrs, but only if they are added to a NamedAttrList. This moves the
code to parse an optional attribute and add it to an list into a common
template, and exposes the simpler functionality of just parsing the optional
attributes.
Reviewed By: rriddle
Differential Revision: https://reviews.llvm.org/D111918
Use wider range for approximating Tanh to match results computed in Eigen with AVX.
Reviewed By: cota
Differential Revision: https://reviews.llvm.org/D112011
This removes edge cases where the default flags we want to use
during printing (e.g. local scope, eliding attributes, etc.)
get missed/dropped.
Differential Revision: https://reviews.llvm.org/D111761
When folding A->B->C => A->C only accept A->C that is valid shape cast
Reviewed By: ThomasRaoux, nicolasvasilache
Differential Revision: https://reviews.llvm.org/D111473
The rules were too restrictive, causing out-of-place bufferization when the result of two ExtractSliceOp is fed into an InsertSliceOp.
Differential Revision: https://reviews.llvm.org/D111861
This patch removes code very specific to affine dependence analysis and
refactors it as a FlatAfffineRelation.
A FlatAffineRelation represents a set of ordered pairs (domain -> range) where
"domain" and "range" are tuples of identifiers. These relations are used to
represent an "access relation" for memory access on a memref. An access
relation maps elements of an iteration domain to the element(s) of an array
domain accessed by that iteration of the associated statement through some
array reference. The dependence relation representing the dependence
constraints between two memory accesses can be built by composing the access
relation of the destination access by the inverse of the access relation of
source access.
This patch does not change the functionality of the existing dependence
analysis in checkMemrefAccessDependence, but refactors it to use
FlatAffineRelations to deduplicate code and enable code reuse for future
development of features like scheduling, value-based dependence analysis, etc.
Reviewed By: bondhugula
Differential Revision: https://reviews.llvm.org/D110563
This is the only lowering to Linalg Tosa has, so it's needlessly
verbose. Likely this was a carry over from IREE's usage where we
originally lowered to linalg on buffers (the only linalg that existed at
the time), so the everything on tensors needed the suffix. We're dropping
it in IREE also, having transitioned entirely to using Linalg on
tensors.
Reviewed By: sjarus
Differential Revision: https://reviews.llvm.org/D111911
Next step towards supporting sparse tensors outputs.
Also some minor refactoring of enum constants as well
as replacing tensor arguments with proper buffer arguments
(latter is required for more general sizes arguments for
the sparse_tensor.init operation, as well as more general
spares_tensor.convert operations later)
Reviewed By: wrengr
Differential Revision: https://reviews.llvm.org/D111771
This adds a new parser and printer for text which may be a keyword or a
string. When printing, it will attempt to print the text as a keyword,
but if it has any special or non-printable characters, it will be
printed as an escaped string. When parsing, it will parse either a
valid keyword or a potentially escaped string. The printer allows for an
empty string, in which case it prints `""`.
This new function is used for printing the name in NamedAttributes, and
for printing the symbol name after the `@`. In CIRCT we are using this
to print module port names, which are conceptually similar to named
function arguments.
Reviewed By: rriddle
Differential Revision: https://reviews.llvm.org/D111683
From the perspective of analysis, scf::ForOp is treated as a black box. Basic block arguments do not alias with their respective OpOperands on the ForOp, so they do not participate in conflict analysis with ops defined outside of the loop.
However, bufferizesToMemoryRead and bufferizesToMemoryWrite on the scf::ForOp itself are used to determine how the scf::ForOp interacts with its surrounding ops.
Differential Revision: https://reviews.llvm.org/D111775
For each memory read, follow SSA use-def chains to find the op that produces the data being read (i.e., the most recent write). A memory write to an alias is a conflict if it takes places after the "most recent write" but before the read.
This CL introduces two main changes:
* There is a concise definition of a conflict. Given a piece of IR with InPlaceSpec annotations and a computes alias set, it is easy to compute whether this program has a conflict. No need to consider multiple cases such as "read of operand after in-place write" etc.
* No need to check for clobbering.
Differential Revision: https://reviews.llvm.org/D111287
Allow emitting get & set prefix for accessors generated for ops. If
enabled, then the argument/return/region name gets converted from
snake_case to UpperCamel and prefix added. The attribute also allows
generating both the current "raw" method along with the prefix'd one to
make it easier to stage changes.
The option is added on the dialect and currently defaults to existing
raw behavior. The expectation is that the staging where both are
generated would be short lived and so optimized to keeping the changes
local/less invasive (it just generates two functions for each accessor
with the same body - most of these internally again call a helper
function). But generation can be optimized if needed.
I'm unsure about OpAdaptor classes as there it is all get methods (it is
a named view into raw data structures), so prefix doesn't add much.
This starts with emitting raw-only form (as current behavior) as
default, then one can opt-in to raw & prefixed, then just prefixed. The
default in OpBase will switch to prefixed-only to be consistent with
MLIR style guide. And the option potentially removed later (considered
enabling specifying prefix but current discussion more pro keeping it
limited and stuck with that).
Also add more explicit checking for pruned functions to avoid emitting
where no function was added (and so avoiding dereferencing nullptr)
during op def/decl generation.
See https://bugs.llvm.org/show_bug.cgi?id=51916 for further discussion.
Differential Revision: https://reviews.llvm.org/D111033
Emit reduction during op vectorization instead of doing it when creating the
transfer write. This allow us to not broadcast output arguments for reduction
initial value.
Differential Revision: https://reviews.llvm.org/D111825
Part of the arith update broke UiToFp32. Fixed the lowering and included a new
test to detect a regression.
Differential Revision: https://reviews.llvm.org/D111772
I am unclear this is reproducible with correct IR but atm the verifier for InsertSliceOp
is not powerful enough and this triggers an infinite loop that is worth fixing independently.
Differential Revision: https://reviews.llvm.org/D111812