output has zero rank.
While lowering to loops, no indices should be used in the load/store
operation if the buffer is zero-rank.
Differential Revision: https://reviews.llvm.org/D75391
This commit updates SPIR-V dialect to support integer signedness
by relaxing various checks for signless to just normal integers.
The hack for spv.Bitcast can now be removed.
Differential Revision: https://reviews.llvm.org/D75611
A previous commit added support for integer signedness in C++
IntegerType. This change introduces ODS definitions for
integer types and integer (element) attributes w.r.t. signedness.
This commit also updates various existing definitions' descriptions
to mention signless where suitable to make it more clear.
Positive and non-negative integer attributes are removed to avoid
the explosion of subclasses. Instead, one should use more atmoic
constraints together with Confined to model that. For example,
`Confined<..., [IntPositive]>`.
Differential Revision: https://reviews.llvm.org/D75610
This matches loops with a affine.min upper bound, limiting the trip
count to a constant, and rewrites them into two loops, one with constant
upper bound and one with variable upper bound. The assumption is that
the constant upper bound loop will be unrolled and vectorized, which is
preferable if this is the hot path.
Differential Revision: https://reviews.llvm.org/D75240
Summary:
AffineApplyNormalizer provides common logic for folding affine maps that appear
in affine.apply into other affine operations that use the result of said
affine.apply. In the process, affine maps of both operations are composed.
During the composition `A.compose(B)` the symbols from the map A are placed
before those of the map B in a single concatenated symbol list. However,
AffineApplyNormalizer was ordering the operands of the operation being
normalized by iteratively appending the symbols into a single list accoridng to
the operand order, regardless of whether these operands are symbols of the
current operation or of the map that is being folded into it. This could lead
to wrong order of symbols and, when the symbols were bound to constant values,
to visibly incorrect folding of constants into affine maps as reported in
PR45031. Make sure symbols operands to the current operation are always placed
before symbols coming from the folded maps.
Update the test that was exercising the incorrect folder behavior. For some
reason, the order of symbol operands was swapped in the test input compared to
the previous operations, making it easy to assume the correct maps were
produced whereas they were swapping the symbols back due to the problem
described above.
Closes https://bugs.llvm.org/show_bug.cgi?id=45031
Differential Revision: https://reviews.llvm.org/D75247
This commit handles folding spv.LogicalAnd/spv.LogicalOr when
one of the operands is constant true/false.
Differential Revision: https://reviews.llvm.org/D75195
Summary:
The mapper assigns annotations to loop.parallel operations that
are compatible with the loop to gpu mapping pass. The outermost
loop uses the grid dimensions, followed by block dimensions. All
remaining loops are mapped to sequential loops.
Differential Revision: https://reviews.llvm.org/D74963
Previously C++ test passes for SPIR-V were put under
test/Dialect/SPIRV. Move them to test/lib/Dialect/SPIRV
to create a better structure.
Also fixed one of the test pass to use new
PassRegistration mechanism.
Differential Revision: https://reviews.llvm.org/D75066
This exploits the fact that the iterations of parallel loops are
independent so tiling becomes just an index transformation. This pass
only tiles the innermost loop of a loop nest.
The ultimate goal is to allow vectorization of the tiled loops, but I
don't think we're there yet with the current rewriting, as the tiled
loops don't have a constant trip count.
Differential Revision: https://reviews.llvm.org/D74954
This revision add support for formatting successor variables in a similar way to operands, attributes, etc.
Differential Revision: https://reviews.llvm.org/D74789
This revision add support in ODS for specifying the successors of an operation. Successors are specified via the `successors` list:
```
let successors = (successor AnySuccessor:$target, AnySuccessor:$otherTarget);
```
Differential Revision: https://reviews.llvm.org/D74783
This patch implements the RFCs proposed here:
https://llvm.discourse.group/t/rfc-modify-ifop-in-loop-dialect-to-yield-values/463https://llvm.discourse.group/t/rfc-adding-operands-and-results-to-loop-for/459/19.
It introduces the following changes:
- All Loop Ops region, except for ReduceOp, terminate with a YieldOp.
- YieldOp can have variadice operands that is used to return values out of IfOp and ForOp regions.
- Change IfOp and ForOp syntax and representation to define values.
- Add unit-tests and update .td documentation.
- YieldOp is a terminator to loop.for/if/parallel
- YieldOp custom parser and printer
Lowering is not supported at the moment, and will be in a follow-up PR.
Thanks.
Reviewed By: bondhugula, nicolasvasilache, rriddle
Differential Revision: https://reviews.llvm.org/D74174
Summary:
This implements the last step for lowering vector.contract progressively
to LLVM IR (except for masks). Multi-dimensional reductions that remain
after expanding all parallel dimensions are lowered into into simpler
vector.contract operations until a trivial 1-dim reduction remains.
Reviewers: nicolasvasilache, andydavis1
Reviewed By: andydavis1
Subscribers: mehdi_amini, rriddle, jpienaar, burmako, shauheen, antiagainst, nicolasvasilache, arpith-jacob, mgester, lucyrfox, liufengdb, Joonsoo, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D74880
Summary:
Lowers all free/batch dimensions in a vector.contract progressively
into simpler vector.contract operations until a direct vector.reduction
operation is reached. Then lowers 1-D reductions into vector.reduce.
Still TBD:
multi-dimensional contractions that remain after removing all the parallel dims
Reviewers: nicolasvasilache, andydavis1, rriddle
Reviewed By: andydavis1
Subscribers: mehdi_amini, rriddle, jpienaar, burmako, shauheen, antiagainst, nicolasvasilache, arpith-jacob, mgester, lucyrfox, liufengdb, Joonsoo, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D74797
Summary:
This trait takes three arguments: lhs, rhs, transformer. It verifies that the type of 'rhs' matches the type of 'lhs' when the given 'transformer' is applied to 'lhs'. This allows for adding constraints like: "the type of 'a' must match the element type of 'b'". A followup revision will add support in the declarative parser for using these equality constraints to port more c++ parsers to the declarative form.
Differential Revision: https://reviews.llvm.org/D74647
Summary:
the .row.col variant turns out to be the popular one, contrary to what I
thought as .row.row. Since .row.col is so prevailing (as I inspect
cuDNN's behavior), I'm going to remove the .row.row support here, which
makes the patch a little bit easier.
Reviewers: ftynse
Subscribers: jholewinski, bixia, sanjoy.google, mehdi_amini, rriddle, jpienaar, burmako, shauheen, antiagainst, nicolasvasilache, arpith-jacob, mgester, lucyrfox, liufengdb, Joonsoo, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D74655
Fixing a bug where using a zero-rank shaped type operand to
linalg.generic ops hit an unrelated assert. This also meant that
lowering the operation to loops was not supported. Adding roundtrip
tests and lowering to loops test for zero-rank shaped type operand
with fixes to make the test pass.
Differential Revision: https://reviews.llvm.org/D74638
Summary:
Linalg's promotion pass was only supporting f32 buffers due to how the
zero value was build for the `fill` operation.
Moreover, `promoteSubViewOperands` was returning a vector with one entry
per float subview while omitting integer subviews. For a program
with only integer subviews the return vector would be of size 0.
However, `promoteSubViewsOperands` would try to access a non zero
number of entries of this vector, resulting in a sefgault.
Reviewers: nicolasvasilache, ftynse
Reviewed By: ftynse
Subscribers: mehdi_amini, rriddle, jpienaar, burmako, shauheen, antiagainst, nicolasvasilache, arpith-jacob, mgester, lucyrfox, liufengdb, Joonsoo, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D74532
Summary: This revision adds support to the declarative parser for formatting enum attributes in the symbolized form. It uses this new functionality to port several of the SPIRV parsers over to the declarative form.
Differential Revision: https://reviews.llvm.org/D74525
Summary:
This sets the basic framework for lowering vector.contract progressively
into simpler vector.contract operations until a direct vector.reduction
operation is reached. More details will be filled out progressively as well.
Reviewers: nicolasvasilache
Reviewed By: nicolasvasilache
Subscribers: mehdi_amini, rriddle, jpienaar, burmako, shauheen, antiagainst, nicolasvasilache, arpith-jacob, mgester, lucyrfox, liufengdb, Joonsoo, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D74520
Thus far we have been using builtin func op to model SPIR-V functions.
It was because builtin func op used to have special treatment in
various parts of the core codebase (e.g., pass pipelines, etc.) and
it's easy to bootstrap the development of the SPIR-V dialect. But
nowadays with general op concepts and region support we don't have
such limitations and it's time to tighten the SPIR-V dialect for
completeness.
This commits introduces a spv.func op to properly model SPIR-V
functions. Compared to builtin func op, it can provide the following
benefits:
* We can control the full op so we can integrate SPIR-V information
bits (e.g., function control) in a more integrated way and define
our own assembly form and enforcing better verification.
* We can have a better dialect and library boundary. At the current
moment only functions are modelled with an external op. With this
change, all ops modelling SPIR-V concpets will be spv.* ops and
registered to the SPIR-V dialect.
* We don't need to special-case func op anymore when creating
ConversionTarget declaring SPIR-V dialect as legal. This is quite
important given we'll see more and more conversions in the future.
In the process, bumps a few FuncOp methods to the FunctionLike trait.
Differential Revision: https://reviews.llvm.org/D74226
In the previous state, we were relying on forcing the linker to include
all libraries in the final binary and the global initializer to self-register
every piece of the system. This change help moving away from this model, and
allow users to compose pieces more freely. The current change is only "fixing"
the dialect registration and avoiding relying on "whole link" for the passes.
The translation is still relying on the global registry, and some refactoring
is needed to make this all more convenient.
Differential Revision: https://reviews.llvm.org/D74461
Summary:
After D72555 has been landed, `linalg.indexed_generic` also accepts ranked
tensor as input and output. Add a test for it.
Differential Revision: https://reviews.llvm.org/D74267
The existing (default) calling convention for memrefs in standard-to-LLVM
conversion was motivated by interfacing with LLVM IR produced from C sources.
In particular, it passes a pointer to the memref descriptor structure when
calling the function. Therefore, the descriptor is allocated on stack before
the call. This convention leads to several problems. PR44644 indicates a
problem with stack exhaustion when calling functions with memref-typed
arguments in a loop. Allocating outside of the loop may lead to concurrent
access problems in case the loop is parallel. When targeting GPUs, the contents
of the stack-allocated memory for the descriptor (passed by pointer) needs to
be explicitly copied to the device. Using an aggregate type makes it impossible
to attach pointer-specific argument attributes pertaining to alignment and
aliasing in the LLVM dialect.
Change the default calling convention for memrefs in standard-to-LLVM
conversion to transform a memref into a list of arguments, each of primitive
type, that are comprised in the memref descriptor. This avoids stack allocation
for ranked memrefs (and thus stack exhaustion and potential concurrent access
problems) and simplifies the device function invocation on GPUs.
Provide an option in the standard-to-LLVM conversion to generate auxiliary
wrapper function with the same interface as the previous calling convention,
compatible with LLVM IR porduced from C sources. These auxiliary functions
pack the individual values into a descriptor structure or unpack it. They also
handle descriptor stack allocation if necessary, serving as an allocation
scope: the memory reserved by `alloca` will be freed on exiting the auxiliary
function.
The effect of this change on MLIR-generated only LLVM IR is minimal. When
interfacing MLIR-generated LLVM IR with C-generated LLVM IR, the integration
only needs to require auxiliary functions and change the function name to call
the wrapper function instead of the original function.
This also opens the door to forwarding aliasing and alignment information from
memrefs to LLVM IR pointers in the standrd-to-LLVM conversion.
Summary:
The `vector.fma` operation is portable enough across targets that we do not want
to keep it wrapped under `vector.outerproduct` and `llvm.intrin.fmuladd`.
This revision lifts the op into the vector dialect and implements the lowering to LLVM by using two patterns:
1. a pattern that lowers from n-D to (n-1)-D by unrolling when n > 2
2. a pattern that converts from 1-D to the proper LLVM representation
Reviewers: ftynse, stellaraccident, aartbik, dcaballe, jsetoain, tetuante
Reviewed By: aartbik
Subscribers: fhahn, dcaballe, merge_guards_bot, mehdi_amini, rriddle, jpienaar, burmako, shauheen, antiagainst, arpith-jacob, mgester, lucyrfox, aartbik, liufengdb, Joonsoo, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D74075
Summary:
This revision exposes the portable `llvm.fma` intrinsic in LLVMOps and uses it
in lieu of `llvm.fmuladd` when lowering the `vector.outerproduct` op to LLVM.
This guarantees proper `fma` instructions will be emitted if the target ISA
supports it.
`llvm.fmuladd` does not have this guarantee in its semantics, despite evidence
that the proper x86 instructions are emitted.
For more details, see https://llvm.org/docs/LangRef.html#llvm-fmuladd-intrinsic.
Reviewers: ftynse, aartbik, dcaballe, fhahn
Reviewed By: aartbik
Subscribers: mehdi_amini, rriddle, jpienaar, burmako, shauheen, antiagainst, arpith-jacob, mgester, lucyrfox, liufengdb, Joonsoo, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D74219
The initial implementation of the fusion operation exposes a method to
fuse a consumer with its producer, when
- both the producer and consumer operate on tensors
- the producer has only a single result value
- the producer has only "parallel" iterator types
A new interface method hasTensorSemantics is added to verify that an
operation has all operands and results of type RankedTensorType.
Differential Revision: https://reviews.llvm.org/D74172
Summary:
Previously, vector.contract did not allow an empty set of
free or batch dimensions (K = 0) which defines a basic
reduction into a scalar (like a dot product). This CL
relaxes that restriction. Also adds constraints on
element type of operands and results. With tests.
Reviewers: nicolasvasilache, andydavis1, rriddle
Reviewed By: andydavis1
Subscribers: merge_guards_bot, mehdi_amini, rriddle, jpienaar, burmako, shauheen, antiagainst, nicolasvasilache, arpith-jacob, mgester, lucyrfox, liufengdb, Joonsoo, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D74014
Summary:
[mlir][VectorOps] Support vector transfer_read/write unrolling for memrefs with vector element type. When unrolling vector transfer read/write on memrefs with vector element type, the indices used to index the memref argument must be updated to reflect the unrolled operation. However, in the case of memrefs with vector element type, we need to be careful to only update the relevant memref indices.
For example, a vector transfer read with the following source/result types, memref<6x2x1xvector<2x4xf32>>, vector<2x1x2x4xf32>, should only update memref indices 1 and 2 during unrolling.
Reviewers: nicolasvasilache, aartbik
Reviewed By: nicolasvasilache, aartbik
Subscribers: lebedev.ri, Joonsoo, merge_guards_bot, mehdi_amini, rriddle, jpienaar, burmako, shauheen, antiagainst, arpith-jacob, mgester, lucyrfox, liufengdb, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D72965
Summary:
Add ShapeCastOp to the vector ops dialect.
The shape_cast operation casts between an n-D source vector shape and a k-D result vector shape (the element type remains the same).
Reviewers: nicolasvasilache, aartbik
Reviewed By: nicolasvasilache
Subscribers: Joonsoo, merge_guards_bot, mehdi_amini, rriddle, jpienaar, burmako, shauheen, antiagainst, arpith-jacob, mgester, lucyrfox, liufengdb, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D73635
We were using normal dictionary attribute for target environment
specification. It becomes cumbersome with more and more fields.
This commit changes the modelling to a dialect-specific attribute,
where we can have control over its storage and assembly form.
Differential Revision: https://reviews.llvm.org/D73959
Summary:
This patch is a step towards enabling BUILD_SHARED_LIBS=on, which
builds most libraries as DLLs instead of statically linked libraries.
The main effect of this is that incremental build times are greatly
reduced, since usually only one library need be relinked in response
to isolated code changes.
The bulk of this patch is fixing incorrect usage of cmake, where library
dependencies are listed under add_dependencies rather than under
target_link_libraries or under the LINK_LIBS tag. Correct usage should be
like this:
add_dependencies(MLIRfoo MLIRfooIncGen)
target_link_libraries(MLIRfoo MLIRlib1 MLIRlib2)
A separate issue is that in cmake, dependencies between static libraries
are automatically included in dependencies. In the above example, if MLIBlib1
depends on MLIRlib2, then it is sufficient to have only MLIRlib1 in the
target_link_libraries. When compiling with shared libraries, it is necessary
to have both MLIRlib1 and MLIRlib2 specified if MLIRfoo uses symbols from both.
Reviewers: mravishankar, antiagainst, nicolasvasilache, vchuravy, inouehrs, mehdi_amini, jdoerfert
Reviewed By: nicolasvasilache, mehdi_amini
Subscribers: Joonsoo, merge_guards_bot, jholewinski, mgorny, mehdi_amini, rriddle, jpienaar, burmako, shauheen, antiagainst, csigg, arpith-jacob, mgester, lucyrfox, herhut, aartbik, liufengdb, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D73653
This commit adds two resource limits, max_compute_workgroup_size
and max_compute_workgroup_invocations as resource limits to
the target environment. They are not used at the current moment,
but they will affect the SPIR-V CodeGen. Adding for now to have
a proper target environment modelling.
Differential Revision: https://reviews.llvm.org/D73905
Summary: This revision add support for accepting a few type constraints, e.g. AllTypesMatch, when inferring types for operands and results. This is used to remove the c++ parsers for several additional operations.
Differential Revision: https://reviews.llvm.org/D73735
LinalgDependenceGraph was not updated after successful producer-consumer
fusion for linalg ops. In this patch it is fixed by reconstructing
LinalgDependenceGraph on every iteration. This is very ineffective and
should be improved by updating LDGraph only when it is necessary.
Summary:
In the original design, gpu.launch required explicit capture of uses
and passing them as operands to the gpu.launch operation. This was
motivated by infrastructure restrictions rather than design. This
change lifts the requirement and removes the concept of kernel
arguments from gpu.launch. Instead, the kernel outlining
transformation now does the explicit capturing.
This is a breaking change for users of gpu.launch.
Differential Revision: https://reviews.llvm.org/D73769
Summary:
After the `subview` operation was migrated from Linalg to Standard, it changed
semantics and does not guarantee the absence of out-of-bounds accesses through
the created view anymore. Compute the size of the subview to make sure it
always fits within the view (subviews in last iterations of the loops may be
smaller than those in other iterations).
Differential Revision: https://reviews.llvm.org/D73614
Summary:
This revision switches over many operations to use the declarative methods for defining the assembly specification. This updates operations in the NVVM, ROCDL, Standard, and VectorOps dialects.
Differential Revision: https://reviews.llvm.org/D73407
Summary:
The 'gpu.terminator' operation is used as the terminator for the
regions of gpu.launch. This is to disambugaute them from the
return operation on 'gpu.func' functions.
This is a breaking change and users of the gpu dialect will need
to adapt their code when producting 'gpu.launch' operations.
Reviewers: nicolasvasilache
Subscribers: mehdi_amini, rriddle, jpienaar, burmako, shauheen, antiagainst, csigg, arpith-jacob, mgester, lucyrfox, liufengdb, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D73620
Summary:
Canonicalization and folding patterns in StandardOps may interfere with the needs
of Linalg. This revision introduces specific foldings for dynamic memrefs that can
be proven to be static.
Very concretely:
Determines whether it is possible to fold it away in the parent Linalg op:
```mlir
%1 = memref_cast %0 : memref<8x16xf32> to memref<?x?xf32>
%2 = linalg.slice %1 ... : memref<?x?xf32> ...
// or
%1 = memref_cast %0 : memref<8x16xf32, affine_map<(i, j)->(16 * i + j)>>
to memref<?x?xf32>
linalg.generic(%1 ...) : memref<?x?xf32> ...
```
into
```mlir
%2 = linalg.slice %0 ... : memref<8x16xf32> ...
// or
linalg.generic(%0 ... : memref<8x16xf32, affine_map<(i, j)->(16 * i + j)>>
```
Reviewers: ftynse, aartbik, jsetoain, tetuante, asaadaldien
Subscribers: mehdi_amini, rriddle, jpienaar, burmako, shauheen, antiagainst, arpith-jacob, mgester, lucyrfox, liufengdb, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D73565
Summary:
Barrier is a simple operation that takes no arguments and returns
nothing, but implies a side effect (synchronization of all threads)
Reviewers: jdoerfert
Subscribers: mgorny, guansong, mehdi_amini, rriddle, jpienaar, burmako, shauheen, antiagainst, nicolasvasilache, arpith-jacob, mgester, lucyrfox, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D72400
Summary:
This diff adds a transformation patter to rewrite linalg.fill as broadcasting a scaler into a vector.
It uses the same preconditioning as matmul (memory is contiguous).
Reviewers: nicolasvasilache
Subscribers: mehdi_amini, rriddle, jpienaar, burmako, shauheen, antiagainst, arpith-jacob, mgester, lucyrfox, aartbik, liufengdb, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D73391
Thus far certain SPIR-V ops have been required to be in spv.module.
While this provides strong verification to catch unexpected errors,
it's quite rigid and makes progressive lowering difficult. Sometimes
we would like to partially lower ops from other dialects, which may
involve creating ops like global variables that should be placed in
other module-like ops. So this commit relaxes the requirement of
such SPIR-V ops' scope to module-like ops. Similarly for function-
like ops.
Differential Revision: https://reviews.llvm.org/D73415
Summary:
Rewrites the extract/insert_slices operation in terms of
strided_slice/insert_strided_slice ops with intermediate
tuple uses (that should get optimimized away with typical
usage). This is done in a separate "pass" to enable testing
this particular rewriting in isolation.
Reviewers: nicolasvasilache, andydavis1, ftynse
Reviewed By: nicolasvasilache
Subscribers: merge_guards_bot, mehdi_amini, rriddle, jpienaar, burmako, shauheen, antiagainst, arpith-jacob, mgester, lucyrfox, liufengdb, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D73295
Summary:
This is based on the use of code constantly checking for an attribute on
a model and instead represents the distinct operaion with a different
op. Instead, this op can be used to provide better filtering.
Reverts "Revert "[mlir] Create a gpu.module operation for the GPU Dialect.""
This reverts commit ac446302ca4145cdc89f377c0c364c29ee303be5 after
fixing internal Google issues.
This additionally updates ROCDL lowering to use the new gpu.module.
Reviewers: herhut, mravishankar, antiagainst, nicolasvasilache
Subscribers: jholewinski, mgorny, mehdi_amini, jpienaar, burmako, shauheen, csigg, arpith-jacob, mgester, lucyrfox, aartbik, liufengdb, llvm-commits, mravishankar, rriddle, antiagainst, bkramer
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D72921
Summary:
Add a `llvm.cmpxchg` op as a counterpart to LLVM IR's `cmpxchg` instruction.
Note that the `weak`, `volatile`, and `syncscope` attributes are not yet supported.
This will be useful for upcoming parallel versions of affine.for and generally
for reduction-like semantics (especially for reductions that can't make use
of `atomicrmw`, e.g. `fmax`).
Reviewers: ftynse, nicolasvasilache
Reviewed By: ftynse
Subscribers: merge_guards_bot, jfb, mehdi_amini, rriddle, jpienaar, burmako, shauheen, antiagainst, arpith-jacob, mgester, lucyrfox, liufengdb, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D72995
Summary:
Generalize broadcastable trait to variadic operands. Update the
documentation that still talked about element type as part of
broadcastable trait (that bug was already fixed). Also rename
Broadcastable to ResultBroadcastableShape to be more explicit that the
trait affects the result shape (it is possible for op to allow
broadcastable operands but not have result shape that is broadcast
compatible with operands).
Doing some intermediate work to have getBroadcastedType take an optional
elementType as input and use that if specified, instead of the common
element type of type1 and type2 in this function.
Differential Revision: https://reviews.llvm.org/D72559
Summary:
Modernize some of the existing custom parsing code in the LLVM dialect.
While this reduces some boilerplate code, it also reduces the precision
of the diagnostic error messges.
Reviewers: ftynse, nicolasvasilache, rriddle
Reviewed By: rriddle
Subscribers: merge_guards_bot, mehdi_amini, rriddle, jpienaar, burmako, shauheen, antiagainst, arpith-jacob, mgester, lucyrfox, liufengdb, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D72967
Summary:
This is a simple extension to allow vectorization to work not only on GenericLinalgOp
but more generally across named ops too.
For now, this still only vectorizes matmul-like ops but is a step towards more
generic vectorization of Linalg ops.
Reviewers: ftynse
Subscribers: mehdi_amini, rriddle, jpienaar, burmako, shauheen, antiagainst, arpith-jacob, mgester, lucyrfox, aartbik, liufengdb, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D72942
Summary:
This op is the counterpart to LLVM's atomicrmw instruction. Note that
volatile and syncscope attributes are not yet supported.
This will be useful for upcoming parallel versions of `affine.for` and generally
for reduction-like semantics.
Differential Revision: https://reviews.llvm.org/D72741
Summary:
This is a temporary implementation to support Flang. The LLVM-IR parser
will need to be extended in some way to support recursive types. The
exact approach here is still a work-in-progress.
Unfortunately, this won't pass roundtrip testing yet. Adding a comment
to the test file as a reminder.
Differential Revision: https://reviews.llvm.org/D72542
In SPIR-V, when a new version is introduced, it is possible some
existing extensions will be incorporated into it so that it becomes
implicitly declared if targeting the new version. This affects
conversion target specification because we need to take this into
account when allowing what extensions to use.
For a capability, it may also implies some other capabilities,
for example, the `Shader` capability implies `Matrix` the capability.
This should also be taken into consideration when preparing the
conversion target: when we specify an capability is allowed, all
its recursively implied capabilities are also allowed.
This commit adds utility functions to query implied extensions for
a given version and implied capabilities for a given capability
and updated SPIRVConversionTarget to use them.
This commit also fixes a bug in availability spec. When a symbol
(op or enum case) can be enabled by an extension, we should drop
it's minimal version requirement. Being enabled by an extension
naturally means the symbol can be used by *any* SPIR-V version
as long as the extension is supported. The grammar still encodes
the 'version' field for such cases, but it should be interpreted
as a different way: rather than meaning a minimal version
requirement, it says the symbol becomes core at that specific
version.
Differential Revision: https://reviews.llvm.org/D72765
This commit defines a new SPIR-V dialect attribute for specifying
a SPIR-V target environment. It is a dictionary attribute containing
the SPIR-V version, supported extension list, and allowed capability
list. A SPIRVConversionTarget subclass is created to take in the
target environment and sets proper dynmaically legal ops by querying
the op availability interface of SPIR-V ops to make sure they are
available in the specified target environment. All existing conversions
targeting SPIR-V is changed to use this SPIRVConversionTarget. It
probes whether the input IR has a `spv.target_env` attribute,
otherwise, it uses the default target environment: SPIR-V 1.0 with
Shader capability and no extra extensions.
Differential Revision: https://reviews.llvm.org/D72256
Summary:
This diff fixes issues with the semantics of linalg.generic on tensors that appeared when converting directly from HLO to linalg.generic.
The changes are self-contained within MLIR and can be captured and tested independently of XLA.
The linalg.generic and indexed_generic are updated to:
To allow progressive lowering from the value world (a.k.a tensor values) to
the buffer world (a.k.a memref values), a linalg.generic op accepts
mixing input and output ranked tensor values with input and output memrefs.
```
%1 = linalg.generic #trait_attribute %A, %B {other-attributes} :
tensor<?x?xf32>,
memref<?x?xf32, stride_specification>
-> (tensor<?x?xf32>)
```
In this case, the number of outputs (args_out) must match the sum of (1) the
number of output buffer operands and (2) the number of tensor return values.
The semantics is that the linalg.indexed_generic op produces (i.e.
allocates and fills) its return values.
Tensor values must be legalized by a buffer allocation pass before most
transformations can be applied. Such legalization moves tensor return values
into output buffer operands and updates the region argument accordingly.
Transformations that create control-flow around linalg.indexed_generic
operations are not expected to mix with tensors because SSA values do not
escape naturally. Still, transformations and rewrites that take advantage of
tensor SSA values are expected to be useful and will be added in the near
future.
Subscribers: bmahjour, mehdi_amini, rriddle, jpienaar, burmako, shauheen, antiagainst, arpith-jacob, mgester, lucyrfox, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D72555
Summary:
This is based on the use of code constantly checking for an attribute on
a model and instead represents the distinct operaion with a different
op. Instead, this op can be used to provide better filtering.
Reviewers: herhut, mravishankar, antiagainst, rriddle
Reviewed By: herhut, antiagainst, rriddle
Subscribers: liufengdb, aartbik, jholewinski, mgorny, mehdi_amini, rriddle, jpienaar, burmako, shauheen, antiagainst, nicolasvasilache, csigg, arpith-jacob, mgester, lucyrfox, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D72336
Summary: The current syntax for AffineMapAttr and IntegerSetAttr conflict with function types, making it currently impossible to round-trip function types(and e.g. FuncOp) in the IR. This revision changes the syntax for the attributes by wrapping them in a keyword. AffineMapAttr is wrapped with `affine_map<>` and IntegerSetAttr is wrapped with `affine_set<>`.
Reviewed By: nicolasvasilache, ftynse
Differential Revision: https://reviews.llvm.org/D72429
Introduce a set of function that promote a memref argument of a `gpu.func` to
workgroup memory using memory attribution. The promotion boils down to
additional loops performing the copy from the original argument to the
attributed memory in the beginning of the function, and back at the end of the
function using all available threads. The loop bounds are specified so as to
adapt to any size of the workgroup. These utilities are intended to compose
with other existing utilities (loop coalescing and tiling) in cases where the
distribution of work across threads is uneven, e.g. copying a 2D memref with
only the threads along the "x" dimension. Similarly, specialization of the
kernel to specific launch sizes should be implemented as a separate pass
combining constant propagation and canonicalization.
Introduce a simple attribute-driven pass to test the promotion transformation
since we don't have a heuristic at the moment.
Differential revision: https://reviews.llvm.org/D71904
Summary:
This diff adds lowering of the linalg.reshape op to LLVM.
A new descriptor is created with fields initialized as follows:
1. allocatedPTr, alignedPtr and offset are copied from the source descriptor
2. sizes are copied from the static destination shape
3. strides are copied from the static strides collected with `getStridesAndOffset`
Only the static case in which the target view conforms to strided memref
semantics is supported. Other cases are left for future work and will be added on
a per-need basis.
Reviewers: ftynse, mravishankar
Subscribers: mehdi_amini, rriddle, jpienaar, burmako, shauheen, antiagainst, arpith-jacob, mgester, lucyrfox, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D72316
Summary:
This diff adds a new operation to linalg to allow reshaping of an
existing view into a new view in the same buffer at the same offset.
More specifically:
The `linalg.reshape` op produces a new view whose sizes are a reassociation
of the original `view`. Depending on whether or not the reassociated
MemRefType is contiguous, the resulting memref may require explicit alloc
and copies.
A reassociation is defined as a continous grouping of dimensions and is
represented with a affine map array attribute. In the future, non-continous
groupings may be allowed (i.e. permutations, reindexings etc).
For now, it is assumed that either:
1. a reassociation produces and consumes contiguous MemRefType or,
2. the reshape op will be folded into its consumers (by changing the shape
of the computations).
All other cases are undefined behavior and a reshape op may not lower to
LLVM if it cannot be proven statically that it does not require alloc+copy.
A reshape may either collapse or expand dimensions, depending on the
relationship between source and target memref ranks. The verification rule
is that the reassociation maps are applied to the memref with the larger
rank to obtain the memref with the smaller rank. In the case of a dimension
expansion, the reassociation maps can be interpreted as inverse maps.
Examples:
```mlir
// Dimension collapse (i, j) -> i' and k -> k'
%1 = linalg.reshape %0 [(i, j, k) -> (i, j),
(i, j, k) -> (k)] :
memref<?x?x?xf32, stride_spec> into memref<?x?xf32, stride_spec_2>
```
```mlir
// Dimension expansion i -> (i', j') and (k) -> (k')
%1 = linalg.reshape %0 [(i, j, k) -> (i, j),
(i, j, k) -> (k)] :
memref<?x?xf32, stride_spec> into memref<?x?x?xf32, stride_spec_2>
```
The relevant invalid and roundtripping tests are added.
Reviewers: AlexEichenberger, ftynse, rriddle, asaadaldien, yangjunpro
Subscribers: kiszk, merge_guards_bot, mehdi_amini, jpienaar, burmako, shauheen, antiagainst, arpith-jacob, mgester, lucyrfox, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D72168
This commit fixes shader ABI attributes to use `spv.` as the prefix
so that they match the dialect's namespace. This enables us to add
verification hooks in the SPIR-V dialect to verify them.
Reviewed By: mravishankar
Differential Revision: https://reviews.llvm.org/D72062
This commit updates gen_spirv_dialect.py to query the grammar and
generate availability spec for various enum attribute definitions
and all defined ops.
Reviewed By: mravishankar
Differential Revision: https://reviews.llvm.org/D72095
Summary:
This diff adds support to allow `linalg.generic` and
`linalg.indexed_generic` to take tensor input and output
arguments.
The subset of output tensor operand types must appear
verbatim in the result types after an arrow. The parser,
printer and verifier are extended to accomodate this
behavior.
The Linalg operations now support variadic ranked tensor
return values. This extension exhibited issues with the
current handling of NativeCall in RewriterGen.cpp. As a
consequence, an explicit cast to `SmallVector<Value, 4>`
is added in the proper place to support the new behavior
(better suggestions are welcome).
Relevant cleanups and name uniformization are applied.
Relevant invalid and roundtrip test are added.
Reviewers: mehdi_amini, rriddle, jpienaar, antiagainst, ftynse
Subscribers: burmako, shauheen, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D72022
Lots of SPIR-V ops take enum attributes and certain enum cases
need extra capabilities or extensions to be available. This commit
extends to allow specifying availability spec on enum cases.
Extra utility functions are generated for the corresponding enum
classes to return the availability requirement. The availability
interface implemention for a SPIR-V op now goes over all enum
attributes to collect the availability requirements.
Reviewed By: mravishankar
Differential Revision: https://reviews.llvm.org/D71947
SPIR-V has a few mechanisms to control op availability: version,
extension, and capabilities. These mechanisms are considered as
different availability classes.
This commit introduces basic definitions for modelling SPIR-V
availability classes. Specifically, an `Availability` class is
added to SPIRVBase.td, along with two subclasses: MinVersion
and MaxVersion for versioning. SPV_Op is extended to take a
list of `Availability`. Each `Availability` instance carries
information for generating op interfaces for the corresponding
availability class and also the concrete availability
requirements.
With the availability spec on ops, we can now auto-generate the
op interfaces of all SPIR-V availability classes and also
synthesize the op's implementations of these interfaces. The
interface generation is done via new TableGen backends
-gen-avail-interface-{decls|defs}. The op's implementation is
done via -gen-spirv-avail-impls.
Differential Revision: https://reviews.llvm.org/D71930
This will allow us to lower most of gpu.all_reduce (when all_reduce
doesn't exist in the target dialect) within the GPU dialect, and only do
target-specific lowering for the shuffle op.
PiperOrigin-RevId: 286548256
Update vector transfer_read/write ops to operatate on memrefs with vector element type.
This handle cases where the memref vector element type represents the minimal memory transfer unit (or multiple of the minimal memory transfer unit).
PiperOrigin-RevId: 286482115
Adds vector ReshapeOp to the VectorOps dialect. An aggregate vector reshape operation, which aggregates multiple hardware vectors, can enable optimizations during decomposition (e.g. loading one input hardware vector and performing multiple rotate and scatter store operations to the vector output).
PiperOrigin-RevId: 286440658
Introduces some centralized methods to move towards
consistent use of i32 as vector subscripts.
Note: sizes/strides/offsets attributes are still i64
PiperOrigin-RevId: 286434133
When memory attributions are present in `gpu.func`, require that they are of
memref type and live in memoryspaces 3 and 5 for workgroup and private memory
attributions, respectively. Adapt the conversion from the GPU dialect to the
NVVM dialect to drop the private memory space from attributions as NVVM is able
to model them as local `llvm.alloca`s in the default memory space.
PiperOrigin-RevId: 286161763
The inline interface uses two methods to check legality of inling:
1) Can a region be inlined into another.
2) Can an operation be inlined into another.
Setting the former to true, allows the inliner to use the second for
legality checks. Add this method to the SPIR-V dialect inlining
interface.
PiperOrigin-RevId: 286041734
This updates the lowering pipelines from the GPU dialect to lower-level
dialects (NVVM, SPIRV) to use the recently introduced gpu.func operation
instead of a standard function annotated with an attribute. In particular, the
kernel outlining is updated to produce gpu.func instead of std.func and the
individual conversions are updated to consume gpu.funcs and disallow standard
funcs after legalization, if necessary. The attribute "gpu.kernel" is preserved
in the generic syntax, but can also be used with the custom syntax on
gpu.funcs. The special kind of function for GPU allows one to use additional
features such as memory attribution.
PiperOrigin-RevId: 285822272
This PR targest issue tensorflow/mlir#295. It exposes the already existing
subiew promotion pass as a declarative pattern
Change-Id: If901ebef9fb53fcd0b12ecc536f6b174ce320b92
Closestensorflow/mlir#315
COPYBARA_INTEGRATE_REVIEW=https://github.com/tensorflow/mlir/pull/315 from tetuante:issue295 8e5f268b6d85f31015c33505329dbd7a4db97ac5
PiperOrigin-RevId: 285801463
Similar to insert/extract vector instructions but
(1) work on 1-D vectors only
(2) allow for a dynamic index
%c3 = constant 3 : index
%0 = vector.insertelement %arg0, %arg1[%c : index] : vector<4xf32>
%1 = vector.extractelement %arg0[%c3 : index] : vector<4xf32>
PiperOrigin-RevId: 285792205
ExtractSlicesOp extracts slices of its vector operand and with a specified tiling scheme.
This operation centralizes the tiling scheme around a single op, which simplifies vector op unrolling and subsequent pattern rewrite transformations.
PiperOrigin-RevId: 285761129
Both work for the current use case, but the latter allows implementing
prefix sums and is a little easier to understand for partial warps.
PiperOrigin-RevId: 285145287
This CL adds more common information to StructuredOpsUtils.h
The n_view attribute is retired in favor of args_in + args_out but the CL is otherwise NFC.
PiperOrigin-RevId: 285000621
This patch closes issue tensorflow/mlir#272
We add a standalone iterator permutation transformation to Linalg.
This transformation composes a permutation map with the maps in the
"indexing_maps" attribute. It also permutes "iterator_types"
accordingly.
Change-Id: I7c1e693b8203aeecc595a7c012e738ca1100c857
Closestensorflow/mlir#307
COPYBARA_INTEGRATE_REVIEW=https://github.com/tensorflow/mlir/pull/307 from tetuante:issue272 f7908d58792f4111119721885e247045104f1131
PiperOrigin-RevId: 284824102
This reorganizes the vector transformations to be more easily testable as patterns and more easily composable into fused passes in the future.
PiperOrigin-RevId: 284817474
For example
%0 = vector.shuffle %x, %y [3 : i32, 2 : i32, 1 : i32, 0 : i32] : vector<2xf32>, vector<2xf32>
yields a vector<4xf32> result with a permutation of the elements of %x and %y
PiperOrigin-RevId: 284657191
This CL uses the newly expanded matcher support to easily detect when a linalg.generic has a multiply-accumulate body. A linalg.generic with such a body is rewritten as a vector contraction.
This CL additionally limits the rewrite to the case of matrix multiplication on contiguous and statically shaped memrefs for now.
Before expanding further, we should harden the infrastructure for expressing custom ops with the structured ops abstraction.
PiperOrigin-RevId: 284566659
Since these operations lower to [insert|extract][element|value] at LLVM
dialect level, neither element nor value would correctly reflect the meaning.
PiperOrigin-RevId: 284240727
Updates vector ContractionOp to use proper vector masks (produced by CreateMaskOp/ConstantMaskOp).
Leverages the following canonicalizations in unrolling unit test: CreateMaskOp -> ConstantMaskOp, StridedSliceOp(ConstantMaskOp) -> ConstantMaskOp
Removes IndexTupleOp (no longer needed now that we have vector mask ops).
Updates all unit tests.
PiperOrigin-RevId: 284182168
The AddressOf operation in the LLVM dialect return a pointer to a global
variable. The latter may be in a non-default address space as indicated by the
"addr_space" attribute. Check that the address space of the pointer returned by
AddressOfOp matches that of the referenced GlobalOp. Update the AddressOfOp
builder to respect this constraint.
PiperOrigin-RevId: 284138860
This patch closes issue tensorflow/mlir#271.
It adds an optional permutation map to declarative tiling transformations.
The map is expressed as a list of integers.
Closestensorflow/mlir#288
COPYBARA_INTEGRATE_REVIEW=https://github.com/tensorflow/mlir/pull/288 from tetuante:issue271 2df2938d6a1f01b3bc404ded08dea2dd1e10b588
PiperOrigin-RevId: 284064151
A CompositeInsertOp operation make a copy of a composite object,
while modifying one part of it.
Closestensorflow/mlir#292
COPYBARA_INTEGRATE_REVIEW=https://github.com/tensorflow/mlir/pull/292 from denis0x0D:sandbox/composite_insert 2200962b9057bda53cd2f2866b461e2797196380
PiperOrigin-RevId: 284036551
For serialization, when we have nested ops, the inner loop will create multiple
SPIR-V blocks. If the outer loop has block arguments (which corresponds to
OpPhi instructions), we defer the handling of OpPhi's parent block handling
until we serialized all blocks and then fix it up with the result <id>. These two
cases happening together was generating invalid SPIR-V blob because we
previously assume the parent block to be the block containing the terminator.
That is not true anymore when the block contains structured control flow ops.
If that happens, it should be fixed to use the structured control flow op's
merge block.
For deserialization, we record a map from header blocks to their corresponding
merge and continue blocks during the initial deserialization and then use the
info to construct spv.selection/spv.loop. The existing implementation will also
fall apart when we have nested loops. If so, we clone all blocks for the outer
loop, including the ones for the inner loop, to the spv.loop's region. So the map
for header blocks' merge info need to be updated; otherwise we are operating
on already deleted blocks.
PiperOrigin-RevId: 283949230
Adds a ConstantMaskOp to the vector ops dialect.
Adds the following canonicalization patterns:
CreateMaskOp -> ConstantMaskOp
StridedSliceOp(ConstantMaskOp) -> ConstantMaskOp
PiperOrigin-RevId: 283816752
This CL also did the following cleanup:
- Moved the test for spv.SubgroupBallotKHR to its own file
- Wrapped generated canonicalization patterns in anonymous namespace
- Updated header comments in SPVOps.td
PiperOrigin-RevId: 283650091
As described in the documentation, ViewOp is expected to take an optional
dynamic offset followed by a list of dynamic sizes. However, the ViewOp parser
did not include a check for the offset being a single value and accepeted a
list of values instead.
Furthermore, several tests have been exercising the wrong syntax of a ViewOp,
passing multiple values to the dyanmic stride list, which was not caught by the
parser. The trailing values could have been erronously interpreted as dynamic
sizes. This is likely due to resyntaxing of the ViewOp, with the previous
syntax taking the list of sizes before the offset. Update the tests to use the
syntax with the offset preceding the sizes.
Worse, the conversion of ViewOp to the LLVM dialect assumed the wrong order of
operands with offset in the trailing position, and erronously relied on the
permissive parsing that interpreted trailing dynamic offset values as leading
dynamic sizes. Fix the lowering to use the correct order of operands.
PiperOrigin-RevId: 283532506
A recent commit introduced the Linkage attribute to the LLVM dialect and used
it in the Global Op. Also use it in LLVMFuncOp. As per LLVM Language Reference,
if the linkage attribute is omitted, the function is assumed to have external
linkage.
PiperOrigin-RevId: 283493299
LLVM IR supports linkage on global objects such as global variables and
functions. Introduce the Linkage attribute into the LLVM dialect, backed by an
integer storage. Use this attribute on LLVM::GlobalOp and make it mandatory.
Implement parsing/printing of the attribute and conversion to LLVM IR.
See tensorflow/mlir#277.
PiperOrigin-RevId: 283309328