Adds vector ReshapeOp to the VectorOps dialect. An aggregate vector reshape operation, which aggregates multiple hardware vectors, can enable optimizations during decomposition (e.g. loading one input hardware vector and performing multiple rotate and scatter store operations to the vector output).
PiperOrigin-RevId: 286440658
Introduces some centralized methods to move towards
consistent use of i32 as vector subscripts.
Note: sizes/strides/offsets attributes are still i64
PiperOrigin-RevId: 286434133
When memory attributions are present in `gpu.func`, require that they are of
memref type and live in memoryspaces 3 and 5 for workgroup and private memory
attributions, respectively. Adapt the conversion from the GPU dialect to the
NVVM dialect to drop the private memory space from attributions as NVVM is able
to model them as local `llvm.alloca`s in the default memory space.
PiperOrigin-RevId: 286161763
The inline interface uses two methods to check legality of inling:
1) Can a region be inlined into another.
2) Can an operation be inlined into another.
Setting the former to true, allows the inliner to use the second for
legality checks. Add this method to the SPIR-V dialect inlining
interface.
PiperOrigin-RevId: 286041734
This updates the lowering pipelines from the GPU dialect to lower-level
dialects (NVVM, SPIRV) to use the recently introduced gpu.func operation
instead of a standard function annotated with an attribute. In particular, the
kernel outlining is updated to produce gpu.func instead of std.func and the
individual conversions are updated to consume gpu.funcs and disallow standard
funcs after legalization, if necessary. The attribute "gpu.kernel" is preserved
in the generic syntax, but can also be used with the custom syntax on
gpu.funcs. The special kind of function for GPU allows one to use additional
features such as memory attribution.
PiperOrigin-RevId: 285822272
This PR targest issue tensorflow/mlir#295. It exposes the already existing
subiew promotion pass as a declarative pattern
Change-Id: If901ebef9fb53fcd0b12ecc536f6b174ce320b92
Closestensorflow/mlir#315
COPYBARA_INTEGRATE_REVIEW=https://github.com/tensorflow/mlir/pull/315 from tetuante:issue295 8e5f268b6d85f31015c33505329dbd7a4db97ac5
PiperOrigin-RevId: 285801463
Similar to insert/extract vector instructions but
(1) work on 1-D vectors only
(2) allow for a dynamic index
%c3 = constant 3 : index
%0 = vector.insertelement %arg0, %arg1[%c : index] : vector<4xf32>
%1 = vector.extractelement %arg0[%c3 : index] : vector<4xf32>
PiperOrigin-RevId: 285792205
ExtractSlicesOp extracts slices of its vector operand and with a specified tiling scheme.
This operation centralizes the tiling scheme around a single op, which simplifies vector op unrolling and subsequent pattern rewrite transformations.
PiperOrigin-RevId: 285761129
Both work for the current use case, but the latter allows implementing
prefix sums and is a little easier to understand for partial warps.
PiperOrigin-RevId: 285145287
This CL adds more common information to StructuredOpsUtils.h
The n_view attribute is retired in favor of args_in + args_out but the CL is otherwise NFC.
PiperOrigin-RevId: 285000621
This patch closes issue tensorflow/mlir#272
We add a standalone iterator permutation transformation to Linalg.
This transformation composes a permutation map with the maps in the
"indexing_maps" attribute. It also permutes "iterator_types"
accordingly.
Change-Id: I7c1e693b8203aeecc595a7c012e738ca1100c857
Closestensorflow/mlir#307
COPYBARA_INTEGRATE_REVIEW=https://github.com/tensorflow/mlir/pull/307 from tetuante:issue272 f7908d58792f4111119721885e247045104f1131
PiperOrigin-RevId: 284824102
This reorganizes the vector transformations to be more easily testable as patterns and more easily composable into fused passes in the future.
PiperOrigin-RevId: 284817474
For example
%0 = vector.shuffle %x, %y [3 : i32, 2 : i32, 1 : i32, 0 : i32] : vector<2xf32>, vector<2xf32>
yields a vector<4xf32> result with a permutation of the elements of %x and %y
PiperOrigin-RevId: 284657191
This CL uses the newly expanded matcher support to easily detect when a linalg.generic has a multiply-accumulate body. A linalg.generic with such a body is rewritten as a vector contraction.
This CL additionally limits the rewrite to the case of matrix multiplication on contiguous and statically shaped memrefs for now.
Before expanding further, we should harden the infrastructure for expressing custom ops with the structured ops abstraction.
PiperOrigin-RevId: 284566659
Since these operations lower to [insert|extract][element|value] at LLVM
dialect level, neither element nor value would correctly reflect the meaning.
PiperOrigin-RevId: 284240727
Updates vector ContractionOp to use proper vector masks (produced by CreateMaskOp/ConstantMaskOp).
Leverages the following canonicalizations in unrolling unit test: CreateMaskOp -> ConstantMaskOp, StridedSliceOp(ConstantMaskOp) -> ConstantMaskOp
Removes IndexTupleOp (no longer needed now that we have vector mask ops).
Updates all unit tests.
PiperOrigin-RevId: 284182168
The AddressOf operation in the LLVM dialect return a pointer to a global
variable. The latter may be in a non-default address space as indicated by the
"addr_space" attribute. Check that the address space of the pointer returned by
AddressOfOp matches that of the referenced GlobalOp. Update the AddressOfOp
builder to respect this constraint.
PiperOrigin-RevId: 284138860
This patch closes issue tensorflow/mlir#271.
It adds an optional permutation map to declarative tiling transformations.
The map is expressed as a list of integers.
Closestensorflow/mlir#288
COPYBARA_INTEGRATE_REVIEW=https://github.com/tensorflow/mlir/pull/288 from tetuante:issue271 2df2938d6a1f01b3bc404ded08dea2dd1e10b588
PiperOrigin-RevId: 284064151
A CompositeInsertOp operation make a copy of a composite object,
while modifying one part of it.
Closestensorflow/mlir#292
COPYBARA_INTEGRATE_REVIEW=https://github.com/tensorflow/mlir/pull/292 from denis0x0D:sandbox/composite_insert 2200962b9057bda53cd2f2866b461e2797196380
PiperOrigin-RevId: 284036551
For serialization, when we have nested ops, the inner loop will create multiple
SPIR-V blocks. If the outer loop has block arguments (which corresponds to
OpPhi instructions), we defer the handling of OpPhi's parent block handling
until we serialized all blocks and then fix it up with the result <id>. These two
cases happening together was generating invalid SPIR-V blob because we
previously assume the parent block to be the block containing the terminator.
That is not true anymore when the block contains structured control flow ops.
If that happens, it should be fixed to use the structured control flow op's
merge block.
For deserialization, we record a map from header blocks to their corresponding
merge and continue blocks during the initial deserialization and then use the
info to construct spv.selection/spv.loop. The existing implementation will also
fall apart when we have nested loops. If so, we clone all blocks for the outer
loop, including the ones for the inner loop, to the spv.loop's region. So the map
for header blocks' merge info need to be updated; otherwise we are operating
on already deleted blocks.
PiperOrigin-RevId: 283949230
Adds a ConstantMaskOp to the vector ops dialect.
Adds the following canonicalization patterns:
CreateMaskOp -> ConstantMaskOp
StridedSliceOp(ConstantMaskOp) -> ConstantMaskOp
PiperOrigin-RevId: 283816752
This CL also did the following cleanup:
- Moved the test for spv.SubgroupBallotKHR to its own file
- Wrapped generated canonicalization patterns in anonymous namespace
- Updated header comments in SPVOps.td
PiperOrigin-RevId: 283650091
As described in the documentation, ViewOp is expected to take an optional
dynamic offset followed by a list of dynamic sizes. However, the ViewOp parser
did not include a check for the offset being a single value and accepeted a
list of values instead.
Furthermore, several tests have been exercising the wrong syntax of a ViewOp,
passing multiple values to the dyanmic stride list, which was not caught by the
parser. The trailing values could have been erronously interpreted as dynamic
sizes. This is likely due to resyntaxing of the ViewOp, with the previous
syntax taking the list of sizes before the offset. Update the tests to use the
syntax with the offset preceding the sizes.
Worse, the conversion of ViewOp to the LLVM dialect assumed the wrong order of
operands with offset in the trailing position, and erronously relied on the
permissive parsing that interpreted trailing dynamic offset values as leading
dynamic sizes. Fix the lowering to use the correct order of operands.
PiperOrigin-RevId: 283532506
A recent commit introduced the Linkage attribute to the LLVM dialect and used
it in the Global Op. Also use it in LLVMFuncOp. As per LLVM Language Reference,
if the linkage attribute is omitted, the function is assumed to have external
linkage.
PiperOrigin-RevId: 283493299
LLVM IR supports linkage on global objects such as global variables and
functions. Introduce the Linkage attribute into the LLVM dialect, backed by an
integer storage. Use this attribute on LLVM::GlobalOp and make it mandatory.
Implement parsing/printing of the attribute and conversion to LLVM IR.
See tensorflow/mlir#277.
PiperOrigin-RevId: 283309328
Adding zero and multiplying one can be common when generating code
for index calculation.
This CL also sorted canonicalize.mlir to alphabetical order.
PiperOrigin-RevId: 282828055
This CL rewrites the linalg ops to loops transformations as patterns that can be targeted directly from Tablegen. Reliance on OpFolder is removed and to cope with it we introduce local folding patterns that are applied greedily.
PiperOrigin-RevId: 282765550
Since second argument is always fully overwritten and
shape is define in "to" clause, it is not needed.
Also renamed "into" to "to" now that arg is dropped.
PiperOrigin-RevId: 282686475
These changes to SPIR-V lowering while adding support for lowering
SUbViewOp, but are not directly related.
- Change the lowering of MemRefType to
!spv.ptr<!spv.struct<!spv.array<...>[offset]>, ..>
This is consistent with the Vulkan spec.
- To enable testing a simple pattern of lowering functions is added to
ConvertStandardToSPIRVPass. This is just used to convert the type of
the arguments of the function. The added function lowering itself is
not meant to be the way functions are eventually lowered into SPIR-V
dialect.
PiperOrigin-RevId: 282589644
This new op is the counterpart of vector.StridedSliceOp and will be used for in the pattern rewrites for vector unrolling.
PiperOrigin-RevId: 282447414
To simplify the lowering into SPIR-V, while still respecting the ABI
requirements of SPIR-V/Vulkan, split the process into two
1) While lowering a function to SPIR-V (when the function is an entry
point function), allow specifying attributes on arguments and
function itself that describe the ABI of the function.
2) Add a pass that materializes the ABI described in the function.
Two attributes are needed.
1) Attribute on arguments of the entry point function that describe
the descriptor_set, binding, storage class, etc, of the
spv.globalVariable this argument will be replaced by
2) Attribute on function that specifies workgroup size, etc. (for now
only workgroup size).
Add the pass -spirv-lower-abi-attrs to materialize the ABI described
by the attributes.
This change makes the SPIRVBasicTypeConverter class unnecessary and is
removed, further simplifying the SPIR-V lowering path.
PiperOrigin-RevId: 282387587
This is the counterpart of vector.extractelement op and has the same
limitations at the moment (static I64IntegerArrayAttr to express position).
This restriction will be filterd in the future.
LLVM lowering will be added in a subsequent commit.
PiperOrigin-RevId: 282365760
Introduce a new function-like operation to the GPU dialect to provide a
placeholder for the execution semantic description and to add support for GPU
memory hierarchy. This aligns with the overall goal of the dialect to expose
the common abstraction layer for GPU devices, in particular by providing an
MLIR unit of semantics (i.e. an operation) for memory modeling.
This proposal has been discussed in the mailing list:
https://groups.google.com/a/tensorflow.org/d/msg/mlir/RfXNP7Hklsc/MBNN7KhjAgAJ
As decided, the "convergence" aspect of the execution model will be factored
out into a new discussion and therefore is not included in this commit. This
commit only introduces the operation but does not hook it up with the remaining
flow. The intention is to develop the new flow while keeping the old flow
operational and do the switch in a simple, separately reversible commit.
PiperOrigin-RevId: 282357599
Add a canonicalizer for `spirv::LogicalNotOp`.
Converts:
* spv.LogicalNot(spv.IEqual(...)) -> spv.INotEqual(...)
* spv.LogicalNot(spv.INotEqual(...)) -> spv.IEqual(...)
* spv.LogicalNot(spv.LogicalEqual(...)) -> spv.LogicalNotEqual(...)
* spv.LogicalNot(spv.LogicalNotEqual(...)) -> spv.LogicalEqual(...)
Also moved the test for spv.IMul to arithemtic tests.
Closestensorflow/mlir#256
COPYBARA_INTEGRATE_REVIEW=https://github.com/tensorflow/mlir/pull/256 from denis0x0D:sandbox/canon_logical_not 76ab5787b2c777f948c8978db061d99e76453d44
PiperOrigin-RevId: 282012356
Due to legacy reasons, a newline character followed by two spaces was always
inserted before the attributes of the function Op in pretty form. This breaks
formatting when functions are nested in some other operations. Don't print the
newline and just put the attributes on the same line, which is also more
consistent with module Op. Line breaking aware of indentation can be introduced
separately into the parser if deemed useful.
PiperOrigin-RevId: 281721793
The `vector.strided_slice` takes an n-D vector, k-D `offsets` integer array attribute, a
k-D `sizes` integer array attribute, a k-D `strides` integer array attribute and extracts
the n-D subvector at the proper offset.
Returns an n-D vector where the first k-D dimensions match the `sizes` attribute.
The returned subvector contains the elements starting at offset `offsets` and ending at
`offsets + sizes`.
Example:
```
%1 = vector.strided_slice %0
{offsets : [0, 2], sizes : [2, 4], strides : [1, 1]}:
vector<4x8x16xf32> // returns a vector<2x4x16xf32>
```
This op will be useful for progressive lowering within the VectorOp dialect.
PiperOrigin-RevId: 281352749
Iterates each element to build the array. This includes a little refactor to
combine bool/int/float into a function, since they are similar. The only
difference is calling different function in the end.
PiperOrigin-RevId: 281210288
The variant that accepts a type will check that the parsed attribute is a valid instance of AttrType. The non-type variant would silently fail in this case, leading to garbage attribute values.
PiperOrigin-RevId: 281136528
Convert chained `spirv::BitcastOp` operations into
one `spirv::BitcastOp` operation.
Closestensorflow/mlir#238
COPYBARA_INTEGRATE_REVIEW=https://github.com/tensorflow/mlir/pull/238 from denis0x0D:sandbox/canon_bitcast 4352ed4f81b959ec92f849c599e733b62a99c010
PiperOrigin-RevId: 281129234
This CL added op definitions for a few bit operations:
* OpBitFieldInsert
* OpBitFieldSExtract
* OpBitFieldUExtract
Closestensorflow/mlir#233
COPYBARA_INTEGRATE_REVIEW=https://github.com/tensorflow/mlir/pull/233 from denis0x0D:sandbox/bit_field_ops e7fd85b00d72d483d7992dc42b9cc4d673903455
PiperOrigin-RevId: 280691816
This CL moves VectorOps to Tablegen and cleans up the implementation.
This is almost NFC but 2 changes occur:
1. an interface change occurs in the padding value specification in vector_transfer_read:
the value becomes non-optional. As a shortcut we currently use %f0 for all paddings.
This should become an OpInterface for vectorization in the future.
2. the return type of vector.type_cast is trivial and simplified to `memref<vector<...>>`
Relevant roundtrip and invalid tests that used to sit in core are moved to the vector dialect.
The op documentation is moved to the .td file.
PiperOrigin-RevId: 280430869
This CL uses the now standard std.subview in linalg.
Two shortcuts are currently taken to allow this port:
1. the type resulting from a view is currently degraded to fully dynamic to pass the SubViewOp verifier.
2. indexing into SubViewOp may access out of bounds since lowering to LLVM does not currently enforce it by construction.
These will be fixed in subsequent commits after discussions.
PiperOrigin-RevId: 280250129
Since VariableOp is serialized during processBlock, we add two more fields,
`functionHeader` and `functionBody`, to collect instructions for a function.
After all the blocks have been processed, we append them to the `functions`.
Also, fix a bug in processGlobalVariableOp. The global variables should be
encoded into `typesGlobalValues`.
PiperOrigin-RevId: 280105366
During deserialization, the loop header block will be moved into the
spv.loop's region. If the loop header block has block arguments,
we need to make sure it is correctly carried over to the block where
the new spv.loop resides.
During serialization, we need to make sure block arguments from the
spv.loop's entry block are not silently dropped.
PiperOrigin-RevId: 280021777
This CL adds an extra pointer to the memref descriptor to allow specifying alignment.
In a previous implementation, we used 2 types: `linalg.buffer` and `view` where the buffer type was the unit of allocation/deallocation/alignment and `view` was the unit of indexing.
After multiple discussions it was decided to use a single type, which conflates both, so the memref descriptor now needs to carry both pointers.
This is consistent with the [RFC-Proposed Changes to MemRef and Tensor MLIR Types](https://groups.google.com/a/tensorflow.org/forum/#!searchin/mlir/std.view%7Csort:date/mlir/-wKHANzDNTg/4K6nUAp8AAAJ).
PiperOrigin-RevId: 279959463
This code should be exercised using the existing kernel outlining unit test, but
let me know if I should add a dedicated unit test using a fake call instruction
as well.
PiperOrigin-RevId: 279436321
This CL added op definitions for a few bit operations:
* OpShiftLeftLogical
* OpShiftRightArithmetic
* OpShiftRightLogical
* OpBitCount
* OpBitReverse
* OpNot
Also moved the definition of spv.BitwiseAnd to follow the
lexicographical order.
Closestensorflow/mlir#215
COPYBARA_INTEGRATE_REVIEW=https://github.com/tensorflow/mlir/pull/215 from denis0x0D:sandbox/bit_ops d9b0852b689ac6c4879a9740b1740a2357f44d24
PiperOrigin-RevId: 279350470
Now that a view op has graduated to the std dialect, we can update Linalg to use it and remove ops that have become obsolete. As a byproduct, the linalg buffer and associated ops can also disappear.
PiperOrigin-RevId: 279073591
This allows GlobalOp to either take a value attribute (for simple constants) or a region that can
contain IR instructions (that must be constant-foldable) to create a ConstantExpr initializer.
Example:
// A complex initializer is constructed with an initializer region.
llvm.mlir.global constant @int_gep() : !llvm<"i32*"> {
%0 = llvm.mlir.addressof @g2 : !llvm<"i32*">
%1 = llvm.mlir.constant(2 : i32) : !llvm.i32
%2 = llvm.getelementptr %0[%1] : (!llvm<"i32*">, !llvm.i32) -> !llvm<"i32*">
llvm.return %2 : !llvm<"i32*">
}
PiperOrigin-RevId: 278717836
This simplifies the implementation quite a bit, and removes the need for explicit string munging. One change is made to some of the enum elements of SPV_DimAttr to ensure that they are proper identifiers; The string form is now prefixed with 'Dim'.
PiperOrigin-RevId: 278027132
This simplifies the implementation, and removes the need to do explicit string manipulation. A utility method 'parseDimensionList' is added to the DialectAsmParser to simplify defining types and attributes that contain shapes.
PiperOrigin-RevId: 278020604
This greatly simplifies the implementation and removes custom parser functionality. The necessary methods are added to the DialectAsmParser.
PiperOrigin-RevId: 278015983
This CL adds a simple pattern for specifying producer-consumer fusion on Linalg operations.
Implementing such an extension reveals some interesting properties.
Since Linalg operates on a buffer abstraction, the output buffers are specified as in/out parameters to the ops. As a consequence, there are no SSA use-def chains and one cannot specify complex dag input patterns with the current infrastructure.
Instead this CL uses constraints based on the existing linalg dependence analysis to focus the pattern and refine patterns based on the type of op that last wrote in a buffer.
This is a very local property and is less powerful than the generic dag specification based on SSA use-def chains.
This will be generalized in the future.
PiperOrigin-RevId: 277931503
This CL added op definitions for a few cast operations:
* OpConvertFToU
* OpConvertFToS
* OpConvertSToF
* OpConvertUToF
* OpUConvert
* OpSConvert
* OpFConvert
Also moved the definition of spv.Bitcast to the new file.
Closestensorflow/mlir#208 and tensorflow/mlir#174
COPYBARA_INTEGRATE_REVIEW=https://github.com/tensorflow/mlir/pull/208 from denis0x0D:sandbox/cast_ops 79bc9b37398aafddee6cf6beb301807988fe67f9
PiperOrigin-RevId: 277587891
Linalg ops provide a good anchor for pattern matching/rewriting transformations.
This CL adds a simple example of how multi-level tiling may be specified by attaching a simple StringAttr to ops as they are transformed so we can easily specify partial lowering to control transformation application.
This is a first stab at taking advantage of higher-level information contained in Linalg ops and will evolve in the future.
PiperOrigin-RevId: 277497958
This CL fixed gen_spirv_dialect.py to support nested delimiters when
chunking existing ODS entries in .td files and to allow ops without
correspondence in the spec. This is needed to pull in the definition
of OpUnreachable.
PiperOrigin-RevId: 277486465
This CL adds another control flow instruction in SPIR-V: OpPhi.
It is modelled as block arguments to be idiomatic with MLIR.
See the rationale.md doc for "Block Arguments vs PHI nodes".
Serialization and deserialization is updated to convert between
block arguments and SPIR-V OpPhi instructions.
PiperOrigin-RevId: 277161545
Combine chained `spirv::AccessChainOp` operations into one
`spirv::AccessChainOp` operation.
Closestensorflow/mlir#198
COPYBARA_INTEGRATE_REVIEW=https://github.com/tensorflow/mlir/pull/198 from denis0x0D:sandbox/canon_access_chain 0cb87955a85511071143d62637ff939d0dabc2bd
PiperOrigin-RevId: 276609345
This allows for them to be used on other non-function, or even other function-like, operations. The algorithms are already generic, so this is simply changing the derived pass type. The majority of this change is just ensuring that the nesting of these passes remains the same, as the pass manager won't auto-nest them anymore.
PiperOrigin-RevId: 276573038
We will use block arguments as the way to model SPIR-V OpPhi in
the SPIR-V dialect.
This CL also adds a few useful helper methods to both ops to
get the block arguments.
Also added tests for branch weight (de)serialization.
PiperOrigin-RevId: 275960797