This CL adds more common information to StructuredOpsUtils.h
The n_view attribute is retired in favor of args_in + args_out but the CL is otherwise NFC.
PiperOrigin-RevId: 285000621
This patch closes issue tensorflow/mlir#272
We add a standalone iterator permutation transformation to Linalg.
This transformation composes a permutation map with the maps in the
"indexing_maps" attribute. It also permutes "iterator_types"
accordingly.
Change-Id: I7c1e693b8203aeecc595a7c012e738ca1100c857
Closestensorflow/mlir#307
COPYBARA_INTEGRATE_REVIEW=https://github.com/tensorflow/mlir/pull/307 from tetuante:issue272 f7908d58792f4111119721885e247045104f1131
PiperOrigin-RevId: 284824102
This reorganizes the vector transformations to be more easily testable as patterns and more easily composable into fused passes in the future.
PiperOrigin-RevId: 284817474
For example
%0 = vector.shuffle %x, %y [3 : i32, 2 : i32, 1 : i32, 0 : i32] : vector<2xf32>, vector<2xf32>
yields a vector<4xf32> result with a permutation of the elements of %x and %y
PiperOrigin-RevId: 284657191
This CL uses the newly expanded matcher support to easily detect when a linalg.generic has a multiply-accumulate body. A linalg.generic with such a body is rewritten as a vector contraction.
This CL additionally limits the rewrite to the case of matrix multiplication on contiguous and statically shaped memrefs for now.
Before expanding further, we should harden the infrastructure for expressing custom ops with the structured ops abstraction.
PiperOrigin-RevId: 284566659
Since these operations lower to [insert|extract][element|value] at LLVM
dialect level, neither element nor value would correctly reflect the meaning.
PiperOrigin-RevId: 284240727
Updates vector ContractionOp to use proper vector masks (produced by CreateMaskOp/ConstantMaskOp).
Leverages the following canonicalizations in unrolling unit test: CreateMaskOp -> ConstantMaskOp, StridedSliceOp(ConstantMaskOp) -> ConstantMaskOp
Removes IndexTupleOp (no longer needed now that we have vector mask ops).
Updates all unit tests.
PiperOrigin-RevId: 284182168
The AddressOf operation in the LLVM dialect return a pointer to a global
variable. The latter may be in a non-default address space as indicated by the
"addr_space" attribute. Check that the address space of the pointer returned by
AddressOfOp matches that of the referenced GlobalOp. Update the AddressOfOp
builder to respect this constraint.
PiperOrigin-RevId: 284138860
This patch closes issue tensorflow/mlir#271.
It adds an optional permutation map to declarative tiling transformations.
The map is expressed as a list of integers.
Closestensorflow/mlir#288
COPYBARA_INTEGRATE_REVIEW=https://github.com/tensorflow/mlir/pull/288 from tetuante:issue271 2df2938d6a1f01b3bc404ded08dea2dd1e10b588
PiperOrigin-RevId: 284064151
A CompositeInsertOp operation make a copy of a composite object,
while modifying one part of it.
Closestensorflow/mlir#292
COPYBARA_INTEGRATE_REVIEW=https://github.com/tensorflow/mlir/pull/292 from denis0x0D:sandbox/composite_insert 2200962b9057bda53cd2f2866b461e2797196380
PiperOrigin-RevId: 284036551
For serialization, when we have nested ops, the inner loop will create multiple
SPIR-V blocks. If the outer loop has block arguments (which corresponds to
OpPhi instructions), we defer the handling of OpPhi's parent block handling
until we serialized all blocks and then fix it up with the result <id>. These two
cases happening together was generating invalid SPIR-V blob because we
previously assume the parent block to be the block containing the terminator.
That is not true anymore when the block contains structured control flow ops.
If that happens, it should be fixed to use the structured control flow op's
merge block.
For deserialization, we record a map from header blocks to their corresponding
merge and continue blocks during the initial deserialization and then use the
info to construct spv.selection/spv.loop. The existing implementation will also
fall apart when we have nested loops. If so, we clone all blocks for the outer
loop, including the ones for the inner loop, to the spv.loop's region. So the map
for header blocks' merge info need to be updated; otherwise we are operating
on already deleted blocks.
PiperOrigin-RevId: 283949230
Adds a ConstantMaskOp to the vector ops dialect.
Adds the following canonicalization patterns:
CreateMaskOp -> ConstantMaskOp
StridedSliceOp(ConstantMaskOp) -> ConstantMaskOp
PiperOrigin-RevId: 283816752
This CL also did the following cleanup:
- Moved the test for spv.SubgroupBallotKHR to its own file
- Wrapped generated canonicalization patterns in anonymous namespace
- Updated header comments in SPVOps.td
PiperOrigin-RevId: 283650091
As described in the documentation, ViewOp is expected to take an optional
dynamic offset followed by a list of dynamic sizes. However, the ViewOp parser
did not include a check for the offset being a single value and accepeted a
list of values instead.
Furthermore, several tests have been exercising the wrong syntax of a ViewOp,
passing multiple values to the dyanmic stride list, which was not caught by the
parser. The trailing values could have been erronously interpreted as dynamic
sizes. This is likely due to resyntaxing of the ViewOp, with the previous
syntax taking the list of sizes before the offset. Update the tests to use the
syntax with the offset preceding the sizes.
Worse, the conversion of ViewOp to the LLVM dialect assumed the wrong order of
operands with offset in the trailing position, and erronously relied on the
permissive parsing that interpreted trailing dynamic offset values as leading
dynamic sizes. Fix the lowering to use the correct order of operands.
PiperOrigin-RevId: 283532506
A recent commit introduced the Linkage attribute to the LLVM dialect and used
it in the Global Op. Also use it in LLVMFuncOp. As per LLVM Language Reference,
if the linkage attribute is omitted, the function is assumed to have external
linkage.
PiperOrigin-RevId: 283493299
LLVM IR supports linkage on global objects such as global variables and
functions. Introduce the Linkage attribute into the LLVM dialect, backed by an
integer storage. Use this attribute on LLVM::GlobalOp and make it mandatory.
Implement parsing/printing of the attribute and conversion to LLVM IR.
See tensorflow/mlir#277.
PiperOrigin-RevId: 283309328
Adding zero and multiplying one can be common when generating code
for index calculation.
This CL also sorted canonicalize.mlir to alphabetical order.
PiperOrigin-RevId: 282828055
This CL rewrites the linalg ops to loops transformations as patterns that can be targeted directly from Tablegen. Reliance on OpFolder is removed and to cope with it we introduce local folding patterns that are applied greedily.
PiperOrigin-RevId: 282765550
Since second argument is always fully overwritten and
shape is define in "to" clause, it is not needed.
Also renamed "into" to "to" now that arg is dropped.
PiperOrigin-RevId: 282686475
These changes to SPIR-V lowering while adding support for lowering
SUbViewOp, but are not directly related.
- Change the lowering of MemRefType to
!spv.ptr<!spv.struct<!spv.array<...>[offset]>, ..>
This is consistent with the Vulkan spec.
- To enable testing a simple pattern of lowering functions is added to
ConvertStandardToSPIRVPass. This is just used to convert the type of
the arguments of the function. The added function lowering itself is
not meant to be the way functions are eventually lowered into SPIR-V
dialect.
PiperOrigin-RevId: 282589644
This new op is the counterpart of vector.StridedSliceOp and will be used for in the pattern rewrites for vector unrolling.
PiperOrigin-RevId: 282447414
To simplify the lowering into SPIR-V, while still respecting the ABI
requirements of SPIR-V/Vulkan, split the process into two
1) While lowering a function to SPIR-V (when the function is an entry
point function), allow specifying attributes on arguments and
function itself that describe the ABI of the function.
2) Add a pass that materializes the ABI described in the function.
Two attributes are needed.
1) Attribute on arguments of the entry point function that describe
the descriptor_set, binding, storage class, etc, of the
spv.globalVariable this argument will be replaced by
2) Attribute on function that specifies workgroup size, etc. (for now
only workgroup size).
Add the pass -spirv-lower-abi-attrs to materialize the ABI described
by the attributes.
This change makes the SPIRVBasicTypeConverter class unnecessary and is
removed, further simplifying the SPIR-V lowering path.
PiperOrigin-RevId: 282387587
This is the counterpart of vector.extractelement op and has the same
limitations at the moment (static I64IntegerArrayAttr to express position).
This restriction will be filterd in the future.
LLVM lowering will be added in a subsequent commit.
PiperOrigin-RevId: 282365760
Introduce a new function-like operation to the GPU dialect to provide a
placeholder for the execution semantic description and to add support for GPU
memory hierarchy. This aligns with the overall goal of the dialect to expose
the common abstraction layer for GPU devices, in particular by providing an
MLIR unit of semantics (i.e. an operation) for memory modeling.
This proposal has been discussed in the mailing list:
https://groups.google.com/a/tensorflow.org/d/msg/mlir/RfXNP7Hklsc/MBNN7KhjAgAJ
As decided, the "convergence" aspect of the execution model will be factored
out into a new discussion and therefore is not included in this commit. This
commit only introduces the operation but does not hook it up with the remaining
flow. The intention is to develop the new flow while keeping the old flow
operational and do the switch in a simple, separately reversible commit.
PiperOrigin-RevId: 282357599
Add a canonicalizer for `spirv::LogicalNotOp`.
Converts:
* spv.LogicalNot(spv.IEqual(...)) -> spv.INotEqual(...)
* spv.LogicalNot(spv.INotEqual(...)) -> spv.IEqual(...)
* spv.LogicalNot(spv.LogicalEqual(...)) -> spv.LogicalNotEqual(...)
* spv.LogicalNot(spv.LogicalNotEqual(...)) -> spv.LogicalEqual(...)
Also moved the test for spv.IMul to arithemtic tests.
Closestensorflow/mlir#256
COPYBARA_INTEGRATE_REVIEW=https://github.com/tensorflow/mlir/pull/256 from denis0x0D:sandbox/canon_logical_not 76ab5787b2c777f948c8978db061d99e76453d44
PiperOrigin-RevId: 282012356
Due to legacy reasons, a newline character followed by two spaces was always
inserted before the attributes of the function Op in pretty form. This breaks
formatting when functions are nested in some other operations. Don't print the
newline and just put the attributes on the same line, which is also more
consistent with module Op. Line breaking aware of indentation can be introduced
separately into the parser if deemed useful.
PiperOrigin-RevId: 281721793
The `vector.strided_slice` takes an n-D vector, k-D `offsets` integer array attribute, a
k-D `sizes` integer array attribute, a k-D `strides` integer array attribute and extracts
the n-D subvector at the proper offset.
Returns an n-D vector where the first k-D dimensions match the `sizes` attribute.
The returned subvector contains the elements starting at offset `offsets` and ending at
`offsets + sizes`.
Example:
```
%1 = vector.strided_slice %0
{offsets : [0, 2], sizes : [2, 4], strides : [1, 1]}:
vector<4x8x16xf32> // returns a vector<2x4x16xf32>
```
This op will be useful for progressive lowering within the VectorOp dialect.
PiperOrigin-RevId: 281352749
Iterates each element to build the array. This includes a little refactor to
combine bool/int/float into a function, since they are similar. The only
difference is calling different function in the end.
PiperOrigin-RevId: 281210288
The variant that accepts a type will check that the parsed attribute is a valid instance of AttrType. The non-type variant would silently fail in this case, leading to garbage attribute values.
PiperOrigin-RevId: 281136528
Convert chained `spirv::BitcastOp` operations into
one `spirv::BitcastOp` operation.
Closestensorflow/mlir#238
COPYBARA_INTEGRATE_REVIEW=https://github.com/tensorflow/mlir/pull/238 from denis0x0D:sandbox/canon_bitcast 4352ed4f81b959ec92f849c599e733b62a99c010
PiperOrigin-RevId: 281129234
This CL added op definitions for a few bit operations:
* OpBitFieldInsert
* OpBitFieldSExtract
* OpBitFieldUExtract
Closestensorflow/mlir#233
COPYBARA_INTEGRATE_REVIEW=https://github.com/tensorflow/mlir/pull/233 from denis0x0D:sandbox/bit_field_ops e7fd85b00d72d483d7992dc42b9cc4d673903455
PiperOrigin-RevId: 280691816
This CL moves VectorOps to Tablegen and cleans up the implementation.
This is almost NFC but 2 changes occur:
1. an interface change occurs in the padding value specification in vector_transfer_read:
the value becomes non-optional. As a shortcut we currently use %f0 for all paddings.
This should become an OpInterface for vectorization in the future.
2. the return type of vector.type_cast is trivial and simplified to `memref<vector<...>>`
Relevant roundtrip and invalid tests that used to sit in core are moved to the vector dialect.
The op documentation is moved to the .td file.
PiperOrigin-RevId: 280430869
This CL uses the now standard std.subview in linalg.
Two shortcuts are currently taken to allow this port:
1. the type resulting from a view is currently degraded to fully dynamic to pass the SubViewOp verifier.
2. indexing into SubViewOp may access out of bounds since lowering to LLVM does not currently enforce it by construction.
These will be fixed in subsequent commits after discussions.
PiperOrigin-RevId: 280250129
Since VariableOp is serialized during processBlock, we add two more fields,
`functionHeader` and `functionBody`, to collect instructions for a function.
After all the blocks have been processed, we append them to the `functions`.
Also, fix a bug in processGlobalVariableOp. The global variables should be
encoded into `typesGlobalValues`.
PiperOrigin-RevId: 280105366
During deserialization, the loop header block will be moved into the
spv.loop's region. If the loop header block has block arguments,
we need to make sure it is correctly carried over to the block where
the new spv.loop resides.
During serialization, we need to make sure block arguments from the
spv.loop's entry block are not silently dropped.
PiperOrigin-RevId: 280021777
This CL adds an extra pointer to the memref descriptor to allow specifying alignment.
In a previous implementation, we used 2 types: `linalg.buffer` and `view` where the buffer type was the unit of allocation/deallocation/alignment and `view` was the unit of indexing.
After multiple discussions it was decided to use a single type, which conflates both, so the memref descriptor now needs to carry both pointers.
This is consistent with the [RFC-Proposed Changes to MemRef and Tensor MLIR Types](https://groups.google.com/a/tensorflow.org/forum/#!searchin/mlir/std.view%7Csort:date/mlir/-wKHANzDNTg/4K6nUAp8AAAJ).
PiperOrigin-RevId: 279959463
This code should be exercised using the existing kernel outlining unit test, but
let me know if I should add a dedicated unit test using a fake call instruction
as well.
PiperOrigin-RevId: 279436321
This CL added op definitions for a few bit operations:
* OpShiftLeftLogical
* OpShiftRightArithmetic
* OpShiftRightLogical
* OpBitCount
* OpBitReverse
* OpNot
Also moved the definition of spv.BitwiseAnd to follow the
lexicographical order.
Closestensorflow/mlir#215
COPYBARA_INTEGRATE_REVIEW=https://github.com/tensorflow/mlir/pull/215 from denis0x0D:sandbox/bit_ops d9b0852b689ac6c4879a9740b1740a2357f44d24
PiperOrigin-RevId: 279350470
Now that a view op has graduated to the std dialect, we can update Linalg to use it and remove ops that have become obsolete. As a byproduct, the linalg buffer and associated ops can also disappear.
PiperOrigin-RevId: 279073591
This allows GlobalOp to either take a value attribute (for simple constants) or a region that can
contain IR instructions (that must be constant-foldable) to create a ConstantExpr initializer.
Example:
// A complex initializer is constructed with an initializer region.
llvm.mlir.global constant @int_gep() : !llvm<"i32*"> {
%0 = llvm.mlir.addressof @g2 : !llvm<"i32*">
%1 = llvm.mlir.constant(2 : i32) : !llvm.i32
%2 = llvm.getelementptr %0[%1] : (!llvm<"i32*">, !llvm.i32) -> !llvm<"i32*">
llvm.return %2 : !llvm<"i32*">
}
PiperOrigin-RevId: 278717836
This simplifies the implementation quite a bit, and removes the need for explicit string munging. One change is made to some of the enum elements of SPV_DimAttr to ensure that they are proper identifiers; The string form is now prefixed with 'Dim'.
PiperOrigin-RevId: 278027132
This simplifies the implementation, and removes the need to do explicit string manipulation. A utility method 'parseDimensionList' is added to the DialectAsmParser to simplify defining types and attributes that contain shapes.
PiperOrigin-RevId: 278020604
This greatly simplifies the implementation and removes custom parser functionality. The necessary methods are added to the DialectAsmParser.
PiperOrigin-RevId: 278015983
This CL adds a simple pattern for specifying producer-consumer fusion on Linalg operations.
Implementing such an extension reveals some interesting properties.
Since Linalg operates on a buffer abstraction, the output buffers are specified as in/out parameters to the ops. As a consequence, there are no SSA use-def chains and one cannot specify complex dag input patterns with the current infrastructure.
Instead this CL uses constraints based on the existing linalg dependence analysis to focus the pattern and refine patterns based on the type of op that last wrote in a buffer.
This is a very local property and is less powerful than the generic dag specification based on SSA use-def chains.
This will be generalized in the future.
PiperOrigin-RevId: 277931503
This CL added op definitions for a few cast operations:
* OpConvertFToU
* OpConvertFToS
* OpConvertSToF
* OpConvertUToF
* OpUConvert
* OpSConvert
* OpFConvert
Also moved the definition of spv.Bitcast to the new file.
Closestensorflow/mlir#208 and tensorflow/mlir#174
COPYBARA_INTEGRATE_REVIEW=https://github.com/tensorflow/mlir/pull/208 from denis0x0D:sandbox/cast_ops 79bc9b37398aafddee6cf6beb301807988fe67f9
PiperOrigin-RevId: 277587891
Linalg ops provide a good anchor for pattern matching/rewriting transformations.
This CL adds a simple example of how multi-level tiling may be specified by attaching a simple StringAttr to ops as they are transformed so we can easily specify partial lowering to control transformation application.
This is a first stab at taking advantage of higher-level information contained in Linalg ops and will evolve in the future.
PiperOrigin-RevId: 277497958
This CL fixed gen_spirv_dialect.py to support nested delimiters when
chunking existing ODS entries in .td files and to allow ops without
correspondence in the spec. This is needed to pull in the definition
of OpUnreachable.
PiperOrigin-RevId: 277486465
This CL adds another control flow instruction in SPIR-V: OpPhi.
It is modelled as block arguments to be idiomatic with MLIR.
See the rationale.md doc for "Block Arguments vs PHI nodes".
Serialization and deserialization is updated to convert between
block arguments and SPIR-V OpPhi instructions.
PiperOrigin-RevId: 277161545
Combine chained `spirv::AccessChainOp` operations into one
`spirv::AccessChainOp` operation.
Closestensorflow/mlir#198
COPYBARA_INTEGRATE_REVIEW=https://github.com/tensorflow/mlir/pull/198 from denis0x0D:sandbox/canon_access_chain 0cb87955a85511071143d62637ff939d0dabc2bd
PiperOrigin-RevId: 276609345
This allows for them to be used on other non-function, or even other function-like, operations. The algorithms are already generic, so this is simply changing the derived pass type. The majority of this change is just ensuring that the nesting of these passes remains the same, as the pass manager won't auto-nest them anymore.
PiperOrigin-RevId: 276573038
We will use block arguments as the way to model SPIR-V OpPhi in
the SPIR-V dialect.
This CL also adds a few useful helper methods to both ops to
get the block arguments.
Also added tests for branch weight (de)serialization.
PiperOrigin-RevId: 275960797
nvvm.shfl.sync.bfly optionally returns a predicate whether source lane was active. Support for this was added to clang in https://reviews.llvm.org/D68892.
Add an optional 'pred' unit attribute to the instruction to return this predicate. Specify this attribute in the partial warp reduction so we don't need to manually compute the predicate.
PiperOrigin-RevId: 275616564
This allows mixing linalg operations with vector transfer operations (with additional modifications to affine ops) and is a step towards solving tensorflow/mlir#189.
PiperOrigin-RevId: 275543361
This CL creates a new Linalg promotion pass that operates on SubViewOp and decouples it from Linalg tiling. This is mostly moving code around.
PiperOrigin-RevId: 275329213
Add a canonicalization pattern for spv.selection operation.
Convert spv.selection operation to spv.Select based on
simple pattern.
Closestensorflow/mlir#183
COPYBARA_INTEGRATE_REVIEW=https://github.com/tensorflow/mlir/pull/183 from denis0x0D:sandbox/canon_select 43d04d923272dd60b9da39f70bdbc51a5168db62
PiperOrigin-RevId: 275312748
'_' is used frequently enough as the separator of words in symbols.
We should allow it in dialect symbols when considering pretty printing.
Also updated LangRef.md regarding pretty form.
PiperOrigin-RevId: 275312494
We just need to implement a few interface hooks to DialectInlinerInterface
and CallOpInterface to gain the benefits of an inliner. :)
Right now only supports some trivial cases:
* Inlining single block with spv.Return/spv.ReturnValue
* Inlining multi block with spv.Return
* Inlining spv.selection/spv.loop without return ops
More advanced cases will require block argument and Phi support.
PiperOrigin-RevId: 275151132
In addition to specifying the type of accumulation through the 'op' attribute, the accumulation can now also be specified as arbitrary code region.
Adds a gpu.yield op to specify the result of the accumulation.
Also support more types (integers) and accumulations (mul).
PiperOrigin-RevId: 275065447
The SpecId decoration is the handle for providing external specialization.
Similar to descriptor set and binding on global variables, we directly
bake it into assembly parsing and printing.
PiperOrigin-RevId: 274893879
When the implementation of the strided memref [RFC](https://groups.google.com/a/tensorflow.org/forum/#!msg/mlir/MaL8m2nXuio/1scRqZa6AQAJ) landed, linalg started using this type instead of the now retired !linalg.view.
As static and partially static cases appear, the stride information needs to be maintained properly. In particular, the result type of the subview op was generally incorrect.
This CL fixes the issue by computing a return type that:
1. always has dynamic sizes, which is generally the only correct way to construct a subview in the absence of data padding and/or code versioning.
2. has the same strides as the base strided memref.
Point 1. above can be further refined but will needs further analysis and canonicalization to optimize the particular case where:
1. The base memref has static size along a given dimension.
2. The subview size can be statically derived (e.g. after canonicalization).
3. *And* the subview size is an even divisor of the base memref.
This 3rd constraint is well-known in the case of tiled layouts that don't assume implicit padding: the boundary tile may be only partial and has size given by `problem_size % tile_size`.
Tests are updated as appropriate.
PiperOrigin-RevId: 274578624
This fixes an omission that prevents Linalg to lower generic ops regions operating on ops in the VectorOps dialect.
To achieve this we simply need to `populateVectorToLLVMConversionPatterns` in the conversion.
Relevant tests are added.
PiperOrigin-RevId: 274577325
Similarly to `llvm.mlir.undef`, this auxiliary operation creates an SSA value
that corresponds to `null` in LLVM IR. This operation is necessary to model
sizeof(<...>) behavior when allocating memory.
PiperOrigin-RevId: 274158760
The lowering is specified as a pattern and is done only if the result
is a SPIR-V scalar type or vector type.
Handling ConstantOp with index return type needs special handling
since SPIR-V dialect does not have index types. Based on the bitwidth
of the attribute value, either i32 or i64 is chosen.
Other constant lowerings are left as a TODO.
PiperOrigin-RevId: 274056805