nvvm.shfl.sync.bfly optionally returns a predicate whether source lane was active. Support for this was added to clang in https://reviews.llvm.org/D68892.
Add an optional 'pred' unit attribute to the instruction to return this predicate. Specify this attribute in the partial warp reduction so we don't need to manually compute the predicate.
PiperOrigin-RevId: 275616564
This allows mixing linalg operations with vector transfer operations (with additional modifications to affine ops) and is a step towards solving tensorflow/mlir#189.
PiperOrigin-RevId: 275543361
This CL creates a new Linalg promotion pass that operates on SubViewOp and decouples it from Linalg tiling. This is mostly moving code around.
PiperOrigin-RevId: 275329213
Add a canonicalization pattern for spv.selection operation.
Convert spv.selection operation to spv.Select based on
simple pattern.
Closestensorflow/mlir#183
COPYBARA_INTEGRATE_REVIEW=https://github.com/tensorflow/mlir/pull/183 from denis0x0D:sandbox/canon_select 43d04d923272dd60b9da39f70bdbc51a5168db62
PiperOrigin-RevId: 275312748
'_' is used frequently enough as the separator of words in symbols.
We should allow it in dialect symbols when considering pretty printing.
Also updated LangRef.md regarding pretty form.
PiperOrigin-RevId: 275312494
We just need to implement a few interface hooks to DialectInlinerInterface
and CallOpInterface to gain the benefits of an inliner. :)
Right now only supports some trivial cases:
* Inlining single block with spv.Return/spv.ReturnValue
* Inlining multi block with spv.Return
* Inlining spv.selection/spv.loop without return ops
More advanced cases will require block argument and Phi support.
PiperOrigin-RevId: 275151132
In addition to specifying the type of accumulation through the 'op' attribute, the accumulation can now also be specified as arbitrary code region.
Adds a gpu.yield op to specify the result of the accumulation.
Also support more types (integers) and accumulations (mul).
PiperOrigin-RevId: 275065447
The SpecId decoration is the handle for providing external specialization.
Similar to descriptor set and binding on global variables, we directly
bake it into assembly parsing and printing.
PiperOrigin-RevId: 274893879
When the implementation of the strided memref [RFC](https://groups.google.com/a/tensorflow.org/forum/#!msg/mlir/MaL8m2nXuio/1scRqZa6AQAJ) landed, linalg started using this type instead of the now retired !linalg.view.
As static and partially static cases appear, the stride information needs to be maintained properly. In particular, the result type of the subview op was generally incorrect.
This CL fixes the issue by computing a return type that:
1. always has dynamic sizes, which is generally the only correct way to construct a subview in the absence of data padding and/or code versioning.
2. has the same strides as the base strided memref.
Point 1. above can be further refined but will needs further analysis and canonicalization to optimize the particular case where:
1. The base memref has static size along a given dimension.
2. The subview size can be statically derived (e.g. after canonicalization).
3. *And* the subview size is an even divisor of the base memref.
This 3rd constraint is well-known in the case of tiled layouts that don't assume implicit padding: the boundary tile may be only partial and has size given by `problem_size % tile_size`.
Tests are updated as appropriate.
PiperOrigin-RevId: 274578624
This fixes an omission that prevents Linalg to lower generic ops regions operating on ops in the VectorOps dialect.
To achieve this we simply need to `populateVectorToLLVMConversionPatterns` in the conversion.
Relevant tests are added.
PiperOrigin-RevId: 274577325
Similarly to `llvm.mlir.undef`, this auxiliary operation creates an SSA value
that corresponds to `null` in LLVM IR. This operation is necessary to model
sizeof(<...>) behavior when allocating memory.
PiperOrigin-RevId: 274158760
The lowering is specified as a pattern and is done only if the result
is a SPIR-V scalar type or vector type.
Handling ConstantOp with index return type needs special handling
since SPIR-V dialect does not have index types. Based on the bitwidth
of the attribute value, either i32 or i64 is chosen.
Other constant lowerings are left as a TODO.
PiperOrigin-RevId: 274056805
Add a pass to decorate the composite types used by
composite objects in the StorageBuffer, PhysicalStorageBuffer,
Uniform, and PushConstant storage classes with layout information.
Closestensorflow/mlir#156
COPYBARA_INTEGRATE_REVIEW=https://github.com/tensorflow/mlir/pull/156 from denis0x0D:sandbox/layout_info_decoration 7c50840fd38ca169a2da7ce9886b52b50c868b84
PiperOrigin-RevId: 273634140
The kernel function called by gpu.launch_func is now placed into an isolated
nested module during the outlining stage to simplify separate compilation.
Until recently, modules did not have names and could not be referenced. This
limitation was circumvented by introducing a stub kernel at the same name at
the same nesting level as the module containing the actual kernel. This
relation is only effective in one direction: from actual kernel function to its
launch_func "caller".
Leverage the recently introduced symbol name attributes on modules to refer to
a specific nested module from `gpu.launch_func`. This removes the implicit
connection between the identically named stub and kernel functions. It also
enables support for `gpu.launch_func`s to call different kernels located in the
same module.
PiperOrigin-RevId: 273491891
The SPIR-V spec recommends all OpUndef instructions be generated at
module level. For the SPIR-V dialect its better for UndefOp to produce
an SSA value for use with other instructions. If UndefOp is to be used
at module level, it cannot produce an SSA value (use of this SSA value
within FuncOp would need implicit capture). To satisfy needs of the
SPIR-V spec while making it simpler to represent UndefOp in the SPIR-V
dialect, the serialization is updated to create OpUndef instruction
at module scope.
PiperOrigin-RevId: 273355526
The structured selection/loop's entry block does not have arguments.
If the function's header block is also part of the structured control
flow, we cannot just simply erase it because it may contain arguments
matching the function signature and used by the cloned blocks. Instead,
turn it into a block only containing a spv.Branch op.
Also, we can directly emit instructions for the spv.selection header
block to the block containing the spv.selection op. This eliminates
unnecessary branches in the SPIR-V blob.
Added a test for nested spv.loop.
PiperOrigin-RevId: 273351424
Now that linalg.view and strided memrefs are unified, there is no reason to
disallow AllocOp in alias analysis. This CLs adds support for AllocOp which allows writing shorter tests that do not require explicitly creating a view for
each operation.
PiperOrigin-RevId: 273303060
Use `getParentOfType<FunctionOp>()` instead of `cast<FuncOp>(getParentOp())`
to avoid crash when return ops are used inside spv.selection/spv.loop.
PiperOrigin-RevId: 273006041
Adding support for OpUndef instruction. Updating the dialect
generation script to fix a few bugs in the instruction spec
generation.
PiperOrigin-RevId: 272975685
Certain lowering patterns were reported as [missing](https://groups.google.com/a/tensorflow.org/forum/#!topic/mlir/dkdmHa77sSQ).
This CL adds them and allows Linalg/roundtrip.mlir and Linalg/loops.mlir to lower to LLVM directly. Those 2 tests are updated to additionally check that the direct lowering to LLVM does not crash.
The following points, left as TODOs still need to be addressed for correct end-to-end execution:
1. the lowering for ConvOp needs to pass attributes such as strides and dilations; the external library call needs to support it.
2. the lowering for GenericOp needs to support lowering to loops as a DialectConversion pattern. This is blocked on the DialectConversion infrastructure accepting an OperationFolder.
PiperOrigin-RevId: 272878131
The first dim length of the axisStats attribute should equals to the slice size
of the input argument when splitted by the axis dimension.
PiperOrigin-RevId: 272798042
This CL implements the last remaining bit of the [strided memref proposal](https://groups.google.com/a/tensorflow.org/forum/#!topic/mlir/MaL8m2nXuio).
The syntax is a bit more explicit than what was originally proposed and resembles:
`memref<?x?xf32, offset: 0 strides: [?, 1]>`
Nonnegative strides and offsets are currently supported. Future extensions will include negative strides.
This also gives a concrete example of syntactic sugar for the ([RFC] Proposed Changes to MemRef and Tensor MLIR Types)[https://groups.google.com/a/tensorflow.org/forum/#!topic/mlir/-wKHANzDNTg].
The underlying implementation still uses AffineMap layout.
PiperOrigin-RevId: 272717437
Similar to spv.loop, spv.selection is another op for modelling
SPIR-V structured control flow. It covers both OpBranchConditional
and OpSwitch with OpSelectionMerge.
Instead of having a `spv.SelectionMerge` op to directly model
selection merge instruction for indicating the merge target,
we use regions to delimit the boundary of the selection: the
merge target is the next op following the `spv.selection` op.
This way it's easier to discover all blocks belonging to
the selection and it plays nicer with the MLIR system.
PiperOrigin-RevId: 272475006
This CL finishes the implementation of the Linalg + Affine type unification of the [strided memref RFC](https://groups.google.com/a/tensorflow.org/forum/#!topic/mlir/MaL8m2nXuio).
As a consequence, the !linalg.view type, linalg::DimOp, linalg::LoadOp and linalg::StoreOp can now disappear and Linalg can use standard types everywhere.
PiperOrigin-RevId: 272187165
According to the SPIR-V spec:
"Length is the number of elements in the array. It must be at least 1."
Closestensorflow/mlir#160
COPYBARA_INTEGRATE_REVIEW=https://github.com/tensorflow/mlir/pull/160 from denis0x0D:sandbox/array_len 0840dc0986ad0088a3aa7d5d8d3e97d489377ed9
PiperOrigin-RevId: 272094669
Add operations corresponding to OpLogicalAnd, OpLogicalNot,
OpLogicalEqual, OpLogicalNotEqual and OpLogicalOr instructions in
SPIR-V dialect. This needs changes to class hierarchy in SPIR-V
TableGen files to split SPIRVLogicalOp into SPIRVLogicalUnaryOp and
SPIRVLogicalBinaryOp. All derived classes of SPIRVLogicalOp are
updated accordingly.
Update the spirv dialect generation script to
1) Allow specifying base class to use for instruction spec generation
and file name to generate the specification in separately.
2) Use the existing descriptions for operations.
3) Update define_inst.sh to also invoke define_opcode.sh to also
define the corresponding SPIR-V instruction opcode enum.
PiperOrigin-RevId: 272014876
The strided MemRef RFC discusses a normalized descriptor and interaction with library calls (https://groups.google.com/a/tensorflow.org/forum/#!topic/mlir/MaL8m2nXuio).
Lowering of nested LLVM structs as value types does not play nicely with externally compiled C/C++ functions due to ABI issues.
Solving the ABI problem generally is a very complex problem and most likely involves taking
a dependence on clang that we do not want atm.
A simple workaround is to pass pointers to memref descriptors at function boundaries, which this CL implement.
PiperOrigin-RevId: 271591708
This commit introduces the ROCDL Dialect (i.e. the ROCDL ops + the code to lower those ROCDL ops to LLWM intrinsics/functions). Think of ROCDL Dialect as analogous to the NVVM Dialect, but for AMD GPUs. This patch contains just the essentials needed to get a simple example up and running. We expect to make further additions to the ROCDL Dialect.
This is the first of 3 commits, the follow-up will be:
* add a pass that lowers GPU Dialect to ROCDL Dialect
* add a "mlir-rocm-runner" utility
Closestensorflow/mlir#146
COPYBARA_INTEGRATE_REVIEW=https://github.com/tensorflow/mlir/pull/146 from deven-amd:deven-rocdl-dialect e78e8005c75a78912631116c78dc844fcc4b0de9
PiperOrigin-RevId: 271511259
This CL modifies the linalg-fusion pass such that it does not tile anymore as part of the pass. Tiling is a separate concern that enables linalg fusion but should happen before.
This makes fusion more composable with other decisions.
In particular the fusion pass now becomes greedy and only applies the transformation on a best-effort basis.
This should also let fusion work in a multi-hop fashion with chains of producer/consumers.
Since the fusion pass does not perform tiling anymore, tests are rewritten to be in pretiled form and make the intent of the test clearer (albeit more verbose).
PiperOrigin-RevId: 271357741
The reduction operation is currently fixed to "add", and the scope is fixed to "workgroup".
The implementation is currently limited to sizes that are multiple 32 (warp size) and no larger than 1024.
PiperOrigin-RevId: 271290265
Support the OpBitcast instruction of SPIR-V using the spv.Bitcast
operation. The semantics implemented in the dialect differ from the
SPIR-V spec in that the dialect does not allow conversion to/from
pointer types from/to non-pointer types.
PiperOrigin-RevId: 271255957
A base class is added to implement all GLSL Binary operations and is
used to implement the FMax operation. The existing framework already
generates all the necessary (de)serialization code.
PiperOrigin-RevId: 271037166
This CL uses the newly added -split-input-file CLI option to
mlir-translate to combine certain (de)serialization tests.
It also renames certain test filenames.
PiperOrigin-RevId: 270816324
According to SPIR-V spec, spirv::CompositeType includes
spirv::RuntimeArrayType. This allows using objects of
spirv::RuntimeArrayType with spirv::AccessChainOp.
PiperOrigin-RevId: 270809492
Sdd support in deserializer for OpMemberName instruction. For now
the name is just processed and not associated with the
spirv::StructType being built. That needs an enhancement to
spirv::StructTypes itself.
Add tests to check for errors reported during deserialization with
some refactoring to common out some utility functions.
PiperOrigin-RevId: 270794524
The existing logic to parse spirv::StructTypes is very brittle. This
change simplifies the parsing logic a lot. The simplification also
allows for memberdecorations to be separated by commas instead of
spaces (which was an artifact of the existing parsing logic). The
change also needs a modification to mlir::parseType to return the
number of chars parsed. Adding a new parseType method to do so.
Also allow specification of spirv::StructType with no members.
PiperOrigin-RevId: 270739672
Roll forward of commit 5684a12.
When outlining GPU kernels, put the kernel function inside a nested module. Then use a nested pipeline to generate the cubins, independently per kernel. In a final pass, move the cubins back to the parent module.
PiperOrigin-RevId: 270639748
The CL adds a rounding mode flag to the class and changes the default to rmNearestTiesToAway from rmNearestTiesToEven because 1) Tensorflow QuantizeV2 ops uses rmNearestTiesToAway; 2) the specialization only implements rmNearestTiesToAway.
PiperOrigin-RevId: 270600739
Make GlobalOp's value attribute an OptionalAttr. Change code that uses the value to handle 'nullopt'. Translate an unitialized value attribute to llvm::UndefValue.
PiperOrigin-RevId: 270423646
Allow specification of decorators on SPIR-V StructType members. If the
struct has layout information, these decorations are to be specified
after the offset specification of the member. These decorations are
emitted as OpMemberDecorate instructions on the struct <id>. Update
(de)serialization to handle these decorations.
PiperOrigin-RevId: 270130136
A new converter with per axis quantization parameters is added to quantize a
dense elements attribute. For each slice along the quantization axis, it
creates an uniform quantized value converter, with different scale and zero
point, and quantizes the values in the slice.
The current implementation doesn't handle sparse elements attributes.
PiperOrigin-RevId: 270121986
When outlining GPU kernels, put the kernel function inside a nested module. Then use a nested pipeline to generate the cubins, independently per kernel. In a final pass, move the cubins back to the parent module.
PiperOrigin-RevId: 269987720
This CL registers a new mlir-translate hook, -test-spirv-roundtrip,
for testing SPIR-V serialization and deserialization round-trip.
This CL also moves the existing -serialize-spirv and
-deserialize-spirv hooks to one source file.
PiperOrigin-RevId: 269659528
A generic mechanism for (de)serialization of extended instruction sets
is added with this CL. To facilitate this, a new class
"SPV_ExtendedInstSetOp" is added which is a base class for all
operations corresponding to extended instruction sets. The methods to
(de)serialization such ops as well as its dispatch is generated
automatically.
The behavior controlled by autogenSerialization and hasOpcode is also
slightly modified to enable this. They are now decoupled.
1) Setting hasOpcode=1 means the operation has a corresponding
opcode in SPIR-V binary format, and its dispatch for
(de)serialization is automatically generated.
2) Setting autogenSerialization=1 generates the function for
(de)serialization automatically.
So now it is possible to have hasOpcode=0 and autogenSerialization=1
(for example SPV_ExtendedInstSetOp).
Since the dispatch functions is also auto-generated, the input file
needs to contain all operations. To this effect, SPIRVGLSLOps.td is
included into SPIRVOps.td. This makes the previously added
SPIRVGLSLOps.h and SPIRVGLSLOps.cpp unnecessary, and are deleted.
The SPIRVUtilsGen.cpp is also changed to make better use of
formatv,making the code more readable.
PiperOrigin-RevId: 269456263
Certain enum classes in SPIR-V, like function/loop control and memory
access, are bitmasks. This CL introduces a BitEnumAttr to properly
model this and drive auto-generation of verification code and utility
functions. We still store the attribute using an 32-bit IntegerAttr
for minimal memory footprint and easy (de)serialization. But utility
conversion functions are adjusted to inspect each bit and generate
"|"-concatenated strings for the bits; vice versa.
Each such enum class has a "None" case that means no bit is set. We
need special handling for "None". Because of this, the logic is not
general anymore. So right now the definition is placed in the SPIR-V
dialect. If later this turns out to be useful for other dialects,
then we can see how to properly adjust it and move to OpBase.td.
Added tests for SPV_MemoryAccess to check and demonstrate.
PiperOrigin-RevId: 269350620
Add support for specifying extended instructions sets. The operations
in SPIR-V dialect are named as 'spv.<extension-name>.<op-name>'. Use
this mechanism to define a 'Exp' operation from GLSL(450)
instructions.
Later CLs will add support for (de)serialization of these operations,
and update the dialect generation scripts to auto-generate the
specification using the spec directly.
Additional changes:
Add a Type Constraint to OpBase.td to check for vector of specified
lengths. This is used to check that the vector type used in SPIR-V
dialect are of lengths 2, 3 or 4.
Update SPIRVBase.td to use this Type constraints for vectors.
PiperOrigin-RevId: 269234377
SPIR-V recently publishes v1.5, which brings a bunch of symbols
into core. So the suffix "KHR"/"EXT"/etc. is removed from the
symbols. We use a script to pull information from the spec
directly.
Also changed conversion and tests to use GLSL450 instead of
VulkanKHR memory model. GLSL450 is still the main memory model
supported by Vulkan shaders and it does not require extra
capability to enable.
PiperOrigin-RevId: 268992661
This CL adds support for serializing and deserializing spv.loop ops.
This adds support for spv.Branch and spv.BranchConditional op
(de)serialization, too, because they are needed for spv.loop.
PiperOrigin-RevId: 268536962
If the composite is a constant, we can fold it away. This only
supports vector and array constants for now, given that struct
constant is not supported in spv.constant yet.
PiperOrigin-RevId: 268350340
Since we apply nudging for the zero point to make sure the nudged zerop points
can be in the range of [qmin, qmax], the constraint that rmin / rmax should
stride zero isn't necessary.
This also matches the documentation of tensorflow's FakeQuantWithMinMaxArgs op,
where min and max don't need to stride zero:
https://www.tensorflow.org/api_docs/python/tf/quantization/fake_quant_with_min_max_args
PiperOrigin-RevId: 268296285
Comparing to the existing quant.const_fake_quant op, the min and max attributes
of this new op is for each channel of last dimension of the input.
PiperOrigin-RevId: 268093722
View descriptors are converted to *pointer to* LLVM struct to avoid ABI issues related to C struct packing. This creates unnecessary complexity and hampers unification with memrefs.
Instead, this CL makes view descriptors convert to LLVM struct (as it was originally) and promotes all structs to pointers right before calling an external function.
PiperOrigin-RevId: 267602693
SPIR-V can explicitly declare structured control-flow constructs using merge
instructions. These explicitly declare a header block before the control
flow diverges and a merge block where control flow subsequently converges.
These blocks delimit constructs that must nest, and can only be entered
and exited in structured ways.
Instead of having a `spv.LoopMerge` op to directly model loop merge
instruction for indicating the merge and continue target, we use regions
to delimit the boundary of the loop: the merge target is the next op
following the `spv.loop` op and the continue target is the block that
has a back-edge pointing to the entry block inside the `spv.loop`'s region.
This way it's easier to discover all blocks belonging to a construct and
it plays nicer with the MLIR system.
Updated the SPIR-V.md doc.
PiperOrigin-RevId: 267431010
This follows up on the recent restructuring that moved the dialects under
lib/Dialect and inter-dialect conversions to lib/Conversion. Originally, the
tests for both the LLVMIR dialect itself and the conversion from Standard to
LLVMIR dialect lived under test/LLVMIR. This no longer reflects the code
structure. Move the tests to either test/Dialect/LLVMIR or
test/Conversion/StandardToLLVM depending on the features they exercise.
PiperOrigin-RevId: 267159219
This will allow us to use MLIR's folding infrastructure to deduplicate
SPIR-V constants.
This CL also changed isValidSPIRVType in SPIRVDialect to a static method.
PiperOrigin-RevId: 266984403
This CL just covers the op definition, its parsing, printing,
and verification. (De)serialization is to be implemented
in a subsequent CL.
PiperOrigin-RevId: 266431077
Previously Module and Function are builtinn constructs in MLIR.
Due to the structural requirements we must wrap the SPIR-V
module inside a Function inside a Module. Now the requirement
is lifted and we can remove the wrapping function! :)
PiperOrigin-RevId: 264736051
In SPIR-V binary format, constants are placed at the module level
and referenced by instructions inside functions using their result
<id>s. To model this natively (using SSA values for result <id>s),
it means we need to have implicit capturing functions. We will
lose the ability to have function passes if going down that path.
Instead, this CL changes to materialize constants at their use
sites in deserialization. It's cheap to copy constants in MLIR
given that attributes is uniqued to MLIRContext. By localizing
constants into functions, we can preserve isolated functions.
PiperOrigin-RevId: 264582532
Similar to global variables, specialization constants also live
in the module scope and can be referenced by instructions in
functions in native SPIR-V. A direct modelling would be to allow
functions in the SPIR-V dialect to implicit capture, but it means
we are losing the ability to write passes for Functions. While
in SPIR-V normally we want to process the module as a whole,
it's not common to see multiple functions get used so we'd like
to leave the door open for those cases. Therefore, similar to
global variables, we introduce spv.specConstant to model three
SPIR-V instructions: OpSpecConstantTrue, OpSpecConstantFalse,
and OpSpecConstant. They do not return SSA value results;
instead they have symbols and can only be referenced by the
symbols. To use it in a function, we need to have another op
spv._reference_of to turn the symbol into an SSA value. This
breaks the tie and makes functions still explicit capture.
Previously specialization constants were handled similarly as
normal constants. That is incorrect given that specialization
constant actually acts more like variable (without need to
load and store). E.g., they cannot be de-duplicated like normal
constants.
This CL also refines various documents and comments.
PiperOrigin-RevId: 264455172
Change the prining/parsing of spv.globalVariable to print the type of
the variable after the ':' to be consistent with MLIR convention.
The spv._address_of should print the variable type after the ':'. It was
mistakenly printing the address of the return value. Add a (missing)
test that should have caught that.
Also move spv.globalVariable and spv._address_of tests to
structure-ops.mlir.
PiperOrigin-RevId: 264204686
This CL adds the spv.ReturnValue op and its tests. Also adds a
InFunctionScope trait to make sure that the op stays inside
a function. To be consistent, ModuleOnly trait is changed to
InModuleScope.
PiperOrigin-RevId: 264193081
FuncOps in MLIR use explicit capture. So global variables defined in
module scope need to have a symbol name and this should be used to
refer to the variable within the function. This deviates from SPIR-V
spec, which assigns an SSA value to variables at all scopes that can
be used to refer to the variable, which requires SPIR-V functions to
allow implicit capture. To handle this add a new op,
spirv::GlobalVariableOp that can be used to define module scope
variables.
Since instructions need an SSA value, an new spirv::AddressOfOp is
added to convert a symbol reference to an SSA value for use with other
instructions.
This also means the spirv::EntryPointOp instruction needs to change to
allow initializers to be specified using symbol reference instead of
SSA value
The current spirv::VariableOp which returns an SSA value (as defined
by SPIR-V spec) can still be used to define function-scope variables.
PiperOrigin-RevId: 263951109
This CL adds an optional third argument to the vector.outerproduct instruction.
When such a third argument is specified, it is added to the result of the outerproduct and is lowered to FMA intrinsic when the lowering supports it.
In the future, we can add an attribute on the `vector.outerproduct` instruction to modify the operations for which to emit code (e.g. "+/*", "max/+", "min/+", "log/exp" ...).
This CL additionally performs minor cleanups in the vector lowering and adds tests to improve coverage.
This has been independently verified to result in proper fma instructions for haswell as follows.
Input:
```
func @outerproduct_add(%arg0: vector<17xf32>, %arg1: vector<8xf32>, %arg2: vector<17x8xf32>) -> vector<17x8xf32> {
%2 = vector.outerproduct %arg0, %arg1, %arg2 : vector<17xf32>, vector<8xf32>
return %2 : vector<17x8xf32>
}
}
```
Command:
```
mlir-opt vector-to-llvm.mlir -vector-lower-to-llvm-dialect --disable-pass-threading | mlir-opt -lower-to-cfg -lower-to-llvm | mlir-translate --mlir-to-llvmir | opt -O3 | llc -O3 -march=x86-64 -mcpu=haswell -mattr=fma,avx2
```
Output:
```
outerproduct_add: # @outerproduct_add
# %bb.0:
...
vmovaps 112(%rbp), %ymm8
vbroadcastss %xmm0, %ymm0
...
vbroadcastss 64(%rbp), %ymm15
vfmadd213ps 144(%rbp), %ymm8, %ymm0 # ymm0 = (ymm8 * ymm0) + mem
...
vfmadd213ps 400(%rbp), %ymm8, %ymm9 # ymm9 = (ymm8 * ymm9) + mem
...
```
PiperOrigin-RevId: 263743359
Generate the EnumAttr to represent BuiltIns in SPIR-V dialect. The
builtIn can be specified as a StringAttr with value being the
name of the builtin. Extend Decoration (de)serialization to handle
BuiltIns.
Also fix an error in the SPIR-V dialect generator script.
PiperOrigin-RevId: 263596624
This CL is step 2/n towards building a simple, programmable and portable vector abstraction in MLIR that can go all the way down to generating assembly vector code via LLVM's opt and llc tools.
This CL adds the vector.outerproduct operation to the MLIR vector dialect as well as the appropriate roundtrip test. Lowering to LLVM will occur in the following CL.
PiperOrigin-RevId: 262552027
This CL is step 2/n towards building a simple, programmable and portable vector abstraction in MLIR that can go all the way down to generating assembly vector code via LLVM's opt and llc tools.
This CL adds the vector.extractelement operation to the MLIR vector dialect as well as the appropriate roundtrip test. Lowering to LLVM will occur in the following CL.
PiperOrigin-RevId: 262545089
This trait provides the ensureTerminator() utility function and
the checks to make sure a spv.module is indeed terminated with
spv._module_end.
PiperOrigin-RevId: 261664153
This CL extends the existing spv.constant op to also support
specialization constant by adding an extra unit attribute
on it.
PiperOrigin-RevId: 261194869
During serialization, the operand number must be used to get the
values assocaited with an operand. Using the argument number in Op
specification was wrong since some of the elements in the arguments
list might be attributes on the operation. This resulted in a segfault
during serialization.
Add a test that exercise that path.
PiperOrigin-RevId: 260977758
All non-argument attributes specified for an operation are treated as
decorations on the result value and (de)serialized using OpDecorate
instruction. An error is generated if an attribute is not an argument,
and the name doesn't correspond to a Decoration enum. Name of the
attributes that represent decoerations are to be the snake-case-ified
version of the Decoration name.
Add utility methods to convert to snake-case and camel-case.
PiperOrigin-RevId: 260792638
This CL adds an initial implementation for translation of kernel
function in GPU Dialect (used with a gpu.launch_kernel) op to a
spv.Module. The original function is translated into an entry
function.
Most of the heavy lifting is done by adding TypeConversion and other
utility functions/classes that provide most of the functionality to
translate from Standard Dialect to SPIR-V Dialect. These are intended
to be reusable in implementation of different dialect conversion
pipelines.
Note : Some of the files for have been renamed to be consistent with
the norm used by the other Conversion frameworks.
PiperOrigin-RevId: 260759165
Automatic generation of spirv::AccessChainOp (de)serialization needs
the (de)serialization emitters to handle argument specified as
Variadic<...>. To handle this correctly, this argument can only be
the last entry in the arguments list.
Add a test to (de)serialize spirv::AccessChainOp
PiperOrigin-RevId: 260532598
AccessChainOp creates a pointer into a composite object that can be used with
OpLoad and OpStore.
Closestensorflow/mlir#52
PiperOrigin-RevId: 260035676
Per tacit agreement, individual dialects should now live in lib/Dialect/Name
with headers in include/mlir/Dialect/Name and tests in test/Dialect/Name.
PiperOrigin-RevId: 259896851
Some TensorFlow simulated quantize ops such as QuantizeAndDequantizeV2Op have
attribute for the sign of the quantization, so quant_ConstFakeQuant should be
able to represent it with the new attribute is added.
The method for converting these attributes to an QuantizedType is updated to
handle this new argument.
PiperOrigin-RevId: 258810290
We only verify broadcastable trait verifier and don't care about mutations so removed all CHECK statements and FileCheck invocation.
PiperOrigin-RevId: 258662882
Currently, Broadcastable trait also rejects instances when the op result has shape other than what can be statically inferred based on the operand shapes even if the result shape is compatible with the inferred broadcasted shape.
For example,
(tensor<3x2xi32>, tensor<*xi32>) -> tensor<4x3x2xi32>
(tensor<2xi32>, tensor<2xi32>) -> tensor<*xi32>
PiperOrigin-RevId: 258647493
The current syntax separates the name and value with ':', but ':' is already overloaded by several other things(e.g. trailing types). This makes the syntax difficult to parse in some situtations:
Old:
"foo: 10 : i32"
New:
"foo = 10 : i32"
PiperOrigin-RevId: 255097928
This is the standard syntax for types on operations, and is also already used by IntegerAttr and FloatAttr.
Example:
dense<5> : tensor<i32>
dense<[3]> : tensor<1xi32>
PiperOrigin-RevId: 255069157
This name has caused some confusion because it suggests that it's running op verification (and that this verification isn't getting run by default).
PiperOrigin-RevId: 254035268
Adding the additional layer of directory was discussed offline and matches the Target/ tree. The names match the defacto convention we seem to be following where the C++ namespace is ^(.+)Ops/$ matched against the directory name.
This is in preparation for patching the Quantizer into this tree, which would have been confusing without moving the Quantization dialect to its more proper home. It is left to others to move other dialects if desired.
Tested:
ninja check-mlir
--
PiperOrigin-RevId: 248171982
TensorFlow comparison ops like tf.Less supports broadcast behavior but the result
type have different element types as the input types. Extend broadcastable trait
to allow such cases. Added tf.Less to demonstrate it.
PiperOrigin-RevId: 237846127
So that we can use this function to deduce broadcasted shapes elsewhere.
Also added support for unknown dimensions, by following TensorFlow behavior.
PiperOrigin-RevId: 237846065
* Add common broadcastable binary adder in TF ops and use for a few ops;
- Adding Sub, Mul here
* Change the prepare lowering to use TF variants;
* Add some more legalization patterns;
PiperOrigin-RevId: 233310952
That allows TensorFlow Add and Div ops to use Broadcastable op trait instead of
more restrictive SameValueType op trait.
That in turn allows TensorFlow ops to be registered by defining GET_OP_LIST and
including the generated ops file. Currently, tf-raise-control-flow pass tests
are using dynamic shapes in tf.Add op and AddOp can't be registered without
supporting the dynamic shapes.
TESTED with unit tests
PiperOrigin-RevId: 232927998
The operand and result types of binary ops are not necessarily the
same. For those binary ops, we cannot print in the short-form assembly.
Enhance impl:::printBinaryOp to consider operand and result types
to select which assembly form to use.
PiperOrigin-RevId: 229608142
We also need the broadcast logic in the TensorFlow dialect. Move it to a
Dialect/ directory for a broader scope. This Dialect/ directory is intended
for code not in core IR, but can potentially be shared by multiple dialects.
Apart from fixing TensorFlow op TableGen to use this trait, this CL only
contains mechanical code shuffling.
PiperOrigin-RevId: 229563911