Add builder functions for spv._address_of, spv.EntryPoint,
spv.ExecutionMode and spv.Load to make it easier to create these
operations.
Fix a minor bug in printing of spv.EntryPoint
Add a utility function to get the attribute name associated with a
decoration.
PiperOrigin-RevId: 272952846
Certain lowering patterns were reported as [missing](https://groups.google.com/a/tensorflow.org/forum/#!topic/mlir/dkdmHa77sSQ).
This CL adds them and allows Linalg/roundtrip.mlir and Linalg/loops.mlir to lower to LLVM directly. Those 2 tests are updated to additionally check that the direct lowering to LLVM does not crash.
The following points, left as TODOs still need to be addressed for correct end-to-end execution:
1. the lowering for ConvOp needs to pass attributes such as strides and dilations; the external library call needs to support it.
2. the lowering for GenericOp needs to support lowering to loops as a DialectConversion pattern. This is blocked on the DialectConversion infrastructure accepting an OperationFolder.
PiperOrigin-RevId: 272878131
The GPUIndexIntrinsicOpLowering template is currently used by the code in both the GPUToNVVM and GPUToROCDL dirs.
Moving it to a common location to remove code duplication.
Closestensorflow/mlir#163
COPYBARA_INTEGRATE_REVIEW=https://github.com/tensorflow/mlir/pull/163 from deven-amd:deven-refactor-gpu-index-ops-lowering b8dc2a5f5353df196039b6ff2ad42106028693ed
PiperOrigin-RevId: 272863297
Some dialects have implicit conversions inherent in their modeling, meaning that a call may have a different type that the type that the callable expects. To support this, a hook is added to the dialect interface that allows for materializing conversion operations during inlining when there is a mismatch. A hook is also added to the callable interface to allow for introspecting the expected result types.
PiperOrigin-RevId: 272814379
This allows for the inliner to work on arbitrary call operations. The updated inliner will also work bottom-up through the callgraph enabling support for multiple levels of inlining.
PiperOrigin-RevId: 272813876
The first dim length of the axisStats attribute should equals to the slice size
of the input argument when splitted by the axis dimension.
PiperOrigin-RevId: 272798042
This CL implements the last remaining bit of the [strided memref proposal](https://groups.google.com/a/tensorflow.org/forum/#!topic/mlir/MaL8m2nXuio).
The syntax is a bit more explicit than what was originally proposed and resembles:
`memref<?x?xf32, offset: 0 strides: [?, 1]>`
Nonnegative strides and offsets are currently supported. Future extensions will include negative strides.
This also gives a concrete example of syntactic sugar for the ([RFC] Proposed Changes to MemRef and Tensor MLIR Types)[https://groups.google.com/a/tensorflow.org/forum/#!topic/mlir/-wKHANzDNTg].
The underlying implementation still uses AffineMap layout.
PiperOrigin-RevId: 272717437
Module names are optional so it makes more sense to take and return an optional
any time the name is involved. Also update the language reference to reflect
the module names.
PiperOrigin-RevId: 272684698
Modules are now Ops and, as such, can be nested. They do not produce an SSA
value so there is no possibility to refer to them in the IR. Introduce support
for symbol names attached to the module Op so that it can be referred to using
SymbolRefAttrs. The name is optional, for example the implicit top-level module
does not have a name.
PiperOrigin-RevId: 272671600
This makes the name of the conversion pass more consistent with the naming
scheme, since it actually converts from the Loop dialect to the Standard
dialect rather than working with arbitrary control flow operations.
PiperOrigin-RevId: 272612112
As specified in the MLIR language reference and rationale documents, `memref`
types should not be allowed to have `index` as element types. As observed in
https://groups.google.com/a/tensorflow.org/forum/#!msg/mlir/P49hVWqTMNc/nW89a4i_AgAJ
this restriction was lifted when canonicalization unit tests for affine
operations were introduced, without sufficient motivation to lift the
restriction itself. The test in question can be trivially rewritten (return
the value from a function instead of storing it to prevent DCE from removing
the producer operation) and the restriction put back in place.
If `memref<...x index>` is relevant for some use cases, the relaxation of the
type system can be implemented separately with appropriate modifications to the
documentation.
PiperOrigin-RevId: 272607043
This also adds coverage with a missing test, which uncovered a bug in the conditional for testing whether an offset is dynamic or not.
PiperOrigin-RevId: 272505798
Similar to spv.loop, spv.selection is another op for modelling
SPIR-V structured control flow. It covers both OpBranchConditional
and OpSwitch with OpSelectionMerge.
Instead of having a `spv.SelectionMerge` op to directly model
selection merge instruction for indicating the merge target,
we use regions to delimit the boundary of the selection: the
merge target is the next op following the `spv.selection` op.
This way it's easier to discover all blocks belonging to
the selection and it plays nicer with the MLIR system.
PiperOrigin-RevId: 272475006
This is a follow-up to the PRtensorflow/mlir#146 which introduced the ROCDL Dialect. This PR introduces a pass to lower GPU Dialect to the ROCDL Dialect. As with the previous PR, this one builds on the work done by @whchung, and addresses most of the review comments in the original PR.
Closestensorflow/mlir#154
COPYBARA_INTEGRATE_REVIEW=https://github.com/tensorflow/mlir/pull/154 from deven-amd:deven-lower-gpu-to-rocdl 809893e08236da5ab6a38e3459692fa04247773d
PiperOrigin-RevId: 272390729
This exposes hooks for accessing internal dominance nodes, and updating the internal DFS numbers.
Closestensorflow/mlir#151
COPYBARA_INTEGRATE_REVIEW=https://github.com/tensorflow/mlir/pull/151 from schweitzpgi:dominance_hooks 69d14214a423b816cbd59feffcacdd02f3b5f921
PiperOrigin-RevId: 272287352
A recent ABI compatibility change affected the conversion from standard
CallOp/CallIndirectOp to LLVM::CallOp by changing its signature. In order to
analyze the signature, the code was looking up the callee symbol in the module.
This is incorrect since, during the conversion, the module may contain both the
original and the converted function op that have the same symbol name. There is
no strict guarantee on which of the two symbols will be found by the lookup.
The conversion was not failing because the type legalizer converts the LLVM
types to themselves making the original and the converted function signatures
ultimately produce the same type.
Instead of looking up the function signature to get the list of result types,
use the types of the CallOp/CallIndirectOp results which must match those of
the function in valid IR. These types are guaranteed to be the original,
unconverted types when converting the operation. Furthermore, this avoids the
need to perform a lookup of a symbol name in the module which may be expensive.
Finally, propagate attributes as-is from the original op to the converted op
since they share the attribute name for the callee of direct calls and the rest
of attributes are not affected by the conversion. This removes the need for
additional contorsions between direct and indirect calls to extract the name of
the optional callee attribute only to insert it back. This also prevents the
conversion from unintentionally dropping the other attributes of the op.
PiperOrigin-RevId: 272218871
This CL finishes the implementation of the Linalg + Affine type unification of the [strided memref RFC](https://groups.google.com/a/tensorflow.org/forum/#!topic/mlir/MaL8m2nXuio).
As a consequence, the !linalg.view type, linalg::DimOp, linalg::LoadOp and linalg::StoreOp can now disappear and Linalg can use standard types everywhere.
PiperOrigin-RevId: 272187165
Perform second reduce only with first warp. This requires an additional __sync_threads(), but doesn't need special handling when the last warp is small. This simplifies support for block sizes that are not multiple of 32.
Supporting partial warp reduce will be done in a separate CL.
PiperOrigin-RevId: 272168917
For the cases where there are multiple levels of nested pass managers, the parent thread ID is not enough to distinguish the parent of a given pass pipeline. Passing in the parent pass gives an exact anchor point.
PiperOrigin-RevId: 272105461
According to the SPIR-V spec:
"Length is the number of elements in the array. It must be at least 1."
Closestensorflow/mlir#160
COPYBARA_INTEGRATE_REVIEW=https://github.com/tensorflow/mlir/pull/160 from denis0x0D:sandbox/array_len 0840dc0986ad0088a3aa7d5d8d3e97d489377ed9
PiperOrigin-RevId: 272094669
Add DeclareOpInterfaceFunctions to enable specifying whether OpInterfaceMethods
for an OpInterface should be generated automatically. This avoids needing to
declare the extra methods, while also allowing adding function declaration by way of trait/inheritance.
Most of this change is mechanical/extracting classes to be reusable.
PiperOrigin-RevId: 272042739
This CL finishes the implementation of the lowering part of the [strided memref RFC](https://groups.google.com/a/tensorflow.org/forum/#!topic/mlir/MaL8m2nXuio).
Strided memrefs correspond conceptually to the following templated C++ struct:
```
template <typename Elem, size_t Rank>
struct {
Elem *ptr;
int64_t offset;
int64_t sizes[Rank];
int64_t strides[Rank];
};
```
The linearization procedure for address calculation for strided memrefs is the same as for linalg views:
`base_offset + SUM_i index_i * stride_i`.
The following CL will unify Linalg and Standard by removing !linalg.view in favor of strided memrefs.
PiperOrigin-RevId: 272033399
Add operations corresponding to OpLogicalAnd, OpLogicalNot,
OpLogicalEqual, OpLogicalNotEqual and OpLogicalOr instructions in
SPIR-V dialect. This needs changes to class hierarchy in SPIR-V
TableGen files to split SPIRVLogicalOp into SPIRVLogicalUnaryOp and
SPIRVLogicalBinaryOp. All derived classes of SPIRVLogicalOp are
updated accordingly.
Update the spirv dialect generation script to
1) Allow specifying base class to use for instruction spec generation
and file name to generate the specification in separately.
2) Use the existing descriptions for operations.
3) Update define_inst.sh to also invoke define_opcode.sh to also
define the corresponding SPIR-V instruction opcode enum.
PiperOrigin-RevId: 272014876
MemRefType::getStrides uses AffineExpr::walk which operates in post-order from the leaves. In order to compute strides properly, it needs to escape on terminal nodes and analyze binary ops only. This did not work for AffineExpr that consist of a single term (i.e. without a binary op).
This CL fixes the corner case and adds relevant tests.
PiperOrigin-RevId: 271975746
Use OpInterfaces to add an interface for ops defining a return type function.
This change does not use this trait in any meaningful way, I'll use it in a
follow up to generalize and unify some of the op type traits/constraints. Also,
currently the infer type function can only be manually specified in C++, that should rather be the fallback in future.
PiperOrigin-RevId: 271883746
The generated build methods have result type before the arguments (operands and attributes, which are also now adjacent in the explicit create method). This also results in changing the create method's ordering to match most build method's ordering.
PiperOrigin-RevId: 271755054
This is more consistent with other dump methods. Otherwise successive Value dumps are concatenated in same line, hurting readability.
PiperOrigin-RevId: 271669846
- also remove stale terminology/references in docs
Signed-off-by: Uday Bondhugula <uday@polymagelabs.com>
Closestensorflow/mlir#148
COPYBARA_INTEGRATE_REVIEW=https://github.com/tensorflow/mlir/pull/148 from bondhugula:cleanup e846b641a3c2936e874138aff480a23cdbf66591
PiperOrigin-RevId: 271618279
The strided MemRef RFC discusses a normalized descriptor and interaction with library calls (https://groups.google.com/a/tensorflow.org/forum/#!topic/mlir/MaL8m2nXuio).
Lowering of nested LLVM structs as value types does not play nicely with externally compiled C/C++ functions due to ABI issues.
Solving the ABI problem generally is a very complex problem and most likely involves taking
a dependence on clang that we do not want atm.
A simple workaround is to pass pointers to memref descriptors at function boundaries, which this CL implement.
PiperOrigin-RevId: 271591708
linalg_integration_test.mlir and simple.mlir were temporarily disabled due to an OSS-only failure.
The issue is that, once created, an llvm::Error must be explicitly checked before it can be discarded or overwritten.
This CL fixes the issue and reenable the test.
PiperOrigin-RevId: 271589651
This commit introduces the ROCDL Dialect (i.e. the ROCDL ops + the code to lower those ROCDL ops to LLWM intrinsics/functions). Think of ROCDL Dialect as analogous to the NVVM Dialect, but for AMD GPUs. This patch contains just the essentials needed to get a simple example up and running. We expect to make further additions to the ROCDL Dialect.
This is the first of 3 commits, the follow-up will be:
* add a pass that lowers GPU Dialect to ROCDL Dialect
* add a "mlir-rocm-runner" utility
Closestensorflow/mlir#146
COPYBARA_INTEGRATE_REVIEW=https://github.com/tensorflow/mlir/pull/146 from deven-amd:deven-rocdl-dialect e78e8005c75a78912631116c78dc844fcc4b0de9
PiperOrigin-RevId: 271511259
This CL modifies the linalg-fusion pass such that it does not tile anymore as part of the pass. Tiling is a separate concern that enables linalg fusion but should happen before.
This makes fusion more composable with other decisions.
In particular the fusion pass now becomes greedy and only applies the transformation on a best-effort basis.
This should also let fusion work in a multi-hop fashion with chains of producer/consumers.
Since the fusion pass does not perform tiling anymore, tests are rewritten to be in pretiled form and make the intent of the test clearer (albeit more verbose).
PiperOrigin-RevId: 271357741
The support for functions taking and returning memrefs of floats was introduced
in the first version of the runner, created before MLIR had reliable lowering
of allocation/deallocation to library calls. It forcibly runs MLIR
transformation convering affine, loop and standard dialects into the LLVM
dialect, unlike the other runner flows that accept the LLVM dialect directly.
Memref support leads to more complex layering and is generally fragile. Drop
it in favor of functions returning a scalar, or library-based function calls to
print memrefs and other data structures.
PiperOrigin-RevId: 271330839
The reduction operation is currently fixed to "add", and the scope is fixed to "workgroup".
The implementation is currently limited to sizes that are multiple 32 (warp size) and no larger than 1024.
PiperOrigin-RevId: 271290265
Support the OpBitcast instruction of SPIR-V using the spv.Bitcast
operation. The semantics implemented in the dialect differ from the
SPIR-V spec in that the dialect does not allow conversion to/from
pointer types from/to non-pointer types.
PiperOrigin-RevId: 271255957
1) Process and ignore the following debug instructions: OpSource,
OpSourceContinued, OpSourceExtension, OpString, OpModuleProcessed.
2) While processing OpTypeInt instruction, ignore the signedness
specification. Currently MLIR doesnt make a distinction between signed
and unsigned integer types.
3) Process and ignore BufferBlock decoration (similar to Buffer
decoration). StructType needs to be enhanced to track this attribute
since its needed for proper validation checks.
4) Report better error for unhandled instruction during
deserialization.
PiperOrigin-RevId: 271057060
- introduce splat op in standard dialect (currently for int/float/index input
type, output type can be vector or statically shaped tensor)
- implement LLVM lowering (when result type is 1-d vector)
- add constant folding hook for it
- while on Ops.cpp, fix some stale names
Signed-off-by: Uday Bondhugula <uday@polymagelabs.com>
Closestensorflow/mlir#141
COPYBARA_INTEGRATE_REVIEW=https://github.com/tensorflow/mlir/pull/141 from bondhugula:splat 48976a6aa0a75be6d91187db6418de989e03eb51
PiperOrigin-RevId: 270965304
The RFC for unifying Linalg and Affine compilation passes into an end-to-end flow with a predictable ABI and linkage to external function calls raised the question of why we have variable sized descriptors for memrefs depending on whether they have static or dynamic dimensions (https://groups.google.com/a/tensorflow.org/forum/#!topic/mlir/MaL8m2nXuio).
This CL standardizes the ABI on the rank of the memrefs.
The LLVM struct for a memref becomes equivalent to:
```
template <typename Elem, size_t Rank>
struct {
Elem *ptr;
int64_t sizes[Rank];
};
```
PiperOrigin-RevId: 270947276
According to SPIR-V spec, spirv::CompositeType includes
spirv::RuntimeArrayType. This allows using objects of
spirv::RuntimeArrayType with spirv::AccessChainOp.
PiperOrigin-RevId: 270809492
Similar to mlir-opt, having a -split-input-file mode is quite useful
in mlir-translate. It allows to put logically related tests in the
same test file for better organization.
PiperOrigin-RevId: 270805467
Sdd support in deserializer for OpMemberName instruction. For now
the name is just processed and not associated with the
spirv::StructType being built. That needs an enhancement to
spirv::StructTypes itself.
Add tests to check for errors reported during deserialization with
some refactoring to common out some utility functions.
PiperOrigin-RevId: 270794524
The existing logic to parse spirv::StructTypes is very brittle. This
change simplifies the parsing logic a lot. The simplification also
allows for memberdecorations to be separated by commas instead of
spaces (which was an artifact of the existing parsing logic). The
change also needs a modification to mlir::parseType to return the
number of chars parsed. Adding a new parseType method to do so.
Also allow specification of spirv::StructType with no members.
PiperOrigin-RevId: 270739672
Using the two call interfaces, CallOpInterface and CallableOpInterface, this change adds support for an initial multi-level CallGraph. This call graph builds a set of nodes for each callable region, and connects them via edges. An edge may be any of the following types:
* Abstract
- An edge not produced by a call operation, used for connecting to internal nodes from external nodes.
* Call
- A call edge is an edge defined via a call-like operation.
* Child
- This is an artificial edge connecting nested callgraph nodes.
This callgraph will be used, and improved upon, to begin supporting more interesting interprocedural analyses and transformation. In a followup, this callgraph will be used to support more complex inlining support.
PiperOrigin-RevId: 270724968
These two operation interfaces will be used in a followup to support building a callgraph:
* CallOpInterface
- Operations providing this interface are call-like, and have a "call" target. A call target may be a symbol reference, via SymbolRefAttr, or a SSA value.
* CallableOpInterface
- Operations providing this interfaces define destinations to call-like operations, e.g. FuncOp. These operations may define any number of callable regions.
PiperOrigin-RevId: 270723300
This fixes a problem with current save-restore pattern of diagnostics handlers, as there may be a thread race between when the previous handler is destroyed. For example, this occurs when using multiple ParallelDiagnosticHandlers asynchronously:
Handler A
Handler B | - LifeTime - | Restore A here.
Handler C | --- LifeTime ---| Restore B after it has been destroyed.
The new design allows for multiple handlers to be registered in a stack like fashion. Handlers can return success() to signal that they have fully processed a diagnostic, or failure to propagate otherwise.
PiperOrigin-RevId: 270720625
Roll forward of commit 5684a12.
When outlining GPU kernels, put the kernel function inside a nested module. Then use a nested pipeline to generate the cubins, independently per kernel. In a final pass, move the cubins back to the parent module.
PiperOrigin-RevId: 270639748
This adds sign- and zero-extension and truncation of integer types to the
standard dialects. This allows to perform integer type conversions without
having to go to the LLVM dialect and introduce custom type casts (between
standard and LLVM integer types).
Closestensorflow/mlir#134
COPYBARA_INTEGRATE_REVIEW=https://github.com/tensorflow/mlir/pull/134 from ombre5733:sext-zext-trunc-in-std c7657bc84c0ca66b304e53ec03797e09152e4d31
PiperOrigin-RevId: 270479722
- fix store to load forwarding for a certain set of cases (where
forwarding shouldn't have happened); use AffineValueMap difference
based MemRefAccess equality checking; utility logic is also greatly
simplified
- add missing equality/inequality operators for AffineExpr ==/!= ints
- add == != operators on MemRefAccess
Closestensorflow/mlir#136
COPYBARA_INTEGRATE_REVIEW=https://github.com/tensorflow/mlir/pull/136 from bondhugula:store-load-forwarding d79fd1add8bcfbd9fa71d841a6a9905340dcd792
PiperOrigin-RevId: 270457011
Make GlobalOp's value attribute an OptionalAttr. Change code that uses the value to handle 'nullopt'. Translate an unitialized value attribute to llvm::UndefValue.
PiperOrigin-RevId: 270423646
computeDepth calls itself recursively, which may insert into minPatternDepth. minPatternDepth is a DenseMap, which invalidates iterators on insertion, so this may lead to asan failures.
PiperOrigin-RevId: 270374203
The RFC for unifying Linalg and Affine compilation passes into an end-to-end flow discusses the notion of a strided MemRef (https://groups.google.com/a/tensorflow.org/forum/#!topic/mlir/MaL8m2nXuio).
This CL adds helper functions to extract strides from the layout map which in turn will allow converting between a strided form of the type and a layout map.
For now strides are only computed on a single affine map with a single result (i.e. the closed subset of linearization maps that are compatible with striding semantics). This restriction will be reevaluated / lifted in the future based on concrete use cases.
PiperOrigin-RevId: 270284686
Allow specification of decorators on SPIR-V StructType members. If the
struct has layout information, these decorations are to be specified
after the offset specification of the member. These decorations are
emitted as OpMemberDecorate instructions on the struct <id>. Update
(de)serialization to handle these decorations.
PiperOrigin-RevId: 270130136
A new converter with per axis quantization parameters is added to quantize a
dense elements attribute. For each slice along the quantization axis, it
creates an uniform quantized value converter, with different scale and zero
point, and quantizes the values in the slice.
The current implementation doesn't handle sparse elements attributes.
PiperOrigin-RevId: 270121986
When outlining GPU kernels, put the kernel function inside a nested module. Then use a nested pipeline to generate the cubins, independently per kernel. In a final pass, move the cubins back to the parent module.
PiperOrigin-RevId: 269987720
This modifies DominanceInfo::properlyDominates(Value *value, Operation *op) to return false if the value is defined by a parent operation of 'op'. This prevents using values defined by the parent operation from within any child regions.
PiperOrigin-RevId: 269934920
- allow symbols in index remapping provided for memref replacement
- fix memref normalize crash on cases with layout maps with symbols
Signed-off-by: Uday Bondhugula <uday@polymagelabs.com>
Reported by: Alex Zinenko
Closestensorflow/mlir#139
COPYBARA_INTEGRATE_REVIEW=https://github.com/tensorflow/mlir/pull/139 from bondhugula:memref-rep-symbols 2f48c1fdb5d4c58915bbddbd9f07b18541819233
PiperOrigin-RevId: 269851182
Introduce support for applying the stripe operator to sum expressions, as in
(x + A) # B = x + A - (x + A) mod B.
This is required to represent a combination of tiling and padding in the SDBM
framework, and is a valid SDBM construct that was not originally supported.
PiperOrigin-RevId: 269758807
Extend SDBM simplification patterns to support more cases where the addition of
two expressions each involving one or two variables would result in a sum
expression that only contains one variable and thus remains in the SDBM domain.
This is made possible by the new canonical structure of SDBM where the constant
term appears once. This simplification will be necessary to support
round-tripping of stripe expressions containing constant terms on the LHS
through affine expressions.
PiperOrigin-RevId: 269757732
This is useful in several cases, for example a user may want to sugar the syntax of a string(as we do with custom operation syntax), or avoid many nested ifs for parsing a set of known keywords.
PiperOrigin-RevId: 269695451
This CL registers a new mlir-translate hook, -test-spirv-roundtrip,
for testing SPIR-V serialization and deserialization round-trip.
This CL also moves the existing -serialize-spirv and
-deserialize-spirv hooks to one source file.
PiperOrigin-RevId: 269659528
Existing translations are either from MLIR or to MLIR. To support
cases like round-tripping some external format via MLIR, one must
chain two mlir-translate invocations together using pipes. This
can be problematic to support -split-input-file in mlir-translate
given that it won't work across pipes.
Motivated by the above, this CL adds another translation category
that allows file to file. This gives users more freedom.
PiperOrigin-RevId: 269636438
This CL changes translation functions to take MemoryBuffer
as input and raw_ostream as output. It is generally better to
avoid handling files directly in a library (unless the library
is specifically for file manipulation) and we can unify all
file handling to the mlir-translate binary itself.
PiperOrigin-RevId: 269625911
- add canonicalization pattern to compose maps into affine loads/stores;
templatize the pattern and reuse it for affine.apply as well
- rename getIndices -> getMapOperands() (getIndices is confusing since
these are no longer the indices themselves but operands to the map
whose results are the indices). This also makes the accessor uniform
across affine.apply/load/store. Change arg names on the affine
load/store builder to avoid confusion. Drop an unused confusing build
method on AffineStoreOp.
- update incomplete doc comment for canonicalizeMapAndOperands (this was
missed from a previous update).
Addresses issue tensorflow/mlir#121
Signed-off-by: Uday Bondhugula <uday@polymagelabs.com>
Closestensorflow/mlir#122
COPYBARA_INTEGRATE_REVIEW=https://github.com/tensorflow/mlir/pull/122 from bondhugula:compose-load-store e71de1771e56a85c4282c10cb43f30cef0701c4f
PiperOrigin-RevId: 269619540
A generic mechanism for (de)serialization of extended instruction sets
is added with this CL. To facilitate this, a new class
"SPV_ExtendedInstSetOp" is added which is a base class for all
operations corresponding to extended instruction sets. The methods to
(de)serialization such ops as well as its dispatch is generated
automatically.
The behavior controlled by autogenSerialization and hasOpcode is also
slightly modified to enable this. They are now decoupled.
1) Setting hasOpcode=1 means the operation has a corresponding
opcode in SPIR-V binary format, and its dispatch for
(de)serialization is automatically generated.
2) Setting autogenSerialization=1 generates the function for
(de)serialization automatically.
So now it is possible to have hasOpcode=0 and autogenSerialization=1
(for example SPV_ExtendedInstSetOp).
Since the dispatch functions is also auto-generated, the input file
needs to contain all operations. To this effect, SPIRVGLSLOps.td is
included into SPIRVOps.td. This makes the previously added
SPIRVGLSLOps.h and SPIRVGLSLOps.cpp unnecessary, and are deleted.
The SPIRVUtilsGen.cpp is also changed to make better use of
formatv,making the code more readable.
PiperOrigin-RevId: 269456263
When performing A->B->C conversion, an operation may still refer to an operand of A. This makes it necessary to unmap through multiple levels of replacement for a specific value.
PiperOrigin-RevId: 269367859
Certain enum classes in SPIR-V, like function/loop control and memory
access, are bitmasks. This CL introduces a BitEnumAttr to properly
model this and drive auto-generation of verification code and utility
functions. We still store the attribute using an 32-bit IntegerAttr
for minimal memory footprint and easy (de)serialization. But utility
conversion functions are adjusted to inspect each bit and generate
"|"-concatenated strings for the bits; vice versa.
Each such enum class has a "None" case that means no bit is set. We
need special handling for "None". Because of this, the logic is not
general anymore. So right now the definition is placed in the SPIR-V
dialect. If later this turns out to be useful for other dialects,
then we can see how to properly adjust it and move to OpBase.td.
Added tests for SPV_MemoryAccess to check and demonstrate.
PiperOrigin-RevId: 269350620
Swap the allowed nesting of sum and diff expressions: now a diff expression can
contain a sum expression, but only on the left hand side. A difference of two
expressions sum must be canonicalized by grouping their constant terms in a
single expression. This change of sturcture became possible thanks to the
introduction of the "direct" super-kind. It is necessary to enable support of
sum expressions on the left hand side of the stripe expression.
SDBM expressions are now grouped into the following structure
- expression
- varying
- direct
- sum <- (term, constant)
- term
- symbol
- dimension
- stripe <- (term, constant)
- negation <- (direct)
- difference <- (direct, term)
- constant
The notation <- (...) denotes the types of subexpressions a compound
expression can combine.
PiperOrigin-RevId: 269337222
The helper functions makePositionAttr() and positionAttr() were originally
introduced in the lowering-to-LLVM-dialect pass to construct integer array
attributes that are used for static positions in extract/insertelement.
Constructing an integer array attribute being fairly common, a utility function
Builder::getI64ArrayAttr was later introduced into the Builder API. Drop
makePositionAttr and similar homegrown functions and use that API instead.
PiperOrigin-RevId: 269295836
Add support for specifying extended instructions sets. The operations
in SPIR-V dialect are named as 'spv.<extension-name>.<op-name>'. Use
this mechanism to define a 'Exp' operation from GLSL(450)
instructions.
Later CLs will add support for (de)serialization of these operations,
and update the dialect generation scripts to auto-generate the
specification using the spec directly.
Additional changes:
Add a Type Constraint to OpBase.td to check for vector of specified
lengths. This is used to check that the vector type used in SPIR-V
dialect are of lengths 2, 3 or 4.
Update SPIRVBase.td to use this Type constraints for vectors.
PiperOrigin-RevId: 269234377
OperationPass' are defined exactly the same way as they are now:
class DerivedPass : public OperationPass<DerivedPass>;
OpPass' are now defined as OperationPass, but with an additional template parameter for the operation type:
class DerivedPass : public OperationPass<DerivedPass, FuncOp>;
PiperOrigin-RevId: 269122410
- turn copy/dma generation method into a utility in LoopUtils, allowing
it to be reused elsewhere.
- no functional/logic change to the pass/utility
- trim down header includes in files affected
Signed-off-by: Uday Bondhugula <uday@polymagelabs.com>
Closestensorflow/mlir#124
COPYBARA_INTEGRATE_REVIEW=https://github.com/tensorflow/mlir/pull/124 from bondhugula:datacopy 9f346e62e5bd9dd1986720a30a35f302eb4d3252
PiperOrigin-RevId: 269106088
- take care of symbolic operands with alloc
- add missing check for compose map failure and a test case
- add test cases on strides
- drop incorrect check for one-to-one'ness
Signed-off-by: Uday Bondhugula <uday@polymagelabs.com>
Closestensorflow/mlir#132
COPYBARA_INTEGRATE_REVIEW=https://github.com/tensorflow/mlir/pull/132 from bondhugula:normalize-memrefs 8aebf285fb0d7c19269d85255aed644657e327b7
PiperOrigin-RevId: 269105947
- NFC - on any pass/utility logic/output.
- Resolve TODO; the method building loop trip count maps was
creating and deleting affine.apply ops (transforming IR from under
analysis!, strictly speaking). Introduce AffineValueMap::difference to
do this correctly (without the need to create any IR).
- Move AffineApplyNormalizer out so that its methods are reusable from
AffineStructures.cpp; add a helper method 'normalize' to it. Fix
AffineApplyNormalize::renumberOneDim (Issue tensorflow/mlir#89).
- Trim includes on files touched.
- add test case on a scenario previously not covered
Signed-off-by: Uday Bondhugula <uday@polymagelabs.com>
Closestensorflow/mlir#133
COPYBARA_INTEGRATE_REVIEW=https://github.com/tensorflow/mlir/pull/133 from bondhugula:trip-count-build 7fc34d857f7788f98b641792cafad6f5bd50e47b
PiperOrigin-RevId: 269101118
- add missing canonicalization pattern to fold memref_cast + dim to
dim (needed to propagate constant when folding a dynamic shape to
a static one)
- also fix an outdated/inconsistent comment in StandardOps/Ops.td
Signed-off-by: Uday Bondhugula <uday@polymagelabs.com>
Closestensorflow/mlir#126
COPYBARA_INTEGRATE_REVIEW=https://github.com/tensorflow/mlir/pull/126 from bondhugula:quickfix 4566e75e49685c532faffff91d64c5d83d4da524
PiperOrigin-RevId: 269020058
This allows for users other than those on the command line to apply a textual description of a pipeline to a given pass manager.
PiperOrigin-RevId: 269017028
SPIR-V recently publishes v1.5, which brings a bunch of symbols
into core. So the suffix "KHR"/"EXT"/etc. is removed from the
symbols. We use a script to pull information from the spec
directly.
Also changed conversion and tests to use GLSL450 instead of
VulkanKHR memory model. GLSL450 is still the main memory model
supported by Vulkan shaders and it does not require extra
capability to enable.
PiperOrigin-RevId: 268992661
This allows for explicitly specifying the pipeline to add to the pass manager. This includes the nesting structure, as well as the passes/pipelines to run. A textual pipeline string is defined as a series of names, each of which may in itself recursively contain a nested pipeline description. A name is either the name of a registered pass, or pass pipeline, (e.g. "cse") or the name of an operation type (e.g. "func").
For example, the following pipeline:
$ mlir-opt foo.mlir -cse -canonicalize -lower-to-llvm
Could now be specified as:
$ mlir-opt foo.mlir -pass-pipeline='func(cse, canonicalize), lower-to-llvm'
This will allow for running pipelines on nested operations, like say spirv modules. This does not remove any of the current functionality, and in fact can be used in unison. The new option is available via 'pass-pipeline'.
PiperOrigin-RevId: 268954279
This CL adds support for serializing and deserializing spv.loop ops.
This adds support for spv.Branch and spv.BranchConditional op
(de)serialization, too, because they are needed for spv.loop.
PiperOrigin-RevId: 268536962
This better reflects how this kind of expressions is used and avoids the
potential confusion since the expression can take negative values. Term
expressions comprise dimensions, symbols and stripe expressions. In an SDBM
domain, a stripe expression always corresponds to a variable, input or
temporary. This expression can appear anywhere an input variable can,
including on the LHS of other stripe expressions.
PiperOrigin-RevId: 268486066
If the composite is a constant, we can fold it away. This only
supports vector and array constants for now, given that struct
constant is not supported in spv.constant yet.
PiperOrigin-RevId: 268350340
Since we apply nudging for the zero point to make sure the nudged zerop points
can be in the range of [qmin, qmax], the constraint that rmin / rmax should
stride zero isn't necessary.
This also matches the documentation of tensorflow's FakeQuantWithMinMaxArgs op,
where min and max don't need to stride zero:
https://www.tensorflow.org/api_docs/python/tf/quantization/fake_quant_with_min_max_args
PiperOrigin-RevId: 268296285
* Add GraphTraits that treat a block as a graph, Operation* as node and use-relationship for edges;
- Just basic graph output;
* Add use iterator to iterate over all uses of an Operation;
* Add testing pass to generate op graph;
This does not support arbitrary operations other than function nor nested regions yet.
PiperOrigin-RevId: 268121782
For per channel fake quant attributes, the returned type should be
UniformQuantizedPerAxisType. Currently, this method isn't under test because we
haven't added the quant_ConstFakeQuantPerAxis op and the convert method.
PiperOrigin-RevId: 268084017
Some compilers will try to auto-generate the destructor, instead of using the user provided destructor, when creating a default move constructor.
PiperOrigin-RevId: 268067367
This allows for parallelizing across pipelines of multiple operation types. AdaptorPasses can now hold pass managers for multiple operation types and will dispatch based upon the operation being operated on.
PiperOrigin-RevId: 268017344
This method parses an operation in its generic form, from the current parser
state. This is the symmetric of OpAsmPrinter::printGenericOp(). An immediate
use case is illustrated in the test dialect, where an operation wraps another
one in its region and makes use of a single-line pretty-print form.
PiperOrigin-RevId: 267930869
This is done via a new set of instrumentation hooks runBeforePipeline/runAfterPipeline, that signal the lifetime of a pass pipeline on a specific operation type. These hooks also provide the parent thread of the pipeline, allowing for accurate merging of timers running on different threads.
PiperOrigin-RevId: 267909193
This will allow clients to implement a different collection strategy on these
values, including collecting each uses within the region for example.
PiperOrigin-RevId: 267803978
- the JIT codegen was being run at the default -O0 level; instead,
propagate the opt level from the cmd line.
Signed-off-by: Uday Bondhugula <uday@polymagelabs.com>
Closestensorflow/mlir#123
COPYBARA_INTEGRATE_REVIEW=https://github.com/tensorflow/mlir/pull/123 from bondhugula:jit-runner 3b055e47f94c9a48bf487f6400787478738cda02
PiperOrigin-RevId: 267778586
The current restrictions on dim/symbols require a top-level symbol for the conservative case of a non-affine region. This should be relaxed in the future.
PiperOrigin-RevId: 267641838
View descriptors are converted to *pointer to* LLVM struct to avoid ABI issues related to C struct packing. This creates unnecessary complexity and hampers unification with memrefs.
Instead, this CL makes view descriptors convert to LLVM struct (as it was originally) and promotes all structs to pointers right before calling an external function.
PiperOrigin-RevId: 267602693
- turn canonicalizeMapAndOperands into a template that works on both
sets and maps, and use it to introduce a utility to canonicalize an
affine integer set and its operands
- add pattern to canonicalize affine if op's.
- rename IntegerSet::getNumOperands -> IntegerSet::getNumInputs to be
consistent with AffineMap
- add missing accessors for IntegerSet
Doesn't need extensive testing since canonicalizeSetAndOperands just
reuses canonicalizeMapAndOperands' logic, and the latter is tested on
affine.apply map + operands; the new method works the same way on an
integer set + operands of an affine if op for example.
Signed-off-by: Uday Bondhugula <uday@polymagelabs.com>
Closestensorflow/mlir#112
COPYBARA_INTEGRATE_REVIEW=https://github.com/tensorflow/mlir/pull/112 from bondhugula:set-canonicalize eff72f23250b96fa7d9f5caff3877440f5de2cec
PiperOrigin-RevId: 267532876
This commit defines an initial implementation of the DialectInlinerInterface for the AffineOps dialect. This change allows for affine operations to be inlined into any region that is not an affine region. Inlining into affine regions requires special handling for dimension/symbol identifiers that will be added in followups.
PiperOrigin-RevId: 267467078
SPIR-V can explicitly declare structured control-flow constructs using merge
instructions. These explicitly declare a header block before the control
flow diverges and a merge block where control flow subsequently converges.
These blocks delimit constructs that must nest, and can only be entered
and exited in structured ways.
Instead of having a `spv.LoopMerge` op to directly model loop merge
instruction for indicating the merge and continue target, we use regions
to delimit the boundary of the loop: the merge target is the next op
following the `spv.loop` op and the continue target is the block that
has a back-edge pointing to the entry block inside the `spv.loop`'s region.
This way it's easier to discover all blocks belonging to a construct and
it plays nicer with the MLIR system.
Updated the SPIR-V.md doc.
PiperOrigin-RevId: 267431010
This defines a set of initial utilities for inlining a region(or a FuncOp), and defines a simple inliner pass for testing purposes.
A new dialect interface is defined, DialectInlinerInterface, that allows for dialects to override hooks controlling inlining legality. The interface currently provides the following hooks, but these are just premilinary and should be changed/added to/modified as necessary:
* isLegalToInline
- Determine if a region can be inlined into one of this dialect, *or* if an operation of this dialect can be inlined into a given region.
* shouldAnalyzeRecursively
- Determine if an operation with regions should be analyzed recursively for legality. This allows for child operations to be closed off from the legality checks for operations like lambdas.
* handleTerminator
- Process a terminator that has been inlined.
This cl adds support for inlining StandardOps, but other dialects will be added in followups as necessary.
PiperOrigin-RevId: 267426759
The refactoring of ExecutionEngine dropped the usage of the irTransform function used to pass -O3 and other options to LLVM. As a consequence, the proper optimizations do not kick in in LLMV-land.
This CL makes use of the transform function and allows producing avx512 instructions, on an internal example, when using:
`mlir-cpu-runner -dump-object-file=1 -object-filename=foo.o` combined with `objdump -D foo.o`.
Assembly produced resembles:
```
2b2e: 62 72 7d 48 18 04 0e vbroadcastss (%rsi,%rcx,1),%zmm8
2b35: 62 71 7c 48 28 ce vmovaps %zmm6,%zmm9
2b3b: 62 72 3d 48 a8 c9 vfmadd213ps %zmm1,%zmm8,%zmm9
2b41: 62 f1 7c 48 28 cf vmovaps %zmm7,%zmm1
2b47: 62 f2 3d 48 a8 c8 vfmadd213ps %zmm0,%zmm8,%zmm1
2b4d: 62 f2 7d 48 18 44 0e vbroadcastss 0x4(%rsi,%rcx,1),%zmm0
2b54: 01
2b55: 62 71 7c 48 28 c6 vmovaps %zmm6,%zmm8
2b5b: 62 72 7d 48 a8 c3 vfmadd213ps %zmm3,%zmm0,%zmm8
2b61: 62 f1 7c 48 28 df vmovaps %zmm7,%zmm3
2b67: 62 f2 7d 48 a8 da vfmadd213ps %zmm2,%zmm0,%zmm3
2b6d: 62 f2 7d 48 18 44 0e vbroadcastss 0x8(%rsi,%rcx,1),%zmm0
2b74: 02
2b75: 62 f2 7d 48 a8 f5 vfmadd213ps %zmm5,%zmm0,%zmm6
2b7b: 62 f2 7d 48 a8 fc vfmadd213ps %zmm4,%zmm0,%zmm7
```
etc.
Fixestensorflow/mlir#120
PiperOrigin-RevId: 267281097
- address remaining comments from PR tensorflow/mlir#87 for better test coverage for
pipeline-data-transfer/replaceAllMemRefUsesWith
- remove dead tag allocs the same way they are removed for the replaced buffers
Signed-off-by: Uday Bondhugula <uday@polymagelabs.com>
Closestensorflow/mlir#106
COPYBARA_INTEGRATE_REVIEW=https://github.com/tensorflow/mlir/pull/106 from bondhugula:followup 9e868666d047e8d43e5f82f43e4093b838c710fa
PiperOrigin-RevId: 267144774
This CL adds support for proper cloning of Linalg ops that have regions (i.e. the generic linalg op). This is used to properly implement tiling and fusion for such ops. Adequate tests are added.
PiperOrigin-RevId: 267027176
- introduce utility to convert memrefs with non-identity layout maps to
ones with identity layout maps: convert the type and rewrite/remap all
its uses
- add this utility to -simplify-affine-structures pass for testing
purposes
Signed-off-by: Uday Bondhugula <uday@polymagelabs.com>
Closestensorflow/mlir#104
COPYBARA_INTEGRATE_REVIEW=https://github.com/tensorflow/mlir/pull/104 from bondhugula:memref-normalize f2c914aa1890e8860326c9e33f9aa160b3d65e6d
PiperOrigin-RevId: 266985317
This will allow us to use MLIR's folding infrastructure to deduplicate
SPIR-V constants.
This CL also changed isValidSPIRVType in SPIRVDialect to a static method.
PiperOrigin-RevId: 266984403
- the [begin, end) range identified for copying could end in between the
block, which makes hoisting invalid in some cases. Change the range
identification to always end with end of block.
- add test case to exercise these (with fast mem capacity set to minimal so
that single element memref buffers are generated at the innermost loop)
- the location of begin/end of the block range for data copying was
being confused with the insert points for copy in and copy out code.
In cases, where we choose to hoist transfers, these are separate.
- when copy loops are single iteration ones, promote their bodies at
the end of the pass.
- change default fast mem space to 1 (setting it to zero made it
generate DMA op's that won't verify in the default case - since the
DMA ops have a check for src/dest memref spaces being different).
Signed-off-by: Uday Bondhugula <uday@polymagelabs.com>
Co-Authored-By: Mehdi Amini <joker.eph@gmail.com>
Closestensorflow/mlir#88
COPYBARA_INTEGRATE_REVIEW=https://github.com/tensorflow/mlir/pull/88 from bondhugula:datacopy 88697267c45e850c3ced87671e16e4a930c02a42
PiperOrigin-RevId: 266980911
Remove unused variables and attributes from BaseViewConversionHelper
on mlir/lib/Dialect/Linalg/Transforms/LowerToLLVMDialect.cpp
Closestensorflow/mlir#116
COPYBARA_INTEGRATE_REVIEW=https://github.com/tensorflow/mlir/pull/116 from alexst07:fix-warnings 5f638e4677492cf71a9cc040eeb6b57427d32e06
PiperOrigin-RevId: 266972082
Some of the operations in the LLVM dialect are required to model the LLVM IR in
MLIR, for example "constant" operations are needed to declare a constant value
since MLIR, unlike LLVM, does not support immediate values as operands. To
avoid confusion with actual LLVM operations, we prefix such axuiliary
operations with "mlir.".
PiperOrigin-RevId: 266942838
This change generalizes the structure of the pass manager to allow arbitrary nesting pass managers for other operations, at any level. The only user visible change to existing code is the fact that a PassManager must now provide an MLIRContext on construction. A new class `OpPassManager` has been added that represents a pass manager on a specific operation type. `PassManager` will remain the top-level entry point into the pipeline, with OpPassManagers being nested underneath. OpPassManagers will still be implicitly nested if the operation type on the pass differs from the pass manager. To explicitly build a pipeline, the 'nest' methods on OpPassManager may be used:
// Pass manager for the top-level module.
PassManager pm(ctx);
// Nest a pipeline operating on FuncOp.
OpPassManager &fpm = pm.nest<FuncOp>();
fpm.addPass(...);
// Nest a pipeline under the FuncOp pipeline that operates on spirv::ModuleOp
OpPassManager &spvModulePM = pm.nest<spirv::ModuleOp>();
// Nest a pipeline on FuncOps inside of the spirv::ModuleOp.
OpPassManager &spvFuncPM = spvModulePM.nest<FuncOp>();
To help accomplish this a new general OperationPass is added that operates on opaque Operations. This pass can be inserted in a pass manager of any type to operate on any operation opaquely. An example of this opaque OperationPass is a VerifierPass, that simply runs the verifier opaquely on the current operation.
/// Pass to verify an operation and signal failure if necessary.
class VerifierPass : public OperationPass<VerifierPass> {
void runOnOperation() override {
Operation *op = getOperation();
if (failed(verify(op)))
signalPassFailure();
markAllAnalysesPreserved();
}
};
PiperOrigin-RevId: 266840344
This interface will allow for providing hooks to interrop with operation folding. The first hook, 'shouldMaterializeInto', will allow for controlling which region to insert materialized constants into. The folder will generally materialize constants into the top-level isolated region, this allows for materializing into a lower level ancestor region if it is more profitable/correct.
PiperOrigin-RevId: 266702972
- the list of passes run by mlir-cpu-runner included -lower-affine and
-lower-to-llvm but was missing -lower-to-cfg (because -lower-affine at
some point used to lower straight to CFG); add -lower-to-cfg in
between. IR with affine ops can now be run by mlir-cpu-runner.
- update -lower-to-cfg to be consistent with other passes (create*Pass methods
were changed to return unique ptrs, but -lower-to-cfg appears to have been
missed).
- mlir-cpu-runner was unable to parse custom form of affine op's - fix
link options
- drop unnecessary run options from test/mlir-cpu-runner/simple.mlir
(none of the test cases had loops)
- -convert-to-llvmir was changed to -lower-to-llvm at some point, but the
create pass method name wasn't updated (this pass converts/lowers to LLVM
dialect as opposed to LLVM IR). Fix this.
(If we prefer "convert", the cmd-line options could be changed to
"-convert-to-llvm/cfg" then.)
Signed-off-by: Uday Bondhugula <uday@polymagelabs.com>
Closestensorflow/mlir#115
PiperOrigin-RevId: 266666909
This pass class generalizes the current functionality between FunctionPass and ModulePass, and allows for operating on any operation type. The pass manager currently only supports OpPasses operating on FuncOp and ModuleOp, but this restriction will be relaxed in follow-up changes. A utility class OpPassBase<OpT> allows for generically referring to operation specific passes: e.g. FunctionPassBase == OpPassBase<FuncOp>.
PiperOrigin-RevId: 266442239
This commit introduces the bits to be able to dump JIT-compile
objects to external files by passing an object cache to OrcJit.
The new functionality is tested in mlir-cpu-runner under the flag
`dump-object-file`.
Closestensorflow/mlir#95
PiperOrigin-RevId: 266439265
Similar to enum, added a generator for structured data. This provide Dictionary that stores a fixed set of values and guarantees the values are valid. It is intended to store a fixed number of values by a given name.
PiperOrigin-RevId: 266437460
This is done by providing a walk callback that returns a WalkResult. This result is either `advance` or `interrupt`. `advance` means that the walk should continue, whereas `interrupt` signals that the walk should stop immediately. An example is shown below:
auto result = op->walk([](Operation *op) {
if (some_invariant)
return WalkResult::interrupt();
return WalkResult::advance();
});
if (result.wasInterrupted())
...;
PiperOrigin-RevId: 266436700
This CL just covers the op definition, its parsing, printing,
and verification. (De)serialization is to be implemented
in a subsequent CL.
PiperOrigin-RevId: 266431077
This change refactors and cleans up the implementation of the operation walk methods. After this refactoring is that the explicit template parameter for the operation type is no longer needed for the explicit op walks. For example:
op->walk<AffineForOp>([](AffineForOp op) { ... });
is now accomplished via:
op->walk([](AffineForOp op) { ... });
PiperOrigin-RevId: 266209552
- extend canonicalizeMapAndOperands to propagate constant operands into
the map's expressions (and thus drop those operands).
- canonicalizeMapAndOperands previously only dropped duplicate and
unused operands; however, operands that were constants were
retained.
This change makes IR maps/expressions generated by various
utilities/passes even simpler; also makes some of the test checks more
accurate and simpler -- for eg., 0' instead of symbol(%{{.*}}).
Signed-off-by: Uday Bondhugula <uday@polymagelabs.com>
Closestensorflow/mlir#107
COPYBARA_INTEGRATE_REVIEW=https://github.com/tensorflow/mlir/pull/107 from bondhugula:canonicalize-maps c889a51486d14fbf7db489f224f881e7e1ff7d72
PiperOrigin-RevId: 266085289
The pass manager is moving towards being able to run on operations at arbitrary nesting. An operation may have both parent and child operations, and the AnalysisManager must be able to handle this generalization. The AnalysisManager class now contains generic 'getCachedParentAnalysis' and 'getChildAnalysis/getCachedChildAnalysis' functions to query analyses on parent/child operations. This removes the hard coded nesting relationship between Module/Function.
PiperOrigin-RevId: 266003636
Tweak to the pretty type parser to recognize that `->` is a special token that
shouldn't be split into two characters. This change allows dialect
types to wrap function types as in `!my.ptr_type<(i32) -> i32>`.
Closestensorflow/mlir#105
COPYBARA_INTEGRATE_REVIEW=https://github.com/tensorflow/mlir/pull/105 from schweitzpgi:parse-arrow 8b2d768053f419daae5a1a864121a44c4319acbe
PiperOrigin-RevId: 265986240
Instead of lowering the program in two steps (Standard->LLVM followed
by GPU->NVVM), leading to invalid IR inbetween, the runner now uses
one pattern based rewrite step to go directly from Standard+GPU to
LLVM+NVVM.
PiperOrigin-RevId: 265861934
Refactor replaceAllMemRefUsesWith to split it into two methods: the new
method does the replacement on a single op, and is used by the existing
one.
- make the methods return LogicalResult instead of bool
- Earlier, when replacement failed (due to non-deferencing uses of the
memref), the set of ops that had already been processed would have
been replaced leaving the IR in an inconsistent state. Now, a
pass is made over all ops to first check for non-deferencing
uses, and then replacement is performed. No test cases were affected
because all clients of this method were first checking for
non-deferencing uses before calling this method (for other reasons).
This isn't true for a use case in another upcoming PR (scalar
replacement); clients can now bail out with consistent IR on failure
of replaceAllMemRefUsesWith. Add test case.
- multiple deferencing uses of the same memref in a single op is
possible (we have no such use cases/scenarios), and this has always
remained unsupported. Add an assertion for this.
- minor fix to another test pipeline-data-transfer case.
Signed-off-by: Uday Bondhugula <uday@polymagelabs.com>
Closestensorflow/mlir#87
PiperOrigin-RevId: 265808183
Each basic block in SPIR-V must start with an OpLabel instruction.
We don't support control flow yet, so this CL just makes sure that
the entry block follows this rule and is valid.
PiperOrigin-RevId: 265718841
To support a conversion of a simple load-compute-store kernel from GPU
dialect to SPIR-V dialect, the conversion of operations like
"gpu.block_dim", "gpu.thread_id" which allow threads to get the launch
conversion is needed. In SPIR-V these are specified as global
variables with builin attributes. This CL adds support to specify
builtin variables in SPIR-V conversion framework. This is used to
convert the relevant operations from GPU dialect to SPIR-V dialect.
Also add support for conversion of load/store operation in Standard
dialect to SPIR-V dialect.
To simplify the conversion add a method to build a spv.AccessChain
operation that automatically determines the return type based on the
base pointer type and the indices provided.
PiperOrigin-RevId: 265718525
Add an extra RewritePattern that does not convert types to rewrite a CopyOp that has non-identity permutations into a sequence of TransposeOp followed by a CopyOp without such permutations.
This RewitePattern is made to fail in the non-permutation case so that the conversion pattern can kick in to lower to LLVM.
This is an instance of A->A->B lowering where A->A is done by a RewritePattern in case_1 and A->B is done by a ConversionPatternRewriter when not(case_1).
PiperOrigin-RevId: 265171380
Add a conversion pattern that transforms a linalg.transpose op into:
1. A function entry `alloca` operation to allocate a ViewDescriptor.
2. A load of the ViewDescriptor from the pointer allocated in 1.
3. Updates to the ViewDescriptor to introduce the data ptr, offset, size
and stride. Size and stride are permutations of the original values.
4. A store of the resulting ViewDescriptor to the alloca'ed pointer.
The linalg.transpose op is replaced by the alloca'ed pointer.
PiperOrigin-RevId: 265169112
A linalg.transpose op is a pure metadata operation that takes a view + permutation map and produces
another view of the same underlying data, with a different reindexing. This is a
pure metadata operation that does not touch the underlying data.
Example:
```
%t = linalg.transpose %v (i, j) -> (j, i) : !linalg.view<?x?xf32>
```
PiperOrigin-RevId: 265139429
This CL extends support for lowering of linalg to external C++ libraries with CopyOp. Currently this can only work when the permutation maps in the copies are identity. Future support for permutations will be added later.
PiperOrigin-RevId: 265093025
This will allow iterating the values of a non-opaque ElementsAttr, with all of the types currently supported by DenseElementsAttr. This should help reduce the amount of specialization on DenseElementsAttr.
PiperOrigin-RevId: 264968151
linalg.subview used to lower to a slice with a bounded range resulting in correct bounded accesses. However linalg.slice could still index out of bounds. This CL moves the bounding to linalg.slice.
LLVM select and cmp ops gain a more idiomatic builder.
PiperOrigin-RevId: 264897125
Split out method into specialized instances + add an early exit. Should be NFC, but simplifies reading the logic slightly IMHO.
PiperOrigin-RevId: 264855529
Operation interfaces generally require a bit of boilerplate code to connect all of the pieces together. This cl introduces mechanisms in the ODS to allow for generating operation interfaces via the 'OpInterface' class.
Providing a definition of the `OpInterface` class will auto-generate the c++
classes for the interface. An `OpInterface` includes a name, for the c++ class,
along with a list of interface methods. There are two types of methods that can be used with an interface, `InterfaceMethod` and `StaticInterfaceMethod`. They are both comprised of the same core components, with the distinction that `StaticInterfaceMethod` models a static method on the derived operation.
An `InterfaceMethod` is comprised of the following components:
* ReturnType
- A string corresponding to the c++ return type of the method.
* MethodName
- A string corresponding to the desired name of the method.
* Arguments
- A dag of strings that correspond to a c++ type and variable name
respectively.
* MethodBody (Optional)
- An optional explicit implementation of the interface method.
def MyInterface : OpInterface<"MyInterface"> {
let methods = [
// A simple non-static method with no inputs.
InterfaceMethod<"unsigned", "foo">,
// A new non-static method accepting an input argument.
InterfaceMethod<"Value *", "bar", (ins "unsigned":$i)>,
// Query a static property of the derived operation.
StaticInterfaceMethod<"unsigned", "fooStatic">,
// Provide the definition of a static interface method.
// Note: `ConcreteOp` corresponds to the derived operation typename.
StaticInterfaceMethod<"Operation *", "create",
(ins "OpBuilder &":$builder, "Location":$loc), [{
return builder.create<ConcreteOp>(loc);
}]>,
// Provide a definition of the non-static method.
// Note: `op` corresponds to the derived operation variable.
InterfaceMethod<"unsigned", "getNumInputsAndOutputs", (ins), [{
return op.getNumInputs() + op.getNumOutputs();
}]>,
];
PiperOrigin-RevId: 264754898
This CL makes use of the standard LLVM LLJIT and removes the need for a custom JIT implementation within MLIR.
To achieve this, one needs to clone (i.e. serde) the produced llvm::Module into a new LLVMContext. This is currently necessary because the llvm::LLVMContext is owned by the LLVMDialect, somewhat deep in the call hierarchy.
In the future we should remove the reliance of serding the llvm::Module by allowing the injection of an LLVMContext from the top-level. Unfortunately this will require deeper API changes and impact multiple places. It is therefore left for future work.
PiperOrigin-RevId: 264737459
Previously Module and Function are builtinn constructs in MLIR.
Due to the structural requirements we must wrap the SPIR-V
module inside a Function inside a Module. Now the requirement
is lifted and we can remove the wrapping function! :)
PiperOrigin-RevId: 264736051
This will allow iterating the values of a non-opaque ElementsAttr, with all of the types currently supported by DenseElementsAttr. This should help reduce the amount of specialization on DenseElementsAttr.
PiperOrigin-RevId: 264637293
This will allow for adding more hooks for controlling parser behavior without bloating Dialect in the common case. This cl also adds iteration support to the DialectInterfaceCollection.
PiperOrigin-RevId: 264627846
This CL extends declarative rewrite rules to support matching and
generating ops with variadic operands/results. For this, the
generated `matchAndRewrite()` method for each pattern now are
changed to
* Use "range" types for the local variables used to store captured
values (`operand_range` for operands, `ArrayRef<Value *>` for
values, *Op for results). This allows us to have a unified way
of handling both single values and value ranges.
* Create local variables for each operand for op creation. If the
operand is variadic, then a `SmallVector<Value*>` will be created
to collect all values for that operand; otherwise a `Value*` will
be created.
* Use a collective result type builder. All result types are
specified via a single parameter to the builder.
We can use one result pattern to replace multiple results of the
matched root op. When that happens, it will require specifying
types for multiple results. Add a new collective-type builder.
PiperOrigin-RevId: 264588559
In SPIR-V binary format, constants are placed at the module level
and referenced by instructions inside functions using their result
<id>s. To model this natively (using SSA values for result <id>s),
it means we need to have implicit capturing functions. We will
lose the ability to have function passes if going down that path.
Instead, this CL changes to materialize constants at their use
sites in deserialization. It's cheap to copy constants in MLIR
given that attributes is uniqued to MLIRContext. By localizing
constants into functions, we can preserve isolated functions.
PiperOrigin-RevId: 264582532
Most dialects are initialized statically, which does not have a guaranteed initialization order. By keeping the dialect list sorted, we can guarantee a deterministic iteration order of dialects.
PiperOrigin-RevId: 264522875
Similar to global variables, specialization constants also live
in the module scope and can be referenced by instructions in
functions in native SPIR-V. A direct modelling would be to allow
functions in the SPIR-V dialect to implicit capture, but it means
we are losing the ability to write passes for Functions. While
in SPIR-V normally we want to process the module as a whole,
it's not common to see multiple functions get used so we'd like
to leave the door open for those cases. Therefore, similar to
global variables, we introduce spv.specConstant to model three
SPIR-V instructions: OpSpecConstantTrue, OpSpecConstantFalse,
and OpSpecConstant. They do not return SSA value results;
instead they have symbols and can only be referenced by the
symbols. To use it in a function, we need to have another op
spv._reference_of to turn the symbol into an SSA value. This
breaks the tie and makes functions still explicit capture.
Previously specialization constants were handled similarly as
normal constants. That is incorrect given that specialization
constant actually acts more like variable (without need to
load and store). E.g., they cannot be de-duplicated like normal
constants.
This CL also refines various documents and comments.
PiperOrigin-RevId: 264455172
tensorflow/mlir#58 fixed and exercised
verification of load/store ops using empty affine maps. Unfortunately,
it didn't exercise the creation of them. This PR addresses that aspect.
It removes the assumption of AffineMap having at least one result and
stores a pointer to MLIRContext as member of AffineMap.
* Add empty map support to affine.store + test
* Move MLIRContext to AffineMapStorage
Closestensorflow/mlir#74
PiperOrigin-RevId: 264416260
This conversion has been using a stack-allocated array of i8 to store the
null-terminated kernel name in order to pass it to the CUDA wrappers expecting
a C string because the LLVM dialect was missing support for globals. Now that
the suport is introduced, use a global instead.
Refactor global string construction from GenerateCubinAccessors into a common
utility function living in the LLVM namespace.
PiperOrigin-RevId: 264382489
JitRunner can use as entry points functions that produce either a single
'!llvm.f32' value or a list of memrefs. Memref support is legacy and was
introduced before MLIR could lower memref allocation and deallocation to
malloc/free calls so as to allocate the memory externally, and is likely to be
dropped in the future since it unconditionally runs affine+standard-to-llvm
lowering on the module instead of accepting the LLVM dialect. CUDA runner
relies on memref-based flow in the runner without actually returning anything.
Introduce a runner flow to use functions that return void as entry points.
PiperOrigin-RevId: 264381686
This CL allows binary operations on n-D vector types to be lowered to LLVMIR by performing an (n-1)-D extractvalue, 1-D vector operation and an (n-1)-D insertvalue.
PiperOrigin-RevId: 264339118
- fix missing check while simplifying an expression with floordiv to a
mod
- fixes issue tensorflow/mlir#82
Signed-off-by: Uday Bondhugula <uday@polymagelabs.com>
Closestensorflow/mlir#84
PiperOrigin-RevId: 264338353
This will allow for naming values the same as existing SSA values for regions attached to operations that are isolated from above. This fits in with how the system already allows separate name scopes for sibling regions. This name shadowing can be enabled in the custom parser of operations by setting the 'enableNameShadowing' flag to true when calling 'parseRegion'.
%arg = constant 10 : i32
foo.op {
%arg = constant 10 : i32
}
PiperOrigin-RevId: 264255999
This CL adds an integer attribute to linalg.buffer_alloc and lowering to LLVM.
The alignment is constrained to be a positive power of 2.
Lowering to LLVM produces the pattern:
```
%[[alloc:.*]] = llvm.call @malloc(%[[s]]) : (!llvm.i64) -> !llvm<"i8*">
%[[cast:.*]] = llvm.bitcast %[[alloc]] : !llvm<"i8*"> to !llvm.i64
%[[rem:.*]] = llvm.urem %[[cast]], %[[c16]] : !llvm.i64
%[[drem:.*]] = llvm.sub %[[c16]], %[[rem]] : !llvm.i64
%[[off:.*]] = llvm.urem %[[drem]], %[[c16]] : !llvm.i64
llvm.getelementptr %{{.*}}[%[[off]]] : (!llvm<"i8*">, !llvm.i64) -> !llvm<"i8*">
```
where `ptr` is aligned on `align` by computing the address
`ptr + (align - ptr % align) % align`.
To allow dealloc op to still be able to free memory, additional information is needed in
the buffer type. The buffer type is thus extended with an extra i8* for the base allocation address.
PiperOrigin-RevId: 264244455
Change the prining/parsing of spv.globalVariable to print the type of
the variable after the ':' to be consistent with MLIR convention.
The spv._address_of should print the variable type after the ':'. It was
mistakenly printing the address of the return value. Add a (missing)
test that should have caught that.
Also move spv.globalVariable and spv._address_of tests to
structure-ops.mlir.
PiperOrigin-RevId: 264204686
This CL adds the spv.ReturnValue op and its tests. Also adds a
InFunctionScope trait to make sure that the op stays inside
a function. To be consistent, ModuleOnly trait is changed to
InModuleScope.
PiperOrigin-RevId: 264193081
The linalg.view type used to be lowered to a struct containing a data pointer, offset, sizes/strides information. This was problematic when passing to external functions due to ABI, struct padding and alignment issues.
The linalg.view type is now lowered to LLVMIR as a *pointer* to a struct containing the data pointer, offset and sizes/strides. This simplifies the interfacing with external library functions and makes it trivial to add new functions without creating a shim that would go from a value type struct to a pointer type.
The consequences are that:
1. lowering explicitly uses llvm.alloca in lieu of llvm.undef and performs the proper llvm.load/llvm.store where relevant.
2. the shim creation function `getLLVMLibraryCallDefinition` disappears.
3. views are passed by pointer, scalars are passed by value. In the future, other structs will be passed by pointer (on a per-need basis).
PiperOrigin-RevId: 264183671
Switch to C++14 standard method as llvm::make_unique has been removed (
https://reviews.llvm.org/D66259). Also mark some targets as c++14 to ease next
integrates.
PiperOrigin-RevId: 263953918
FuncOps in MLIR use explicit capture. So global variables defined in
module scope need to have a symbol name and this should be used to
refer to the variable within the function. This deviates from SPIR-V
spec, which assigns an SSA value to variables at all scopes that can
be used to refer to the variable, which requires SPIR-V functions to
allow implicit capture. To handle this add a new op,
spirv::GlobalVariableOp that can be used to define module scope
variables.
Since instructions need an SSA value, an new spirv::AddressOfOp is
added to convert a symbol reference to an SSA value for use with other
instructions.
This also means the spirv::EntryPointOp instruction needs to change to
allow initializers to be specified using symbol reference instead of
SSA value
The current spirv::VariableOp which returns an SSA value (as defined
by SPIR-V spec) can still be used to define function-scope variables.
PiperOrigin-RevId: 263951109
These methods are currently defined 'inline' in StandardTypes.h, but this may create linker errors if StandardTypes.h isn't included at the use site.
PiperOrigin-RevId: 263850328
Often we want to ensure that block arguments are converted before operations that use them. This refactors the current implementation to be cleaner/less frequent by triggering conversion when a set of blocks are moved/inlined; or when legalization is successful.
PiperOrigin-RevId: 263795005
This CL adds an optional third argument to the vector.outerproduct instruction.
When such a third argument is specified, it is added to the result of the outerproduct and is lowered to FMA intrinsic when the lowering supports it.
In the future, we can add an attribute on the `vector.outerproduct` instruction to modify the operations for which to emit code (e.g. "+/*", "max/+", "min/+", "log/exp" ...).
This CL additionally performs minor cleanups in the vector lowering and adds tests to improve coverage.
This has been independently verified to result in proper fma instructions for haswell as follows.
Input:
```
func @outerproduct_add(%arg0: vector<17xf32>, %arg1: vector<8xf32>, %arg2: vector<17x8xf32>) -> vector<17x8xf32> {
%2 = vector.outerproduct %arg0, %arg1, %arg2 : vector<17xf32>, vector<8xf32>
return %2 : vector<17x8xf32>
}
}
```
Command:
```
mlir-opt vector-to-llvm.mlir -vector-lower-to-llvm-dialect --disable-pass-threading | mlir-opt -lower-to-cfg -lower-to-llvm | mlir-translate --mlir-to-llvmir | opt -O3 | llc -O3 -march=x86-64 -mcpu=haswell -mattr=fma,avx2
```
Output:
```
outerproduct_add: # @outerproduct_add
# %bb.0:
...
vmovaps 112(%rbp), %ymm8
vbroadcastss %xmm0, %ymm0
...
vbroadcastss 64(%rbp), %ymm15
vfmadd213ps 144(%rbp), %ymm8, %ymm0 # ymm0 = (ymm8 * ymm0) + mem
...
vfmadd213ps 400(%rbp), %ymm8, %ymm9 # ymm9 = (ymm8 * ymm9) + mem
...
```
PiperOrigin-RevId: 263743359
Modify the Type converters to have a SPIRVBasicTypeConverter which
only handles conversion from standard types to SPIRV types. Rename
SPIRVEntryFnConverter to SPIRVTypeConverter. This contains the
SPIRVBasicTypeConverter within it.
Remove SPIRVFnLowering class and have separate utility methods to
lower a function as entry function or a non-entry function. The
current setup could end with diamond inheritence that is not very
friendly to use. For example, you could define the following Op
conversion methods that lower from a dialect "Foo" which resuls in
diamond inheritance.
template<typename OpTy>
class FooDialect : public SPIRVOpLowering<OpTy> {...};
class FooFnLowering : public FooDialect, SPIRVFnLowering {...};
PiperOrigin-RevId: 263597101
Generate the EnumAttr to represent BuiltIns in SPIR-V dialect. The
builtIn can be specified as a StringAttr with value being the
name of the builtin. Extend Decoration (de)serialization to handle
BuiltIns.
Also fix an error in the SPIR-V dialect generator script.
PiperOrigin-RevId: 263596624
LLVM r368707 updated the APIs in llvm::orc::DynamicLibrarySearchGenerator to
use unique_ptr for holding the instance of the generator. Update our uses of
DynamicLibrarySearchGenerator in the ExecutionEngine to reflect that.
PiperOrigin-RevId: 263539855
Dialect interfaces are virtual apis registered to a specific dialect instance. Dialect interfaces are generally useful for transformation passes, or analyses, that want to opaquely operate on operations within a given dialect. These interfaces generally involve wide coverage over the entire dialect.
A dialect interface can be defined by inheriting from the CRTP base class DialectInterfaceBase::Base. This class provides the necessary utilities for registering an interface with the dialect so that it can be looked up later. Dialects overriding an interface may register an instance via 'Dialect::addInterfaces'. This API works very similarly to the respective addOperations/addTypes/etc. This will allow for a transformation/utility to later query the interface from an opaque dialect instance via 'getInterface<T>'.
A utility class 'DialectInterfaceCollection' is also provided that will collect all of the dialects that implement a specific interface within a given module. This allows for simplifying the API of interface lookups.
PiperOrigin-RevId: 263489015
All 'getValue' variants now require that the index is valid, queryable via 'isValidIndex'. 'getSplatValue' now requires that the attribute is a proper splat. This allows for querying these methods on DenseElementAttr with all possible value types; e.g. float, int, APInt, etc. This also allows for removing unnecessary conversions to Attribute that really want the underlying value.
PiperOrigin-RevId: 263437337
This CL moves the linalg.load/range/store ops to ODS.
Minor cleanups are performed.
Additional invalid IR tests are added for coverage.
PiperOrigin-RevId: 263432110
This CL fuses the emission of size and stride information and makes it clearer which indexings are stepped over when querying the positions. This refactor was motivated by an index calculation bug in the stride computation.
PiperOrigin-RevId: 263341610
This CL fixes the stepping through operands when emitting the view sizes of linalg.slice to LLVMIR. This is now consistent with the strides emission.
A relevant test is added.
Fix suggested by Alex Zinenko, thanks!
PiperOrigin-RevId: 263150922
The GenerateCubinAccessors was generating functions that fill
dynamically-allocated memory with the binary constant of a CUBIN attached as a
stirng attribute to the GPU kernel. This approach was taken to circumvent the
missing support for global constants in the LLVM dialect (and MLIR in general).
Global constants were recently added to the LLVM dialect. Change the
GenerateCubinAccessors pass to emit a global constant array of characters and a
function that returns a pointer to the first character in the array.
PiperOrigin-RevId: 263092052
Since raw pointers are always passed around for IR construct without
implying any ownership transfer, it can be error prone to have implicit
ownership transferred the same way.
For example this code can seem harmless:
Pass *pass = ....
pm.addPass(pass);
pm.addPass(pass);
pm.run(module);
PiperOrigin-RevId: 263053082
Prefer to enumerate all cases in the switch instead of using default to allow
compiler to flag missing cases. This also avoids -Wcovered-switch-default
warning.
PiperOrigin-RevId: 262935972
This can result in index expression overflow in "Loc.getPointer() - ColumnNo"
in SourgeMgr.
loc could also be prefixed to the message additionally in this case.
PiperOrigin-RevId: 262935408
This instruction is a local counterpart of llvm.global that takes a symbol
reference to a global and produces an SSA value containing the pointer to it.
Used in combination, these two operations allow one to use globals with other
operations expecting SSA values. At a cost of IR indirection, we make sure the
functions don't implicitly capture the surrounding SSA values and remain
suitable for parallel processing.
PiperOrigin-RevId: 262908622
This CL is step 3/n towards building a simple, programmable and portable vector abstraction in MLIR that can go all the way down to generating assembly vector code via LLVM's opt and llc tools.
This CL adds support for converting MLIR n-D vector types to (n-1)-D arrays of 1-D LLVM vectors and a conversion VectorToLLVM that lowers the `vector.extractelement` and `vector.outerproduct` instructions to the proper mix of `llvm.vectorshuffle`, `llvm.extractelement` and `llvm.mulf`.
This has been independently verified to produce proper avx2 code.
Input:
```
func @vec_1d(%arg0: vector<4xf32>, %arg1: vector<8xf32>) -> vector<8xf32> {
%2 = vector.outerproduct %arg0, %arg1 : vector<4xf32>, vector<8xf32>
%3 = vector.extractelement %2[0 : i32]: vector<4x8xf32>
return %3 : vector<8xf32>
}
```
Command:
```
mlir-opt vector-to-llvm.mlir -vector-lower-to-llvm-dialect --disable-pass-threading | mlir-opt -lower-to-cfg -lower-to-llvm | mlir-translate --mlir-to-llvmir | opt -O3 | llc -O3 -march=x86-64 -mcpu=haswell -mattr=fma,avx2
```
Output:
```
vec_1d: # @vec_1d
# %bb.0:
vbroadcastss %xmm0, %ymm0
vmulps %ymm1, %ymm0, %ymm0
retq
```
PiperOrigin-RevId: 262895929
The current implementation only returns one element for the splat case, which often comes as a surprise; leading to subtle/confusing bugs. The new behavior will include an iterate over the full range of elements, as defined by the shaped type, by providing the splat value for each iterator index.
PiperOrigin-RevId: 262756780
There are currently several different terms used to refer to a parent IR unit in 'get' methods: getParent/getEnclosing/getContaining. This cl standardizes all of these methods to use 'getParent*'.
PiperOrigin-RevId: 262680287
In declarative rewrite rules, a symbol can be bound to op arguments or
results in the source pattern, and it can be bound to op results in the
result pattern. This means given a symbol in the pattern, it can stands
for different things: op operand, op attribute, single op result,
op result pack. We need a better way to model this complexity so that
we can handle according to the specific kind a symbol corresponds to.
Created SymbolInfo class for maintaining the information regarding a
symbol. Also created a companion SymbolInfoMap class for a map of
such symbols, providing insertion and querying depending on use cases.
PiperOrigin-RevId: 262675515
This will allow for reusing the same pattern list, which may be costly to continually reconstruct, on multiple invocations.
PiperOrigin-RevId: 262664599
The translation code predates the introduction of LogicalResult and was relying
on the obsolete LLVM convention of returning false on success. Change it to
use MLIR's LogicalResult abstraction instead. NFC.
PiperOrigin-RevId: 262589432
Unlike regular constant values, strings must be placed in some memory and
referred to through a pointer to that memory. Until now, they were not
supported in function-local constant declarations with `llvm.constant`.
Introduce support for global strings using `llvm.global`, which would translate
them into global arrays in LLVM IR and thus make sure they have some memory
allocated for storage.
PiperOrigin-RevId: 262569316
This CL introduces the ability to generate the external library name for Linalg operations.
The problem is that neither mlir or C support overloading and we want a simplified form of name mangling that is still reasonable to read.
This CL creates the name of the external call that Linalg expects from the operation name and the type of its arguments.
The interface library names are updated and use new cases are added for FillOp.
PiperOrigin-RevId: 262556833
This CL adds the ability for linalg.view to act as a bitcast operation.
This will be used when promoting views into faster memory and casting to vector types.
In the process, linalg.view is moved to ODS.
PiperOrigin-RevId: 262556246
This CL is step 2/n towards building a simple, programmable and portable vector abstraction in MLIR that can go all the way down to generating assembly vector code via LLVM's opt and llc tools.
This CL adds the vector.outerproduct operation to the MLIR vector dialect as well as the appropriate roundtrip test. Lowering to LLVM will occur in the following CL.
PiperOrigin-RevId: 262552027
This CL is step 2/n towards building a simple, programmable and portable vector abstraction in MLIR that can go all the way down to generating assembly vector code via LLVM's opt and llc tools.
This CL adds the vector.extractelement operation to the MLIR vector dialect as well as the appropriate roundtrip test. Lowering to LLVM will occur in the following CL.
PiperOrigin-RevId: 262545089
This CL is step 1/n towards building a simple, programmable and portable vector abstraction in MLIR that can go all the way down to generating assembly vector code via LLVM's opt and llc tools.
This CL adds the 3 instructions `llvm.extractelement`, `llvm.insertelement` and `llvm.shufflevector` as documented in the LLVM LangRef "Vector Instructions" section.
The "Experimental Vector Reduction Intrinsics" are left out for now and can be added in the future on a per-need basis.
Appropriate roundtrip and LLVM Target tests are added.
PiperOrigin-RevId: 262542095
Introduce an operation that defines global constants and variables in the LLVM
dialect, to reflect the corresponding LLVM IR capability. This operation is
expected to live in the top-level module and behaves similarly to
llvm.constant. It currently does not model many of the attributes supported by
the LLVM IR for global values (memory space, alignment, thread-local, linkage)
and will be extended as the relevant use cases appear.
PiperOrigin-RevId: 262539445
This adds support for fcmp to the LLVM dialect and adds any necessary lowerings, as well as support for EDSCs.
Closestensorflow/mlir#69
PiperOrigin-RevId: 262475255
This commit improves JitRunner so that it creates a target machine
for the current CPU host which is used to properly initialize LLVM's
TargetTransformInfo for such a target. This will enable optimizations
such as vectorization in LLVM when using JitRunner. Please, note that,
as part of this work, JITTargetMachineBuilder::detectHost() has been
extended to include the host CPU name and sub-target features as part of
the host CPU detection (https://reviews.llvm.org/D65760).
Closestensorflow/mlir#71
PiperOrigin-RevId: 262452525
Building the symbol table upfront from module op allows for O(1)
lookup of the function while verifying duplicate EntryPointOp within
the module.
PiperOrigin-RevId: 262435697
Adding the SymbolTable trait allows looking up the name of the
functions using the symbol table while verifying EntryPointOps instead
of manually tracking the function names.
PiperOrigin-RevId: 262431220
Lexer methods were added progressively as implementation advanced. The rest of
MLIR now tends to sort methods alphabetically for better discoverability in
absence of tooling. Sort the lexer methods as well.
PiperOrigin-RevId: 262406992
This changes the type of the function type-building callback from
(ArrayRef<Type>, ArrayRef<Type>, bool, string &) to (ArrayRef<Type>,
ArrayRef<Type>, VariadicFlag, String &) to make the intended use clear from the
callback signature alone.
Also rearrange type definitions in Parser.cpp to make them more sorted
alphabetically.
PiperOrigin-RevId: 262405851
LLVM function type has first-class support for variadic functions. In the
current lowering pipeline, it is emulated using an attribute on functions of
standard function type. In LLVMFuncOp that has LLVM function type, this can be
modeled directly. Introduce parsing support for variadic arguments to the
function and use it to support variadic function declarations in LLVMFuncOp.
Function definitions are currently not supported as that would require modeling
va_start/va_end LLVM intrinsics in the dialect and we don't yet have a
consistent story for LLVM intrinsics.
PiperOrigin-RevId: 262372651
Now that modules are also operations, nothing prevents one from defining SSA
values in the module. Doing so in an implicit top-level module, i.e. outside
of a `module` operation, was leading to a crash because the implicit module was
not associated with an SSA name scope. Create a name scope before parsing the
top-level module to fix this.
PiperOrigin-RevId: 262366891
This CL introduces canonicalization patterns for linalg.dim.
This allows the dimenions of chains of view, slice and subview operations to simplify.
Down the line, when mixed with cse, this also allows better composition of linalg tiling and fusion by tracking operations that give the same result (not in this CL).
PiperOrigin-RevId: 262365865
The entry block is often used recently after insertion. This removes the need to perform an additional lookup in such cases.
PiperOrigin-RevId: 262265671
These methods will allow replacing the uses of results with an existing operation, with the same number of results, or a range of values. This removes a number of hand-rolled result replacement loops and simplifies replacement for operations with multiple results.
PiperOrigin-RevId: 262206600
Verification complained when using zero-dimensional memrefs in
affine.load, affine.store, std.load and std.store. This PR extends
verification so that those memrefs can be used.
Closestensorflow/mlir#58
COPYBARA_INTEGRATE_REVIEW=https://github.com/tensorflow/mlir/pull/58 from dcaballe:dcaballe/zero-dim 49bcdcd45c52c48beca776431328e5ce551dfa9e
PiperOrigin-RevId: 262164916
This CL extends the Linalg GenericOp with an alternative way of specifying the body of the computation based on a single block region. The "fun" attribute becomes optional.
Either a SymbolRef "fun" attribute or a single block region must be specified to describe the side-effect-free computation. Upon lowering to loops, the new region body is inlined in the innermost loop.
The parser, verifier and pretty printer are extended.
Appropriate roundtrip, negative and lowering to loop tests are added.
PiperOrigin-RevId: 261895568
This CL modifies the LowerLinalgToLoopsPass to use RewritePattern.
This will make it easier to inline Linalg generic functions and regions when emitting to loops in a subsequent CL.
PiperOrigin-RevId: 261894120
Many LLVM transformations benefits from knowing the targets. This enables optimizations,
especially in a JIT context when the target is (generally) well-known.
Closestensorflow/mlir#49
PiperOrigin-RevId: 261840617
This allows for proper forward declaration, as opposed to leaking the internal implementation via a using directive. This also allows for all pattern building to go through 'insert' methods on the OwningRewritePatternList, replacing uses of 'push_back' and 'RewriteListBuilder'.
PiperOrigin-RevId: 261816316
This trait provides the ensureTerminator() utility function and
the checks to make sure a spv.module is indeed terminated with
spv._module_end.
PiperOrigin-RevId: 261664153
Similar to all LLVM dialect operations, llvm.func needs to have the custom
syntax. Use the generic FunctionLike printer and parser to implement it.
PiperOrigin-RevId: 261641755
The includes related to the LLVM dialect are not used in this file and
introduce an implicit dependencies between the two libraries which isn't
reflected in the CMakeLists.txt, causing non-deterministic build failures.
PiperOrigin-RevId: 261576935
LLVM r367686 changed the locking scheme to avoid potential deadlocks and the
related llvm::orc::ThreadSafeModule APIs ExecutionEngine was relying upon,
breaking the MLIR build. Update our use of ThreadSafeModule to unbreak the
build.
PiperOrigin-RevId: 261566571
When inlining the declaration for llvm::DenseSet/DenseMap in the mlir
namespace from a forward declaration, clang does not take the default
for the template parameters if their are declared later.
namespace llvm {
template<typename Foo>
class DenseMap;
}
namespace mlir {
using llvm::DenseMap;
}
namespace llvm {
template<typename Foo = int>
class DenseMap {};
}
namespace mlir {
DenseMap<> map;
}
PiperOrigin-RevId: 261495612
This CL introduces a linalg.generic op to represent generic tensor contraction operations on views.
A linalg.generic operation requires a numbers of attributes that are sufficient to emit the computation in scalar form as well as compute the appropriate subviews to enable tiling and fusion.
These attributes are very similar to the attributes for existing operations such as linalg.matmul etc and existing operations can be implemented with the generic form.
In the future, most existing operations can be implemented using the generic form.
This CL starts by splitting out most of the functionality of the linalg::NInputsAndOutputs trait into a ViewTrait that queries the per-instance properties of the op. This allows using the attribute informations.
This exposes an ordering of verifiers issue where ViewTrait::verify uses attributes but the verifiers for those attributes have not been run. The desired behavior would be for the verifiers of the attributes specified in the builder to execute first but it is not the case atm. As a consequence, to emit proper error messages and avoid crashing, some of the
linalg.generic methods are defensive as such:
```
unsigned getNumInputs() {
// This is redundant with the `n_views` attribute verifier but ordering of verifiers
// may exhibit cases where we crash instead of emitting an error message.
if (!getAttr("n_views") || n_views().getValue().size() != 2)
return 0;
```
In pretty-printed form, the specific attributes required for linalg.generic are factored out in an independent dictionary named "_". When parsing its content is flattened and the "_name" is dropped. This allows using aliasing for reducing boilerplate at each linalg.generic invocation while benefiting from the Tablegen'd verifier form for each named attribute in the dictionary.
For instance, implementing linalg.matmul in terms of linalg.generic resembles:
```
func @mac(%a: f32, %b: f32, %c: f32) -> f32 {
%d = mulf %a, %b: f32
%e = addf %c, %d: f32
return %e: f32
}
#matmul_accesses = [
(m, n, k) -> (m, k),
(m, n, k) -> (k, n),
(m, n, k) -> (m, n)
]
#matmul_trait = {
doc = "C(m, n) += A(m, k) * B(k, n)",
fun = @mac,
indexing_maps = #matmul_accesses,
library_call = "linalg_matmul",
n_views = [2, 1],
n_loop_types = [2, 1, 0]
}
```
And can be used in multiple places as:
```
linalg.generic #matmul_trait %A, %B, %C [other-attributes] :
!linalg.view<?x?xf32>, !linalg.view<?x?xf32>, !linalg.view<?x?xf32>
```
In the future it would be great to have a mechanism to alias / register a new
linalg.op as a pair of linalg.generic, #trait.
Also, note that with one could theoretically only specify the `doc` string and parse all the attributes from it.
PiperOrigin-RevId: 261338740
AffineDataCopyGeneration pass relied on command line flags for internal logic
in several places, which makes it unusable in a library context (i.e. outside a
standalone mlir-opt binary that does the command line parsing). Define
configuration flags in the constructor instead, and set them up to command
line-based defaults to maintain the original behavior.
PiperOrigin-RevId: 261322364
This CL extends the existing spv.constant op to also support
specialization constant by adding an extra unit attribute
on it.
PiperOrigin-RevId: 261194869
verifyUnusedValue is a bit strange given that it is specified in a
result pattern but used to generate match statements. Now we are
able to support multi-result ops better, we can retire it and replace
it with a HasNoUseOf constraint. This reduces the number of mechanisms.
PiperOrigin-RevId: 261166863
We allow to generate more ops than what are needed for replacing
the matched root op. Only the last N static values generated are
used as replacement; the others serve as auxiliary ops/values for
building the replacement.
With the introduction of multi-result op support, an op, if used
as a whole, may be used to replace multiple static values of
the matched root op. We need to consider this when calculating
the result range an generated op is to replace.
For example, we can have the following pattern:
```tblgen
def : Pattern<(ThreeResultOp ...),
[(OneResultOp ...), (OneResultOp ...), (OneResultOp ...)]>;
// Two op to replace all three results
def : Pattern<(ThreeResultOp ...),
[(TwoResultOp ...), (OneResultOp ...)]>;
// One op to replace all three results
def : Pat<(ThreeResultOp ...), (ThreeResultOp ...)>;
def : Pattern<(ThreeResultOp ...),
[(AuxiliaryOp ...), (ThreeResultOp ...)]>;
```
PiperOrigin-RevId: 261017235
Extend the recently introduced support for hexadecimal float literals to tensor
literals, which may also contain special floating point values such as
infinities and NaNs.
Modify TensorLiteralParser to store the list of tokens representing values
until the type is parsed instead of trying to guess the tensor element type
from the token kinds (hexadecimal values can be either integers or floats, and
can be mixed with both). Maintain the error reports as close as possible to
the existing implementation to avoid disturbing the tests. They can be
improved in a separate clean-up if deemed necessary.
PiperOrigin-RevId: 260794716
All non-argument attributes specified for an operation are treated as
decorations on the result value and (de)serialized using OpDecorate
instruction. An error is generated if an attribute is not an argument,
and the name doesn't correspond to a Decoration enum. Name of the
attributes that represent decoerations are to be the snake-case-ified
version of the Decoration name.
Add utility methods to convert to snake-case and camel-case.
PiperOrigin-RevId: 260792638
MLIR does not have support for parsing special floating point values such as
infinities and NaNs. If programmatically constructed, these values are printed
as NaN and (+-)Inf and cannot be parsed back. Add parser support for
hexadecimal literals in float attributes, following LLVM IR. The literal
corresponds to the in-memory representation of the floating point value.
IEEE 754 defines a range of possible values for NaNs, storing the bitwise
representation allows MLIR to properly roundtrip NaNs with different bit values
of significands.
The initial version of this commit was missing support for float literals that
used to be printed in decimal notation as a fallback, but ended up being
printed in hexadecimal format which became the fallback for special values.
The decimal fallback behavior was not exercised by tests. It is currently
reinstated and tested by the newly added test @f32_potential_precision_loss in
parser.mlir.
PiperOrigin-RevId: 260790900
This CL adds an initial implementation for translation of kernel
function in GPU Dialect (used with a gpu.launch_kernel) op to a
spv.Module. The original function is translated into an entry
function.
Most of the heavy lifting is done by adding TypeConversion and other
utility functions/classes that provide most of the functionality to
translate from Standard Dialect to SPIR-V Dialect. These are intended
to be reusable in implementation of different dialect conversion
pipelines.
Note : Some of the files for have been renamed to be consistent with
the norm used by the other Conversion frameworks.
PiperOrigin-RevId: 260759165
We are relying on serializer to construct positive cases to drive
the test for deserializer. This leaves negative cases untested.
This CL adds a basic test fixture for covering the negative
corner cases to enforce a more robust deserializer.
Refactored common SPIR-V building methods out of serializer to
share it with the deserialization test.
PiperOrigin-RevId: 260742733
It's quite common that we want to put further constraints on the matched
multi-result op's specific results. This CL enables referencing symbols
bound to source op with the `__N` syntax.
PiperOrigin-RevId: 260122401
In the backward slice computation, BlockArgument coming from function arguments represent a natural boundary for the traversal and should not trigger llvm_unreachable.
This CL also improves the error message and adds a relevant test.
PiperOrigin-RevId: 260118630
Clipping creates non-affine memory accesses, use std_load and std_store instead of affine_load and affine_store.
In the future we may also want a fill with the neutral element rather than clip, this would make the accesses affine if we wanted more analyses and transformations to happen post lowering to pointwise copies.
PiperOrigin-RevId: 260110503
AccessChainOp creates a pointer into a composite object that can be used with
OpLoad and OpStore.
Closestensorflow/mlir#52
PiperOrigin-RevId: 260035676
Function-like operations are likely to have similar custom syntax, in
particular they all need to print function signature with argument attributes.
Transform function printer and parser so that they can be applied to any
operation with the FunctionLike trait. Move them to the trait itself. To
avoid large member functions in the class template, define a concrete base
class for the trait and implement common functionality in it. This allows
printer and parser to be implemented in a source file without templating.
PiperOrigin-RevId: 260020893
MLIR does not have support for parsing special floating point values such as
infinities and NaNs. If programmatically constructed, these values are printed
as NaN and (+-)Inf and cannot be parsed back. Add parser support for
hexadecimal literals in float attributes, following LLVM IR. The literal
corresponds to the in-memory representation of the floating point value.
IEEE 754 defines a range of possible values for NaNs, storing the bitwise
representation allows MLIR to properly roundtrip NaNs with different bit values
of significands.
PiperOrigin-RevId: 260018802
This mode analyzes which operations are legalizable to the given target if a conversion were to be applied, i.e. no rewrites are ever performed even on success. This mode is useful for device partitioning or other utilities that may want to analyze the effect of conversion to different targets before performing it.
The analysis method currently just fills a provided set with the operations that were found to be legalizable. This can be extended in the future to capture more information as necessary.
PiperOrigin-RevId: 259987105
This CL fixes an oversight with dealing with loops in slicing analysis.
The forward slice computation properly propagates through loops but not the backward slice.
Add relevant unit tests.
PiperOrigin-RevId: 259903396
Per tacit agreement, individual dialects should now live in lib/Dialect/Name
with headers in include/mlir/Dialect/Name and tests in test/Dialect/Name.
PiperOrigin-RevId: 259896851
This CL adds support for SubViewOp in the alias analysis to permit multiple Linalg fusion passes to compose. The debugging messages are also improved for better readability. The readability benefits came in handy when tracking this issue.
A 2-level fusion test is added to capture the new behavior.
PiperOrigin-RevId: 259720246
The function populateStdOpsToSPIRVPatterns appends the conversion
patterns automatically generated from StdOpsToSPIRVConversion.td to a
list of patterns
PiperOrigin-RevId: 259677890
Conversion from integers (window or input size, padding etc) to floating point is required to express many ML kernels, for example average pooling.
PiperOrigin-RevId: 259575284
The loop parallelism detection utility only collects the affine.load and
affine.store operations appearing inside the loop to analyze the access
patterns for the absence of dependences. However, any operation, including
unregistered operations, can appear in a body of an affine loop. If such
operation has side effects, the result of parallelism analysis is incorrect.
Conservatively assume affine loops are not parallel in presence of operations
other than affine.load, affine.store, affine.for, affine.terminator that may
have side effects.
This required to update the loop-fusion unit test that relies on parallelism
analysis and was exercising loop fusion in presence of an unregistered
operation.
PiperOrigin-RevId: 259560935
Originally, MLIR only supported functions of the built-in FunctionType. On the
conversion path to LLVM IR, we were creating MLIR functions that contained LLVM
dialect operations and used LLVM IR types for everything expect top-level
functions (e.g., a second-order function would have a FunctionType that consume
or produces a wrapped LLVM function pointer type). With MLIR functions
becoming operations, it is now possible to introduce non-built-in function
operations. This will let us use conversion patterns for function conversion,
simplify the MLIR-to-LLVM translation by removing the knowledge of the MLIR
built-in function types, and provide stronger correctness verifications (e.g.
LLVM functions only accept LLVM types).
Furthermore, we can currently construct a situation where the same function is
used with two different types: () -> () when its specified and called directly,
and !llvm<"void ()"> when it's passed somewhere on called indirectly. Having a
special function-op that is always of !llvm<"void ()"> type makes the function
model and the llvm dialect type system more consistent.
Introduce LLVMFuncOp to represent a function in the LLVM dialect. Unlike
standard FuncOp, this function has an LLVMType wrapping an LLVM IR function
type. Generalize the common behavior of function-defining operations
(functions live in a symbol table of a module, contain a single region, are
iterable as a list of blocks, and support argument attributes).
This only defines the operation. Custom syntax, conversion and translation
rules will be added in follow-ups.
The operation name mentions LLVM explicitly to avoid confusion with standard
FuncOp, especially in multiple files that use both `mlir` and `mlir::LLVM`
namespaces.
PiperOrigin-RevId: 259550940
- introduce parseRegionArgumentList (similar to parseOperandList) to parse a
list of region arguments with a delimiter
- allows defining custom parse for op's with multiple/variadic number of
region arguments
- use this on the gpu.launch op (although the latter has a fixed number
of region arguments)
- add a test dialect op to test region argument list parsing (with the
no delimiter case)
Signed-off-by: Uday Bondhugula <uday@polymagelabs.com>
Closestensorflow/mlir#40
PiperOrigin-RevId: 259442536
SPIR-V has multiple constant instructions covering different
constant types:
* `OpConstantTrue` and `OpConstantFalse` for boolean constants
* `OpConstant` for scalar constants
* `OpConstantComposite` for composite constants
* `OpConstantNull` for null constants
* ...
We model them all with a single spv.constant op for uniformity
and friendliness to transformations. This does mean that when
doing (de)serialization, we need to poke spv.constant's type
to determine which SPIR-V binary instruction to use.
This CL only covers the case of bool and integer spv.constant.
The rest will follow.
PiperOrigin-RevId: 259311698
* Let them return `LogicalResult` so we can chain them together
with other functions returning `LogicalResult`.
* Added "Into" as the suffix to the function name and made the
`binary` as the first parameter so that it reads more naturally.
PiperOrigin-RevId: 259311636
We already have two levels of controls in SPIRVBase.td: hasOpcode and
autogenSerialization. The former controls whether to add an entry to
the dispatch table, while the latter controls whether to autogenerate
the op's (de)serialization method specialization. This is enough for
our cases. Remove the indirection from processOp to processOpImpl
to simplify the picture.
PiperOrigin-RevId: 259308711
This cl enforces that the conversion of the type signatures for regions, and thus their entry blocks, is handled via ConversionPatterns. A new hook 'applySignatureConversion' is added to the ConversionPatternRewriter to perform the desired conversion on a region. This also means that the handling of rewriting the signature of a FuncOp is moved to a pattern. A default implementation is provided via 'mlir::populateFuncOpTypeConversionPattern'. This removes the hacky implicit 'dynamically legal' status of FuncOp that was present previously, and leaves it up to the user to decide when/how to convert the signature of a function.
PiperOrigin-RevId: 259161999
Since the serialization of EntryPointOp contains the name of the
function as well, the function serialization emits the function name
using OpName instruction, which is used during deserialization to get
the correct function name.
PiperOrigin-RevId: 259158784
This allows for the raw data to be reinterpreted as the derived c++ type, e.g. ArrayRef<uint64_t>. This fixes a ubsan error for misaligned-pointer-use.
PiperOrigin-RevId: 259128031
The TypeUtilities.{cpp,h}, currently living in {lib,include/mlir}/Support, do
not belong to the Support library. Instead, they form a separate utility
library that depends on the IR library. The operations it provides relate to
standard types (tensors, memrefs) as well as to operation manipulation, making
them a better fit for the main IR library.
PiperOrigin-RevId: 259108314
When printing the value attribute in spv.constant, OpAsmPrinter
already attaches a trailing type. So we don't need to duplicate
it again unless it's an array attribute, which does not have
type by default but we use it for spirv::ArrayType.
PiperOrigin-RevId: 258994197
It's a known bug that older GCC is not happy with method specialization in
the enclosing (global) namespace:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=56480
This CL wraps the generated specialization methods in the anonymous namespace
to make sure the specialization is in the same namespace as the class.
PiperOrigin-RevId: 258983181
Version of std::equal used required C++14, switching to for-loop for now. Just a direct change from std::equal to the equivalent using for loop.
PiperOrigin-RevId: 258970366
This CL changes the Op definition of spirv::EntryPointOp and
spirv::ExecutionModeOp to be consistent with the SPIR-V spec.
1) The EntryPointOp doesn't return a value
2) The ExecutionModeOp takes as argument, the SymbolRefAttr to refer
to the function, instead of the result of the EntryPointOp.
Following this, the spirv::EntryPointType is no longer necessary, and
is removed.
PiperOrigin-RevId: 258964027
Several groups of operations in different dialects (e.g. AffineForOp,
AffineIfOp; loop::ForOp, loop::IfOp) share the requirement for their regions to
contain 0 or 1 block, and for blocks to always have a specific terminator type.
Furthermore, this terminator may be omitted from the custom syntax. Generalize
this behavior into OpTrait::SingleBlockImplicitTerminator, parameterized by the
terminator operation type. This trait provides the verifier that checks the
presence of the terminator, and utility functions adding the terminator in case
of absence.
PiperOrigin-RevId: 258957180
This CL introduces a simple loop utility function which rewrites the bounds and step of a loop so as to become mappable on a regular grid of processors whose identifiers are given by SSA values.
A corresponding unit test is added.
For example, using CUDA terminology, and assuming a 2-d grid with processorIds = [blockIdx.x, threadIdx.x] and numProcessors = [gridDim.x, blockDim.x], the loop:
```
loop.for %i = %lb to %ub step %step {
...
}
```
is rewritten into a version resembling the following pseudo-IR:
```
loop.for %i = %lb + threadIdx.x + blockIdx.x * blockDim.x to %ub
step %gridDim.x * blockDim.x {
...
}
```
PiperOrigin-RevId: 258945942
This CL adapts the recently introduced parametric tiling to have an API matching the tiling
of AffineForOp. The transformation using stripmineSink is more general and produces imperfectly nested loops.
Perfect nesting invariants of the tiled version are obtained by selectively applying hoisting of ops to isolate perfectly nested bands. Such hoisting may fail to produce a perfect loop nest in cases where ForOp transitively depend on enclosing induction variables. In such cases, the API provides a LogicalResult return but the SimpleParametricLoopTilingPass does not currently use this result.
A new unit test is added with a triangular loop for which the perfect nesting property does not hold. For this example, the old behavior was to produce IR that did not verify (some use was not dominated by its def).
PiperOrigin-RevId: 258928309
This allows for providing specific handling for dynamically legal operations/dialects without overriding the general 'isDynamicallyLegal' hook. This also means that a derived ConversionTarget class need not always be defined when some operations are dynamically legal.
Example usage:
ConversionTarget target(...);
target.addDynamicallyLegalOp<ReturnOp>([](ReturnOp op) {
return ...
};
target.addDynamicallyLegalDialect<StandardOpsDialect>([](Operation *op) {
return ...
};
PiperOrigin-RevId: 258884753
This CL groups (de)serialization methods logically and improves comments
at various places. It also sorted method implementations to follow the
order of their declarations. There is NFC.
PiperOrigin-RevId: 258843490
This specific PatternRewriter will allow for exposing hooks in the future that are only useful for the conversion framework, e.g. type conversions.
PiperOrigin-RevId: 258818122
Some TensorFlow simulated quantize ops such as QuantizeAndDequantizeV2Op have
attribute for the sign of the quantization, so quant_ConstFakeQuant should be
able to represent it with the new attribute is added.
The method for converting these attributes to an QuantizedType is updated to
handle this new argument.
PiperOrigin-RevId: 258810290
We already parse boolean "true"/"false" as ElementsAttr elements.
This CL makes it round-trippable that we are printing the same way.
PiperOrigin-RevId: 258784962
For ops in SPIR-V dialect that are a direct mirror of SPIR-V
operations, the serialization/deserialization methods can be
automatically generated from the Op specification. To enable this an
'autogenSerialization' field is added to SPV_Ops. When set to
non-zero, this will enable the automatic (de)serialization function
generation
Also adding tests that verify the spv.Load, spv.Store and spv.Variable
ops are serialized and deserialized correctly. To fully support these
tests also add serialization and deserialization of float types and
spv.ptr types
PiperOrigin-RevId: 258684764
The API on TupleType::getFlattenedTypes follows our normal conventions by accepting an output parameter, but requires callers to allocate their own storage and lends itself to use in an imperative style. This makes it difficult to use in tablegen. The current solution is to define a lambda that is immediately called, but it's cleaner to extract that into a helper.
PiperOrigin-RevId: 258672046
This cl standardizes the printing of the type of dialect attributes to work the same as other attribute kinds. The type of dialect attributes will trail the dialect specific portion:
`#` dialect-namespace `<` attr-data `>` `:` type
The attribute parsing hooks on Dialect have been updated to take an optionally null expected type for the attribute. This matches the respective parseAttribute hooks in the OpAsmParser.
PiperOrigin-RevId: 258661298
Currently, Broadcastable trait also rejects instances when the op result has shape other than what can be statically inferred based on the operand shapes even if the result shape is compatible with the inferred broadcasted shape.
For example,
(tensor<3x2xi32>, tensor<*xi32>) -> tensor<4x3x2xi32>
(tensor<2xi32>, tensor<2xi32>) -> tensor<*xi32>
PiperOrigin-RevId: 258647493
This cl begins a large refactoring over how signature types are converted in the DialectConversion infrastructure. The signatures of blocks are now converted on-demand when an operation held by that block is being converted. This allows for handling the case where a region is created as part of a pattern, something that wasn't possible previously.
This cl also generalizes the region signature conversion used by FuncOp to work on any region of any operation. This generalization allows for removing the 'apply*Conversion' functions that were specific to FuncOp/ModuleOp. The implementation currently uses a new hook on TypeConverter, 'convertRegionSignature', but this should ideally be removed in favor of using Patterns. That depends on adding support to the PatternRewriter used by ConversionPattern to allow applying signature conversions to regions, which should be coming in a followup.
PiperOrigin-RevId: 258645733
This explicit tag is useful is several ways:
*) This simplifies how to mark sub sections of a dialect as explicitly unsupported, e.g. my target supports all operations in the foo dialect except for these select few. This is useful for partial lowerings between dialects.
*) Partial conversions will now verify that operations that were explicitly marked as illegal must be converted. This provides some guarantee that the operations that need to be lowered by a specific pass will be.
PiperOrigin-RevId: 258582879
Users generally want several different modes of conversion. This cl refactors DialectConversion to provide two:
* Partial (applyPartialConversion)
- This mode allows for illegal operations to exist in the IR, and does not fail if an operation fails to be legalized.
* Full (applyFullConversion)
- This mode fails if any operation is not properly legalized to the conversion target. This allows for ensuring that the IR after a conversion only contains operations legal for the target.
PiperOrigin-RevId: 258412243
This patch added a new argument to the fakeQuantAttrsToType utility method, so
it can be used to convert min/max to quantized type with different signed
storage types.
PiperOrigin-RevId: 258382538
Unless we explicitly name a template instantiation in .td file, its def
name will be "anonymous_<number>". We typically give base-level Attr
template instantiation a name by writing `def AnAttr : Attr<...>`. But
when `AnAttr` is further wrapped in classes like OptionalAttr, the name
is lost unless explicitly def'ed again. These implicit-named template
instantiation is fairly common when writing op definitions. Those wrapper
classes are just essentially attaching more information to the attribute.
Without a proper way to trace back to the original attribute def name
can cause problems for consumers wanting to handle attributes according
to their types.
Previously we handled OptionalAttr and DefaultValuedAttr specifically,
but Confined was not supported. And they can compose together to have
Confined<OptionalAttr<...>, [...]>. So this CL moves the baseAttr field
to main Attr class (like isOptional) and set it only on the innermost
wrapper class.
PiperOrigin-RevId: 258341646
With the introduction of the Loop dialect, uses of the `linalg.for` operation can now be subsumed 1-to-1 by `loop.for`.
This CL performs the replacement and tests are updated accordingly.
PiperOrigin-RevId: 258322565
These methods don't compose well with the rest of conversion framework, and create artificial breaks in conversion. Replace these methods with two(populateAffineToStdConversionPatterns and populateLoopToStdConversionPatterns respectively) that populate a list of patterns to perform the same behavior.
PiperOrigin-RevId: 258219277
This cl changes the way that operations/blocks to convert are collected/traversed so that parent region operations can be legalized before their bodies. Most RewritePatterns for region operations assume that the entry arguments to each region are yet to be converted. Given that the bodies are currently converted first, this makes it difficult to fit these patterns into the same run as one converting types.
The operations/blocks to convert are now collected before any legalization has run, which simplifies the conversion logic itself, as legalization may insert new operations, move blocks, etc.
PiperOrigin-RevId: 258170158
This CL extends the linalg ops that can be tiled and fused to operations that take either views, scalar or vector operands.
PiperOrigin-RevId: 258159734
ConversionPattern should ideally only be used when the types of the operands are changing, which in this case they aren't. Using OpRewritePattern also lends to much simpler code.
PiperOrigin-RevId: 258158474
The dependecy on standard type is mandated by attribute verifiers on some
operations (e.g., llvm.constant) that use values of those types.
PiperOrigin-RevId: 258153134
Multiple (perfectly) nested loops with independent bounds can be combined into
a single loop and than subdivided into blocks of arbitrary size for load
balancing or more efficient parallelism exploitation. However, MLIR wants to
preserve the multi-dimensional multi-loop structure at higher levels of
abstraction. Introduce a transformation that coalesces nested loops with
independent bounds so that they can be further subdivided by tiling.
PiperOrigin-RevId: 258151016
This CL extends the linalg ops that can be tiled and fused to operations that take either views, scalar or vector operands.
PiperOrigin-RevId: 258149291
Due to the absence of ODS support for enum attributes, the implementation of
the LLVM dialect `icmp` operation was reusing the comparison predicate from the
Standard dialect, creating an avoidable library dependency. With ODS support
and ICmpPredicate attribute recently introduced, the dependency is no longer
justified. Update the Standard to LLVM convresion to also convert the
CmpIPredicate into LLVM::ICmpPredicate and remove the unnecessary includes.
Note that the MLIRLLVMIR library did not explicitly depend on MLIRStandardOps,
requiring dependees of MLIRLLVMIR to also depend on MLIRStandardOps, which
should no longer be the case.
PiperOrigin-RevId: 258148456
Use the recently introduced enum-gen functionality to define the predicate
attribute of the ICmp LLVM dialect operation directly in ODS. This removes the
need for manually-coded string-to-integer conversion functions and contributes
to the overall homogenization of the operation definitions.
PiperOrigin-RevId: 258143923
These ops should not belong to the std dialect.
This CL extracts them in their own dialect and updates the corresponding conversions and tests.
PiperOrigin-RevId: 258123853
When using a RewritePattern and replacing an operation with an existing value, that value may have already been replaced by something else. This cl ensures that only the final value is used when applying rewrites.
PiperOrigin-RevId: 258058488
following SPIRV Instructions serializaiton/deserialization are added
as well
OpFunction
OpFunctionParameter
OpFunctionEnd
OpReturn
PiperOrigin-RevId: 257869806
* Changed SPIR-V types to all use unsigned for member type count and index
* Used the same method name for getting element type and count
* Improved spv.CompositeExtract verification a bit
PiperOrigin-RevId: 257862580
The verifier passes are NO-OP and are only useful to print after in the case of failure. This removes a lot of unnecessary clutter when printing after/before all passes.
PiperOrigin-RevId: 257836310
This field wasn't updated as the insertion point changed, making it potentially dangerous given the multi-level of MLIR(e.g. 'createBlock' would always insert the new block in 'region'). This also allows for building an OpBuilder with just a context.
PiperOrigin-RevId: 257829135
The GreedyPatternRewriteDriver currently does not notify the OperationFolder when constants are removed as part of a pattern match. This materializes in a nasty bug where a different operation may be allocated to the same address. This causes an assertion in the OperationFolder when it gets notified of the new operations removal.
PiperOrigin-RevId: 257817627
This CL splits the lowering of affine to LLVM into 2 parts:
1. affine -> std
2. std -> LLVM
The conversions mostly consists of splitting concerns between the affine and non-affine worlds from existing conversions.
Short-circuiting of affine `if` conditions was never tested or exercised and is removed in the process, it can be reintroduced later if needed.
LoopParametricTiling.cpp is updated to reflect the newly added ForOp::build.
PiperOrigin-RevId: 257794436
PassRegistration with an optional constructor was introduced after the
LoopsToGPUPass, which resorted to deriving one pass from another as a means of
accepting options supplied as command-line arguments. Use PassRegistration with
constructor instead of defining a derived pass for LoopsToGPU. Also rename the
pass to better reflect its current nature.
PiperOrigin-RevId: 257786923
Linalg tiling pass was introduced before PassRegistration with an optional pass
constructor. It resorted to deriving a helper class from the origial pass
class in order to provide a default constructor with values obtained from
command line flags. Use PassRegistration with the optional pass constructor
instead, which avoids declaring an additional class.
PiperOrigin-RevId: 257786876
Remove the Function specific attribute verifier in favor of the general operation verifier. This also generalizes the function argument verifier to allow use for an argument attached to any region of any operation.
PiperOrigin-RevId: 257689962
This allows for the attribute to hold symbolic references to other operations than FuncOp. This also allows for removing the dependence on FuncOp from the base Builder.
PiperOrigin-RevId: 257650017
Operations in a block can use a value defined in a dominating block. When a
block, and therefore all its operations, is deleted, the operations are not
allowed to have any remaining uses. Drop all uses of values in all blocks
before deleting them in FuncOp::eraseBody to avoid deleting an operation before
deleting the users of its results.
PiperOrigin-RevId: 257628002
Affine load and store operations take a variadic number of arguments, most of
which are interpreted as subscripts for the multi-dimensional memref they
access. Add a verifier check that ensures the number of operands is equal to
the number affine remapping inputs if present and to the rank of the acessed
memref otherwise. Although it is impossible to obtain such operations by
parsing the custom syntax, it is possible to construct them using the generic
syntax or programmatically.
PiperOrigin-RevId: 257605902
JSON spec into the SPIRBase.td file. This is done incrementally to
only import those opcodes that are needed, through use of the script
define_opcode.sh added.
PiperOrigin-RevId: 257517343
This changes the top-level module parser to handle the case where the top-level module is defined with the module operation syntax, i.e:
module ... {
}
The printer is also updated to always print the top-level module in this form. This allows for cleanly round-tripping the location and attributes of the top-level module.
PiperOrigin-RevId: 257492069
The ModulePrinter prints the newline now for children of the top-level module. This also fixes the location printing for functions as the location used to be printed on a different line.
PiperOrigin-RevId: 257447633
Standard load and store operations are evolving to be separated from the Affine
constructs. Special affine.load/store have been introduced to uphold the
restrictions of the Affine control flow constructs on their operands.
EDSC-produced loads and stores were originally intended to uphold those
restrictions as well so they should use affine.load/store instead of
std.load/store.
PiperOrigin-RevId: 257443307
There is already a more general 'getParentOfType' method, and 'getModule' is likely to be misused as functions get placed within different regions than ModuleOp.
PiperOrigin-RevId: 257442243
AffineIfOp::build is not tested or exercised anywhere. It also perpetuates a questionable choice of encoding an optional region as an empty region which we would like to change in the future.
PiperOrigin-RevId: 257439832
This was an arbitrary restriction caused by the way that modules were printed. Now that that has been fixed, this restriction can be removed.
PiperOrigin-RevId: 257240329
Change the AsmPrinter to number values breadth-first so that values in adjacent regions can have the same name. This allows for ModuleOp to contain operations that produce results. This also standardizes the special name of region entry arguments to "arg[0-9+]" now that Functions are also operations.
PiperOrigin-RevId: 257225069
Parametric tiling can be used to extract outer loops with fixed number of
iterations. This in turn enables mapping to GPU kernels on a fixed grid
independently of the range of the original loops, which may be unknown
statically, making the kernel adaptable to different sizes. Provide a utility
function that also computes the parametric tile size given the range of the
loop. Exercise the utility function through a simple pass that applies it to
all top-level loop nests. Permutability or parallelism checks must be
performed before calling this utility function in actual passes.
Note that parametric tiling cannot be implemented in a purely affine way,
although it can be encoded using semi-affine maps. The choice to implement it
on standard loops is guided by them being the common representation between
Affine loops, Linalg and GPU kernels.
PiperOrigin-RevId: 257180251
Extend the utility that converts affine loop nests to support other types of
loops by abstracting away common behavior through templates. This also
slightly simplifies the existing Affine to GPU conversion by always passing in
the loop step as an additional kernel argument even though it is a known
constant. If it is used, it will be propagated into the loop body by the
existing canonicalization pattern and can be further constant-folded, otherwise
it will be dropped by canonicalization.
This prepares for the common loop abstraction that will be used for converting
to GPU kernels, which is conceptually close to Linalg loops, while maintaining
the existing conversion operational.
PiperOrigin-RevId: 257172216
Operations must only contain a single region. Once attached, all operations that contain a 'mlir::SymbolTable::getSymbolAttrName()' StringAttr attribute within the child region will be verified to ensure that the names are uniqued. Operations using this trait also gain access to the 'SymbolTable' class, which can be used to manage the symbol table of the operation. This class also provides constant-time lookup of symbols by name, and will automatically rename symbols on insertion.
PiperOrigin-RevId: 257123573
Modules can now contain more than just Functions, this just updates the iteration API to reflect that. The 'begin'/'end' methods have also been updated to iterate over opaque Operations.
PiperOrigin-RevId: 257099084
These methods assume that a function is a valid builtin top-level operation, and removing these methods allows for decoupling FuncOp and IR/. Utility "getParentOfType" methods have been added to Operation/OpState to allow for querying the first parent operation of a given type.
PiperOrigin-RevId: 257018913
This CL adds an "std.if" op to represent an if-then-else construct whose condition is an arbitrary value of type i1.
This is necessary to lower all the existing examples from affine and linalg to std.for + std.if.
This CL introduces the op and adds the relevant positive and negative unit test. Lowering will be done in a separate followup CL.
PiperOrigin-RevId: 256649138
Some operations need to override the default behavior of builders, in
particular region-holding operations such as affine.for or tf.graph want to
inject default terminators into the region upon construction, which default
builders won't do. Provide a flag that disables the generation of default
builders so that the custom builders could use the same function signatures.
This is an intentionally low-level and heavy-weight feature that requires the
entire builder to be implemented, and it should be used sparingly. Injecting
code into the end of a default builder would depend on the naming scheme of the
default builder arguments that is not visible in the ODS. Checking that the
signature of a custom builder conflicts with that of a default builder to
prevent emission would require teaching ODG to differentiate between types and
(optional) argument names in the generated C++ code. If this flag ends up
being used a lot, we should consider adding traits that inject specific code
into the default builder.
PiperOrigin-RevId: 256640069
This tool allows to execute MLIR IR snippets written in the GPU dialect
on a CUDA capable GPU. For this to work, a working CUDA install is required
and the build has to be configured with MLIR_CUDA_RUNNER_ENABLED set to 1.
PiperOrigin-RevId: 256551415
The equality check between the rank of a memref and the input size of the
layout affine map in AllocOp::verify is subsumed by the well-formedness check
of the memref type itself. Drop the redundant check from the verifier since it
is never exercised (the type builder does not allow one to construct a type
that would not pass the verifier check).
PiperOrigin-RevId: 256551247
Extend the LLVM lowering pass to accept callbacks that construct an instance of
(a subclass of) LLVMTypeConverter and populate a list of conversion patterns.
These callbacks will be called when the pass processes a module and their
results will be used to set up the dialect conversion infrastructure. Clients
can now provide additional conversion patterns to avoid the need of
materializing type conversions between LLVM and other types.
PiperOrigin-RevId: 256532415
This is an important step in allowing for the top-level of the IR to be extensible. FuncOp and ModuleOp contain all of the necessary functionality, while using the existing operation infrastructure. As an interim step, many of the usages of Function and Module, including the name, will remain the same. In the future, many of these will be relaxed to allow for many different types of top-level operations to co-exist.
PiperOrigin-RevId: 256427100
In most places, this is just a name change (with the exception of affine.dma_start swapping the operand positions of its tag memref and num_elements operands).
Significant code changes occur here:
*) Vectorization: LoopAnalysis.cpp, Vectorize.cpp
*) Affine Transforms: Transforms/Utils/Utils.cpp
PiperOrigin-RevId: 256395088
This CL refactors tiling to enable tiling of views that are not just specified by a simple permutation. This allows the tiling of convolutions for which a new example is added.
PiperOrigin-RevId: 256346028
This CL is the first step of a refactoring unification of the control flow abstraction used in different dialects. The `std.for` loop accepts unrestricted indices to encode min, max and step and will be used as a common abstraction on the way to lower level dialects.
PiperOrigin-RevId: 256331795
As Functions/Modules becomes operations, these methods will conflict with the 'verify' hook already on derived operation types.
PiperOrigin-RevId: 256246112
This CL uses the generic CopyOp to promote a subview (constructed during tiling) into a new buffer + copy by:
1. Creating a new buffer for the subview.
2. Taking a view into the buffer and copying into it.
3. Adapting the linalg op to operating on the view from point 2.
Tiling is extended with a boolean flag to enable promoting views (all or nothing for now).
More specifically, the current implementation creates a buffer that is always of the full size of the ranges of the subview. This produces a buffer whose size may be bigger
than the actual size of the `subView` at the boundaries and is related to the full/partial tile problem.
In practice, we introduce a `buffer`, a `fullLocalView` and a `partialLocalView` such that:
* `buffer` is always the size of the subview in the full tile case.
* `fullLocalView` is a dense contiguous view into that buffer.
* `partialLocalView` is a dense non-contiguous slice of `fullLocalView`
that corresponds to the size of `subView` and accounting for boundary
effects.
The point of the full tile buffer is that constant static tile sizes are
folded and result in a buffer type with statically known size and alignment
properties.
Padding is introduced on the boundary tiles with a `fill` op followed by a partial `copy` op.
These behaviors will be refined later, on a per-need basis.
PiperOrigin-RevId: 256237319
As with Functions, Module will soon become an operation, which are value-typed. This eases the transition from Module to ModuleOp. A new class, OwningModuleRef is provided to allow for owning a reference to a Module, and will auto-delete the held module on destruction.
PiperOrigin-RevId: 256196193
about the buffer size. This is needed to resolve the operand
correctly. Add that information to view op
serialization/deserialization
Also modify the parsing of buffer type by splitting at 'x' to
side-step issues with StringRef number parsing.
PiperOrigin-RevId: 256188319
This saves us the excessive string conversions and comparisons in
verification and transformation and scopes them only to parsing
and printing, which are meant for I/O so string conversions should
be fine.
In order to do this, changed the custom assembly format of
spv.module regarding addressing model and memory model.
PiperOrigin-RevId: 256149856
StringAttr-backed enum attribute cases changed to allow explicit values,
But this assertion was not deleted.
Fixes https://github.com/tensorflow/mlir/issues/39
PiperOrigin-RevId: 256090793
The GPU Launch operation may take constants as arguments, in particular
affine-to-GPU mapping pass automatically forwards potentially constant lower
bounds of loops into the kernel. Define a canonicalization pattern for
LaunchOp that recreates the constants inside the kernel region instead of
accepting them as kernel arguments. This is currently restricted to standard
constants but may be extended to other constant operations.
Also fix an off-by-one indexing bug in OperandStorage::eraseOperand.
PiperOrigin-RevId: 256035437
Move the data members out of Function and into a new impl storage class 'FunctionStorage'. This allows for Function to become value typed, which will greatly simplify the transition of Function to FuncOp(given that FuncOp is also value typed).
PiperOrigin-RevId: 255983022
Type conversion does not necessarily affect all types, some of them may remain
untouched. The type conversion tool from the dialect conversion framework will
unconditionally insert a temporary cast operation from the type to itself
anyway, and will try to materialize it to a real conversion operation if there
are remaining uses. Simply use the original value instead.
PiperOrigin-RevId: 255975450
In ODS, right now we use StringAttrs to emulate enum attributes. It is
suboptimal if the op actually can and wants to store the enum as a
single integer value; we are paying extra cost on storing and comparing
the attribute value.
This CL introduces a new enum attribute subclass that are backed by
IntegerAttr. The downside with IntegerAttr-backed enum attributes is
that the assembly form now uses integer values, which is less obvious
than the StringAttr-backed ones. However, that can be remedied by
defining custom assembly form with the help of the conversion utility
functions generated via EnumsGen.
Choices are given to the dialect writers to decide which one to use for
their enum attributes.
PiperOrigin-RevId: 255935542
Originally, AffineToGPUPass was created and registered in the source file
mainly for testing purposes. Provide a factory function that constructs
AffineToGPU pass to make it usable in pass pipelines.
PiperOrigin-RevId: 255902831
This functionality is now moved to a new class, ModuleManager. This class allows for inserting functions into a module, and will auto-rename them on insert to ensure a unique name. This now means that users adding new functions to a module must ensure that the function name is unique, as the Module will no longer do it automatically. This also means that Module::getNamedFunction now operates in O(N) instead of the O(c) time it did before. This simplifies the move of Modules to Operations as the ModuleOp will not be able to have this functionality.
PiperOrigin-RevId: 255846088
This CL also fixes a parsing issue in the BufferType, adds LLVM lowering support for handling the static constant buffer size and a roundtrip test.
PiperOrigin-RevId: 255834356
These ops are analogues of the current standard ops dma_start/wait, with the exception that the memref operands are affine expressions of loop IVs and symbols (analogous to affine.load/store).
The addition of these operations will enable changes to affine transformation and analysis passes which operate on memory dereferencing operations.
PiperOrigin-RevId: 255658382
During conversion, if a type conversion has dangling uses a type conversion must persist after conversion has finished to maintain valid IR. In these cases, we now query the TypeConverter to materialize a conversion for us. This allows for the default case of a full conversion to continue working as expected, but also handle the degenerate cases more robustly.
PiperOrigin-RevId: 255637171
constant then it is represented as <size x elementType>. If the size
is not a compile time constant, then it is represented as
<? x elementType>.
PiperOrigin-RevId: 255619400