(1) simplify integer printing logic by always using 64-bit print
(2) add index support (since vector<16xindex> is planned to be added)
(3) adjust naming convention print_x -> printX
Reviewed By: bkramer
Differential Revision: https://reviews.llvm.org/D88436
This generalizes printing beyond just i1,i32,i64 and also accounts
for signed and unsigned interpretation in the output.
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D88290
Allow propagating optional user defined attributes during SCF to GPU conversion. Gives opportunity to use user defined attributes in the further lowering. For example setting subgroup size, or other options for GPU dispatch. This does not break backward compatibility and does not require new attributes, just allow passing optional ones.
Differential Revision: https://reviews.llvm.org/D88203
This pass converts shape.cstr_* ops to eager (side-effecting)
error-handling code. After that conversion is done, the witnesses are
trivially satisfied and are replaced with `shape.const_witness true`.
Differential Revision: https://reviews.llvm.org/D87941
Conversion to LLVM becomes confusing and incorrect if someone tries to lower
STD -> LLVM and only then GPULaunchFuncOp to LLVM separately. Although it is
technically allowed now, it works incorrectly because of the argument
promotion. The correct way to use this conversion pattern is to add to the
STD->LLVM patterns before running the pass.
Differential Revision: https://reviews.llvm.org/D88147
This revision allows representing a reduction at the level of linalg on tensors for named ops. When a structured op has a reduction and returns tensor(s), new conventions are added and documented.
As an illustration, the syntax for a `linalg.matmul` writing into a buffer is:
```
linalg.matmul ins(%a, %b : memref<?x?xf32>, tensor<?x?xf32>)
outs(%c : memref<?x?xf32>)
```
, whereas the syntax for a `linalg.matmul` returning a new tensor is:
```
%d = linalg.matmul ins(%a, %b : tensor<?x?xf32>, memref<?x?xf32>)
init(%c : memref<?x?xf32>)
-> tensor<?x?xf32>
```
Other parts of linalg will be extended accordingly to allow mixed buffer/tensor semantics in the presence of reductions.
ConvOp vectorization supports now only convolutions of static shapes with dimensions
of size either 3(vectorized) or 1(not) as underlying vectors have to be of static
shape as well. In this commit we add support for convolutions of any size as well as
dynamic shapes by leveraging existing matmul infrastructure for tiling of both input
and kernel to sizes accepted by the previous version of ConvOp vectorization.
In the future this pass can be extended to take "tiling mask" as a user input which
will enable vectorization of user specified dimensions.
Differential Revision: https://reviews.llvm.org/D87676
When packing function results into a structure during the standard-to-llvm
dialect conversion, do not assume the conversion was successful and propagate
nullptr as error state.
Fixes PR45184.
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D87605
Added support to the Std dialect cast operations to do casts in vector types when feasible.
Reviewed By: ftynse
Differential Revision: https://reviews.llvm.org/D87410
Type converter may fail and return nullptr on unconvertible types. The function
conversion did not include a check and was attempting to use a nullptr type to
construct an LLVM function, leading to a crash. Add a check and return early.
The rest of the call stack propagates errors properly.
Fixes PR47403.
Reviewed By: mehdi_amini
Differential Revision: https://reviews.llvm.org/D87075
This introduces a builder for the more general case that supports zero
elements (where the element type can't be inferred from the ValueRange,
since it might be empty).
Also, fix up some cases in ShapeToStandard lowering that hit this. It
happens very easily when dealing with shapes of 0-D tensors.
The SameOperandsAndResultElementType is redundant with the new
TypesMatchWith and prevented having zero elements.
Differential Revision: https://reviews.llvm.org/D87492
Rationale:
After some discussion we decided that it is safe to assume 32-bit
indices for all subscripting in the vector dialect (it is unlikely
the dialect will be used; or even work; for such long vectors).
So rather than detecting specific situations that can exploit
32-bit indices with higher parallel SIMD, we just optimize it
by default, and let users that don't want it opt-out.
Reviewed By: nicolasvasilache, bkramer
Differential Revision: https://reviews.llvm.org/D87404
Take advantage of the new `dynamic_tensor_from_elements` operation in `std`.
Instead of stack-allocated memory, we can now lower directly to a single `std`
operation.
Differential Revision: https://reviews.llvm.org/D86935
This replaces the select chain for edge-padding with an scf.if that
performs the memory operation when the index is in bounds and uses the
pad value when it's not. For transfer_write the same mechanism is used,
skipping the store when the index is out of bounds.
The integration test has a bunch of cases of how I believe this should
work.
Differential Revision: https://reviews.llvm.org/D87241
In this commit a new way of convolution ops lowering is introduced.
The conv op vectorization pass lowers linalg convolution ops
into vector contractions. This lowering is possible when conv op
is first tiled by 1 along specific dimensions which transforms
it into dot product between input and kernel subview memory buffers.
This pass converts such conv op into vector contraction and does
all necessary vector transfers that make it work.
Differential Revision: https://reviews.llvm.org/D86619
Vector to SCF conversion still had issues due to the interaction with the natural alignment derived by the LLVM data layout. One traditional workaround is to allocate aligned. However, this does not always work for vector sizes that are non-powers of 2.
This revision implements a more portable mechanism where the intermediate allocation is always a memref of elemental vector type. AllocOp is extended to use the natural LLVM DataLayout alignment for non-scalar types, when the alignment is not specified in the first place.
An integration test is added that exercises the transfer to scf.for + scalar lowering with a 5x5 transposition.
Differential Revision: https://reviews.llvm.org/D87150
When allowed, use 32-bit indices rather than 64-bit indices in the
SIMD computation of masks. This runs up to 2x and 4x faster on
a number of AVX2 and AVX512 microbenchmarks.
Reviewed By: bkramer
Differential Revision: https://reviews.llvm.org/D87116
Added 128 byte alignment to alloc ops created in VectorToSCF pass.
128b alignment was already introduced to this pass but not to all alloc
ops. This commit changes that by adding 128b alignment to the remaining ops.
The point of specifying alignment is to prevent possible memory alignment errors
on weakly tested architectures.
Differential Revision: https://reviews.llvm.org/D86454
Adding a conversion pattern for the parallel Operation. This will
help the conversion of parallel operation with standard dialect to
parallel operation with llvm dialect. The type conversion of the block
arguments in a parallel region are controlled by the pattern for the
parallel Operation. Without this pattern, a parallel Operation with
block arguments cannot be converted from standard to LLVM dialect.
Other OpenMP operations without regions are marked as legal. When
translation of OpenMP operations with regions are added then patterns
for these operations can also be added.
Also uses all the standard to llvm patterns. Patterns of other dialects
can be added later if needed.
Reviewed By: rriddle
Differential Revision: https://reviews.llvm.org/D86273
This patch allows to pass the gpu module name to SPIR-V
module during conversion. This has many benefits as we can lookup
converted to SPIR-V kernel in the symbol table.
In order to avoid symbol conflicts, `"__spv__"` is added to the
gpu module name to form the new one.
Reviewed By: mravishankar
Differential Revision: https://reviews.llvm.org/D86384
This patch introduces a hook to encode descriptor set
and binding number into `spv.globalVariable`'s symbolic name. This
allows to preserve this information, and at the same time legalize
the global variable for the conversion to LLVM dialect.
This is required for `mlir-spirv-cpu-runner` to convert kernel
arguments into LLVM.
Also, a couple of some nits added:
- removed unused comment
- changed to a capital letter in the comment
Reviewed By: mravishankar
Differential Revision: https://reviews.llvm.org/D86515
Binding MemRefs of f16 needs special handling as the type is not supported on
CPU. There was a bug in the type used.
Differential Revision: https://reviews.llvm.org/D86328
Removed the Standard to LLVM conversion patterns that were previously
pulled in for testing purposes. This helps to separate the conversion
to LLVM dialect of the MLIR module with both SPIR-V and Standard
dialects in it (particularly helpful for SPIR-V cpu runner). Also,
tests were changed accordingly.
Reviewed By: mravishankar
Differential Revision: https://reviews.llvm.org/D86285
Add the unsigned complements to the existing FPToSI and SIToFP operations in the
standard dialect, with one-to-one lowerings to the corresponding LLVM operations.
Reviewed By: ftynse
Differential Revision: https://reviews.llvm.org/D85557
If Memref has rank > 1 this pass emits N-1 loops around
TransferRead op and transforms the op itself to 1D read. Since vectors
must have static shape while memrefs don't the pass emits if condition
to prevent out of bounds accesses in case some memref dimension is smaller
than the corresponding dimension of targeted vector. This logic is fine
but authors forgot to apply `permutation_map` on loops upper bounds and
thus if condition compares induction variable to incorrect loop upper bound
(dimension of the memref) in case `permutation_map` is not identity map.
This commit aims to fix that.
There should be an equivalent std.floor op to std.ceil. This includes
matching lowerings for SPIRV, NVVM, ROCDL, and LLVM.
Reviewed By: ftynse
Differential Revision: https://reviews.llvm.org/D85940
This patch adds more op/type conversion support
necessary for `spirv-runner`:
- EntryPoint/ExecutionMode: currently removed since we assume
having only one kernel function in the kernel module.
- StorageBuffer storage class is now supported. We are not
concerned with multithreading so this is fine for now.
- Type conversion enhanced, now regular offsets and strides
for structs and arrays are supported (based on
`VulkanLayoutUtils`).
- Support of `spc.AccessChain` that is modelled with GEP op
in LLVM dialect.
Reviewed By: mravishankar
Differential Revision: https://reviews.llvm.org/D86109
According to the LLVM Language Reference, 'cmpxchg' accepts integer or pointer
types. Several MLIR tests were using it with floats as it appears possible to
programmatically construct and print such an instruction, but it cannot be
parsed back. Use integers instead.
Depends On D85899
Reviewed By: flaub, rriddle
Differential Revision: https://reviews.llvm.org/D85900
Legacy implementation of the LLVM dialect in MLIR contained an instance of
llvm::Module as it was required to parse LLVM IR types. The access to the data
layout of this module was exposed to the users for convenience, but in practice
this layout has always been the default one obtained by parsing an empty layout
description string. Current implementation of the dialect no longer relies on
wrapping LLVM IR types, but it kept an instance of DataLayout for
compatibility. This effectively forces a single data layout to be used across
all modules in a given MLIR context, which is not desirable. Remove DataLayout
from the LLVM dialect and attach it as a module attribute instead. Since MLIR
does not yet have support for data layouts, use the LLVM DataLayout in string
form with verification inside MLIR. Introduce the layout when converting a
module to the LLVM dialect and keep the default "" description for
compatibility.
This approach should be replaced with a proper MLIR-based data layout when it
becomes available, but provides an immediate solution to compiling modules with
different layouts, e.g. for GPUs.
This removes the need for LLVMDialectImpl, which is also removed.
Depends On D85650
Reviewed By: aartbik
Differential Revision: https://reviews.llvm.org/D85652
The convresion of memref cast operaitons from the Standard dialect to the LLVM
dialect has been emitting bitcasts from a struct type to itself. Beyond being
useless, such casts are invalid as bitcast does not operate on aggregate types.
This kept working by accident because LLVM IR bitcast construction API skips
the construction if types are equal before it verifies that the types are
acceptable in a bitcast. Do not emit such bitcasts, the memref cast that only
adds/erases size information is in fact a noop on the current descriptor as it
always contains dynamic values for all sizes.
Reviewed By: pifon2a
Differential Revision: https://reviews.llvm.org/D85899
Inital conversion of `spv._address_of` and `spv.globalVariable`.
In SPIR-V, the global returns a pointer, whereas in LLVM dialect
the global holds an actual value. This difference is handled by
`spv._address_of` and `llvm.mlir.addressof`ops that both return
a pointer. Moreover, only current invocation is in conversion's
scope.
Reviewed By: antiagainst, mravishankar
Differential Revision: https://reviews.llvm.org/D84626
This change adds initial support needed to generate OpenCL compliant SPIRV.
If Kernel capability is declared then memory model becomes OpenCL.
If Addresses capability is declared then addressing model becomes Physical64.
Additionally for Kernel capability interface variable ABI attributes are not
generated as entry point function is expected to have normal arguments.
Differential Revision: https://reviews.llvm.org/D85196
Using a shuffle for the last recursive step in progressive lowering not only
results in much more compact IR, but also more efficient code (since the
backend is no longer confused on subvector aliasing for longer vectors).
E.g. the following
%f = vector.shape_cast %v0: vector<1024xf32> to vector<32x32xf32>
yields much better x86-64 code that runs 3x faster than the original.
Reviewed By: bkramer, nicolasvasilache
Differential Revision: https://reviews.llvm.org/D85482
The intrinsics were already supported and vector.transfer_read/write lowered
direclty into these operations. By providing them as individual ops, however,
clients can used them directly, and it opens up progressively lowering transfer
operations at higher levels (rather than direct lowering to LLVM IR as done now).
Reviewed By: bkramer
Differential Revision: https://reviews.llvm.org/D85357
Per Vulkan's SPIR-V environment spec: "While the OpSRem and OpSMod
instructions are supported by the Vulkan environment, they require
non-negative values and thus do not enable additional functionality
beyond what OpUMod provides."
The `getOffsetForBitwidth` function is used for lowering std.load
and std.store, whose indices are of `index` type and cannot be
negative. So we should be okay to use spv.UMod directly here to
be exact. Also made the comment explicit about the assumption.
Differential Revision: https://reviews.llvm.org/D83714
If Int16 is not available, 16-bit integers inside Workgroup storage
class should be emulated via 32-bit integers. This was previously
broken because the capability querying logic was incorrectly
intercepting all storage classes where it meant to only handle
interface storage classes. Adjusted where we return to fix this.
Differential Revision: https://reviews.llvm.org/D85308
Handle the case where the ViewOp takes in a memref that has
an memory space.
Reviewed By: ftynse, bondhugula, nicolasvasilache
Differential Revision: https://reviews.llvm.org/D85048
This patch introduces a conversion of `spv.loop` to LLVM dialect.
Similarly to `spv.selection`, op's control attributes are not mapped
to LLVM yet and therefore the conversion fails if the loop control is
not `None`. Also, all blocks within the loop should be reachable in
order for conversion to succeed.
Reviewed By: mravishankar
Differential Revision: https://reviews.llvm.org/D84245
Introduces the expand and compress operations to the Vector dialect
(important memory operations for sparse computations), together
with a first reference implementation that lowers to the LLVM IR
dialect to enable running on CPU (and other targets that support
the corresponding LLVM IR intrinsics).
Reviewed By: reidtatge
Differential Revision: https://reviews.llvm.org/D84888
A new first-party modeling for LLVM IR types in the LLVM dialect has been
developed in parallel to the existing modeling based on wrapping LLVM `Type *`
instances. It resolves the long-standing problem of modeling identified
structure types, including recursive structures, and enables future removal of
LLVMContext and related locking mechanisms from LLVMDialect.
This commit only switches the modeling by (a) renaming LLVMTypeNew to LLVMType,
(b) removing the old implementaiton of LLVMType, and (c) updating the tests. It
is intentionally minimal. Separate commits will remove the infrastructure built
for the transition and update API uses where appropriate.
Depends On D85020
Reviewed By: rriddle
Differential Revision: https://reviews.llvm.org/D85021
This is a first patch that sweeps over tests to fix
indentation (tabs to spaces). It also adds label checks and
removes redundant matching of `%{{.*}} = `.
The following tests have been fixed:
- arithmetic-ops-to-llvm
- bitwise-ops-to-llvm
- cast-ops-to-llvm
- comparison-ops-to-llvm
- logical-ops-to-llvm (renamed to match the rest)
Reviewed By: ftynse
Differential Revision: https://reviews.llvm.org/D85181
Now that we can have a memref of index type, we no longer need to materialize shapes in i64 and then index_cast.
Differential Revision: https://reviews.llvm.org/D84938
When lowering to the standard dialect, we currently support only the extent
tensor variant of the shape.rank operation. This change lets the conversion
pattern fail in a well-defined manner.
Differential Revision: https://reviews.llvm.org/D84852
This patch introduces new intrinsics in LLVM dialect:
- `llvm.intr.floor`
- `llvm.intr.maxnum`
- `llvm.intr.minnum`
- `llvm.intr.smax`
- `llvm.intr.smin`
These intrinsics correspond to SPIR-V ops from GLSL
extended instruction set (`spv.GLSL.Floor`, `spv.GLSL.FMax`,
`spv.GLSL.FMin`, `spv.GLSL.SMax` and `spv.GLSL.SMin`
respectively). Also conversion patterns for them were added.
Reviewed By: antiagainst
Differential Revision: https://reviews.llvm.org/D84661
This is a second patch on conversion of GLSL ops to LLVM dialect.
It introduces patterns to convert `spv.InverseSqrt` and `spv.Tanh`.
Reviewed By: antiagainst
Differential Revision: https://reviews.llvm.org/D84633
This is the first patch that adds support for GLSL extended
instruction set ops. These are direct conversions, apart from `spv.Tan`
that is lowered to `sin() / cos()`.
Reviewed By: antiagainst
Differential Revision: https://reviews.llvm.org/D84627
The lowering does not support all types for its source operations. This change
makes the patterns fail in a well-defined manner.
Differential Revision: https://reviews.llvm.org/D84443
Operating on indices and extent tensors directly, the type conversion is no
longer needed for the supported cases.
Differential Revision: https://reviews.llvm.org/D84442
This adds conversions for const_size and to_extent_tensor. Also, cast-like operations are now folded away if the source and target types are the same.
Differential Revision: https://reviews.llvm.org/D84745
Conversion of `spv.BranchConditional` now supports branch weights
that are mapped to weights vector in `llvm.cond_br`.
Reviewed By: antiagainst
Differential Revision: https://reviews.llvm.org/D84657
This patch adds support of Volatile and Nontemporal
memory accesses to `spv.Load` and `spv.Store`. These attributes are
modelled with a `volatile` and `nontemporal` flags.
Reviewed By: antiagainst
Differential Revision: https://reviews.llvm.org/D84739
functions.
This allows using command line flags to lowere from GPU to SPIR-V. The
pass added is only for testing/example purposes. Most uses cases will
need more fine-grained control on setting workgroup sizes for kernel
functions.
Differential Revision: https://reviews.llvm.org/D84619
Do not return error code, instead return created resource handles or void. Error reporting is done by the library function.
Reviewed By: herhut
Differential Revision: https://reviews.llvm.org/D84660
The LowerAffine psas was a FunctionPass only for legacy
reasons. Making this Op-agnostic allows it to be used from command
line when affine expressions are within operations other than
`std.func`.
Differential Revision: https://reviews.llvm.org/D84590
Previous changes generalized some of the operands and results. Complete
a larger group of those to simplify progressive lowering. Also update
some of the declarative asm form due to generalization. Tried to keep it
mostly mechanical.
This patch introduces conversion pattern for `spv.Store` and `spv.Load`.
Only op with `Function` Storage Class is supported at the moment
because `spv.GlobalVariable` has not been introduced yet. If the op
has memory access attribute, then there are the following cases.
If the access is `Aligned`, add alignment to the op builder. Otherwise
the conversion fails as other cases are not supported yet because they
require additional attributes for `llvm.store`/`llvm.load` ops: e.g.
`volatile` and `nontemporal`.
Reviewed By: antiagainst
Differential Revision: https://reviews.llvm.org/D84236
The patch introduces the conversion pattern for function-level
`spv.Variable`. It is modelled as `llvm.alloca` op. If initialized, then
additional store instruction is used. Note that there is no initialization
for arrays and structs since constants of these types are not supported in
LLVM dialect yet. Also, at the moment initialisation is only possible via
`spv.constant` (since `spv.GlobalVariable` conversion is not implemented
yet).
The input code has some scoping is not taken into account and will be
addressed in a different patch.
Reviewed By: ftynse
Differential Revision: https://reviews.llvm.org/D84224
This concerns `from/to_extent_tensor`, `size_to_index`, `index_to_size`, and
`const_size` conversion patterns. The new lowering will work directly on indices
and extent tensors. The shape and size values will allow for error values but
are not yet supported by the dialect conversion.
Differential Revision: https://reviews.llvm.org/D84436
The operation `shape.shape_of` now returns an extent tensor `tensor<?xindex>` in
cases when no error are possible. All consuming operation will eventually accept
both, shapes and extent tensors.
Differential Revision: https://reviews.llvm.org/D84160
The default lowering of `assert` calls `abort` in case the assertion is
violated. The failure message is ignored but should be used by custom lowerings
that can assume more about their environment.
Differential Revision: https://reviews.llvm.org/D83886
This revision adds support for much deeper type conversion integration into the conversion process, and enables auto-generating cast operations when necessary. Type conversions are now largely automatically managed by the conversion infra when using a ConversionPattern with a provided TypeConverter. This removes the need for patterns to do type cast wrapping themselves and moves the burden to the infra. This makes it much easier to perform partial lowerings when type conversions are involved, as any lingering type conversions will be automatically resolved/legalized by the conversion infra.
To support this new integration, a few changes have been made to the type materialization API on TypeConverter. Materialization has been split into three separate categories:
* Argument Materialization: This type of materialization is used when converting the type of block arguments when calling `convertRegionTypes`. This is useful for contextually inserting additional conversion operations when converting a block argument type, such as when converting the types of a function signature.
* Source Materialization: This type of materialization is used to convert a legal type of the converter into a non-legal type, generally a source type. This may be called when uses of a non-legal type persist after the conversion process has finished.
* Target Materialization: This type of materialization is used to convert a non-legal, or source, type into a legal, or target, type. This type of materialization is used when applying a pattern on an operation, but the types of the operands have not yet been converted.
Differential Revision: https://reviews.llvm.org/D82831
Introduces the scatter/gather operations to the Vector dialect
(important memory operations for sparse computations), together
with a first reference implementation that lowers to the LLVM IR
dialect to enable running on CPU (and other targets that support
the corresponding LLVM IR intrinsics).
The operations can be used directly where applicable, or can be used
during progressively lowering to bring other memory operations closer to
hardware ISA support for a gather/scatter. The semantics of the operation
closely correspond to those of the corresponding llvm intrinsics.
Note that the operation allows for a dynamic index vector (which is
important for sparse computations). However, this first reference
lowering implementation "serializes" the address computation when
base + index_vector is converted to a vector of pointers. Exploring
how to use SIMD properly during these step is TBD. More general
memrefs and idiomatic versions of striding are also TBD.
Reviewed By: arpith-jacob
Differential Revision: https://reviews.llvm.org/D84039
This patch introduces conversion pattern for `spv.selection` op.
The conversion can only be applied to selection with all blocks being
reachable. Moreover, selection with control attributes "Flatten" and
"DontFlatten" is not supported.
Since the `PatternRewriter` hook for block merging has not been implemented
for `ConversionPatternRewriter`, merge and continue blocks are kept
separately.
Reviewed By: antiagainst, ftynse
Differential Revision: https://reviews.llvm.org/D83860
This patch introduces conversion for `spv.Branch` and `spv.BranchConditional`
ops. Branch weigths for `spv.BranchConditional` are not supported at the
moment, and conversion in this case fails.
Reviewed By: antiagainst
Differential Revision: https://reviews.llvm.org/D83784
Summary: The logic was conservative but inverted: cases that should remain unmasked became 1-D masked.
Differential Revision: https://reviews.llvm.org/D84051
Lower `shape.shape_eq` to the `scf` (and `std`) dialect. For now, this lowering
is limited to extent tensor operands.
Differential Revision: https://reviews.llvm.org/D82530
To make it clear when shape error values cannot occur the shape operations can
operate on extent tensors. This change updates the lowering for `shape.reduce`
accordingly.
Differential Revision: https://reviews.llvm.org/D83944
Summary: The native alignment may generally not be used when lowering a vector.transfer to the underlying load/store operation. This revision fixes the unmasked load/store alignment to match that of the masked path.
Differential Revision: https://reviews.llvm.org/D83684
Per the Vulkan's SPIR-V environment spec, "for the OpSRem and OpSMod
instructions, if either operand is negative the result is undefined."
So we cannot directly use spv.SRem/spv.SMod if either operand can be
negative. Emulate it via spv.UMod.
Because the emulation uses spv.SNegate, this commit also defines
spv.SNegate.
Differential Revision: https://reviews.llvm.org/D83679
Summary:
These are semantically equivalent, but fmuladd allows decaying the op
into fmul+fadd if there is no fma instruction available. llvm.fma lowers
to scalar calls to libm fmaf, which is a lot slower.
Reviewers: nicolasvasilache, aartbik, ftynse
Subscribers: mehdi_amini, rriddle, jpienaar, shauheen, antiagainst, arpith-jacob, mgester, lucyrfox, liufengdb, stephenneuendorffer, Joonsoo, grosul1, Kayjukh, jurahul, msifontes
Tags: #mlir
Differential Revision: https://reviews.llvm.org/D83666
This patch introduces type conversion for SPIR-V structs. Since
handling offset case requires thorough testing, it was left out
for now. Hence, only structs with no offset are currently
supported. Also, structs containing member decorations cannot
be translated.
Reviewed By: antiagainst
Differential Revision: https://reviews.llvm.org/D83403
This patch adds type conversion for 4 SPIR-V types: array, runtime array, pointer
and struct. This conversion is integrated using a separate function
`populateSPIRVToLLVMTypeConversion()` that adds new type conversions. At the moment,
this is a basic skeleton that allows to perfom conversion from SPIR-V array,
runtime array and pointer types to LLVM typesystem. There is no support of array
strides or storage classes. These will be supported on the case by case basis.
Reviewed By: antiagainst
Differential Revision: https://reviews.llvm.org/D83399
This patch adds conversion patterns for `spv.BitFieldSExtract` and `spv.BitFieldUExtract`.
As in the patch for `spv.BitFieldInsert`, `offset` and `count` have to be broadcasted in
vector case and casted to match the type of the base.
Reviewed By: antiagainst
Differential Revision: https://reviews.llvm.org/D82640
This patch introduces 3 new direct conversions for SPIR-V ops:
- `spv.Select`
- `spv.Undef`
- `spv.FMul` that was skipped in the patch with arithmetic ops
Differential Revision: https://reviews.llvm.org/D83291
scf.if currently lacks folding on true / false conditionals.
Such foldings are a bit more involved than can be addressed immediately.
This revision introduces an eager folding for lowering vector.transfer operations in the presence of unrolling.
Differential revision: https://reviews.llvm.org/D83146
This patch introduces conversion pattern for `spv.constant` with scalar
and vector types. There is a special case when the constant value is a
signed/unsigned integer (vector of integers). Since LLVM dialect does not
have signedness semantics, the types had to be converted to signless ints.
Reviewed By: antiagainst
Differential Revision: https://reviews.llvm.org/D82936
Added conversion pattern for SPIR-V `FunctionCallOp`. Based on
specification, it returns no results or a single result, so
can be mapped directly to LLVM dialect's `llvm.call`.
Reviewed By: antiagainst, ftynse
Differential Revision: https://reviews.llvm.org/D83030
This patch introduces conversion pattern for `spv.BitFiledInsert` op,
as well as some utility functions to facilitate code reading.
Since `spv.BitFiledInsert` may take both vector and integer operands,
this case was specifically handled by broadcasting values (`count`
and `offset` here) to vectors. Moreover, the types had to be converted
to same bitwidth in order to conform with LLVM dialect rules.
This was done with `zext` when extending (Note that `count` and
`offset` are treated as unsigned) and `trunc` in the opposite case.
For the latter one, truncation is safe since the op is defined only when
`count`/`offset`/their sum is less than the bitwidth of the result.
This introduces a natural bound of the value of 64, which can be
expressed as `i8`.
Reviewed By: antiagainst, ftynse
Differential Revision: https://reviews.llvm.org/D82639
This allow lowering to support scf.for and scf.if with results. As right now
spv region operations don't have return value the results are demoted to
Function memory. We create one allocation per result right before the region
and store the yield values in it. Then we can load back the value from
allocation to be able to use the results.
Differential Revision: https://reviews.llvm.org/D82246
Added conversion pattern and tests for `spv.Bitcast` op. This one has
a direct mapping in LLVM dialect so `DirectConversionPattern` was used.
Differential Revision: https://reviews.llvm.org/D82748
This patch introduces new conversion patterns for bit and logical
negation op: `spv.Not` and `spv.LogicalNot`. They are implemented
by applying xor on the operand and mask with all bits set.
Differential Revision: https://reviews.llvm.org/D82637
Summary:
The patch makes the index type lowering of the GPU to NVVM/ROCDL conversion configurable. It introduces a pass option that controls the bitwidth used when lowering index computations and uses the LowerToLLVMOptions structure to control the Standard to LLVM lowering.
This commit fixes a use-after-free bug introduced by the reverted commit d10b1a3. It implements the following changes:
- Added a getDefaultOptions method to the LowerToLLVMOptions struct that returns a reference to statically allocated default options.
- Use the getDefaultOptions method to provide default LowerToLLVMOptions (instead of an initializer list).
- Added comments to clarify the required lifetime of the LowerToLLVMOptions
Reviewed By: ftynse
Differential Revision: https://reviews.llvm.org/D82475
`llvm.mlir.constant` was originally introduced as an LLVM dialect counterpart
to `std.constant`. As such, it was supporting "function pointer" constants
derived from the symbol name. This is different from `std.constant` that allows
for creation of a "function" constant since MLIR, unlike LLVM IR, supports
this. Later, `llvm.mlir.addressof` was introduced as an Op that obtains a
constant pointer to a global in the LLVM dialect. It naturally extends to
functions (in LLVM IR, functions are globals) and should be used for defining
"function pointer" values instead.
Fixes PR46344.
Differential Revision: https://reviews.llvm.org/D82667
When the origin of a shape is an extent tensor the operation `get_extent` can be
lowered directly to `extract_element`.
This choice circumvents the necessity to materialize the shape in memory.
Differential Revision: https://reviews.llvm.org/D82645
When the shape is derived from a tensor argument the shape extent can be derived
directly from that tensor with `std.dim`.
This lowering pattern circumvents the necessity to materialize the shape in
memory.
Differential Revision: https://reviews.llvm.org/D82644
Rationale:
In general, passing "fastmath" from MLIR to LLVM backend is not supported, and even just providing such a feature for experimentation is under debate. However, passing fine-grained fastmath related attributes on individual operations is generally accepted. This CL introduces an option to instruct the vector-to-llvm lowering phase to annotate floating-point reductions with the "reassociate" fastmath attribute, which allows the LLVM backend to use SIMD implementations for such constructs. Oher lowering passes can start using this mechanism right away in cases where reassociation is allowed.
Benefit:
For some microbenchmarks on x86-avx2, speedups over 20 were observed for longer vector (due to cleaner, spill-free and SIMD exploiting code).
Usage:
mlir-opt --convert-vector-to-llvm="reassociate-fp-reductions"
Reviewed By: ftynse, mehdi_amini
Differential Revision: https://reviews.llvm.org/D82624
Implemented conversion for `spv.BitReverse` and `spv.BitCount`. Since ODS
generates builders in a different way for LLVM dialect intrinsics, I
added attributes to build method in `DirectConversionPattern` class. The
tests for these ops are in `bitwise-ops-to-llvm.mlir`.
Differential Revision: https://reviews.llvm.org/D82286
Initially, unranked memref descriptors in the LLVM dialect were designed only
to be passed into functions. An assertion was guarding against returning
unranked memrefs from functions in the standard-to-LLVM conversion. This is
insufficient for functions that wish to return an unranked memref such that the
caller does not know the rank in advance, and hence cannot allocate the
descriptor and pass it in as an argument.
Introduce a calling convention for returning unranked memref descriptors as
follows. An unranked memref descriptor always points to a ranked memref
descriptor stored on stack of the current function. When an unranked memref
descriptor is returned from a function, the ranked memref descriptor it points
to is copied to dynamically allocated memory, the ownership of which is
transferred to the caller. The caller is responsible for deallocating the
dynamically allocated memory and for copying the pointed-to ranked memref
descriptor onto its stack.
Provide default lowerings for std.return, std.call and std.indirect_call that
maintain the conversion defined above.
This convention is additionally exercised by a runtime test to guard against
memory errors.
Differential Revision: https://reviews.llvm.org/D82647
Lower `shape.rank` to standard dialect.
A shape's size is the same as the extent of the first and only dimension of the
`tensor<?xindex>` it is represented by.
Differential Revision: https://reviews.llvm.org/D82080
This patch introduces conversion patterns for `spv.module` and `spv._module_end`.
SPIR-V module is converted into `ModuleOp`. This will play a role of enclosing
scope to LLVM ops. At the moment, SPIR-V module attributes (such as memory model,
etc) are ignored.
Differential Revision: https://reviews.llvm.org/D82468
This patch provides an implementation for `spv.func` conversion. The pattern
is populated in a separate method added to the pass. At the moment, the type
signature conversion only includes the supported types. The conversion pattern
also matches SPIR-V function control attributes to LLVM function attributes.
Those are modelled as `passthrough` attributes in LLVM dialect. The following
mapping are used:
- None: no attributes passed
- Inline: `alwaysinline` seems to be the right equivalent (`inlinehint` is
semantically weaker in my opinion)
- DontInline: `noinline`
- Pure and Const: I think those can be modelled as `readonly` and `readnone`
attributes respectively.
Also, 2 patterns added for return ops conversion (`spv.Return` for void return
and `spv.ReturnValue` for a single value return).
Differential Revision: https://reviews.llvm.org/D81931
This patch extends the AccessChainOp index type handling to be able to deal with
all Integer type indices (i.e., all bit-widths and signedness symantics).
There were two ways of achieving this:
1- Backward compatible: The new way of handling the indices will assume that
an index type is i32 by default if not specified in the assembly format,
this way all the old tests would pass correctly.
2- Enforce the format: This unifies the spv.AccessChain Op format and all the old
tests had to be updated to reflect this change or else they fail.
I picked option-2 to unify the Op format and avoid having optional index-type fields
that can lead to somewhat confusing tests format and multiple representations for
the same Op with undocumented assumption that an index is i32 unless stated.
Nonetheless, reverting to option-1 should be straightforward if preferred or needed.
Differential Revision: https://reviews.llvm.org/D81763
The patch makes the index type lowering of the GPU to NVVM/ROCDL
conversion configurable. It introduces a pass option that controls the
bitwidth used when lowering index computations.
Differential Revision: https://reviews.llvm.org/D80285
Subview operations are not natively supported downstream in the spirv path.
This change allows removing subview when used by vector transfer the same way
we already do it when they are used by LoadOp/StoreOp
Differential Revision: https://reviews.llvm.org/D82106
Use direct vector constants for the 1-D case. This approach
scales much better than generating elaborate insertion operations
that are eventually folded into a constant. We could of course
generalize the 1-D case to higher ranks, but this simplification
already helps in scaling some microbenchmarks that would formerly
crash on the intermediate IR length.
Reviewed By: reidtatge
Differential Revision: https://reviews.llvm.org/D82144
Lower `shape.shape_of` to standard dialect.
This lowering supports statically and dynamically shaped tensors.
Support for unranked tensors will be added as part of the lowering to `scf`.
Differential Revision: https://reviews.llvm.org/D82098
Summary:
The "i1" (viz. bool) type does not have a proper equivalent on the "C"
size. So, to avoid any ABIs issues, we simply use print_i32 on an i32
value of one or zero for true and false. This has the added advantage
that one less function needs to be implemented when porting the runtime
support library.
Reviewers: ftynse, bkramer, nicolasvasilache
Reviewed By: ftynse
Subscribers: mehdi_amini, rriddle, jpienaar, shauheen, antiagainst, nicolasvasilache, arpith-jacob, mgester, lucyrfox, liufengdb, stephenneuendorffer, Joonsoo, grosul1, frgossen, Kayjukh, jurahul, msifontes
Tags: #mlir
Differential Revision: https://reviews.llvm.org/D82048
Added support of simple logical ops: `LogicalAnd`, `LogicalOr`,
`LogicalEqual` and `LogicalNotEqual`. Added a missing conversion
for `UMod` op.
Also, implemented SPIR-V cast ops conversion. There are 4 simple
case where there is a clear equivalent in LLVM (e.g. `ConvertFToS`
is `fptosi`). For `FConvert`, `SConvert` and `UConvert` we
distinguish between truncation and extension based on the bit
width of the operand.
Differential Revision: https://reviews.llvm.org/D81812
Implement the missing lowering from `std.dim` to the LLVM dialect in case of a
dynamic dimension.
Differential Revision: https://reviews.llvm.org/D81834
This patch has shift ops conversion implementation. In SPIR-V dialect,
`Shift` and `Base` may have different bit width. On the contrary,
in LLVM dialect both `Base` and `Shift` have to be of the same bit width.
This leads to the following cases:
- if `Base` has the same bit width as `Shift`, the conversion is
straightforward.
- if `Base` has a greater bit width than `Shift`, shift is sign/zero
extended first. Then the extended value is passed to the shift.
- otherwise the conversion is considered to be illegal.
Differential Revision: https://reviews.llvm.org/D81546
This option avoids to accidentally reuse variable across -LABEL match,
it can be explicitly opted-in by prefixing the variable name with $
Differential Revision: https://reviews.llvm.org/D81531
Implemented `FComparePattern` and `IComparePattern` classes
that provide conversion of SPIR-V comparison ops (such as
`spv.FOrdGreaterThanEqual` and others) to LLVM dialect.
Also added tests in `comparison-ops-to-llvm.mlir`.
Differential Revision: https://reviews.llvm.org/D81487
Following the previous revision `D81100`, this commit implements a templated class
that would provide conversion patterns for “straightforward” SPIR-V ops into
LLVM dialect. Templating allows to abstract away from concrete implementation
for each specific op. Those are mainly binary operations. Currently supported
and tested ops are:
- Arithmetic ops: `IAdd`, `ISub`, `IMul`, `FAdd`, `FSub`, `FMul`, `FDiv`, `FNegate`,
`SDiv`, `SRem` and `UDiv`
- Bitwise ops: `BitwiseAnd`, `BitwiseOr`, `BitwiseXor`
The implementation relies on `SPIRVToLLVMConversion` class that makes use of
`OpConversionPattern`.
Differential Revision: https://reviews.llvm.org/D81305
Allow for dynamic indices in the `dim` operation.
Rather than an attribute, the index is now an operand of type `index`.
This allows to apply the operation to dynamically ranked tensors.
The correct lowering of dynamic indices remains to be implemented.
Differential Revision: https://reviews.llvm.org/D81551
Having the input dumped on failure seems like a better
default: I debugged FileCheck tests for a while without knowing
about this option, which really helps to understand failures.
Remove `-dump-input-on-failure` and the environment variable
FILECHECK_DUMP_INPUT_ON_FAILURE which are now obsolete.
Differential Revision: https://reviews.llvm.org/D81422
Summary:
The NVVM target only provides implementations for tanh etc. on f32 and
f64 operands. To also support f16, we now insert operations to extend to f32
and truncate back to f16 around the intrinsic call.
Differential Revision: https://reviews.llvm.org/D81473
These commits set up the skeleton for SPIR-V to LLVM dialect conversion.
I created SPIR-V to LLVM pass, registered it in Passes.td, InitAllPasses.h.
Added a pattern for `spv.BitwiseAndOp` and tests for it. Integer, float
and vector types are converted through LLVMTypeConverter.
Differential Revision: https://reviews.llvm.org/D81100
The operations `to_extent_tensor` and `from_extent_tensor` become no-ops when
lowered to the standard dialect.
This is possible with a lowering from `shape.shape` to `tensor<?xindex>`.
Differential Revision: https://reviews.llvm.org/D81162
Recently introduced allocation hoisting is quite conservative on the cases when it triggers.
This revision makes it such that the allocations for vector transfer lowerings are hoisted
to the top of the function.
This should be revisited in the context of parallelism and is a temporary workaround.
Differential Revision: https://reviews.llvm.org/D81253
This simplifies a lot of handling of BoolAttr/IntegerAttr. For example, a lot of places currently have to handle both IntegerAttr and BoolAttr. In other places, a decision is made to pick one which can lead to surprising results for users. For example, DenseElementsAttr currently uses BoolAttr for i1 even if the user initialized it with an Array of i1 IntegerAttrs.
Differential Revision: https://reviews.llvm.org/D81047
Add SubgroupId, SubgroupSize and NumSubgroups to GPU dialect ops and add the
lowering of those ops to SPIRV.
Differential Revision: https://reviews.llvm.org/D81042
Add a new pass to lower operations from the `shape` to the `std` dialect.
The conversion applies only to the `size_to_index` and `index_to_size`
operations and affected types.
Other patterns will be added as needed.
Differential Revision: https://reviews.llvm.org/D81091
Keeping in the affine.for to gpu.launch conversions, which should
probably be the affine.parallel to gpu.launch conversion as well.
Differential Revision: https://reviews.llvm.org/D80747
https://reviews.llvm.org/D79246 introduces alignment propagation for vector transfer operations. Unfortunately, the alignment calculation is incorrect and can result in crashes.
This revision fixes the calculation by using the natural alignment of the memref elemental type, instead of the resulting vector type.
If more alignment is desired, it can be done in 2 ways:
1. use a proper vector.type_cast to transform a memref<axbxcxdxf32> into a memref<axbxvector<cxdxf32>> giving a natural alignment of vector<cxdxf32>
2. add an alignment attribute to vector transfer operations and propagate it.
With this change the alignment in the relevant tests goes down from 128 to 4.
Lastly, a few minor cleanups are performed and the custom `isMinorIdentityMap` is deprecated.
Differential Revision: https://reviews.llvm.org/D80734
Make ConvertKernelFuncToCubin pass to be generic:
- Rename to ConvertKernelFuncToBlob.
- Allow specifying triple, target chip, target features.
- Initializing LLVM backend is supplied by a callback function.
- Lowering process from MLIR module to LLVM module is via another callback.
- Change mlir-cuda-runner to adopt the revised pass.
- Add new tests for lowering to ROCm HSA code object (HSACO).
- Tests for CUDA and ROCm are kept in separate directories.
Differential Revision: https://reviews.llvm.org/D80142
This allocation of a workgroup memory is lowered to a
spv.globalVariable. Only static size allocation with element type
being int or float is handled. The lowering does account for the
element type that are not supported in the lowered spv.module based on
the extensions/capabilities and adjusts the number of elements to get
the same byte length.
Differential Revision: https://reviews.llvm.org/D80411
Due to similar APIs between CUDA and ROCm (HIP),
ConvertGpuLaunchFuncToCudaCalls pass could be used on both platforms with some
refactoring.
In this commit:
- Migrate ConvertLaunchFuncToCudaCalls from GPUToCUDA to GPUCommon, and rename.
- Rename runtime wrapper APIs be platform-neutral.
- Let GPU binary annotation attribute be specifiable as a PassOption.
- Naming changes within the implementation and tests.
Subsequent patches would introduce ROCm-specific tests and runtime wrapper
APIs.
Differential Revision: https://reviews.llvm.org/D80167
This reverts commit cdb6f05e2d.
The build is broken with:
You have called ADD_LIBRARY for library obj.MLIRGPUtoCUDATransforms without any source files. This typically indicates a problem with your CMakeLists.txt file
Due to similar APIs between CUDA and ROCm (HIP),
ConvertGpuLaunchFuncToCudaCalls pass could be used on both platforms with some
refactoring.
In this commit:
- Migrate ConvertLaunchFuncToCudaCalls from GPUToCUDA to GPUCommon, and rename.
- Rename runtime wrapper APIs be platform-neutral.
- Let GPU binary annotation attribute be specifiable as a PassOption.
- Naming changes within the implementation and tests.
Subsequent patches would introduce ROCm-specific tests and runtime wrapper
APIs.
Differential Revision: https://reviews.llvm.org/D80167
The subview semantics changes recently to allow for more natural
representation of constant offsets and strides. The legalization of
subview op for lowering to SPIR-V needs to account for this.
Also change the linearization to use the strides from the affine map
of a memref.
Differential Revision: https://reviews.llvm.org/D80270
Summary:
Previously, the only support partial lowering from vector transfers to SCF was
going through loops. This requires a dedicated allocation and extra memory
roundtrips because LLVM aggregates cannot be indexed dynamically (for more
details see the [deep-dive](https://mlir.llvm.org/docs/Dialects/Vector/#deeperdive)).
This revision allows specifying full unrolling which removes this additional roundtrip.
This should be used carefully though because full unrolling will spill, negating the
benefits of removing the interim alloc in the first place.
Proper heuristics are left for a later time.
Differential Revision: https://reviews.llvm.org/D80100
Originally, the SCFToStandard conversion only declared Ops from the Standard
dialect as legal after conversion. This is undesirable as it would fail the
conversion if the SCF ops contained ops from any other dialect. Furthermore,
this would be problematic for progressive lowering of `scf.parallel` to
`scf.for` after `ensureRegionTerminator` is made aware of the pattern rewriting
infrastructure because it creates temporary `scf.yield` operations declared
illegal. Change the legalization target to declare any op other than `scf.for`,
`scf.if` and `scf.parallel` legal.
Differential Revision: https://reviews.llvm.org/D80137
Summary:
Previously, after applying the mask, a negative number would convert to a
positive number because the sign flag was forgotten. This patch adds two more
shift operations to do the sign extension. This assumes that we're using two's
complement.
This patch applies sign extension unconditionally when loading a unspported integer width, and it relies the pattern to do the casting because the signedness semantic is carried by operator itself.
Differential Revision: https://reviews.llvm.org/D79753
Summary:
Vector transfer ops semantic is extended to allow specifying a per-dimension `masked`
attribute. When the attribute is false on a particular dimension, lowering to LLVM emits
unmasked load and store operations.
Differential Revision: https://reviews.llvm.org/D80098
Summary:
This revision makes the use of vector transfer operatons more idiomatic by
allowing to omit and inferring the permutation_map.
Differential Revision: https://reviews.llvm.org/D80092
The following Conversions are affected: LoopToStandard -> SCFToStandard,
LoopsToGPU -> SCFToGPU, VectorToLoops -> VectorToSCF. Full file paths are
affected. Additionally, drop the 'Convert' prefix from filenames living under
lib/Conversion where applicable.
API names and CLI options for pass testing are also renamed when applicable. In
particular, LoopsToGPU contains several passes that apply to different kinds of
loops (`for` or `parallel`), for which the original names are preserved.
Differential Revision: https://reviews.llvm.org/D79940
have abi attributes.
To ensure there is no conflict, use the default ABI only when none of
the arguments have the spv.interface_var_abi attribute. This also
implies that if one of the arguments has a spv.interface_var_abi
attribute, all of them should have it as well.
Differential Revision: https://reviews.llvm.org/D77232
This revision starts decoupling the include the kitchen sink behavior of Linalg to LLVM lowering by inserting a -convert-linalg-to-std pass.
The lowering of linalg ops to function calls was previously lowering to memref descriptors by having both linalg -> std and std -> LLVM patterns in the same rewrite.
When separating this step, a new issue occurred: the layout is automatically type-erased by this process. This revision therefore introduces memref casts to perform these type erasures explicitly. To connect everything end-to-end, the LLVM lowering of MemRefCastOp is relaxed because it is artificially more restricted than the op semantics. The op semantics already guarantee that source and target MemRefTypes are cast-compatible. An invalid lowering test now becomes valid and is removed.
Differential Revision: https://reviews.llvm.org/D79468
This patch adds `affine.vector_load` and `affine.vector_store` ops to
the Affine dialect and lowers them to `vector.transfer_read` and
`vector.transfer_write`, respectively, in the Vector dialect.
Reviewed By: bondhugula, nicolasvasilache
Differential Revision: https://reviews.llvm.org/D79658
All ops of the SCF dialect now use the `scf.` prefix instead of `loop.`. This
is a part of dialect renaming.
Differential Revision: https://reviews.llvm.org/D79844
Summary:
Makes this operation runnable on CPU by generating MLIR instructions
that are eventually folded into an LLVM IR constant for the mask.
Reviewers: nicolasvasilache, ftynse, reidtatge, bkramer, andydavis1
Reviewed By: nicolasvasilache, ftynse, andydavis1
Subscribers: mehdi_amini, rriddle, jpienaar, shauheen, antiagainst, nicolasvasilache, arpith-jacob, mgester, lucyrfox, liufengdb, stephenneuendorffer, Joonsoo, grosul1, frgossen, Kayjukh, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D79815
The main objective of this revision is to change the way static information is represented, propagated and canonicalized in the SubViewOp.
In the current implementation the issue is that canonicalization may strictly lose information because static offsets are combined in irrecoverable ways into the result type, in order to fit the strided memref representation.
The core semantics of the op do not change but the parser and printer do: the op always requires `rank` offsets, sizes and strides. These quantities can now be either SSA values or static integer attributes.
The result type is automatically deduced from the static information and more powerful canonicalizations (as powerful as the representation with sentinel `?` values allows). Previously static information was inferred on a best-effort basis from looking at the source and destination type.
Relevant tests are rewritten to use the idiomatic `offset: x, strides : [...]`-form. Bugs are corrected along the way that were not trivially visible in flattened strided memref form.
Lowering to LLVM is updated, simplified and now supports all cases.
A mixed static-dynamic mode test that wouldn't previously lower is added.
It is an open question, and a longer discussion, whether a better result type representation would be a nicer alternative. For now, the subview op carries the required semantic.
Differential Revision: https://reviews.llvm.org/D79662
This reverts commit 80d133b24f.
Per Stephan Herhut: The canonicalizer pattern that was added creates
forms of the subview op that cannot be lowered.
This is shown by failing Tensorflow XLA tests such as:
tensorflow/compiler/xla/service/mlir_gpu/tests:abs.hlo.test
Will provide more details offline, they rely on logs from private CI.
Summary:
The main objective of this revision is to change the way static information is represented, propagated and canonicalized in the SubViewOp.
In the current implementation the issue is that canonicalization may strictly lose information because static offsets are combined in irrecoverable ways into the result type, in order to fit the strided memref representation.
The core semantics of the op do not change but the parser and printer do: the op always requires `rank` offsets, sizes and strides. These quantities can now be either SSA values or static integer attributes.
The result type is automatically deduced from the static information and more powerful canonicalizations (as powerful as the representation with sentinel `?` values allows). Previously static information was inferred on a best-effort basis from looking at the source and destination type.
Relevant tests are rewritten to use the idiomatic `offset: x, strides : [...]`-form. Bugs are corrected along the way that were not trivially visible in flattened strided memref form.
It is an open question, and a longer discussion, whether a better result type representation would be a nicer alternative. For now, the subview op carries the required semantic.
Reviewers: ftynse, mravishankar, antiagainst, rriddle!, andydavis1, timshen, asaadaldien, stellaraccident
Reviewed By: mravishankar
Subscribers: aartbik, bondhugula, mehdi_amini, rriddle, jpienaar, shauheen, antiagainst, arpith-jacob, mgester, lucyrfox, liufengdb, stephenneuendorffer, Joonsoo, bader, grosul1, frgossen, Kayjukh, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D79662
This [discussion](https://llvm.discourse.group/t/viewop-isnt-expressive-enough/991/2) raised some concerns with ViewOp.
In particular, the handling of offsets is incorrect and does not match the op description.
Note that with an elemental type change, offsets cannot be part of the type in general because sizeof(srcType) != sizeof(dstType).
Howerver, offset is a poorly chosen term for this purpose and is renamed to byte_shift.
Additionally, for all intended purposes, trying to support non-identity layouts for this op does not bring expressive power but rather increases code complexity.
This revision simplifies the existing semantics and implementation.
This simplification effort is voluntarily restrictive and acts as a stepping stone towards supporting richer semantics: treat the non-common cases as YAGNI for now and reevaluate based on concrete use cases once a round of simplification occurred.
Differential revision: https://reviews.llvm.org/D79541
Complex addition and substraction are the first two binary operations on complex
numbers.
Remaining operations will follow the same pattern.
Differential Revision: https://reviews.llvm.org/D79479
Adding this pattern reduces code duplication. There is no need to have a
custom implementation for lowering to llvm.cmpxchg.
Differential Revision: https://reviews.llvm.org/D78753
Summary:
As D78974, this patch implements the emulation for store op. The emulation is
done with atomic operations. E.g., if the storing value is i8, rewrite the
StoreOp to:
1) load a 32-bit integer
2) clear 8 bits in the loading value
3) store 32-bit value back
4) load a 32-bit integer
5) modify 8 bits in the loading value
6) store 32-bit value back
The step 1 to step 3 are done by AtomicAnd as one atomic step, and the step 4
to step 6 are done by AtomicOr as another atomic step.
Differential Revision: https://reviews.llvm.org/D79272
Add `CreateComplexOp`, `ReOp`, and `ImOp` to the standard dialect.
This is the first step to support complex numbers.
Differential Revision: https://reviews.llvm.org/D79159