A memref argument is converted into a pointer-to-struct argument
of type `{T*, T*, i64, i64[N], i64[N]}*` in the wrapper function,
where T is the converted element type and N is the memref rank.
Differential Revision: https://reviews.llvm.org/D76059
This commits changes the definition of spv.module to use the #spv.vce
attribute for specifying (version, capabilities, extensions) triple
so that we can have better API and custom assembly form. Since now
we have proper modelling of the triple, (de)serialization is wired up
to use them.
With the new UpdateVCEPass, we don't need to manually specify the
required extensions and capabilities anymore when creating a spv.module.
One just need to call UpdateVCEPass before serialization to get the
needed version/extensions/capabilities.
Differential Revision: https://reviews.llvm.org/D75872
We also need the (version, capabilities, extensions) triple on the
spv.module op. Thus far we have been using separate 'extensions'
and 'capabilities' attributes there and 'version' is missing. Creating
a separate attribute for the trip allows us to reuse the assembly
form and verification.
Differential Revision: https://reviews.llvm.org/D75868
Summary:
This replaces the direct lowering of vector.outerproduct to LLVM with progressive lowering into elementary vectors ops to avoid having the similar lowering logic at several places.
NOTE1: with the new progressive rule, the lowered llvm is slightly more elaborate than with the direct lowering, but the generated assembly is just as optimized; still if we want to stay closer to the original, we should add a "broadcast on extract" to shuffle rewrite (rather than special cases all the lowering steps)
NOTE2: the original outerproduct lowering code should now be removed but some linalg test work directly on vector and contain some dead code, so this requires another CL
Reviewers: nicolasvasilache, andydavis1
Reviewed By: nicolasvasilache, andydavis1
Subscribers: mehdi_amini, rriddle, jpienaar, burmako, shauheen, antiagainst, nicolasvasilache, arpith-jacob, mgester, lucyrfox, liufengdb, Joonsoo, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D75956
* Adds GpuLaunchFuncToVulkanLaunchFunc conversion pass.
* Moves a serialization of the `spirv::Module` from LaunchFuncToVulkanCalls pass to newly created pass.
* Updates LaunchFuncToVulkanCalls instrumentation pass, adds `initVulkan` and `deinitVulkan` runtime calls.
* Adds `bindResource` call to bind specifc resource by the given descriptor set and descriptor binding.
* Eliminates static construction and desctruction of `VulkanRuntimeManager`.
Differential Revision: https://reviews.llvm.org/D75192
Summary: This op mirrors the llvm.intr counterpart and allows lowering + type conversions in a progressive fashion.
Differential Revision: https://reviews.llvm.org/D75775
The existing API for successor operands on operations is in the process of being removed. This revision simplifies a later one that completely removes the existing API.
Differential Revision: https://reviews.llvm.org/D75316
This attribute details the segment sizes for operand groups within the operation. This revision add support for automatically populating this attribute in the declarative parser.
Differential Revision: https://reviews.llvm.org/D75315
Recently introduced support for converting sequential reduction loops to
CFG of basic blocks in the Standard dialect makes it possible to perform
a staged conversion of parallel reduction loops into a similar CFG by
using sequential loops as an intermediate step. This is already the case
for parallel loops without reduction, so extend the pattern to support
an additional use case.
Differential Revision: https://reviews.llvm.org/D75599
Summary:
This adds an rsqrt op to the standard dialect, and lowers
it as 1 / sqrt to the LLVM dialect.
Differential Revision: https://reviews.llvm.org/D75353
Summary:
Introduce support for converting loop.for operations with loop-carried values
to a CFG in the standard dialect. This is achieved by passing loop-carried
values as block arguments to the loop condition block. This block dominates
both the loop body and the block immediately following the loop, so the
arguments of this block are remain visible there.
Differential Revision: https://reviews.llvm.org/D75513
Summary: Added brackets to fix the loop trip count computation.
The brackets ensure the bounds are subtracted before we divide
the result by the step of the loop.
Differential Revision: https://reviews.llvm.org/D75449
Summary:
The original patch had TODOs to add support for step computations,
which this commit addresses. The computations are expressed using
affine expressions so that the affine canonicalizers can simplify
the full bound and index computations.
Also cleans up the code a little and exposes the pass in the
header file.
Differential Revision: https://reviews.llvm.org/D75052
Summary:
The RFC for this op is here: https://llvm.discourse.group/t/rfc-add-std-atomic-rmw-op/489
The std.atmomic_rmw op provides a way to support read-modify-write
sequences with data race freedom. It is intended to be used in the lowering
of an upcoming affine.atomic_rmw op which can be used for reductions.
A lowering to LLVM is provided with 2 paths:
- Simple patterns: llvm.atomicrmw
- Everything else: llvm.cmpxchg
Differential Revision: https://reviews.llvm.org/D74401
This patch implements the RFCs proposed here:
https://llvm.discourse.group/t/rfc-modify-ifop-in-loop-dialect-to-yield-values/463https://llvm.discourse.group/t/rfc-adding-operands-and-results-to-loop-for/459/19.
It introduces the following changes:
- All Loop Ops region, except for ReduceOp, terminate with a YieldOp.
- YieldOp can have variadice operands that is used to return values out of IfOp and ForOp regions.
- Change IfOp and ForOp syntax and representation to define values.
- Add unit-tests and update .td documentation.
- YieldOp is a terminator to loop.for/if/parallel
- YieldOp custom parser and printer
Lowering is not supported at the moment, and will be in a follow-up PR.
Thanks.
Reviewed By: bondhugula, nicolasvasilache, rriddle
Differential Revision: https://reviews.llvm.org/D74174
The commit switching the calling convention for memrefs (5a1778057)
inadvertently introduced a bug in the function argument attribute conversion:
due to incorrect indexing of function arguments it was not assigning the
attributes to the arguments beyond those generated from the first original
argument. This was not caught in the commit since the test suite does have a
test for converting multi-argument functions with argument attributes. Fix the
bug and add relevant tests.
Implement a pass to convert gpu.launch_func op into a sequence of
Vulkan runtime calls. The Vulkan runtime API surface is huge so currently we
don't expose separate external functions in IR for each of them, instead we
expose a few external functions to wrapper libraries which manages
Vulkan runtime.
Differential Revision: https://reviews.llvm.org/D74549
Summary:
To unblock other work, this implements basic lowering based on mapping
attributes that have to be provided on all loop.parallel. The lowering
does not yet support reduce.
Differential Revision: https://reviews.llvm.org/D73893
This patch adapts the method MemRefDescriptor::fromStaticShape to
support static non-zero offsets. The updated method uses the
getStridesAndOffset method to extract strides and offset. The patch also
adapts the test cases since sizes and strides are now set in forward
instead of reverse order.
Differential Revision: https://reviews.llvm.org/D74474
A memref_cast casting to a memref with a non identity map can't be
lowered to llvm. Take the following case:
```
func @invalid_memref_cast(%arg0: memref<?x?xf64>) {
%c1 = constant 1 : index
%c0 = constant 0 : index
%5 = memref_cast %arg0 : memref<?x?xf64> to memref<?x?xf64, #map1>
%25 = std.subview %5[%c0, %c0][%c1, %c1][] : memref<?x?xf64, #map1> to memref<?x?xf64, #map1>
return
}
```
When lowering the subview mlir was assuming `%5` to have an llvm type
(which is not the case as mlir failed to lower the memref_cast).
Differential Revision: https://reviews.llvm.org/D74466
Thus far we have been using builtin func op to model SPIR-V functions.
It was because builtin func op used to have special treatment in
various parts of the core codebase (e.g., pass pipelines, etc.) and
it's easy to bootstrap the development of the SPIR-V dialect. But
nowadays with general op concepts and region support we don't have
such limitations and it's time to tighten the SPIR-V dialect for
completeness.
This commits introduces a spv.func op to properly model SPIR-V
functions. Compared to builtin func op, it can provide the following
benefits:
* We can control the full op so we can integrate SPIR-V information
bits (e.g., function control) in a more integrated way and define
our own assembly form and enforcing better verification.
* We can have a better dialect and library boundary. At the current
moment only functions are modelled with an external op. With this
change, all ops modelling SPIR-V concpets will be spv.* ops and
registered to the SPIR-V dialect.
* We don't need to special-case func op anymore when creating
ConversionTarget declaring SPIR-V dialect as legal. This is quite
important given we'll see more and more conversions in the future.
In the process, bumps a few FuncOp methods to the FunctionLike trait.
Differential Revision: https://reviews.llvm.org/D74226
Follow-up on D72802. Turn -convert-std-to-llvm-use-alloca and
-convert-std-to-llvm-bare-ptr-memref-call-conv into pass flags
of LLVMLoweringPass.
Reviewed By: mehdi_amini
Differential Revision: https://reviews.llvm.org/D73912
We have spv.entry_point_abi for specifying the local workgroup size.
It should be decorated onto input gpu.func ops to drive the SPIR-V
CodeGen to generate the proper SPIR-V module execution mode. Compared
to using command-line options for specifying the configuration, using
attributes also has the benefits that 1) we are now able to use
different local workgroup for different entry points and 2) the
tests contains the configuration directly.
Differential Revision: https://reviews.llvm.org/D74012
The current standard to llvm conversion pass lowers subview ops only if
dynamic offsets are provided. This commit extends the lowering with a
code path that uses the constant offset of the target memref for the
subview op lowering (see Example 3 of the subview op definition for an
example) if no dynamic offsets are provided.
Differential Revision: https://reviews.llvm.org/D74280
The existing (default) calling convention for memrefs in standard-to-LLVM
conversion was motivated by interfacing with LLVM IR produced from C sources.
In particular, it passes a pointer to the memref descriptor structure when
calling the function. Therefore, the descriptor is allocated on stack before
the call. This convention leads to several problems. PR44644 indicates a
problem with stack exhaustion when calling functions with memref-typed
arguments in a loop. Allocating outside of the loop may lead to concurrent
access problems in case the loop is parallel. When targeting GPUs, the contents
of the stack-allocated memory for the descriptor (passed by pointer) needs to
be explicitly copied to the device. Using an aggregate type makes it impossible
to attach pointer-specific argument attributes pertaining to alignment and
aliasing in the LLVM dialect.
Change the default calling convention for memrefs in standard-to-LLVM
conversion to transform a memref into a list of arguments, each of primitive
type, that are comprised in the memref descriptor. This avoids stack allocation
for ranked memrefs (and thus stack exhaustion and potential concurrent access
problems) and simplifies the device function invocation on GPUs.
Provide an option in the standard-to-LLVM conversion to generate auxiliary
wrapper function with the same interface as the previous calling convention,
compatible with LLVM IR porduced from C sources. These auxiliary functions
pack the individual values into a descriptor structure or unpack it. They also
handle descriptor stack allocation if necessary, serving as an allocation
scope: the memory reserved by `alloca` will be freed on exiting the auxiliary
function.
The effect of this change on MLIR-generated only LLVM IR is minimal. When
interfacing MLIR-generated LLVM IR with C-generated LLVM IR, the integration
only needs to require auxiliary functions and change the function name to call
the wrapper function instead of the original function.
This also opens the door to forwarding aliasing and alignment information from
memrefs to LLVM IR pointers in the standrd-to-LLVM conversion.
The existing lowering of gpu.block_dim added a global variable with
the WorkGroupSize decoration. This raises an error within
Vulkan/SPIR-V validation since Vulkan requires this to have a constant
initializer. This is not yet supported in SPIR-V dialect. Changing the
lowering to return the workgroup size as a constant value instead,
obtained from spv.entry_point_abi attribute gets around the issue for
now. The validation goes through since the workgroup size is specified
using spv.execution_mode operation.
Summary:
The `vector.fma` operation is portable enough across targets that we do not want
to keep it wrapped under `vector.outerproduct` and `llvm.intrin.fmuladd`.
This revision lifts the op into the vector dialect and implements the lowering to LLVM by using two patterns:
1. a pattern that lowers from n-D to (n-1)-D by unrolling when n > 2
2. a pattern that converts from 1-D to the proper LLVM representation
Reviewers: ftynse, stellaraccident, aartbik, dcaballe, jsetoain, tetuante
Reviewed By: aartbik
Subscribers: fhahn, dcaballe, merge_guards_bot, mehdi_amini, rriddle, jpienaar, burmako, shauheen, antiagainst, arpith-jacob, mgester, lucyrfox, aartbik, liufengdb, Joonsoo, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D74075
Summary:
This revision exposes the portable `llvm.fma` intrinsic in LLVMOps and uses it
in lieu of `llvm.fmuladd` when lowering the `vector.outerproduct` op to LLVM.
This guarantees proper `fma` instructions will be emitted if the target ISA
supports it.
`llvm.fmuladd` does not have this guarantee in its semantics, despite evidence
that the proper x86 instructions are emitted.
For more details, see https://llvm.org/docs/LangRef.html#llvm-fmuladd-intrinsic.
Reviewers: ftynse, aartbik, dcaballe, fhahn
Reviewed By: aartbik
Subscribers: mehdi_amini, rriddle, jpienaar, burmako, shauheen, antiagainst, arpith-jacob, mgester, lucyrfox, liufengdb, Joonsoo, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D74219
We were using normal dictionary attribute for target environment
specification. It becomes cumbersome with more and more fields.
This commit changes the modelling to a dialect-specific attribute,
where we can have control over its storage and assembly form.
Differential Revision: https://reviews.llvm.org/D73959
This commit adds two resource limits, max_compute_workgroup_size
and max_compute_workgroup_invocations as resource limits to
the target environment. They are not used at the current moment,
but they will affect the SPIR-V CodeGen. Adding for now to have
a proper target environment modelling.
Differential Revision: https://reviews.llvm.org/D73905
Summary:
In the original design, gpu.launch required explicit capture of uses
and passing them as operands to the gpu.launch operation. This was
motivated by infrastructure restrictions rather than design. This
change lifts the requirement and removes the concept of kernel
arguments from gpu.launch. Instead, the kernel outlining
transformation now does the explicit capturing.
This is a breaking change for users of gpu.launch.
Differential Revision: https://reviews.llvm.org/D73769
Summary:
This patch introduces an alternative calling convention for
MemRef function arguments in LLVM dialect. It converts MemRef
function arguments to LLVM bare pointers to the MemRef element
type instead of creating a MemRef descriptor. Bare pointers are
then promoted to a MemRef descriptors at the beginning of the
function. This calling convention is only enabled with a flag.
Reviewers: ftynse, bondhugula, nicolasvasilache, rriddle, mehdi_amini
Reviewed By: ftynse, rriddle, mehdi_amini
Subscribers: Joonsoo, flaub, merge_guards_bot, jholewinski, mehdi_amini, rriddle, jpienaar, burmako, shauheen, antiagainst, csigg, arpith-jacob, mgester, lucyrfox, herhut, aartbik, liufengdb, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D72802
Summary:
Rationale:
When lowering to LLVM for different rank insert (n vs k), the offset
arrays needs to drop one dimension (becomes n-1), but the strides
array needs to be preserved (remains k). With regression test.
Note that this example was actually in the documentation, so
extra important to do it right :-)
Reviewers: nicolasvasilache, andydavis1, ftynse
Reviewed By: nicolasvasilache, ftynse
Subscribers: Joonsoo, merge_guards_bot, mehdi_amini, rriddle, jpienaar, burmako, shauheen, antiagainst, arpith-jacob, mgester, lucyrfox, liufengdb, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D73733
This commit adds a pattern to lower linalg.generic for reduction
to spv.GroupNonUniform* ops. Right now this only supports integer
reduction on 1-D input memref. Shader entry point ABI is queried
to make sure that the input memref's shape matches the local
workgroup's invocation configuration. This makes sure that the
workload fits in one local workgroup so that we can leverage
SPIR-V group non-uniform operations.
linglg.generic is a structured op that preserves the right level
of information. It is easier to recognize reduction at this level
than performing analysis on loops.
This commit also exposes `getElementPtr` in SPIRVLowering.h given
that it's a generally useful utility function.
Differential Revision: https://reviews.llvm.org/D73437
Summary:
The current code assumes that one always maps at least one loop to block
dimensions and at least one loop to thread dimensions. If either is not
the case, a loop would get mapped twice.
Differential Revision: https://reviews.llvm.org/D73685
Summary:
In the scope of the lowering phase from GPU to ROCDL, the intructions for the conversion patterns seems to be wrong.
According to https://github.com/ROCm-Developer-Tools/HIP/blob/master/include/hip/hcc_detail/math_fwd.h the instructions need two underscores in the beginning instead of one.
Reviewers: nicolasvasilache, herhut, rriddle
Reviewed By: herhut, rriddle
Subscribers: merge_guards_bot, mehdi_amini, rriddle, jpienaar, burmako, shauheen, antiagainst, csigg, arpith-jacob, mgester, lucyrfox, herhut, liufengdb, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D73535
Summary:
The 'gpu.terminator' operation is used as the terminator for the
regions of gpu.launch. This is to disambugaute them from the
return operation on 'gpu.func' functions.
This is a breaking change and users of the gpu dialect will need
to adapt their code when producting 'gpu.launch' operations.
Reviewers: nicolasvasilache
Subscribers: mehdi_amini, rriddle, jpienaar, burmako, shauheen, antiagainst, csigg, arpith-jacob, mgester, lucyrfox, liufengdb, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D73620
Summary:
This also removes the explicit pattern for loop.terminator to ensure
that the terminator is only erased if the parent op is rewritten.
Reductions are not yet supported.
Reviewers: nicolasvasilache
Subscribers: mehdi_amini, rriddle, jpienaar, burmako, shauheen, antiagainst, arpith-jacob, mgester, lucyrfox, aartbik, liufengdb, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D73348
Summary:
The tanh lowering from Standard dialect to NVVM and ROCDL was not working.
The conversion pattern are inserted in the lowering files.
The test cases for the lowerings were added in the test files.
Reviewers: nicolasvasilache, ftynse, herhut
Reviewed By: ftynse, herhut
Subscribers: merge_guards_bot, ftynse, jholewinski, mehdi_amini, rriddle, jpienaar, burmako, shauheen, antiagainst, csigg, arpith-jacob, mgester, lucyrfox, herhut, liufengdb, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D73471
Add lowering for constant operation with ranked tensor type to
spv.constant with spv.array type.
Differential Revision: https://reviews.llvm.org/D73022
Summary:
This is based on the use of code constantly checking for an attribute on
a model and instead represents the distinct operaion with a different
op. Instead, this op can be used to provide better filtering.
Reverts "Revert "[mlir] Create a gpu.module operation for the GPU Dialect.""
This reverts commit ac446302ca4145cdc89f377c0c364c29ee303be5 after
fixing internal Google issues.
This additionally updates ROCDL lowering to use the new gpu.module.
Reviewers: herhut, mravishankar, antiagainst, nicolasvasilache
Subscribers: jholewinski, mgorny, mehdi_amini, jpienaar, burmako, shauheen, csigg, arpith-jacob, mgester, lucyrfox, aartbik, liufengdb, llvm-commits, mravishankar, rriddle, antiagainst, bkramer
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D72921
When lowering `loop.if` to `spv.selection` we explicitly create
a selection header block before the control flow diverges and a
merge block where control flow subsequently converges.
Differential Revision: https://reviews.llvm.org/D72836
Summary:
This is based on the use of code constantly checking for an attribute on
a model and instead represents the distinct operaion with a different
op. Instead, this op can be used to provide better filtering.
Reviewers: herhut, mravishankar, antiagainst, rriddle
Reviewed By: herhut, antiagainst, rriddle
Subscribers: liufengdb, aartbik, jholewinski, mgorny, mehdi_amini, rriddle, jpienaar, burmako, shauheen, antiagainst, nicolasvasilache, csigg, arpith-jacob, mgester, lucyrfox, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D72336
Summary: The current syntax for AffineMapAttr and IntegerSetAttr conflict with function types, making it currently impossible to round-trip function types(and e.g. FuncOp) in the IR. This revision changes the syntax for the attributes by wrapping them in a keyword. AffineMapAttr is wrapped with `affine_map<>` and IntegerSetAttr is wrapped with `affine_set<>`.
Reviewed By: nicolasvasilache, ftynse
Differential Revision: https://reviews.llvm.org/D72429
Summary:
This diff implements the progressive lowering of insert_strided_slice.
Two cases appear:
1. when the source and dest vectors have different ranks, extract the dest
subvector at the proper offset and reduce to case 2.
2. when they have the same rank N:
a. if the source and dest type are the same, the insertion is trivial:
just forward the source
b. otherwise, iterate over all N-1 D subvectors and create an
extract/insert_strided_slice/insert replacement, reducing the problem
to vecotrs of the same N-1 rank.
This combines properly with the other conversion patterns to lower all the way to LLVM.
Reviewers: ftynse, rriddle, AlexEichenberger, andydavis1, tetuante, nicolasvasilache
Reviewed By: andydavis1
Subscribers: merge_guards_bot, mehdi_amini, jpienaar, burmako, shauheen, antiagainst, arpith-jacob, mgester, lucyrfox, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D72317
Summary:
This diff implements the progressive lowering of strided_slice to either:
1. extractelement + insertelement for the 1-D case
2. extract + optional strided_slice + insert for the n-D case.
This combines properly with the other conversion patterns to lower all the way to LLVM.
Appropriate tests are added.
Reviewers: ftynse, rriddle, AlexEichenberger, andydavis1, tetuante
Reviewed By: andydavis1
Subscribers: merge_guards_bot, mehdi_amini, jpienaar, burmako, shauheen, antiagainst, arpith-jacob, mgester, lucyrfox, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D72310
This commit fixes shader ABI attributes to use `spv.` as the prefix
so that they match the dialect's namespace. This enables us to add
verification hooks in the SPIR-V dialect to verify them.
Reviewed By: mravishankar
Differential Revision: https://reviews.llvm.org/D72062
The conversion from std.and/std.or to spv.LogicalAnd/spv.LogicalOr is
only valid for boolean (i1) types. Modify BinaryOpPattern in
StandardToSPIRV.td to allow limiting the type of the operands for
which the pattern is applied.
Differential Revision: https://reviews.llvm.org/D71881
Rename the 'shlis' operation in the standard dialect to 'shift_left'. Add tests
for this operation (these have been missing so far) and add a lowering to the
'shl' operation in the LLVM dialect.
Add also 'shift_right_signed' (lowered to LLVM's 'ashr') and 'shift_right_unsigned'
(lowered to 'lshr').
The original plan was to name these operations 'shift.left', 'shift.right.signed'
and 'shift.right.unsigned'. This works if the operations are prefixed with 'std.'
in MLIR assembly. Unfortunately during import the short form is ambigous with
operations from a hypothetical 'shift' dialect. The best solution seems to omit
dots in standard operations for now.
Closestensorflow/mlir#226
PiperOrigin-RevId: 286803388
This will allow us to lower most of gpu.all_reduce (when all_reduce
doesn't exist in the target dialect) within the GPU dialect, and only do
target-specific lowering for the shuffle op.
PiperOrigin-RevId: 286548256
Introduces some centralized methods to move towards
consistent use of i32 as vector subscripts.
Note: sizes/strides/offsets attributes are still i64
PiperOrigin-RevId: 286434133
Introduce affine.prefetch: op to prefetch using a multi-dimensional
subscript on a memref; similar to affine.load but has no effect on
semantics, but only on performance.
Provide lowering through std.prefetch, llvm.prefetch and map to llvm's
prefetch instrinsic. All attributes reflected through the lowering -
locality hint, rw, and instr/data cache.
affine.prefetch %0[%i, %j + 5], false, 3, true : memref<400x400xi32>
Signed-off-by: Uday Bondhugula <uday@polymagelabs.com>
Closestensorflow/mlir#225
COPYBARA_INTEGRATE_REVIEW=https://github.com/tensorflow/mlir/pull/225 from bondhugula:prefetch 4c3b4e93bc64d9a5719504e6d6e1657818a2ead0
PiperOrigin-RevId: 286212997
When memory attributions are present in `gpu.func`, require that they are of
memref type and live in memoryspaces 3 and 5 for workgroup and private memory
attributions, respectively. Adapt the conversion from the GPU dialect to the
NVVM dialect to drop the private memory space from attributions as NVVM is able
to model them as local `llvm.alloca`s in the default memory space.
PiperOrigin-RevId: 286161763
This updates the lowering pipelines from the GPU dialect to lower-level
dialects (NVVM, SPIRV) to use the recently introduced gpu.func operation
instead of a standard function annotated with an attribute. In particular, the
kernel outlining is updated to produce gpu.func instead of std.func and the
individual conversions are updated to consume gpu.funcs and disallow standard
funcs after legalization, if necessary. The attribute "gpu.kernel" is preserved
in the generic syntax, but can also be used with the custom syntax on
gpu.funcs. The special kind of function for GPU allows one to use additional
features such as memory attribution.
PiperOrigin-RevId: 285822272
Similar to insert/extract vector instructions but
(1) work on 1-D vectors only
(2) allow for a dynamic index
%c3 = constant 3 : index
%0 = vector.insertelement %arg0, %arg1[%c : index] : vector<4xf32>
%1 = vector.extractelement %arg0[%c3 : index] : vector<4xf32>
PiperOrigin-RevId: 285792205
During the conversion from the standard dialect to the LLVM dialect,
memref-typed arguments are promoted from registers to memory and passed into
functions by pointer. This had been introduced into the lowering to work around
the abesnce of calling convention modeling in MLIR to enable better
interoperability with LLVM IR generated from C, and has been exerciced for
several months. Make this promotion the default calling covention when
converting to the LLVM dialect. This adds the documentation, simplifies the
code and makes the conversion consistent across function operations and
function types used in other places, e.g. in high-order functions or
attributes, which would not follow the same rule previously.
PiperOrigin-RevId: 285751280
This change allows for DialectConversion to attempt folding as a mechanism to legalize illegal operations. This also expands folding support in OpBuilder::createOrFold to generate new constants when folding, and also enables it to work in the context of a PatternRewriter.
PiperOrigin-RevId: 285448440
This type is not used anymore now that Linalg view and subview have graduated to std and that alignment is supported on alloc.
PiperOrigin-RevId: 285213424
Both work for the current use case, but the latter allows implementing
prefix sums and is a little easier to understand for partial warps.
PiperOrigin-RevId: 285145287
This reorganizes the vector transformations to be more easily testable as patterns and more easily composable into fused passes in the future.
PiperOrigin-RevId: 284817474
For example
%0 = vector.shuffle %x, %y [3 : i32, 2 : i32, 1 : i32, 0 : i32] : vector<2xf32>, vector<2xf32>
yields a vector<4xf32> result with a permutation of the elements of %x and %y
PiperOrigin-RevId: 284657191
The existing GPU to SPIR-V lowering created a spv.module for every
function with gpu.kernel attribute. A better approach is to lower the
module that the function lives in (which has the attribute
gpu.kernel_module) to a spv.module operation. This better captures the
host-device separation modeled by GPU dialect and simplifies the
lowering as well.
PiperOrigin-RevId: 284574688
Unifies vector op unrolling transformation, by using the same unrolling implementation for contraction and elementwise operations.
Removes fakefork/join operations which are non longer needed now that we have the InsertStridedSlice operation.
PiperOrigin-RevId: 284570784
Since these operations lower to [insert|extract][element|value] at LLVM
dialect level, neither element nor value would correctly reflect the meaning.
PiperOrigin-RevId: 284240727
GPU functions use memory attributions, a combination of Op attributes and
region arguments, to specify function-wide buffers placed in workgroup or
private memory spaces. Introduce a lowering pattern for GPU functions to be
converted to LLVM functions taking into account memory attributions. Workgroup
attributions get transformed into module-level globals with unique names
derived from function names. Private attributions get converted into
llvm.allocas inside the function body. In both cases, we inject at the
beginning of the function the IR that obtains the raw pointer to the data and
populates a MemRef descriptor based on the MemRef type of buffer, making
attributions compose with the rest of the MemRef lowering and transparent for
use with std.load and std.store. While using raw pointers instead of
descriptors might have been more efficient, it is better implemented as a
canonicalization or a separate transformation so that non-attribution memrefs
could also benefit from it.
PiperOrigin-RevId: 284208396
Updates vector ContractionOp to use proper vector masks (produced by CreateMaskOp/ConstantMaskOp).
Leverages the following canonicalizations in unrolling unit test: CreateMaskOp -> ConstantMaskOp, StridedSliceOp(ConstantMaskOp) -> ConstantMaskOp
Removes IndexTupleOp (no longer needed now that we have vector mask ops).
Updates all unit tests.
PiperOrigin-RevId: 284182168
SPIR-V/Vulkan spec requires the workgroups size to be specified with
the spv.ExecutionMode operation. This was hard-wired to be set to a
particular value. It is now changed to be configurable by clients of
the pass or of the patterns that implement the lowering from GPU to
SPIRV.
PiperOrigin-RevId: 284017482
In the future, a more configurable malloc and free interface should be used and exposed via
extra parameters to the `createLowerToLLVMPass`. Until requirements are gathered, a simple CL flag allows generating code that runs successfully on hardware that cannot use the stdlib.
PiperOrigin-RevId: 283833424
Now that we have unrolling as a declarative pattern, we can drop a full pass that has gone stale. In the future we may want to add specific unrolling patterns for VectorTransferReadOp.
PiperOrigin-RevId: 283806880
This CL refactors some of the MLIR vector dependencies to allow decoupling VectorOps, vector analysis, vector transformations and vector conversions from each other.
This makes the system more modular and allows extracting VectorToVector into VectorTransforms that do not depend on vector conversions.
This refactoring exhibited a bunch of cyclic library dependencies that have been cleaned up.
PiperOrigin-RevId: 283660308
Not all StandardOps can be lowered to SPIR-V. For example, subview op
implementation requires use of pointer bitcasts which is not valid
according to SPIR-V spec (or at least is ambiguous about it). Such ops
need to be removed/transformed before lowering to SPIR-V. The
SPIRVLegalizationPass is added a place where such legalizations can be
added. Current implementation folds the subview ops with load/stores
so that the lowering itself does not have to convert a subview op.
PiperOrigin-RevId: 283642981
The SPIR-V lowering used nested !spv.arrays to represented
multi-dimensional arrays, with the hope that in-conjunction with the
layout annotations, the shape and layout of memref can be represented
directly. It is unclear though how portable this representation will
end up being. It will rely on driver compilers implementing complex
index computations faithfully. A more portable approach is to use
linearized arrays to represent memrefs and explicitly instantiate all
the index computation in SPIR-V. This gives added benefit that we can
further optimize the generated code in MLIR before generating the
SPIR-V binary.
PiperOrigin-RevId: 283571167
As described in the documentation, ViewOp is expected to take an optional
dynamic offset followed by a list of dynamic sizes. However, the ViewOp parser
did not include a check for the offset being a single value and accepeted a
list of values instead.
Furthermore, several tests have been exercising the wrong syntax of a ViewOp,
passing multiple values to the dyanmic stride list, which was not caught by the
parser. The trailing values could have been erronously interpreted as dynamic
sizes. This is likely due to resyntaxing of the ViewOp, with the previous
syntax taking the list of sizes before the offset. Update the tests to use the
syntax with the offset preceding the sizes.
Worse, the conversion of ViewOp to the LLVM dialect assumed the wrong order of
operands with offset in the trailing position, and erronously relied on the
permissive parsing that interpreted trailing dynamic offset values as leading
dynamic sizes. Fix the lowering to use the correct order of operands.
PiperOrigin-RevId: 283532506
are constant (i.e., there are no size and stride operands).
We recently added canonicalization that rewrites constant size and stride operands to
SubViewOp into static information in the type, so these patterns now occur during code
generation.
PiperOrigin-RevId: 283524688
LLVM IR supports linkage on global objects such as global variables and
functions. Introduce the Linkage attribute into the LLVM dialect, backed by an
integer storage. Use this attribute on LLVM::GlobalOp and make it mandatory.
Implement parsing/printing of the attribute and conversion to LLVM IR.
See tensorflow/mlir#277.
PiperOrigin-RevId: 283309328
These changes to SPIR-V lowering while adding support for lowering
SUbViewOp, but are not directly related.
- Change the lowering of MemRefType to
!spv.ptr<!spv.struct<!spv.array<...>[offset]>, ..>
This is consistent with the Vulkan spec.
- To enable testing a simple pattern of lowering functions is added to
ConvertStandardToSPIRVPass. This is just used to convert the type of
the arguments of the function. The added function lowering itself is
not meant to be the way functions are eventually lowered into SPIR-V
dialect.
PiperOrigin-RevId: 282589644
This CL uses the recently added op to finish the implementation of Vector -> Vector unrolling by replacing the "fake join op" by a series of InsertStridedSliceOp.
Test is updated accordingly
PiperOrigin-RevId: 282451126
To simplify the lowering into SPIR-V, while still respecting the ABI
requirements of SPIR-V/Vulkan, split the process into two
1) While lowering a function to SPIR-V (when the function is an entry
point function), allow specifying attributes on arguments and
function itself that describe the ABI of the function.
2) Add a pass that materializes the ABI described in the function.
Two attributes are needed.
1) Attribute on arguments of the entry point function that describe
the descriptor_set, binding, storage class, etc, of the
spv.globalVariable this argument will be replaced by
2) Attribute on function that specifies workgroup size, etc. (for now
only workgroup size).
Add the pass -spirv-lower-abi-attrs to materialize the ABI described
by the attributes.
This change makes the SPIRVBasicTypeConverter class unnecessary and is
removed, further simplifying the SPIR-V lowering path.
PiperOrigin-RevId: 282387587
The current SubViewOp specification allows for either all offsets,
shape and stride to be dynamic or all of them to be static. There are
opportunities for more fine-grained canonicalization based on which of
these are static. For example, if the sizes are static, the result
memref is of static shape. The specification of SubViewOp is modified
to allow on or more of offsets, shapes and strides to be statically
specified. The verification is updated to ensure that the result type
of the subview op is consistent with which of these are static and
which are dynamic.
PiperOrigin-RevId: 281560457
This CL uses the pattern rewrite infrastructure to implement a simple VectorOps -> VectorOps legalization strategy to unroll coarse-grained vector operations into finer grained ones.
The transformation is written using local pattern rewrites to allow composition with other rewrites. It proceeds by iteratively introducing fake cast ops and cleaning canonicalizing or lowering them away where appropriate.
This is an example of writing transformations as compositions of local pattern rewrites that should enable us to make them significantly more declarative.
PiperOrigin-RevId: 281555100
The command-line flag name `lower-to-llvm` for the pass performing dialect
conversion from the Standard dialect to the LLVM dialect is misleading and
inconsistent with most of the conversion passses. It leads the user to believe
that there are no restrictions on what can be converted, while in fact only a
subset of the Standard dialect can be converted (with operations from other
dialects converted by separate passes). Use `convert-std-to-llvm` that better
reflects what the pass does and is consistent with most other conversions.
PiperOrigin-RevId: 281238797
This is step 1/n in refactoring infrastructure along the Vector dialect to make it ready for retargetability and composable progressive lowering.
PiperOrigin-RevId: 280529784
Following up on the consolidation of MemRef descriptor conversion, update
Vector-to-LLVM conversion to use the helper class that abstracts away the
implementation details of the MemRef descriptor. This also makes the types of
the attributes in emitted llvm.insert/extractelement operations consistently
i64 instead of a mix of index and i64.
PiperOrigin-RevId: 280441451
This CL moves VectorOps to Tablegen and cleans up the implementation.
This is almost NFC but 2 changes occur:
1. an interface change occurs in the padding value specification in vector_transfer_read:
the value becomes non-optional. As a shortcut we currently use %f0 for all paddings.
This should become an OpInterface for vectorization in the future.
2. the return type of vector.type_cast is trivial and simplified to `memref<vector<...>>`
Relevant roundtrip and invalid tests that used to sit in core are moved to the vector dialect.
The op documentation is moved to the .td file.
PiperOrigin-RevId: 280430869
This CL uses the now standard std.subview in linalg.
Two shortcuts are currently taken to allow this port:
1. the type resulting from a view is currently degraded to fully dynamic to pass the SubViewOp verifier.
2. indexing into SubViewOp may access out of bounds since lowering to LLVM does not currently enforce it by construction.
These will be fixed in subsequent commits after discussions.
PiperOrigin-RevId: 280250129
Lowering of CmpIOp, DivISOp, RemISOp, SubIOp and SelectOp to SPIR-V
dialect enables the lowering of operations generated by AffineExpr ->
StandardOps conversion into the SPIR-V dialect.
PiperOrigin-RevId: 280039204
loop::ForOp can be lowered to the structured control flow represented
by spirv::LoopOp by making the continue block of the spirv::LoopOp the
loop latch and the merge block the exit block. The resulting
spirv::LoopOp has a single back edge from the continue to header
block, and a single exit from header to merge.
PiperOrigin-RevId: 280015614
This CL adds an extra pointer to the memref descriptor to allow specifying alignment.
In a previous implementation, we used 2 types: `linalg.buffer` and `view` where the buffer type was the unit of allocation/deallocation/alignment and `view` was the unit of indexing.
After multiple discussions it was decided to use a single type, which conflates both, so the memref descriptor now needs to carry both pointers.
This is consistent with the [RFC-Proposed Changes to MemRef and Tensor MLIR Types](https://groups.google.com/a/tensorflow.org/forum/#!searchin/mlir/std.view%7Csort:date/mlir/-wKHANzDNTg/4K6nUAp8AAAJ).
PiperOrigin-RevId: 279959463
This CL ports the lowering of linalg.view to the newly introduced std.view.
Differences in implementation relate to std.view having slightly different semantics:
1. a static or dynamic offset can be specified.
2. the size of the (contiguous) shape is passed instead of a range.
3. static size and stride information is extracted from the memref type rather than the range.
Besides these differences, lowering behaves the same.
A future CL will update Linalg to use this unified infrastructure.
PiperOrigin-RevId: 278948853
The current lowering of loops to GPU only supports lowering of loop
nests where the loops mapped to workgroups and workitems are perfectly
nested. Here a new lowering is added to handle lowering of imperfectly
nested loop body with the following properties
1) The loops partitioned to workgroups are perfectly nested.
2) The loop body of the inner most loop partitioned to workgroups can
contain one or more loop nests that are to be partitioned across
workitems. Each individual loops nests partitioned to workitems should
also be perfectly nested.
3) The number of workgroups and workitems are not deduced from the
loop bounds but are passed in by the caller of the lowering as values.
4) For statements within the perfectly nested loop nest partitioned
across workgroups that are not loops, it is valid to have all threads
execute that statement. This is NOT verified.
PiperOrigin-RevId: 277958868
The type constraint had to be relaxed due to the order of lowering passes in
the examples, that since has been fixed. The relaxed version was still used by
the CUDA lowering for launch sizes of `index` type. This is not necessary since
the GPU dialect does not restrict the type of the launch size operands. Use an
LLVM type instead and restore the check in the LLVM_CallOp definition.
PiperOrigin-RevId: 275920109
A VectorTypeCastOp can only be used to lower between statically sized contiguous memrefs of scalar and matching vector type. The sizes and strides are thus fully static and easy to determine.
A relevant test is added.
This is a step towards solving tensorflow/mlir#189.
PiperOrigin-RevId: 275538981
Makes the spv.module generated by the GPU to SPIR-V conversion SPIR-V
spec compliant (validated using spirv-val from Vulkan tools).
1) Separate out the VulkanLayoutUtils from
DecorateSPIRVCompositeTypeLayoutPass to make it reusable within the
Type converter in SPIR-V lowering infrastructure. This is used to
compute the layout of the !spv.struct used in global variable type
description.
2) Set the capabilities of the spv.module to Shader (needed for use of
Logical Memory Model, and the extensions to
SPV_KHR_storage_buffer_storage_class for use of Storage Buffer)
PiperOrigin-RevId: 275081486
In addition to specifying the type of accumulation through the 'op' attribute, the accumulation can now also be specified as arbitrary code region.
Adds a gpu.yield op to specify the result of the accumulation.
Also support more types (integers) and accumulations (mul).
PiperOrigin-RevId: 275065447
Originally, the lowering of `alloc` operations has been computing the number of
bytes to allocate when lowering based on the properties of MLIR type. This does
not take into account type legalization that happens when compiling LLVM IR
down to target assembly. This legalization can widen the type, potentially
leading to out-of-bounds accesses to `alloc`ed data due to mismatches between
address computation that takes the widening into account and allocation that
does not. Use the LLVM IR's equivalent of `sizeof` to compute the number of
bytes to be allocated:
%0 = getelementptr %type* null, %indexType 0
%1 = ptrtoint %type* %0 to %indexType
adapted from
http://nondot.org/sabre/LLVMNotes/SizeOf-OffsetOf-VariableSizedStructs.txt
PiperOrigin-RevId: 274159900
This test was not updated in the original commit that switched to using LLVM
functions since it wasn't broken by that change. FileCheck was able to match
the `func` part of `llvm.func` to the expected pattern and continue as usual.
Make sure the `llvm.` dialect prefix is included in the expected output.
PiperOrigin-RevId: 274127281
In Standard to LLVM dialect conversion, the binary op conversion pattern
implicitly assumed some operands were of LLVM IR dialect type. This is not
necessarily true, for example if the Ops that produce those operands did not
match the existing convresion patterns. Check if all operands are of LLVM IR
dialect type and if not, fail to patch the binary op pattern.
Closestensorflow/mlir#168
PiperOrigin-RevId: 274063207
The lowering is specified as a pattern and is done only if the result
is a SPIR-V scalar type or vector type.
Handling ConstantOp with index return type needs special handling
since SPIR-V dialect does not have index types. Based on the bitwidth
of the attribute value, either i32 or i64 is chosen.
Other constant lowerings are left as a TODO.
PiperOrigin-RevId: 274056805
This function-like operation allows one to define functions that have wrapped
LLVM IR function type, in particular variadic functions. The operation was
added in parallel to the existing lowering flow, this commit only switches the
flow to use it.
Using a custom function type makes the LLVM IR dialect type system more
consistent and avoids complex conversion rules for functions that previously
had to use the built-in function type instead of a wrapped LLVM IR dialect type
and perform conversions during the analysis.
PiperOrigin-RevId: 273910855
The lowering infrastructure needs to be enhanced to lower into a
spv.Module that is consistent with the SPIR-V spec. The following
changes are needed
1) The Vulkan/SPIR-V validation rules dictates entry functions to have
signature of void(void). This requires changes to the function
signature conversion infrastructure within the dialect conversion
framework. When an argument is dropped from the original function
signature, a function can be specified that when invoked will return
the value to use as a replacement for the argument from the original
function.
2) Some changes to the type converter to make the converted type
consistent with the Vulkan/SPIR-V validation rules,
a) Add support for converting dynamically shaped tensors to
spv.rtarray type.
b) Make the global variable of type !spv.ptr<!spv.struct<...>>
3) Generate the entry point operation for the kernel functions and
automatically compute all the interface variables needed
PiperOrigin-RevId: 273784229
Originally, we were attaching attributes containing CUBIN blobs to the kernel
function called by `gpu.launch_func`. This kernel is now contained in a nested
module that is used as a compilation unit. Attach compiled CUBIN blobs to the
module rather than to the function since we were compiling the module. This
also avoids duplication of the attribute on multiple kernels within the same
module.
PiperOrigin-RevId: 273497303
Originally, the CUBIN getter function was introduced as a mechanism to
circumvent the absence of globals in the LLVM dialect. It would allocate memory
and populate it with the CUBIN data. LLVM dialect now supports globals and they
are already used to store CUBIN data, making the getter function a trivial
address computation of a global. Emit the address computation directly at the
place of `gpu.launch_func` instead of putting it in a function and calling it.
This simplifies the conversion flow and prepares it for using the
DialectConversion infrastructure.
PiperOrigin-RevId: 273496221
Now that the accessor function is a trivial getter of the global variable, it
makes less sense to have the getter generation as a separate pass. Move the
getter generation into the lowering of `gpu.launch_func` to CUDA calls. This
change is mostly code motion, but the process can be simplified further by
generating the addressof inplace instead of using a call. This is will be done
in a follow-up.
PiperOrigin-RevId: 273492517
The kernel function called by gpu.launch_func is now placed into an isolated
nested module during the outlining stage to simplify separate compilation.
Until recently, modules did not have names and could not be referenced. This
limitation was circumvented by introducing a stub kernel at the same name at
the same nesting level as the module containing the actual kernel. This
relation is only effective in one direction: from actual kernel function to its
launch_func "caller".
Leverage the recently introduced symbol name attributes on modules to refer to
a specific nested module from `gpu.launch_func`. This removes the implicit
connection between the identically named stub and kernel functions. It also
enables support for `gpu.launch_func`s to call different kernels located in the
same module.
PiperOrigin-RevId: 273491891
This makes the name of the conversion pass more consistent with the naming
scheme, since it actually converts from the Loop dialect to the Standard
dialect rather than working with arbitrary control flow operations.
PiperOrigin-RevId: 272612112
This also adds coverage with a missing test, which uncovered a bug in the conditional for testing whether an offset is dynamic or not.
PiperOrigin-RevId: 272505798