The commit switching the calling convention for memrefs (5a1778057)
inadvertently introduced a bug in the function argument attribute conversion:
due to incorrect indexing of function arguments it was not assigning the
attributes to the arguments beyond those generated from the first original
argument. This was not caught in the commit since the test suite does have a
test for converting multi-argument functions with argument attributes. Fix the
bug and add relevant tests.
Implement a pass to convert gpu.launch_func op into a sequence of
Vulkan runtime calls. The Vulkan runtime API surface is huge so currently we
don't expose separate external functions in IR for each of them, instead we
expose a few external functions to wrapper libraries which manages
Vulkan runtime.
Differential Revision: https://reviews.llvm.org/D74549
Summary:
To unblock other work, this implements basic lowering based on mapping
attributes that have to be provided on all loop.parallel. The lowering
does not yet support reduce.
Differential Revision: https://reviews.llvm.org/D73893
This patch adapts the method MemRefDescriptor::fromStaticShape to
support static non-zero offsets. The updated method uses the
getStridesAndOffset method to extract strides and offset. The patch also
adapts the test cases since sizes and strides are now set in forward
instead of reverse order.
Differential Revision: https://reviews.llvm.org/D74474
A memref_cast casting to a memref with a non identity map can't be
lowered to llvm. Take the following case:
```
func @invalid_memref_cast(%arg0: memref<?x?xf64>) {
%c1 = constant 1 : index
%c0 = constant 0 : index
%5 = memref_cast %arg0 : memref<?x?xf64> to memref<?x?xf64, #map1>
%25 = std.subview %5[%c0, %c0][%c1, %c1][] : memref<?x?xf64, #map1> to memref<?x?xf64, #map1>
return
}
```
When lowering the subview mlir was assuming `%5` to have an llvm type
(which is not the case as mlir failed to lower the memref_cast).
Differential Revision: https://reviews.llvm.org/D74466
Thus far we have been using builtin func op to model SPIR-V functions.
It was because builtin func op used to have special treatment in
various parts of the core codebase (e.g., pass pipelines, etc.) and
it's easy to bootstrap the development of the SPIR-V dialect. But
nowadays with general op concepts and region support we don't have
such limitations and it's time to tighten the SPIR-V dialect for
completeness.
This commits introduces a spv.func op to properly model SPIR-V
functions. Compared to builtin func op, it can provide the following
benefits:
* We can control the full op so we can integrate SPIR-V information
bits (e.g., function control) in a more integrated way and define
our own assembly form and enforcing better verification.
* We can have a better dialect and library boundary. At the current
moment only functions are modelled with an external op. With this
change, all ops modelling SPIR-V concpets will be spv.* ops and
registered to the SPIR-V dialect.
* We don't need to special-case func op anymore when creating
ConversionTarget declaring SPIR-V dialect as legal. This is quite
important given we'll see more and more conversions in the future.
In the process, bumps a few FuncOp methods to the FunctionLike trait.
Differential Revision: https://reviews.llvm.org/D74226
Follow-up on D72802. Turn -convert-std-to-llvm-use-alloca and
-convert-std-to-llvm-bare-ptr-memref-call-conv into pass flags
of LLVMLoweringPass.
Reviewed By: mehdi_amini
Differential Revision: https://reviews.llvm.org/D73912
We have spv.entry_point_abi for specifying the local workgroup size.
It should be decorated onto input gpu.func ops to drive the SPIR-V
CodeGen to generate the proper SPIR-V module execution mode. Compared
to using command-line options for specifying the configuration, using
attributes also has the benefits that 1) we are now able to use
different local workgroup for different entry points and 2) the
tests contains the configuration directly.
Differential Revision: https://reviews.llvm.org/D74012
The current standard to llvm conversion pass lowers subview ops only if
dynamic offsets are provided. This commit extends the lowering with a
code path that uses the constant offset of the target memref for the
subview op lowering (see Example 3 of the subview op definition for an
example) if no dynamic offsets are provided.
Differential Revision: https://reviews.llvm.org/D74280
The existing (default) calling convention for memrefs in standard-to-LLVM
conversion was motivated by interfacing with LLVM IR produced from C sources.
In particular, it passes a pointer to the memref descriptor structure when
calling the function. Therefore, the descriptor is allocated on stack before
the call. This convention leads to several problems. PR44644 indicates a
problem with stack exhaustion when calling functions with memref-typed
arguments in a loop. Allocating outside of the loop may lead to concurrent
access problems in case the loop is parallel. When targeting GPUs, the contents
of the stack-allocated memory for the descriptor (passed by pointer) needs to
be explicitly copied to the device. Using an aggregate type makes it impossible
to attach pointer-specific argument attributes pertaining to alignment and
aliasing in the LLVM dialect.
Change the default calling convention for memrefs in standard-to-LLVM
conversion to transform a memref into a list of arguments, each of primitive
type, that are comprised in the memref descriptor. This avoids stack allocation
for ranked memrefs (and thus stack exhaustion and potential concurrent access
problems) and simplifies the device function invocation on GPUs.
Provide an option in the standard-to-LLVM conversion to generate auxiliary
wrapper function with the same interface as the previous calling convention,
compatible with LLVM IR porduced from C sources. These auxiliary functions
pack the individual values into a descriptor structure or unpack it. They also
handle descriptor stack allocation if necessary, serving as an allocation
scope: the memory reserved by `alloca` will be freed on exiting the auxiliary
function.
The effect of this change on MLIR-generated only LLVM IR is minimal. When
interfacing MLIR-generated LLVM IR with C-generated LLVM IR, the integration
only needs to require auxiliary functions and change the function name to call
the wrapper function instead of the original function.
This also opens the door to forwarding aliasing and alignment information from
memrefs to LLVM IR pointers in the standrd-to-LLVM conversion.
The existing lowering of gpu.block_dim added a global variable with
the WorkGroupSize decoration. This raises an error within
Vulkan/SPIR-V validation since Vulkan requires this to have a constant
initializer. This is not yet supported in SPIR-V dialect. Changing the
lowering to return the workgroup size as a constant value instead,
obtained from spv.entry_point_abi attribute gets around the issue for
now. The validation goes through since the workgroup size is specified
using spv.execution_mode operation.
Summary:
The `vector.fma` operation is portable enough across targets that we do not want
to keep it wrapped under `vector.outerproduct` and `llvm.intrin.fmuladd`.
This revision lifts the op into the vector dialect and implements the lowering to LLVM by using two patterns:
1. a pattern that lowers from n-D to (n-1)-D by unrolling when n > 2
2. a pattern that converts from 1-D to the proper LLVM representation
Reviewers: ftynse, stellaraccident, aartbik, dcaballe, jsetoain, tetuante
Reviewed By: aartbik
Subscribers: fhahn, dcaballe, merge_guards_bot, mehdi_amini, rriddle, jpienaar, burmako, shauheen, antiagainst, arpith-jacob, mgester, lucyrfox, aartbik, liufengdb, Joonsoo, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D74075
Summary:
This revision exposes the portable `llvm.fma` intrinsic in LLVMOps and uses it
in lieu of `llvm.fmuladd` when lowering the `vector.outerproduct` op to LLVM.
This guarantees proper `fma` instructions will be emitted if the target ISA
supports it.
`llvm.fmuladd` does not have this guarantee in its semantics, despite evidence
that the proper x86 instructions are emitted.
For more details, see https://llvm.org/docs/LangRef.html#llvm-fmuladd-intrinsic.
Reviewers: ftynse, aartbik, dcaballe, fhahn
Reviewed By: aartbik
Subscribers: mehdi_amini, rriddle, jpienaar, burmako, shauheen, antiagainst, arpith-jacob, mgester, lucyrfox, liufengdb, Joonsoo, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D74219
We were using normal dictionary attribute for target environment
specification. It becomes cumbersome with more and more fields.
This commit changes the modelling to a dialect-specific attribute,
where we can have control over its storage and assembly form.
Differential Revision: https://reviews.llvm.org/D73959
This commit adds two resource limits, max_compute_workgroup_size
and max_compute_workgroup_invocations as resource limits to
the target environment. They are not used at the current moment,
but they will affect the SPIR-V CodeGen. Adding for now to have
a proper target environment modelling.
Differential Revision: https://reviews.llvm.org/D73905
Summary:
In the original design, gpu.launch required explicit capture of uses
and passing them as operands to the gpu.launch operation. This was
motivated by infrastructure restrictions rather than design. This
change lifts the requirement and removes the concept of kernel
arguments from gpu.launch. Instead, the kernel outlining
transformation now does the explicit capturing.
This is a breaking change for users of gpu.launch.
Differential Revision: https://reviews.llvm.org/D73769
Summary:
This patch introduces an alternative calling convention for
MemRef function arguments in LLVM dialect. It converts MemRef
function arguments to LLVM bare pointers to the MemRef element
type instead of creating a MemRef descriptor. Bare pointers are
then promoted to a MemRef descriptors at the beginning of the
function. This calling convention is only enabled with a flag.
Reviewers: ftynse, bondhugula, nicolasvasilache, rriddle, mehdi_amini
Reviewed By: ftynse, rriddle, mehdi_amini
Subscribers: Joonsoo, flaub, merge_guards_bot, jholewinski, mehdi_amini, rriddle, jpienaar, burmako, shauheen, antiagainst, csigg, arpith-jacob, mgester, lucyrfox, herhut, aartbik, liufengdb, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D72802
Summary:
Rationale:
When lowering to LLVM for different rank insert (n vs k), the offset
arrays needs to drop one dimension (becomes n-1), but the strides
array needs to be preserved (remains k). With regression test.
Note that this example was actually in the documentation, so
extra important to do it right :-)
Reviewers: nicolasvasilache, andydavis1, ftynse
Reviewed By: nicolasvasilache, ftynse
Subscribers: Joonsoo, merge_guards_bot, mehdi_amini, rriddle, jpienaar, burmako, shauheen, antiagainst, arpith-jacob, mgester, lucyrfox, liufengdb, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D73733
This commit adds a pattern to lower linalg.generic for reduction
to spv.GroupNonUniform* ops. Right now this only supports integer
reduction on 1-D input memref. Shader entry point ABI is queried
to make sure that the input memref's shape matches the local
workgroup's invocation configuration. This makes sure that the
workload fits in one local workgroup so that we can leverage
SPIR-V group non-uniform operations.
linglg.generic is a structured op that preserves the right level
of information. It is easier to recognize reduction at this level
than performing analysis on loops.
This commit also exposes `getElementPtr` in SPIRVLowering.h given
that it's a generally useful utility function.
Differential Revision: https://reviews.llvm.org/D73437
Summary:
The current code assumes that one always maps at least one loop to block
dimensions and at least one loop to thread dimensions. If either is not
the case, a loop would get mapped twice.
Differential Revision: https://reviews.llvm.org/D73685
Summary:
In the scope of the lowering phase from GPU to ROCDL, the intructions for the conversion patterns seems to be wrong.
According to https://github.com/ROCm-Developer-Tools/HIP/blob/master/include/hip/hcc_detail/math_fwd.h the instructions need two underscores in the beginning instead of one.
Reviewers: nicolasvasilache, herhut, rriddle
Reviewed By: herhut, rriddle
Subscribers: merge_guards_bot, mehdi_amini, rriddle, jpienaar, burmako, shauheen, antiagainst, csigg, arpith-jacob, mgester, lucyrfox, herhut, liufengdb, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D73535
Summary:
The 'gpu.terminator' operation is used as the terminator for the
regions of gpu.launch. This is to disambugaute them from the
return operation on 'gpu.func' functions.
This is a breaking change and users of the gpu dialect will need
to adapt their code when producting 'gpu.launch' operations.
Reviewers: nicolasvasilache
Subscribers: mehdi_amini, rriddle, jpienaar, burmako, shauheen, antiagainst, csigg, arpith-jacob, mgester, lucyrfox, liufengdb, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D73620
Summary:
This also removes the explicit pattern for loop.terminator to ensure
that the terminator is only erased if the parent op is rewritten.
Reductions are not yet supported.
Reviewers: nicolasvasilache
Subscribers: mehdi_amini, rriddle, jpienaar, burmako, shauheen, antiagainst, arpith-jacob, mgester, lucyrfox, aartbik, liufengdb, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D73348
Summary:
The tanh lowering from Standard dialect to NVVM and ROCDL was not working.
The conversion pattern are inserted in the lowering files.
The test cases for the lowerings were added in the test files.
Reviewers: nicolasvasilache, ftynse, herhut
Reviewed By: ftynse, herhut
Subscribers: merge_guards_bot, ftynse, jholewinski, mehdi_amini, rriddle, jpienaar, burmako, shauheen, antiagainst, csigg, arpith-jacob, mgester, lucyrfox, herhut, liufengdb, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D73471
Add lowering for constant operation with ranked tensor type to
spv.constant with spv.array type.
Differential Revision: https://reviews.llvm.org/D73022
Summary:
This is based on the use of code constantly checking for an attribute on
a model and instead represents the distinct operaion with a different
op. Instead, this op can be used to provide better filtering.
Reverts "Revert "[mlir] Create a gpu.module operation for the GPU Dialect.""
This reverts commit ac446302ca4145cdc89f377c0c364c29ee303be5 after
fixing internal Google issues.
This additionally updates ROCDL lowering to use the new gpu.module.
Reviewers: herhut, mravishankar, antiagainst, nicolasvasilache
Subscribers: jholewinski, mgorny, mehdi_amini, jpienaar, burmako, shauheen, csigg, arpith-jacob, mgester, lucyrfox, aartbik, liufengdb, llvm-commits, mravishankar, rriddle, antiagainst, bkramer
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D72921
When lowering `loop.if` to `spv.selection` we explicitly create
a selection header block before the control flow diverges and a
merge block where control flow subsequently converges.
Differential Revision: https://reviews.llvm.org/D72836
Summary:
This is based on the use of code constantly checking for an attribute on
a model and instead represents the distinct operaion with a different
op. Instead, this op can be used to provide better filtering.
Reviewers: herhut, mravishankar, antiagainst, rriddle
Reviewed By: herhut, antiagainst, rriddle
Subscribers: liufengdb, aartbik, jholewinski, mgorny, mehdi_amini, rriddle, jpienaar, burmako, shauheen, antiagainst, nicolasvasilache, csigg, arpith-jacob, mgester, lucyrfox, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D72336
Summary: The current syntax for AffineMapAttr and IntegerSetAttr conflict with function types, making it currently impossible to round-trip function types(and e.g. FuncOp) in the IR. This revision changes the syntax for the attributes by wrapping them in a keyword. AffineMapAttr is wrapped with `affine_map<>` and IntegerSetAttr is wrapped with `affine_set<>`.
Reviewed By: nicolasvasilache, ftynse
Differential Revision: https://reviews.llvm.org/D72429
Summary:
This diff implements the progressive lowering of insert_strided_slice.
Two cases appear:
1. when the source and dest vectors have different ranks, extract the dest
subvector at the proper offset and reduce to case 2.
2. when they have the same rank N:
a. if the source and dest type are the same, the insertion is trivial:
just forward the source
b. otherwise, iterate over all N-1 D subvectors and create an
extract/insert_strided_slice/insert replacement, reducing the problem
to vecotrs of the same N-1 rank.
This combines properly with the other conversion patterns to lower all the way to LLVM.
Reviewers: ftynse, rriddle, AlexEichenberger, andydavis1, tetuante, nicolasvasilache
Reviewed By: andydavis1
Subscribers: merge_guards_bot, mehdi_amini, jpienaar, burmako, shauheen, antiagainst, arpith-jacob, mgester, lucyrfox, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D72317
Summary:
This diff implements the progressive lowering of strided_slice to either:
1. extractelement + insertelement for the 1-D case
2. extract + optional strided_slice + insert for the n-D case.
This combines properly with the other conversion patterns to lower all the way to LLVM.
Appropriate tests are added.
Reviewers: ftynse, rriddle, AlexEichenberger, andydavis1, tetuante
Reviewed By: andydavis1
Subscribers: merge_guards_bot, mehdi_amini, jpienaar, burmako, shauheen, antiagainst, arpith-jacob, mgester, lucyrfox, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D72310
This commit fixes shader ABI attributes to use `spv.` as the prefix
so that they match the dialect's namespace. This enables us to add
verification hooks in the SPIR-V dialect to verify them.
Reviewed By: mravishankar
Differential Revision: https://reviews.llvm.org/D72062
The conversion from std.and/std.or to spv.LogicalAnd/spv.LogicalOr is
only valid for boolean (i1) types. Modify BinaryOpPattern in
StandardToSPIRV.td to allow limiting the type of the operands for
which the pattern is applied.
Differential Revision: https://reviews.llvm.org/D71881
Rename the 'shlis' operation in the standard dialect to 'shift_left'. Add tests
for this operation (these have been missing so far) and add a lowering to the
'shl' operation in the LLVM dialect.
Add also 'shift_right_signed' (lowered to LLVM's 'ashr') and 'shift_right_unsigned'
(lowered to 'lshr').
The original plan was to name these operations 'shift.left', 'shift.right.signed'
and 'shift.right.unsigned'. This works if the operations are prefixed with 'std.'
in MLIR assembly. Unfortunately during import the short form is ambigous with
operations from a hypothetical 'shift' dialect. The best solution seems to omit
dots in standard operations for now.
Closestensorflow/mlir#226
PiperOrigin-RevId: 286803388
This will allow us to lower most of gpu.all_reduce (when all_reduce
doesn't exist in the target dialect) within the GPU dialect, and only do
target-specific lowering for the shuffle op.
PiperOrigin-RevId: 286548256