Ensure that `gpu.func` is only used within the dedicated `gpu.module`.
Implement the constraint to the GPU dialect and adopt test cases.
Differential Revision: https://reviews.llvm.org/D78541
It currently requires that the condition match the shape of the selected value, but this is only really useful for things like masks. This revision allows for the use of i1 to mean that all of the vector/tensor is selected. This also matches the behavior of LLVM select. A benefit of this change is that transformations that want to generate selects, like those on the CFG, don't have to special case vector/tensor. Previously the only way to generate a select from an i1 was to use a splat, but that doesn't support dynamically shaped/unranked tensors.
Differential Revision: https://reviews.llvm.org/D78690
This revision adds support for canonicalizing the following:
```
br ^bb1
^bb1
br ^bbN(...)
br ^bbN(...)
```
Differential Revision: https://reviews.llvm.org/D78683
This revision adds support for canonicalizing the following:
```
cond_br %cond, ^bb1(A, ..., N), ^bb1(A, ..., N)
br ^bb1(A, ..., N)
```
If the operands to the successor are different and the cond_br is the only predecessor, we emit selects for the branch operands.
```
cond_br %cond, ^bb1(A), ^bb1(B)
%select = select %cond, A, B
br ^bb1(%select)
```
Differential Revision: https://reviews.llvm.org/D78682
Summary:
Use a nested symbol to identify the kernel to be invoked by a `LaunchFuncOp` in the GPU dialect.
This replaces the two attributes that were used to identify the kernel module and the kernel within seperately.
Differential Revision: https://reviews.llvm.org/D78551
Summary:
Use the shortcu `kernel` for the `gpu.kernel` attribute of `gpu.func`.
The parser supports this and test cases are easier to read.
Differential Revision: https://reviews.llvm.org/D78542
Summary:
Fix a broken test case in the `invalid.mlir` lit test case.
`expect` was missing its `e`.
Differential Revision: https://reviews.llvm.org/D78540
The buffer allocated by a promotion can be subject to other transformations afterward. For example it could be vectorized, in which case it is needed to ensure that this buffer is memory-aligned.
Differential Revision: https://reviews.llvm.org/D78556
This revision is the first in a set of improvements that aim at allowing
more generalized named Linalg op generation from a mathematical
specification.
This revision allows creating a new op and checks that the parser,
printer and verifier are hooked up properly.
This opened up a few design points that will be addressed in the future:
1. A named linalg op has a static region builder instead of an
explicitly parsed region. This is not currently compatible with
assemblyFormat so a custom parser / printer are needed.
2. The convention for structured ops and tensor return values needs to
evolve to allow tensor-land and buffer land specifications to agree
3. ReferenceIndexingMaps and referenceIterators will need to become
static to allow building attributes at parse time.
4. Error messages will be improved once we have 3. and we pretty print
in custom form.
Differential Revision: https://reviews.llvm.org/D78327
Unfortunately FileCheck ignores directives with whitespace between the directive and the colon (`CHECK :` for example), thus most of the directives of this test were ignored.
Differential Revision: https://reviews.llvm.org/D78548
The promotion transformation is promoting all input and output buffers of the transformed op. The user might want to only promote some of these buffers.
Differential Revision: https://reviews.llvm.org/D78498
Fix intra-tile upper bound setting in a scenario where the tile size was
larger than the trip count.
Differential Revision: https://reviews.llvm.org/D78505
Summary:
Rather than having a full, recursive, lowering of vector.broadcast
to LLVM IR, it is much more elegant to have a progressive lowering
of each vector.broadcast into a lower dimensional vector.broadcast,
until only elementary vector operations remain. This results
in more elegant, step-wise code, that is easier to understand.
Also makes some optimizations in the generated code.
Reviewers: nicolasvasilache, mehdi_amini, andydavis1, grosul1
Reviewed By: nicolasvasilache
Subscribers: mehdi_amini, rriddle, jpienaar, burmako, shauheen, antiagainst, nicolasvasilache, arpith-jacob, mgester, lucyrfox, liufengdb, Joonsoo, grosul1, frgossen, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D78071
The function attribute in generic ops is not paying for itself.
A region is the more standardized way of specifying a custom computation.
If needed this region can call a function directly.
This is deemed more natural than managing a dedicated function attribute.
This also simplifies named ops generation by trimming unnecessary complexity.
Differential Revision: https://reviews.llvm.org/D78266
Summary:
Modified AffineMap::get to remove support for the overload which allowed
an ArrayRef of AffineExpr but no context (and gathered the context from a
presumed first entry, resulting in bugs when there were 0 results).
Instead, we support only a ArrayRef and a context, and a version which
takes a single AffineExpr.
Additionally, removed some now needless case logic which previously
special cased which call to AffineMap::get to use.
Reviewers: flaub, bondhugula, rriddle!, nicolasvasilache, ftynse, ulysseB, mravishankar, antiagainst, aartbik
Subscribers: mehdi_amini, jpienaar, burmako, shauheen, antiagainst, arpith-jacob, mgester, lucyrfox, liufengdb, Joonsoo, bader, grosul1, frgossen, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D78226
This revision introduces a utility to unswitch affine.for/parallel loops
by hoisting affine.if operations past surrounding affine.for/parallel.
The hoisting works for both perfect/imperfect nests and in the presence
of else blocks. The hoisting is currently to as outermost a level as
possible. Uses a test pass to test the utility.
Add convenience method Operation::getParentWithTrait<Trait>.
Depends on D77487.
Differential Revision: https://reviews.llvm.org/D77870
Similarly to actual LLVM IR, and to `llvm.mlir.func`, allow the custom syntax
of `llvm.mlir.global` to omit the linkage keyword. If omitted, the linkage is
assumed to be external. This makes the modeling of globals in the LLVM dialect
more consistent, both within the dialect and with LLVM IR.
Differential Revision: https://reviews.llvm.org/D78096
Introduce mlir::applyOpPatternsAndFold which applies patterns as well as
any folding only on a specified op (in contrast to
applyPatternsAndFoldGreedily which applies patterns only on the regions
of an op isolated from above). The caller is made aware of the op being
folded away or erased.
Depends on D77485.
Differential Revision: https://reviews.llvm.org/D77487
The inversePermutation method returns a null map on failure. Update
uses of this method within Linalg to handle this. In LinalgToLoops the
null return value was used to emit scalar code. Modify that to return
failure, and emit scalar implementation when affine map is "empty",
i.e. 1 dims, 0 symbols and no result exprs.
Differential Revision: https://reviews.llvm.org/D77964
The invertPermutation method does not return a nullptr anymore, but
rather returns an empty map for the scalar case. Update the check in
LinalgToLoops to reflect this.
Also add test case for generating scalar code.
The outer parallel loops of a linalg operation is lowered to
loop.parallel, with the other loops lowered to loop.for. This gets the
lowering to loop.parallel on par with the loop.for lowering. In future
the reduction loop could also be lowered to loop.parallel.
Also add a utility function that returns the loops that are
created.
Differential Revision: https://reviews.llvm.org/D77678
NFC clean up for simplify-affine-structures test cases. Rename sets
better; avoid suffix numbers; move outlined definitions close to use.
This is in preparation for other functionality updates.
Differential Revision: https://reviews.llvm.org/D78017
This commit added stride support in runtime array types. It also
adjusted the assembly form for the stride from `[N]` to `stride=N`.
This makes the IR more readable, especially for the cases where
one mix array types and struct types.
Differential Revision: https://reviews.llvm.org/D78034
This patch adds support for taskwait and taskyield operations in OpenMP dialect and translation of the these constructs to LLVM IR. The OpenMP IRBuilder is used for this translation.
The patch includes code changes and a testcase modifications.
Differential Revision: https://reviews.llvm.org/D77634
Summary:
LLVM matrix intrinsics recently introduced an option to support row-major mode.
This matches the MLIR vector model, this revision switches to row-major.
A corner case related to degenerate sizes was also fixed upstream.
This revision removes the guard against this corner case.
A bug was uncovered on the output vector construction which this revision also fixes.
Lastly, this has been tested on a small size and benchmarked independently: no visible performance regression is observed.
In the future, when matrix intrinsics support per op attribute, we can more aggressively translate to that and avoid inserting MLIR-level transposes.
This has been tested independently to work on small matrices.
Differential Revision: https://reviews.llvm.org/D77761
This revision builds a simple "fused pass" consisting of 2 levels of tiling, memory promotion and vectorization using linalg transformations written as composable pattern rewrites.
Summary: Pass options are a better choice for various reasons and avoid the need for static constructors.
Differential Revision: https://reviews.llvm.org/D77707
Summary:
Update ShapeCastOp folder to use producer-consumer value forwarding.
Support is added for tracking sub-vectors through trivial shape cast operations,
where the sub-vector shape is preserved across shape cast operations and only
leading ones are added or removed.
Support is preserved for cancelling shape cast operations.
One unit test is added and two are updated.
Reviewers: aartbik, nicolasvasilache
Reviewed By: aartbik, nicolasvasilache
Subscribers: frgossen, mehdi_amini, rriddle, jpienaar, burmako, shauheen, antiagainst, arpith-jacob, mgester, lucyrfox, liufengdb, Joonsoo, grosul1, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D77253
This revision removes the reliance of Promotion on `linalg.slice` which is meant
for the rank-reducing case.
Differential Revision: https://reviews.llvm.org/D77676
Summary:
* Removal of FxpMathOps was discussed on the mailing list.
* Will send a courtesy note about also removing the Quantizer (which had some dependencies on FxpMathOps).
* These were only ever used for experimental purposes and we know how to get them back from history as needed.
* There is a new proposal for more generalized quantization tooling, so moving these older experiments out of the way helps clean things up.
Subscribers: mgorny, mehdi_amini, rriddle, jpienaar, burmako, shauheen, antiagainst, nicolasvasilache, arpith-jacob, mgester, lucyrfox, liufengdb, Joonsoo, grosul1, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D77479
If we have two back-to-back loops with block arguments, the OpPhi
instructions generated for the second loop's block arguments should
have use the merge block of the first SPIR-V loop structure as
their incoming parent block.
Differential Revision: https://reviews.llvm.org/D77543
Fix point-wise copy generation to work with bounds that have max/min.
Change structure of copy loop nest to use absolute loop indices and
subtracting base from the indexes of the fast buffers. Update supporting
utilities: Fix FlatAffineConstraints::getLowerAndUpperBound to look at
equalities as well and for a missing division. Update unionBoundingBox
to not discard common constraints (leads to a tighter system). Update
MemRefRegion::getConstantBoundingSizeAndShape to add memref dimension
constraints. Run removeTrivialRedundancy at the end of
MemRefRegion::compute. Run single iteration loop promotion and
load/store canonicalization after affine data copy (in its test pass as
well).
Differential Revision: https://reviews.llvm.org/D77320
Summary:
This revision adds a tensor_reshape operation that operates on tensors.
In the tensor world the constraints are less stringent and we can allow more
arbitrary dynamic reshapes, as long as they are contractions.
The expansion of a dynamic dimension into multiple dynamic dimensions is under-specified and is punted on for now.
Differential Revision: https://reviews.llvm.org/D77360
Summary: This revision adds support for marking the last region as variadic in the ODS region list with the VariadicRegion directive.
Differential Revision: https://reviews.llvm.org/D77455
Two back-to-back transpose operations are combined into a single transpose, which uses a combination of their permutation vectors.
Differential Revision: https://reviews.llvm.org/D77331
Add a method that given an affine map returns another with just its unique
results. Use this to drop redundant bounds in max/min for affine.for. Update
affine.for's canonicalization pattern and createCanonicalizedForOp to use
this.
Differential Revision: https://reviews.llvm.org/D77237
Summary:
The RAW fusion happens only if the produecer block dominates the consumer block.
The WAW pattern also works with the precondition. I.e., if a producer can
dominate the consumer, they can fairly fuse together.
Since they are all tilable, we can think the pattern like this way:
Input:
```
linalg_op1 view
tile_loop
subview_2
linalg_op2 subview_2
```
Tile the first Linalg op as same as the second Linalg.
```
tile_loop
subview_1
linalg_op1 subview_1
tile_loop
subview_2
liangl_op2 subview_2
```
Since the first Linalg op is tilable in the same way and the computation are
independently, it's fair to fuse it with the second Linalg op.
```
tile_loop
subview_1
linalg_op1 subview_1
linalg_op2 subview_2
```
In short, this patch includes:
- Handling both RAW and WAW pattern.
- Adding a interface method to get input and output buffers.
- Exposing a method to get a StringRef of a dependency type.
- Fixing existing WAW tests and add one more use case: initialize the buffer
before conv op.
Differential Revision: https://reviews.llvm.org/D76897
Summary:
Performs an N-D pooling operation similarly to the description in the TF
documentation:
https://www.tensorflow.org/api_docs/python/tf/nn/pool
Different from the description, this operation doesn't perform on batch and
channel. It only takes tensors of rank `N`.
```
output[x[0], ..., x[N-1]] =
REDUCE_{z[0], ..., z[N-1]}
input[
x[0] * strides[0] - pad_before[0] + dilation_rate[0]*z[0],
...
x[N-1]*strides[N-1] - pad_before[N-1] + dilation_rate[N-1]*z[N-1]
],
```
The required optional arguments are:
- strides: an i64 array specifying the stride (i.e. step) for window
loops.
- dilations: an i64 array specifying the filter upsampling/input
downsampling rate
- padding: an i64 array of pairs (low, high) specifying the number of
elements to pad along a dimension.
If strides or dilations attributes are missing then the default value is
one for each of the input dimensions. Similarly, padding values are zero
for both low and high in each of the dimensions, if not specified.
Differential Revision: https://reviews.llvm.org/D76414