This reverts commit 0d48d265db.
This reapplies the following commit, with a fix for CAPI/ir.c:
[mlir] Start splitting the `tensor` dialect out of `std`.
This starts by moving `std.extract_element` to `tensor.extract` (this
mirrors the naming of `vector.extract`).
Curiously, `std.extract_element` supposedly works on vectors as well,
and this patch removes that functionality. I would tend to do that in
separate patch, but I couldn't find any downstream users relying on
this, and the fact that we have `vector.extract` made it seem safe
enough to lump in here.
This also sets up the `tensor` dialect as a dependency of the `std`
dialect, as some ops that currently live in `std` depend on
`tensor.extract` via their canonicalization patterns.
Part of RFC: https://llvm.discourse.group/t/rfc-split-the-tensor-dialect-from-std/2347/2
Differential Revision: https://reviews.llvm.org/D92991
This starts by moving `std.extract_element` to `tensor.extract` (this
mirrors the naming of `vector.extract`).
Curiously, `std.extract_element` supposedly works on vectors as well,
and this patch removes that functionality. I would tend to do that in
separate patch, but I couldn't find any downstream users relying on
this, and the fact that we have `vector.extract` made it seem safe
enough to lump in here.
This also sets up the `tensor` dialect as a dependency of the `std`
dialect, as some ops that currently live in `std` depend on
`tensor.extract` via their canonicalization patterns.
Part of RFC: https://llvm.discourse.group/t/rfc-split-the-tensor-dialect-from-std/2347/2
Differential Revision: https://reviews.llvm.org/D92991
A separate AVX512 lowering pass does not compose well with the regular
vector lowering pass. As such, it is at risk of code duplication and
lowering inconsistencies. This change removes the separate AVX512 lowering
pass and makes it an "option" in the regular vector lowering pass
(viz. vector dialect "augmented" with AVX512 dialect).
Reviewed By: rriddle
Differential Revision: https://reviews.llvm.org/D92614
The ops are very similar to the std variants, but support async GPU execution.
gpu.alloc does not currently support an alignment attribute, and the new ops do not have
canonicalizers/folders like their std siblings do.
Reviewed By: herhut
Differential Revision: https://reviews.llvm.org/D91698
Introduce a conversion pass from SCF parallel loops to OpenMP dialect
constructs - parallel region and workshare loop. Loops with reductions are not
supported because the OpenMP dialect cannot model them yet.
The conversion currently targets only one level of parallelism, i.e. only
one top-level `omp.parallel` operation is produced even if there are nested
`scf.parallel` operations that could be mapped to `omp.wsloop`. Nested
parallelism support is left for future work.
Reviewed By: kiranchandramohan
Differential Revision: https://reviews.llvm.org/D91982
It is a simple conversion that only requires to change the region argument
types, generalize it from ParallelOp.
Reviewed By: kiranchandramohan
Differential Revision: https://reviews.llvm.org/D91989
The existing implementation of the conversion from SCF Parallel operation to
SCF "for" loops in order to further convert those loops to branch-based CFG has
been cloning the loop and reduction body operations into the new loop because
ConversionPatternRewriter was missing support for moving blocks while replacing
their arguments. This functionality now available, use it to implement the
conversion and avoid cloning operations, which may lead to doubling of the IR
size during the conversion.
In addition, this fixes an issue with converting nested SCF "if" conditionals
present in "parallel" operations that would cause the conversion infrastructure
to stop because of the repeated application of the pattern converting "newly"
created "if"s (which were in fact just moved). Arguably, this should be fixed
at the infrastructure level and this fix is a workaround.
Reviewed By: herhut
Differential Revision: https://reviews.llvm.org/D91955
Depends On D89963
**Automatic reference counting algorithm outline:**
1. `ReturnLike` operations forward the reference counted values without
modifying the reference count.
2. Use liveness analysis to find blocks in the CFG where the lifetime of
reference counted values ends, and insert `drop_ref` operations after
the last use of the value.
3. Insert `add_ref` before the `async.execute` operation capturing the
value, and pairing `drop_ref` before the async body region terminator,
to release the captured reference counted value when execution
completes.
4. If the reference counted value is passed only to some of the block
successors, insert `drop_ref` operations in the beginning of the blocks
that do not have reference coutned value uses.
Reviewed By: silvas
Differential Revision: https://reviews.llvm.org/D90716
Null types are commonly used as an error marker. Catch them in the constructor
of Operation if they are present in the result type list, as otherwise this
could lead to further surprising behavior when querying op result types.
Fix AsyncToLLVM and StandardToLLVM that were using null types when constructing
operations.
Reviewed By: rriddle
Differential Revision: https://reviews.llvm.org/D91770
This commit does the renaming mentioned in the title in order to bring
'spv' dialect closer to the MLIR naming conventions.
Reviewed By: antiagainst
Differential Revision: https://reviews.llvm.org/D91792
This commit does the renaming mentioned in the title in order to bring
'spv' dialect closer to the MLIR naming conventions.
Reviewed By: antiagainst
Differential Revision: https://reviews.llvm.org/D91797
Make the interface match the one of ConvertToLLVMPattern::getDataPtr() (to be removed in a separate change).
Reviewed By: ftynse
Differential Revision: https://reviews.llvm.org/D91599
std.alloc only supports memrefs with identity layout, which means we can simplify the lowering to LLVM and compute strides only from (static and dynamic) sizes.
Reviewed By: ftynse
Differential Revision: https://reviews.llvm.org/D91549
This commit does the renaming mentioned in the title in order to bring
`spv` dialect closer to the MLIR naming conventions.
Reviewed By: antiagainst
Differential Revision: https://reviews.llvm.org/D91609
The current code allows strided layouts, but the number of elements allocated is ambiguous. It could be either the number of elements in the shape (the current implementation), or the amount of elements required to not index out-of-bounds with the given maps (which would require evaluating the layout map).
If we require the canonical layouts, the two will be the same.
Reviewed By: nicolasvasilache, ftynse
Differential Revision: https://reviews.llvm.org/D91523
The logic of vector on boolean was missed. This patch adds the logic and test on
it.
Reviewed By: mravishankar
Differential Revision: https://reviews.llvm.org/D91403
Depends On D89958
1. Adds `async.group`/`async.awaitall` to group together multiple async tokens/values
2. Rewrite scf.parallel operation into multiple concurrent async.execute operations over non overlapping subranges of the original loop.
Example:
```
scf.for (%i, %j) = (%lbi, %lbj) to (%ubi, %ubj) step (%si, %sj) {
"do_some_compute"(%i, %j): () -> ()
}
```
Converted to:
```
%c0 = constant 0 : index
%c1 = constant 1 : index
// Compute blocks sizes for each induction variable.
%num_blocks_i = ... : index
%num_blocks_j = ... : index
%block_size_i = ... : index
%block_size_j = ... : index
// Create an async group to track async execute ops.
%group = async.create_group
scf.for %bi = %c0 to %num_blocks_i step %c1 {
%block_start_i = ... : index
%block_end_i = ... : index
scf.for %bj = %c0 t0 %num_blocks_j step %c1 {
%block_start_j = ... : index
%block_end_j = ... : index
// Execute the body of original parallel operation for the current
// block.
%token = async.execute {
scf.for %i = %block_start_i to %block_end_i step %si {
scf.for %j = %block_start_j to %block_end_j step %sj {
"do_some_compute"(%i, %j): () -> ()
}
}
}
// Add produced async token to the group.
async.add_to_group %token, %group
}
}
// Await completion of all async.execute operations.
async.await_all %group
```
In this example outer loop launches inner block level loops as separate async
execute operations which will be executed concurrently.
At the end it waits for the completiom of all async execute operations.
Reviewed By: ftynse, mehdi_amini
Differential Revision: https://reviews.llvm.org/D89963
This exposes a hook to configure legality of operations such that only
`scf.parallel` operations that have mapping attributes are marked as
illegal. Consequently, the transformation can now also be applied to
mixed forms.
Differential Revision: https://reviews.llvm.org/D91340
This patch introduces a new conversion pattern for `spv.ExecutionMode`.
`spv.ExecutionMode` may contain important information about the entry
point, which we want to preserve. For example, `LocalSize` provides
information about the work-group size that can be reused. Hence, the
pattern for entry-point ops changes to the following:
- `spv.EntryPoint` is still simply removed
- Info from `spv.ExecutionMode` is used to create a global struct variable,
which looks like:
```
struct {
int32_t executionMode;
int32_t values[]; // optional values
};
```
Reviewed By: mravishankar
Differential Revision: https://reviews.llvm.org/D89989
VectorInsertDynamicOp in SPIRV dialect
conversion from vector.insertelement to spirv VectorInsertDynamicOp
Differential Revision: https://reviews.llvm.org/D90927
This revision refactors the way that attributes/types are considered when generating aliases. Instead of considering all of the attributes/types of every operation, we perform a "fake" print step that prints the operations using a dummy printer to collect the attributes and types that would actually be printed during the real process. This removes a lot of attributes/types from consideration that generally won't end up in the final output, e.g. affine map attributes in an `affine.apply`/`affine.for`.
This resolves a long standing TODO w.r.t aliases, and helps to have a much cleaner textual output format. As a datapoint to the latter, as part of this change several tests were identified as testing for the presence of attributes aliases that weren't actually referenced by the custom form of any operation.
To ensure that this wouldn't cause a large degradation in compile time due to the second full print, I benchmarked this change on a very large module with a lot of operations(The file is ~673M/~4.7 million lines long). This file before this change take ~6.9 seconds to print in the custom form, and ~7 seconds after this change. In the custom assembly case, this added an average of a little over ~100 miliseconds to the compile time. This increase was due to the way that argument attributes on functions are structured and how they get printed; i.e. with a better representation the negative impact here can be greatly decreased. When printing in the generic form, this revision had no observable impact on the compile time. This benchmarking leads me to believe that the impact of this change on compile time w.r.t printing is closely related to `print` methods that perform a lot of additional/complex processing outside of the OpAsmPrinter.
Differential Revision: https://reviews.llvm.org/D90512
- Change syntax for FuncOp to be `func <visibility>? @name` instead of printing the
visibility in the attribute dictionary.
- Since printFunctionLikeOp() and parseFunctionLikeOp() are also used by other
operations, make the "inline visibility" an opt-in feature.
- Updated unit test to use and check the new syntax.
Differential Revision: https://reviews.llvm.org/D90859
- Convert `global_memref` to LLVM::GlobalOp.
- Convert `get_global_memref` to a memref descriptor with a pointer to the first element
of the global stashed in it.
- Extend unit test and a mlir-cpu-runner test to validate the generated LLVM IR.
Differential Revision: https://reviews.llvm.org/D90803
Since SPIR-V module has an optional name, this patch
makes a change to pass it to `ModuleOp` during conversion.
Reviewed By: ftynse
Differential Revision: https://reviews.llvm.org/D90904
VectorExtractDynamicOp in SPIRV dialect
conversion from vector.extractelement to spirv VectorExtractDynamicOp
Differential Revision: https://reviews.llvm.org/D90679
When the "after" region of a WhileOp is merely forwarding its arguments back to
the "before" region, i.e. WhileOp is a canonical do-while loop, a simpler CFG
subgraph that omits the "after" region with its extra branch operation can be
produced. Loop rotation from general "while" to "if { do-while }" is left for a
future canonicalization pattern when it becomes necessary.
Differential Revision: https://reviews.llvm.org/D90604
The lowering is a straightforward inlining of the "before" and "after" regions
connected by (conditional) branches. This plugs the WhileOp into the
progressive lowering scheme. Future commits may choose to target WhileOp
instead of CFG when lowering ForOp.
Differential Revision: https://reviews.llvm.org/D90603
Because cstr operations allow more instruction reordering than asserts, we only
lower cstr_broadcastable to std ops with cstr_require. This ensures that the
more drastic lowering to asserts can happen specifically with the user's desire.
Differential Revision: https://reviews.llvm.org/D89325
This should fix the reason for the failures after ec7780ebda. I will roll forward in a separate change.
Reviewed By: antiagainst
Differential Revision: https://reviews.llvm.org/D90410
The conversion between PDL and the interpreter is split into several different parts.
** The Matcher:
The matching section of all incoming pdl.pattern operations is converted into a predicate tree and merged. Each pattern is first converted into an ordered list of predicates starting from the root operation. A predicate is composed of three distinct parts:
* Position
- A position refers to a specific location on the input DAG, i.e. an
existing MLIR entity being matched. These can be attributes, operands,
operations, results, and types. Each position also defines a relation to
its parent. For example, the operand `[0] -> 1` has a parent operation
position `[0]` (the root).
* Question
- A question refers to a query on a specific positional value. For
example, an operation name question checks the name of an operation
position.
* Answer
- An answer is the expected result of a question. For example, when
matching an operation with the name "foo.op". The question would be an
operation name question, with an expected answer of "foo.op".
After the predicate lists have been created and ordered(based on occurrence of common predicates and other factors), they are formed into a tree of nodes that represent the branching flow of a pattern match. This structure allows for efficient construction and merging of the input patterns. There are currently only 4 simple nodes in the tree:
* ExitNode: Represents the termination of a match
* SuccessNode: Represents a successful match of a specific pattern
* BoolNode/SwitchNode: Branch to a specific child node based on the expected answer to a predicate question.
Once the matcher tree has been generated, this tree is walked to generate the corresponding interpreter operations.
** The Rewriter:
The rewriter portion of a pattern is generated in a very straightforward manor, similarly to lowerings in other dialects. Each PDL operation that may exist within a rewrite has a mapping into the interpreter dialect. The code for the rewriter is generated within a FuncOp, that is invoked by the interpreter on a successful pattern match. Referenced values defined in the matcher become inputs the generated rewriter function.
An example lowering is shown below:
```mlir
// The following high level PDL pattern:
pdl.pattern : benefit(1) {
%resultType = pdl.type
%inputOperand = pdl.input
%root, %results = pdl.operation "foo.op"(%inputOperand) -> %resultType
pdl.rewrite %root {
pdl.replace %root with (%inputOperand)
}
}
// is lowered to the following:
module {
// The matcher function takes the root operation as an input.
func @matcher(%arg0: !pdl.operation) {
pdl_interp.check_operation_name of %arg0 is "foo.op" -> ^bb2, ^bb1
^bb1:
pdl_interp.return
^bb2:
pdl_interp.check_operand_count of %arg0 is 1 -> ^bb3, ^bb1
^bb3:
pdl_interp.check_result_count of %arg0 is 1 -> ^bb4, ^bb1
^bb4:
%0 = pdl_interp.get_operand 0 of %arg0
pdl_interp.is_not_null %0 : !pdl.value -> ^bb5, ^bb1
^bb5:
%1 = pdl_interp.get_result 0 of %arg0
pdl_interp.is_not_null %1 : !pdl.value -> ^bb6, ^bb1
^bb6:
// This operation corresponds to a successful pattern match.
pdl_interp.record_match @rewriters::@rewriter(%0, %arg0 : !pdl.value, !pdl.operation) : benefit(1), loc([%arg0]), root("foo.op") -> ^bb1
}
module @rewriters {
// The inputs to the rewriter from the matcher are passed as arguments.
func @rewriter(%arg0: !pdl.value, %arg1: !pdl.operation) {
pdl_interp.replace %arg1 with(%arg0)
pdl_interp.return
}
}
}
```
Differential Revision: https://reviews.llvm.org/D84580
This patch introduces a pass for running
`mlir-spirv-cpu-runner` - LowerHostCodeToLLVMPass.
This pass emulates `gpu.launch_func` call in LLVM dialect and lowers
the host module code to LLVM. It removes the `gpu.module`, creates a
sequence of global variables that are later linked to the varables
in the kernel module, as well as a series of copies to/from
them to emulate the memory transfer to/from the host or to/from the
device sides. It also converts the remaining Standard dialect into
LLVM dialect, emitting C wrappers.
Reviewed By: mravishankar
Differential Revision: https://reviews.llvm.org/D86112
This reverts commit 4986d5eaff with
proper patches to CMakeLists.txt:
- Add MLIRAsync as a dependency to MLIRAsyncToLLVM
- Add Coroutines as a dependency to MLIRExecutionEngine
Lower from Async dialect to LLVM by converting async regions attached to `async.execute` operations into LLVM coroutines (https://llvm.org/docs/Coroutines.html):
1. Outline all async regions to functions
2. Add LLVM coro intrinsics to mark coroutine begin/end
3. Use MLIR conversion framework to convert all remaining async types and ops to LLVM + Async runtime function calls
All `async.await` operations inside async regions converted to coroutine suspension points. Await operation outside of a coroutine converted to the blocking wait operations.
Implement simple runtime to support concurrent execution of coroutines.
Reviewed By: herhut
Differential Revision: https://reviews.llvm.org/D89292
Forward missing attributes when creating the new transfer op otherwise the
builder would use default values.
Differential Revision: https://reviews.llvm.org/D89907
This still satisfies the constraints required by the affine dialect and
gives more flexibility in what iteration bounds can be used when
loewring to the GPU dialect.
Differential Revision: https://reviews.llvm.org/D89782
Now, convert-shape-to-std doesn't internally create memrefs, which was
previously a bit of a layering violation. The conversion to memrefs
should logically happen as part of bufferization.
Differential Revision: https://reviews.llvm.org/D89669
This PR adds support for identified and recursive structs.
This includes: parsing, printing, serializing, and
deserializing such structs.
The following C struct:
```C
struct A {
A* next;
};
```
which is translated to the following MLIR code as:
```mlir
!spv.struct<A, (!spv.ptr<!spv.struct<A>, Generic>)>
```
would be represented in the SPIR-V module as:
```spirv
OpName %A "A"
OpTypeForwardPointer %APtr Generic
%A = OpTypeStruct %APtr
%APtr = OpTypePointer Generic %A
```
In particular the following changes are included:
- SPIR-V structs can now be either identified or literal
(i.e. non-identified).
- All structs now have their members surrounded by a ()-pair.
- For recursive references,
(1) an OpTypeForwardPointer instruction is emitted before
the OpTypeStruct instruction defining the recursive struct
(2) an OpTypePointer instruction is emitted after the
OpTypeStruct instruction which actually defines the recursive
pointer to struct type.
Reviewed By: antiagainst, rriddle, ftynse
Differential Revision: https://reviews.llvm.org/D87206
This is required or broadcasting with operands of different ranks will lead to
failures as the select op requires both possible outputs and its output type to
be the same.
Differential Revision: https://reviews.llvm.org/D89134
Add conversion pass for Vector dialect to SPIR-V dialect and add some simple
conversion pattern for vector.broadcast, vector.insert, vector.extract.
Differential Revision: https://reviews.llvm.org/D88761
A pattern to convert `spv.CompositeInsert` and `spv.CompositeExtract`.
In LLVM, there are 2 ops that correspond to each instruction depending
on the container type. If the container type is a vector type, then
the result of conversion is `llvm.insertelement` or `llvm.extractelement`.
If the container type is an aggregate type (i.e. struct, array), the
result of conversion is `llvm.insertvalue` or `llvm.extractvalue`.
Reviewed By: mravishankar
Differential Revision: https://reviews.llvm.org/D88205
The previous code did the lowering to alloca, malloc, and aligned_malloc
in a single class with different code paths that are somewhat difficult to
follow.
This change moves the common code to a base class and has a separte
derived class per lowering target that contains the specifics.
Reviewed By: ftynse
Differential Revision: https://reviews.llvm.org/D88696
While affine maps are part of the builtin memref type, there is very
limited support for manipulating them in the standard dialect. Add
transpose to the set of ops to complement the existing view/subview ops.
This is a metadata transformation that encodes the transpose into the
strides of a memref.
I'm planning to use this when lowering operations on strided memrefs,
using the transpose to remove the stride without adding a dependency on
linalg dialect.
Differential Revision: https://reviews.llvm.org/D88651
We hit an llvm_unreachable related to unranked memrefs for call ops
with scalar types. Removing the llvm_unreachable since the conversion
should gracefully bail out in the presence of unranked memrefs. Adding
tests to verify that.
Reviewed By: ftynse
Differential Revision: https://reviews.llvm.org/D88709
Current setup for conv op vectorization does not enable user to specify tile
sizes as well as dimensions for vectorization. In this commit we change that by
adding tile sizes as pass arguments. Every dimension with corresponding tile
size > 1 is automatically vectorized.
Differential Revision: https://reviews.llvm.org/D88533
This patch adds support for the 'return' and 'call' ops to the bare-ptr
calling convention. These changes also align the bare-ptr calling
convention code with the latest changes in the default calling convention
and reduce the amount of customization code needed.
Reviewed By: ftynse
Differential Revision: https://reviews.llvm.org/D87724
- use select-ops to make the lowering simpler
- change style of FileCheck variables names to be consistent
- change some variable names in the code to be more explicit
Differential Revision: https://reviews.llvm.org/D88258
(1) simplify integer printing logic by always using 64-bit print
(2) add index support (since vector<16xindex> is planned to be added)
(3) adjust naming convention print_x -> printX
Reviewed By: bkramer
Differential Revision: https://reviews.llvm.org/D88436
This generalizes printing beyond just i1,i32,i64 and also accounts
for signed and unsigned interpretation in the output.
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D88290
Allow propagating optional user defined attributes during SCF to GPU conversion. Gives opportunity to use user defined attributes in the further lowering. For example setting subgroup size, or other options for GPU dispatch. This does not break backward compatibility and does not require new attributes, just allow passing optional ones.
Differential Revision: https://reviews.llvm.org/D88203
This pass converts shape.cstr_* ops to eager (side-effecting)
error-handling code. After that conversion is done, the witnesses are
trivially satisfied and are replaced with `shape.const_witness true`.
Differential Revision: https://reviews.llvm.org/D87941
Conversion to LLVM becomes confusing and incorrect if someone tries to lower
STD -> LLVM and only then GPULaunchFuncOp to LLVM separately. Although it is
technically allowed now, it works incorrectly because of the argument
promotion. The correct way to use this conversion pattern is to add to the
STD->LLVM patterns before running the pass.
Differential Revision: https://reviews.llvm.org/D88147
This revision allows representing a reduction at the level of linalg on tensors for named ops. When a structured op has a reduction and returns tensor(s), new conventions are added and documented.
As an illustration, the syntax for a `linalg.matmul` writing into a buffer is:
```
linalg.matmul ins(%a, %b : memref<?x?xf32>, tensor<?x?xf32>)
outs(%c : memref<?x?xf32>)
```
, whereas the syntax for a `linalg.matmul` returning a new tensor is:
```
%d = linalg.matmul ins(%a, %b : tensor<?x?xf32>, memref<?x?xf32>)
init(%c : memref<?x?xf32>)
-> tensor<?x?xf32>
```
Other parts of linalg will be extended accordingly to allow mixed buffer/tensor semantics in the presence of reductions.
ConvOp vectorization supports now only convolutions of static shapes with dimensions
of size either 3(vectorized) or 1(not) as underlying vectors have to be of static
shape as well. In this commit we add support for convolutions of any size as well as
dynamic shapes by leveraging existing matmul infrastructure for tiling of both input
and kernel to sizes accepted by the previous version of ConvOp vectorization.
In the future this pass can be extended to take "tiling mask" as a user input which
will enable vectorization of user specified dimensions.
Differential Revision: https://reviews.llvm.org/D87676
When packing function results into a structure during the standard-to-llvm
dialect conversion, do not assume the conversion was successful and propagate
nullptr as error state.
Fixes PR45184.
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D87605
Added support to the Std dialect cast operations to do casts in vector types when feasible.
Reviewed By: ftynse
Differential Revision: https://reviews.llvm.org/D87410
Type converter may fail and return nullptr on unconvertible types. The function
conversion did not include a check and was attempting to use a nullptr type to
construct an LLVM function, leading to a crash. Add a check and return early.
The rest of the call stack propagates errors properly.
Fixes PR47403.
Reviewed By: mehdi_amini
Differential Revision: https://reviews.llvm.org/D87075
This introduces a builder for the more general case that supports zero
elements (where the element type can't be inferred from the ValueRange,
since it might be empty).
Also, fix up some cases in ShapeToStandard lowering that hit this. It
happens very easily when dealing with shapes of 0-D tensors.
The SameOperandsAndResultElementType is redundant with the new
TypesMatchWith and prevented having zero elements.
Differential Revision: https://reviews.llvm.org/D87492
Rationale:
After some discussion we decided that it is safe to assume 32-bit
indices for all subscripting in the vector dialect (it is unlikely
the dialect will be used; or even work; for such long vectors).
So rather than detecting specific situations that can exploit
32-bit indices with higher parallel SIMD, we just optimize it
by default, and let users that don't want it opt-out.
Reviewed By: nicolasvasilache, bkramer
Differential Revision: https://reviews.llvm.org/D87404
Take advantage of the new `dynamic_tensor_from_elements` operation in `std`.
Instead of stack-allocated memory, we can now lower directly to a single `std`
operation.
Differential Revision: https://reviews.llvm.org/D86935
This replaces the select chain for edge-padding with an scf.if that
performs the memory operation when the index is in bounds and uses the
pad value when it's not. For transfer_write the same mechanism is used,
skipping the store when the index is out of bounds.
The integration test has a bunch of cases of how I believe this should
work.
Differential Revision: https://reviews.llvm.org/D87241
In this commit a new way of convolution ops lowering is introduced.
The conv op vectorization pass lowers linalg convolution ops
into vector contractions. This lowering is possible when conv op
is first tiled by 1 along specific dimensions which transforms
it into dot product between input and kernel subview memory buffers.
This pass converts such conv op into vector contraction and does
all necessary vector transfers that make it work.
Differential Revision: https://reviews.llvm.org/D86619
Vector to SCF conversion still had issues due to the interaction with the natural alignment derived by the LLVM data layout. One traditional workaround is to allocate aligned. However, this does not always work for vector sizes that are non-powers of 2.
This revision implements a more portable mechanism where the intermediate allocation is always a memref of elemental vector type. AllocOp is extended to use the natural LLVM DataLayout alignment for non-scalar types, when the alignment is not specified in the first place.
An integration test is added that exercises the transfer to scf.for + scalar lowering with a 5x5 transposition.
Differential Revision: https://reviews.llvm.org/D87150
When allowed, use 32-bit indices rather than 64-bit indices in the
SIMD computation of masks. This runs up to 2x and 4x faster on
a number of AVX2 and AVX512 microbenchmarks.
Reviewed By: bkramer
Differential Revision: https://reviews.llvm.org/D87116
Added 128 byte alignment to alloc ops created in VectorToSCF pass.
128b alignment was already introduced to this pass but not to all alloc
ops. This commit changes that by adding 128b alignment to the remaining ops.
The point of specifying alignment is to prevent possible memory alignment errors
on weakly tested architectures.
Differential Revision: https://reviews.llvm.org/D86454
Adding a conversion pattern for the parallel Operation. This will
help the conversion of parallel operation with standard dialect to
parallel operation with llvm dialect. The type conversion of the block
arguments in a parallel region are controlled by the pattern for the
parallel Operation. Without this pattern, a parallel Operation with
block arguments cannot be converted from standard to LLVM dialect.
Other OpenMP operations without regions are marked as legal. When
translation of OpenMP operations with regions are added then patterns
for these operations can also be added.
Also uses all the standard to llvm patterns. Patterns of other dialects
can be added later if needed.
Reviewed By: rriddle
Differential Revision: https://reviews.llvm.org/D86273
This patch allows to pass the gpu module name to SPIR-V
module during conversion. This has many benefits as we can lookup
converted to SPIR-V kernel in the symbol table.
In order to avoid symbol conflicts, `"__spv__"` is added to the
gpu module name to form the new one.
Reviewed By: mravishankar
Differential Revision: https://reviews.llvm.org/D86384