This is a followup to D128847. The `AffineMap::getPermutedPosition` method performs a linear scan of the map, thus the previous implementation had asymptotic complexity of `O(|topSort| * |m|)`. This change reduces that to `O(|topSort| + |m|)`.
Reviewed By: aartbik
Differential Revision: https://reviews.llvm.org/D129011
The TOSA Specification doesn't have a dilation attribute for transpose_conv2d,
and the padding array is of size 4. (top,bottom,left,right).
This change updates the dialect to match the specification, and updates the lit
tests to match the dialect changes.
Differential Revision: https://reviews.llvm.org/D127332
This also changes the space of the returned lexmin for IntegerPolyhedrons;
the symbols in the poly now correspond to symbols in the result rather than dims.
Reviewed By: Groverkss
Differential Revision: https://reviews.llvm.org/D128933
This allows purging references of scf.ForeachThreadOp and scf.PerformConcurrentlyOp from
ParallelInsertSliceOp.
This will allowmoving the op closer to tensor::InsertSliceOp with which it should share much more
code.
In the future, the decoupling will also allow extending the type of ops that can be used in the
parallel combinator as well as semantics related to multiple concurrent inserts to the same
result.
Differential Revision: https://reviews.llvm.org/D128857
This revision avoids a crash in the 0-D case of distributing vector.transfer ops out of
vector.warp_execute_on_lane_0.
Due to the code complexity and lack of documentation, it took untangling the implementation
before realizing that the simple fix was to fail in the 0-D case.
The rewrite is still very useful to understand this code better.
Differential Revision: https://reviews.llvm.org/D128793
This differential improves two error conditions, by detecting them earlier and by providing better messages to help users understand what went wrong.
Reviewed By: jpienaar
Differential Revision: https://reviews.llvm.org/D128555
This patch implements the analysis state classes needed for sparse data-flow analysis and implements a dead-code analysis using those states to determine liveness of blocks, control-flow edges, region predecessors, and function callsites.
Depends on D126751
Reviewed By: rriddle, phisiart
Differential Revision: https://reviews.llvm.org/D127064
For the conversion to nvgpu `mma.sync` and `ldmatrix` pathways, the code
was missing support for the `i4` data type. While fixing this, another
bug was discoverd that caused the number of ldmatrix tiles calculated for
certain operand types and configurations to be incorrect. This change
fixes both issues and adds additional tests.
Differential Revision: https://reviews.llvm.org/D128074
This revision merges the 2 split_reduction transforms and adds extra control by using attributes.
SplitReduction is known to require a concrete additional buffer to store tempoaray information.
Add an option to introduce a `bufferization.alloc_tensor` instead of `linalg.init_tensor`.
This behaves better with subset-based tiling and bufferization.
Differential Revision: https://reviews.llvm.org/D128722
When the iteration graph is cyclic (even after several attempts using less and less constraints), the current sparse compiler bails out, and no rewriting hapens. However, this revision adds some new logic where the sparse compiler tries to find a single input sparse tensor that breaks the cycle, and then adds a proper sparse conversion operation. This way, more incoming kernels can be handled!
Note, the resulting code is not optimal (although it keeps more or less proper "sparse" complexity), and more improvements should be added (especially when the kernel directly yields without computation, such as the transpose example). However, handling is better than not handling ;-)
Reviewed By: bixia
Differential Revision: https://reviews.llvm.org/D128847
This feature is tested by unit test since not many places in the codebase
use SubElementTypeInterface.
Differential Revision: https://reviews.llvm.org/D127539
Since only mutable types and attributes can go into infinite recursion
inside SubElementInterface::walkSubElement, and there are only a few of
them (mutable types and attributes), we introduce new traits for Type
and Attribute: TypeTrait::IsMutable and AttributeTrait::IsMutable,
respectively. They indicate whether a type or attribute is mutable.
Such traits are required if the ImplType defines a `mutate` function.
Then, inside SubElementInterface, we use a set to record visited mutable
types and attributes that have been visited before.
Differential Revision: https://reviews.llvm.org/D127537
We currently generate reproducer configurations using a comment placed at
the top of the generated .mlir file. This is kind of hacky given that comments
have no semantic context in the source file and can easily be dropped. This
strategy also wouldn't work if/when we have a bitcode format. This commit
switches to using an external assembly resource, which is verifiable/can
work with a hypothetical bitcode naturally/and removes the awkward processing
from mlir-opt for splicing comments and re-applying command line options. With
the removal of command line munging, this opens up new possibilities for
executing reproducers in memory.
Differential Revision: https://reviews.llvm.org/D126447
This commit enables support for providing and processing external
resources within MLIR assembly formats. This is a mechanism with which
dialects, and external clients, may attach additional information when
printing IR without that information being encoded in the IR itself.
External resources are not uniqued within the MLIR context, are not
attached directly to any operation, and are solely intended to live and be
processed outside of the immediate IR. There are many potential uses of this
functionality, for example MLIR's pass crash reproducer could utilize this to
attach the pass resource executing when a crash occurs. Other types of
uses may be embedding large amounts of binary data, such as weights in ML
applications, that shouldn't be copied directly into the MLIR context, but
need to be kept adjacent to the IR.
External resources are encoded using a key-value pair nested within a
dictionary anchored by name either on a dialect, or an externally registered
entity. The key is an identifier used to disambiguate the data. The value
may be stored in various limited forms, but general encodings use a string
(human readable) or blob format (binary). Within the textual format, an
example may be of the form:
```mlir
{-#
// The `dialect_resources` section within the file-level metadata
// dictionary is used to contain any dialect resource entries.
dialect_resources: {
// Here is a dictionary anchored on "foo_dialect", which is a dialect
// namespace.
foo_dialect: {
// `some_dialect_resource` is a key to be interpreted by the dialect,
// and used to initialize/configure/etc.
some_dialect_resource: "Some important resource value"
}
},
// The `external_resources` section within the file-level metadata
// dictionary is used to contain any non-dialect resource entries.
external_resources: {
// Here is a dictionary anchored on "mlir_reproducer", which is an
// external entity representing MLIR's crash reproducer functionality.
mlir_reproducer: {
// `pipeline` is an entry that holds a crash reproducer pipeline
// resource.
pipeline: "func.func(canonicalize,cse)"
}
}
```
Differential Revision: https://reviews.llvm.org/D126446
The original code is more readable because the goal is to check if the given
value does *not* lie in the range. It is harder to understand this by
reading the rewritten code.
Reviewed By: mehdi_amini
Differential Revision: https://reviews.llvm.org/D128753
Should be NFC. We can just do the base conversion manually and avoid
warnings about it. Clang before Clang 13 didn't implement P1825 and
complains:
mlir/lib/Analysis/Presburger/IntegerRelation.cpp:226:10: warning: local variable 'result' will be copied
despite being returned by name [-Wreturn-std-move]
return result;
^~~~~~
mlir/lib/Analysis/Presburger/IntegerRelation.cpp:226:10: note: call 'std::move' explicitly to avoid copying
return result;
^~~~~~
std::move(result)
Consecutive complex.neg are redundant so that we can canonicalize them to the original operands.
Reviewed By: pifon2a
Differential Revision: https://reviews.llvm.org/D128781
1. Remove the redundant collapse clause in MLIR OpenMP worksharing-loop
operation.
2. Fix several typos.
3. Refactor the chunk size type conversion since CreateSExtOrTrunc has
both type check and type conversion.
Reviewed By: kiranchandramohan
Differential Revision: https://reviews.llvm.org/D128338
`enableSplitting` simply enables/disables whether we should split
or use the full buffer. `insertMarkerInOutput` toggles if split markers
should be inserted in between prcessed output chunks.
These options allow for merging the duplicate code paths we have
when splitting is optional.
Differential Revision: https://reviews.llvm.org/D128764
Also added test cases. Also extend support for `computeReprWithOnlyDivLocals` from `IntegerPolyhedron` to `IntegerRelation` and `PresburgerRelation`.
Depends on D128736.
Reviewed By: Groverkss
Differential Revision: https://reviews.llvm.org/D128737
Also added test cases to test this. Both IntegerRelation::addLocalFloorDiv and the fixed implementation of subtraction need to compute division inequalities from dividend and divisor, so this also adds helper util functions to avoid duplicating this logic.
Reviewed By: Groverkss
Differential Revision: https://reviews.llvm.org/D128736
Currently, in the Presburger library, we use the words "variables" and
"identifiers" interchangeably. This patch changes this to only use "variables" to
refer to the variables of PresburgerSpace.
The reasoning behind this change is that the current usage of the word "identifier"
is misleading. variables do not "identify" anything. The information attached to them is the
actual "identifier" for the variable. The word "identifier", will later be used
to refer to the information attached to each variable in space.
Reviewed By: ftynse
Differential Revision: https://reviews.llvm.org/D128585
This is already partially the case, but we can rely more heavily on interface libraries and how they are imported/exported in other to simplify the implementation of the mlir python functions in Cmake.
This change also makes a couple of other changes:
1) Add a new CMake function which handles "pure" sources. This was done inline previously
2) Moves the headers associated with CAPI libraries to the libraries themselves. These were previously managed in a separate source target. They can now be added directly to the CAPI libraries using DECLARED_HEADERS.
3) Cleanup some dependencies that showed up as an issue during the refactor
This is a big CMake change that should produce no impact on the build of mlir and on the produced *build tree*. However, this change fixes an issue with the *install tree* of mlir which was previously unusable for projects like torch-mlir because both the "pure" and "extension" targets were pointing to either the build or source trees.
Reviewed By: stellaraccident
Differential Revision: https://reviews.llvm.org/D128230
This pattern can kick in when the source of the broadcast has a shape
that is a prefix/suffix of the result of the shape_cast.
Differential Revision: https://reviews.llvm.org/D128734
Enforce the assumption made on tensor buffers explicitly. When in-place,
reuse the buffer, but fill with all zeroes for the non-update case, since
the kernel assumes all elements are written to. When not in-place, zero
out the new buffer when materializing or when no-updates occur. Copy the
original tensor value when updates occur. This prepares migrating to the
new bufferization strategy, where these assumptions must be made explicit.
Reviewed By: springerm
Differential Revision: https://reviews.llvm.org/D128691
This attribute is similar to DenseElementsAttr but does not support
splat. As such it has a much simpler API and does not need any smart
iterator: it exposes direct ArrayRef access.
A new syntax is introduced so that the generic printing/parsing looks
like:
[:i64 1, -2, 3]
This attribute beings like an ArrayAttr but has a `:` token after the
opening square brace to introduce the element type (supported are I8,
I16, I32, I64, F32, F64) and the comma separated list for the data.
This is particularly convenient for attributes intended to be small,
like those referring to shapes.
For example a `transpose` operation with a `dims` attribute could be
defined as such:
let arguments = (ins AnyTensor:$input, DenseI64ArrayAttr:$dims);
let assemblyFormat = "$input `dims` `=` $dims attr-dict : type($input)";
And printed this way (the element type is elided in this case):
transpose %input dims = [0, 2, 1] : tensor<2x3x4xf32>
The C++ API for dims would just directly return an ArrayRef<int64>
RFC: https://discourse.llvm.org/t/rfc-introduce-a-new-dense-array-attribute/63279
Recommit with a custom DenseArrayBaseAttrStorage class to ensure
over-alignment of the storage to the largest type.
Reviewed By: rriddle
Differential Revision: https://reviews.llvm.org/D123774
This attribute is similar to DenseElementsAttr but does not support
splat. As such it has a much simpler API and does not need any smart
iterator: it exposes direct ArrayRef access.
A new syntax is introduced so that the generic printing/parsing looks
like:
[:i64 1, -2, 3]
This attribute beings like an ArrayAttr but has a `:` token after the
opening square brace to introduce the element type (supported are I8,
I16, I32, I64, F32, F64) and the comma separated list for the data.
This is particularly convenient for attributes intended to be small,
like those referring to shapes.
For example a `transpose` operation with a `dims` attribute could be
defined as such:
let arguments = (ins AnyTensor:$input, DenseI64ArrayAttr:$dims);
let assemblyFormat = "$input `dims` `=` $dims attr-dict : type($input)";
And printed this way (the element type is elided in this case):
transpose %input dims = [0, 2, 1] : tensor<2x3x4xf32>
The C++ API for dims would just directly return an ArrayRef<int64>
RFC: https://discourse.llvm.org/t/rfc-introduce-a-new-dense-array-attribute/63279
Reviewed By: rriddle
Differential Revision: https://reviews.llvm.org/D123774
This was previous implemented as part of the BufferizableOpInterface of ForEachThreadOp. Moving the implementation to ParallelInsertSliceOp to be consistent with the remaining ops and to have a nice example op that can serve as a blueprint for other ops.
Differential Revision: https://reviews.llvm.org/D128666
Add basic canonicalization for consecutive complex.add and sub operations.
Reviewed By: pifon2a
Differential Revision: https://reviews.llvm.org/D128702
Adding the accumulator value after the `vector.contract` changes the
precision of the operation. This makes sure the accumulator is carried
through to `vector.reduce` (and down to LLVM).
Differential Revision: https://reviews.llvm.org/D128674
This patch adds a `convertFromStorage` field to attribute or type parameters that can implement more complex logic for converting from the parameter's C++ storage type (e.g. `Optional<SmallVector<T>>`) to its C++ type (e.g. `Optional<ArrayRef<T>>`).
Reviewed By: rriddle
Differential Revision: https://reviews.llvm.org/D128293
The parser currently can't parse bare identifiers like 'i0' in affine
maps and sets, and similarly ids like f16/f32. But these bare ids are
part of the grammar - although they are primitive types.
```
error: expected bare identifier
set = affine_set<(i0, i1) : ()>
^
```
This patch allows the parser for AffineMap/IntegerSet to parse bare
identifiers as defined by the grammer.
Reviewed By: bondhugula, rriddle
Differential Revision: https://reviews.llvm.org/D127076
Adding more test cases for sparse_tensor.BinaryOp, including different cases when overlap/left/right region is implemented/empty/identity
Reviewed By: aartbik
Differential Revision: https://reviews.llvm.org/D128383
Previously, the sparse_tensor.unary integration test does not contain cases with the use of `linalg.index` (previoulsy unsupported), this commit adds test cases that use `linalg.index` operators.
Reviewed By: aartbik
Differential Revision: https://reviews.llvm.org/D128460
This patch memorize compatible LLVM types in `LLVM::isCompatibleType` in
order to avoid redundant works.
This is especially useful when the size of program is big and there are
multiple occurrences of some deeply nested LLVM struct types, in which
case we can gain quite some speedups with this patch.
Differential Revision: https://reviews.llvm.org/D127918
This patch adds three new LLVM intrinsic operations: llvm.intr.vastart/copy/end.
And its translation from LLVM IR.
This effectively removes a restriction, imposed by 0126dcf1f0, where
non-external functions in LLVM dialect cannot be variadic. At that time
it was not clear how LLVM intrinsics are going to be modeled, which
indirectly affects va_start/copy/end, the core intrinsics used in
variadic functions. But since we have LLVM intrinsics as normal
MLIR operations, it's not a problem anymore.
Differential Revision: https://reviews.llvm.org/D127540
This change updates all remaining bufferization patterns (except for scf.while) and the remaining bufferization infrastructure to infer the memory space whenever possible instead of falling back to "0". (If a default memory space is set in the bufferization options, we still fall back to that value if the memory space could not be inferred.)
Differential Revision: https://reviews.llvm.org/D128423
Add a failure return value and bufferization options argument. This is to keep a subsequent change smaller.
Differential Revision: https://reviews.llvm.org/D128278
These intrinsics will be needed to convert between fixed-length vectors
and scalable vectors.
This operation will be needed for VLS (vector-length specific)
vectorization, when interfacing with vector functions or intrinsics that
take scalable vectors as operands in a context where the length of our
vectors is known or assumed at compile time, but we still want to
generate scalable vector instructions.
Differential Revision: https://reviews.llvm.org/D127100
An optional thread_dim_mapping index array attribute specifies for each
virtual thread dimension, how it remaps 1-1 to a set of concrete processing
element resources (e.g. a CUDA grid dimension or a level of concrete nested
async parallelism). At this time, the specification is backend-dependent and
is not verified by the op, beyond being an index array attribute.
It is the reponsibility of the lowering to interpret the index array in the
context of the concrete target the op is lowered to, or to ignore it when
the specification is ill-formed or unsupported for a particular target.
Differential Revision: https://reviews.llvm.org/D128633
This allows for better type inference during bufferization and is in preparation of supporting memory spaces.
Differential Revision: https://reviews.llvm.org/D128422
This allows for better type inference during bufferization and is in preparation of supporting memory spaces.
Differential Revision: https://reviews.llvm.org/D128581
This allows for better type inference during bufferization and is in preparation of supporting memory spaces.
Differential Revision: https://reviews.llvm.org/D128580
This allows for better type inference during bufferization and is in preparation of supporting memory spaces.
Differential Revision: https://reviews.llvm.org/D128579
This is useful because the result type of an op can sometimes be inferred from its body (e.g., `scf.if`). This will be utilized in subsequent changes.
Also introduces a new `getBufferType` interface method on BufferizableOpInterface. This method is useful for computing a bufferized block argument type with respect to OpOperand types of the parent op.
Differential Revision: https://reviews.llvm.org/D128420
This attribute is currently supported on AllocTensorOp only. Future changes will add support to other ops. Furthermore, the memory space is not propagated properly in all bufferization patterns and some of the core bufferization infrastructure. This will be addressed in a subsequent change.
Differential Revision: https://reviews.llvm.org/D128274
Seems to have been an accident of history and none of these had any reason to be restricted to FuncOp.
Differential Revision: https://reviews.llvm.org/D128614
This patch fixes:
llvm-project/mlir/lib/Dialect/Linalg/Transforms/SplitReduction.cpp:300:26:
error: comparison of integers of different signs: 'int64_t' (aka
'long') and 'size_t' (aka 'unsigned long') [-Werror,-Wsign-compare]
This paves the way for integer-exact projection, and for supporting
non-division locals in subtraction, complement, and equality checks.
Reviewed By: Groverkss
Differential Revision: https://reviews.llvm.org/D127463
Static loop unrolling does not change the operation type. We can rigorously make sure to use affine.store in the check.
Reviewed By: ftynse
Differential Revision: https://reviews.llvm.org/D128237
We can have validation test for quant.region having incompatible output spec.
Reviewed By: stellaraccident
Differential Revision: https://reviews.llvm.org/D128245
When creating a scf.for without argument a scf.yield is automatically
created. Make sure we don't create a second one.
Differential Revision: https://reviews.llvm.org/D128405
Putting some direct use restrictions on tensor allocations in the
sparse case enables the use of simplifying assumptions in the
bufferization analysis.
Reviewed By: springerm
Differential Revision: https://reviews.llvm.org/D128463
Only the analysis part of the interface is implemented. The bufferization itself is performed by the SparseTensorConversion pass.
Differential Revision: https://reviews.llvm.org/D128138
spv.bitcast from a vector to a scalar expects the lower-numbered
components of the the vector to map to the lower-ordered bits of
the scalar. That actually already matches how little endian stores
data in the memory. So we just need to read and push to the back
of the vector sequentially.
Reviewed By: hanchung
Differential Revision: https://reviews.llvm.org/D128473
Add the reverse functions to the ViewLikeInterface's functions
`getMixedStrides`, `getMixedSizes`, and `getMixedOffsets`. The new functions
are useful to build view-like operations from an array of mixed static/dynamic
values.
Differential Revision: https://reviews.llvm.org/D128376
Such situations manifest themselves with an empty payload which ends up producing empty results.
In such cases, we still want to match the transform op contract and return as many empty SmallVector<Operation*>
as the op requires.
Differential Revision: https://reviews.llvm.org/D128456
All bufferizable ops that bufferize to an allocation receive a `bufferization.escape` attribute during TensorCopyInsertion.
Differential Revision: https://reviews.llvm.org/D128137
Command line option injected by tablegen rule cannot be respected by
PDLL here, so add new helper function that is copy of original without
any additional flags injected. This avoids compilation failure when
compiler warnings are disabled.
Kept it as a mechanical copy.
Fixes#55716
The result of applying an N-result producing transformation to M payload ops
is an M-wide result, each containing N result operations.
This requires a transposition of the results obtained by calling `applyToOne`.
This revision fixes the issue and adds more advanced tests that exercise the behavior.
Differential Revision: https://reviews.llvm.org/D128414
This revision proposes a different implementation of the SplitReductoin transformation that does
not rely on tensor::ExpandShapeOp.
Previously, a dimension `[k]` would be split into `[k][kk]` via an ExpandShapeOp.
Instead, this revision proposes to rewrite `[k]` into `[factor * k + kk]`.
There are different tradeoffs involved but the proposed implementation is more general because
the affine rewrite is well-defined. In particular, it works naturally with `?` parallel dimensions and
non-trivial indexing maps.
A further rewrite of `[factor * k + kk]` + ExpandShapeOp is possible as a followup.
Differential Revision: https://reviews.llvm.org/D128266
Binary size of `clang` is trivial; namely, numerical value doesn't
change when measured in MiB, and `.data` section increases from 139Ki to
173 Ki.
Differential Revision: https://reviews.llvm.org/D128070
This revision makes sure we accept sparse tensors as arguments
of the expand/collapse reshaping operations in the tensor dialect.
Note that the actual lowering to runnable IR is still TBD.
Reviewed By: springerm
Differential Revision: https://reviews.llvm.org/D128311
The Presburger library currently uses int64_t throughout for its integers.
This runs the risk of silently producing incorrect results when overflows occur.
Fixing this issue requires some sort of multiprecision integer
that transparently supports aribtrary arithmetic computations.
The class SlowMPInt provides this functionality, and is intended to be used
as the slow path fallback for a more optimized upcoming class, MPInt, that optimizes
for the Presburger library's workloads.
Reviewed By: ftynse
Differential Revision: https://reviews.llvm.org/D123758
erf and round op are able to lowered to libm supporting vector type as other math operations.
Reviewed By: ftynse
Differential Revision: https://reviews.llvm.org/D127934
Marking bufferization allocation operation as invalid
during sparse lowering is too strict, since dense and
sparse allocation can co-exist. This revision refines
the lowering with a dynamic type check.
Reviewed By: bixia
Differential Revision: https://reviews.llvm.org/D128305
This patch implements tile and fuse transformation for ops that
implement the tiling interface. To do so,
- `TilingInterface` needs a new method that generates a tiled
implementation of the operation based on the tile of the result
needed.
- A pattern is added that replaces a `tensor.extract_slice` whose
source is defined by an operation that implements the
`TilingInterface` with a tiled implementation that produces the
extracted slice in-place (using the method added to
`TilingInterface`).
- A pattern is added that takes a sequence of operations that
implement the `TilingInterface` (for now `LinalgOp`s), tiles the
consumer, and greedily fuses its producers iteratively.
Differential Revision: https://reviews.llvm.org/D127809
This patch implements tile and fuse transformation for ops that
implement the tiling interface. To do so,
- `TilingInterface` needs a new method that generates a tiled
implementation of the operation based on the tile of the result
needed.
- A pattern is added that replaces a `tensor.extract_slice` whose
source is defined by an operation that implements the
`TilingInterface` with a tiled implementation that produces the
extracted slice in-place (using the method added to
`TilingInterface`).
- A pattern is added that takes a sequence of operations that
implement the `TilingInterface` (for now `LinalgOp`s), tiles the
consumer, and greedily fuses its producers iteratively.
Differential Revision: https://reviews.llvm.org/D127809
Some GDB versions require all prettyprinter classes to define to_string.
This commit adds these definitions.
Reviewed By: csigg
Differential Revision: https://reviews.llvm.org/D127969
This revision separates the `LinalgSplitReduction` pattern, whose application is based on attributes,
from its implementation.
A transform dialect op extension is added to control the application of the transformation at a finer granularity.
Differential Revision: https://reviews.llvm.org/D128165
This revision adds the necessary plumbing for canonicalizing scf::ForeachThread with the
`AffineOpSCFCanonicalizationPattern`.
In the process the `loopMatcher` helper is updated to take OpFoldResult instead of just values.
This allows composing various scenarios without the need for an artificial builder.
Differential Revision: https://reviews.llvm.org/D128244
This patch adds omp.taskgroup operation according to OpenMP 5.0 2.17.6.
Also added tests for the same.
Reviewed By: kiranchandramohan, peixin
Differential Revision: https://reviews.llvm.org/D127250
In order to support newer hardware, define wrappers around MFMA
intrinsics that have not previously been exposed in the ROCDL dialect.
A `amdgpu.mfma` wrapper around these instructions is in development
and will provide a more user-friendly interface to them.
Reviewed By: ThomasRaoux
Differential Revision: https://reviews.llvm.org/D128079
This aligns the SCF dialect file layout with the majority of the dialects.
Reviewed By: jpienaar
Differential Revision: https://reviews.llvm.org/D128049
Patch created by running:
rg -l parallelForEachN | xargs sed -i '' -c 's/parallelForEachN/parallelFor/'
No behavior change.
Differential Revision: https://reviews.llvm.org/D128140
Follow up from flipping dialects to both, flip accessor used to prefixed
variant ahead to flipping from _Both to _Prefixed. This just flips to
the accessors introduced in the preceding change which are just prefixed
forms of the existing accessor changed from.
Mechanical change using helper script
https://github.com/jpienaar/llvm-project/blob/main/clang-tools-extra/clang-tidy/misc/AddGetterCheck.cpp and clang-format.
Marked all dialects that could be (reasonably) easily flipped to _Both
prefix. Updating the accessors to prefixed form will happen in follow
up, this was to flush out conflicts and to mark all dialects explicitly
as I plan to flip OpBase default to _Prefixed to avoid needing to
migrate new dialects.
Except for Standalone example which got flipped to _Prefixed.
Differential Revision: https://reviews.llvm.org/D128027
Support complex types of float and double. See the added test for an example.
Reviewed By: aartbik
Differential Revision: https://reviews.llvm.org/D128076
This adds weak versions of the truncation libcalls in case the runtime
environment doesn't have them.
Differential Revision: https://reviews.llvm.org/D128091
This fixes all sorts of ABI issues due to passing by-value
(using by-reference with memref's exclusively).
Reviewed By: bkramer
Differential Revision: https://reviews.llvm.org/D128018
This change adds a transformation and pass to the NvGPU dialect that
attempts to optimize reads/writes from a memref representing GPU shared
memory in order to avoid bank conflicts. Given a value representing a
shared memory memref, it traverses all reads/writes within the parent op
and, subject to suitable conditions, rewrites all last dimension index
values such that element locations in the final (col) dimension are
given by
`newColIdx = col % vecSize + perm[row](col/vecSize,row)`
where `perm` is a permutation function indexed by `row` and `vecSize`
is the vector access size in elements (currently assumes 128bit
vectorized accesses, but this can be made a parameter). This specific
transformation can help optimize typical distributed & vectorized accesses
common to loading matrix multiplication operands to/from shared memory.
Differential Revision: https://reviews.llvm.org/D127457
This resolves problems reported in commit 1a20252978.
1. Promote to float lowering for nodes XINT_TO_FP
2. Bail out f16 from shuffle combine due to vector type is not legal in the version
With the recent refactorings, this class is no longer needed. We can use BufferizationOptions in all places were BufferizationState was used.
Differential Revision: https://reviews.llvm.org/D127653
This change changes the bufferization so that it utilizes the new TensorCopyInsertion pass. One-Shot Bufferize no longer calls the One-Shot Analysis. Instead, it relies on the TensorCopyInsertion pass to make the entire IR fully inplacable. The `bufferize` implementations of all ops are simplified; they no longer have to account for out-of-place bufferization decisions. These were already materialized in the IR in the form of `bufferization.alloc_tensor` ops during the TensorCopyInsertion pass.
Differential Revision: https://reviews.llvm.org/D127652
The 'emit_c_wrappers' option in the FuncToLLVM conversion requests C interface
wrappers to be emitted for every builtin function in the module. While this has
been useful to bootstrap the interface, it is problematic in the longer term as
it may unintentionally affect the functions that should retain their existing
interface, e.g., libm functions obtained by lowering math operations (see
D126964 for an example). Since D77314, we have a finer-grain control over
interface generation via an attribute that avoids the problem entirely. Remove
the 'emit_c_wrappers' option. Introduce the '-llvm-request-c-wrappers' pass
that can be run in any pipeline that needs blanket emission of functions to
annotate all builtin functions with the attribute before performing the usual
lowering that accounts for the attribute.
Reviewed By: chelini
Differential Revision: https://reviews.llvm.org/D127952
* Split ops into X_graph variants as discussed;
* Remove tokens from non-Graph region variants and rely on side-effect
modelling there while removing side-effect modelling from Graph
variants and relying on explicit ordering there;
* Make tokens required to be produced by Graph variants - but kept
explicit token type specification given previous discussion on this
potentially being configurable in future;
This results in duplicating some code. I considered adding helper
functions but decided against adding an abstraction there early given
size of duplication and creating accidental coupling.
Differential Revision: https://reviews.llvm.org/D127813
Where a constraint also has a def, emit the def only to avoid duplicate
output (and def has more complete info). Also move attributes and types
to the end rather than some on top and some at end.
Differential Revision: https://reviews.llvm.org/D127823
The semi-ring blocks were simply "inlined" by the sparse compiler but
without any filtering or patching. This revision improves the analysis
(rejecting blocks that use non-invariant computations from outside
their blocks, except for linalg.index) and also improves the codegen
by properly patching up index computations (previous version crashed).
With a regression test. Also updated the documentation now that the
example code is properly working.
Reviewed By: bixia
Differential Revision: https://reviews.llvm.org/D128000
The LinalgElementwiseOpFusion pass has become smarter, and converts
the simple conversion linalg operation into a sparse dialect convert
operation. However, since our current bufferization does not take the
new semantics into consideration, we leak memory of the allocation.
For now, this has been fixed by making the operation less trivial.
Reviewed By: bixia
Differential Revision: https://reviews.llvm.org/D128002
contraction op can have mixed type, add support for this case to the pattern
lowering contraction op to outerproduct.
Differential Revision: https://reviews.llvm.org/D127926
The previous approach does not work as the Adreno driver is
clever at optimizing away the selection. So now check two
inputs together.
Reviewed By: ThomasRaoux
Differential Revision: https://reviews.llvm.org/D127930
The useLocalScope printing flag has been passed around between pybind methods, but doesn't actually enable the corresponding printing flag.
Reviewed By: stellaraccident
Differential Revision: https://reviews.llvm.org/D127907
If any of the operands for ICmpOp is a vector, returns a vector<Nxi1>
, rather than an i1 type result.
Differential Revision: https://reviews.llvm.org/D127536
Instead of casting the incoming operand into VectorType to check if it's
scalable or not.
This is the place I missed to fix in f088b99eac.
Differential Revision: https://reviews.llvm.org/D127535
When translating from a llvm::ConstantAggregate with vector type, we
should lower to insertelement operations (if needed) rather than using
insertvalue.
Differential Revision: https://reviews.llvm.org/D127534
The maxf implementation of wmma elementwise op was incorrect as the
operands of the select to check for Nan were swapped.
Differential Revision: https://reviews.llvm.org/D127879