Patch created by running:
rg -l parallelForEachN | xargs sed -i '' -c 's/parallelForEachN/parallelFor/'
No behavior change.
Differential Revision: https://reviews.llvm.org/D128140
Follow up from flipping dialects to both, flip accessor used to prefixed
variant ahead to flipping from _Both to _Prefixed. This just flips to
the accessors introduced in the preceding change which are just prefixed
forms of the existing accessor changed from.
Mechanical change using helper script
https://github.com/jpienaar/llvm-project/blob/main/clang-tools-extra/clang-tidy/misc/AddGetterCheck.cpp and clang-format.
Marked all dialects that could be (reasonably) easily flipped to _Both
prefix. Updating the accessors to prefixed form will happen in follow
up, this was to flush out conflicts and to mark all dialects explicitly
as I plan to flip OpBase default to _Prefixed to avoid needing to
migrate new dialects.
Except for Standalone example which got flipped to _Prefixed.
Differential Revision: https://reviews.llvm.org/D128027
Support complex types of float and double. See the added test for an example.
Reviewed By: aartbik
Differential Revision: https://reviews.llvm.org/D128076
This adds weak versions of the truncation libcalls in case the runtime
environment doesn't have them.
Differential Revision: https://reviews.llvm.org/D128091
This fixes all sorts of ABI issues due to passing by-value
(using by-reference with memref's exclusively).
Reviewed By: bkramer
Differential Revision: https://reviews.llvm.org/D128018
This change adds a transformation and pass to the NvGPU dialect that
attempts to optimize reads/writes from a memref representing GPU shared
memory in order to avoid bank conflicts. Given a value representing a
shared memory memref, it traverses all reads/writes within the parent op
and, subject to suitable conditions, rewrites all last dimension index
values such that element locations in the final (col) dimension are
given by
`newColIdx = col % vecSize + perm[row](col/vecSize,row)`
where `perm` is a permutation function indexed by `row` and `vecSize`
is the vector access size in elements (currently assumes 128bit
vectorized accesses, but this can be made a parameter). This specific
transformation can help optimize typical distributed & vectorized accesses
common to loading matrix multiplication operands to/from shared memory.
Differential Revision: https://reviews.llvm.org/D127457
This resolves problems reported in commit 1a20252978.
1. Promote to float lowering for nodes XINT_TO_FP
2. Bail out f16 from shuffle combine due to vector type is not legal in the version
With the recent refactorings, this class is no longer needed. We can use BufferizationOptions in all places were BufferizationState was used.
Differential Revision: https://reviews.llvm.org/D127653
This change changes the bufferization so that it utilizes the new TensorCopyInsertion pass. One-Shot Bufferize no longer calls the One-Shot Analysis. Instead, it relies on the TensorCopyInsertion pass to make the entire IR fully inplacable. The `bufferize` implementations of all ops are simplified; they no longer have to account for out-of-place bufferization decisions. These were already materialized in the IR in the form of `bufferization.alloc_tensor` ops during the TensorCopyInsertion pass.
Differential Revision: https://reviews.llvm.org/D127652
The 'emit_c_wrappers' option in the FuncToLLVM conversion requests C interface
wrappers to be emitted for every builtin function in the module. While this has
been useful to bootstrap the interface, it is problematic in the longer term as
it may unintentionally affect the functions that should retain their existing
interface, e.g., libm functions obtained by lowering math operations (see
D126964 for an example). Since D77314, we have a finer-grain control over
interface generation via an attribute that avoids the problem entirely. Remove
the 'emit_c_wrappers' option. Introduce the '-llvm-request-c-wrappers' pass
that can be run in any pipeline that needs blanket emission of functions to
annotate all builtin functions with the attribute before performing the usual
lowering that accounts for the attribute.
Reviewed By: chelini
Differential Revision: https://reviews.llvm.org/D127952
* Split ops into X_graph variants as discussed;
* Remove tokens from non-Graph region variants and rely on side-effect
modelling there while removing side-effect modelling from Graph
variants and relying on explicit ordering there;
* Make tokens required to be produced by Graph variants - but kept
explicit token type specification given previous discussion on this
potentially being configurable in future;
This results in duplicating some code. I considered adding helper
functions but decided against adding an abstraction there early given
size of duplication and creating accidental coupling.
Differential Revision: https://reviews.llvm.org/D127813
Where a constraint also has a def, emit the def only to avoid duplicate
output (and def has more complete info). Also move attributes and types
to the end rather than some on top and some at end.
Differential Revision: https://reviews.llvm.org/D127823
The semi-ring blocks were simply "inlined" by the sparse compiler but
without any filtering or patching. This revision improves the analysis
(rejecting blocks that use non-invariant computations from outside
their blocks, except for linalg.index) and also improves the codegen
by properly patching up index computations (previous version crashed).
With a regression test. Also updated the documentation now that the
example code is properly working.
Reviewed By: bixia
Differential Revision: https://reviews.llvm.org/D128000
The LinalgElementwiseOpFusion pass has become smarter, and converts
the simple conversion linalg operation into a sparse dialect convert
operation. However, since our current bufferization does not take the
new semantics into consideration, we leak memory of the allocation.
For now, this has been fixed by making the operation less trivial.
Reviewed By: bixia
Differential Revision: https://reviews.llvm.org/D128002
contraction op can have mixed type, add support for this case to the pattern
lowering contraction op to outerproduct.
Differential Revision: https://reviews.llvm.org/D127926
The previous approach does not work as the Adreno driver is
clever at optimizing away the selection. So now check two
inputs together.
Reviewed By: ThomasRaoux
Differential Revision: https://reviews.llvm.org/D127930
The useLocalScope printing flag has been passed around between pybind methods, but doesn't actually enable the corresponding printing flag.
Reviewed By: stellaraccident
Differential Revision: https://reviews.llvm.org/D127907
If any of the operands for ICmpOp is a vector, returns a vector<Nxi1>
, rather than an i1 type result.
Differential Revision: https://reviews.llvm.org/D127536
Instead of casting the incoming operand into VectorType to check if it's
scalable or not.
This is the place I missed to fix in f088b99eac.
Differential Revision: https://reviews.llvm.org/D127535
When translating from a llvm::ConstantAggregate with vector type, we
should lower to insertelement operations (if needed) rather than using
insertvalue.
Differential Revision: https://reviews.llvm.org/D127534
The maxf implementation of wmma elementwise op was incorrect as the
operands of the select to check for Nan were swapped.
Differential Revision: https://reviews.llvm.org/D127879
Make the reduction distribution pattern more generic and remove layering
problem. The new pattern to distribute reduction is now independent of
GPU and takes a lamdba to decide how the distributed reduction should be
generated.
Differential Revision: https://reviews.llvm.org/D127867
When specifying an op attribute with a default value (via DefaultValuedAttr), the default value is a string of C++ code. In the general case, the default value of such an attribute cannot be translated to Python when generating the bindings. However, we can hard-code default Python values for frequently-used C++ default values.
This change adds a Python default value for empty ArrayAttrs.
Differential Revision: https://reviews.llvm.org/D127750
Add static assertions into the various `attachInterface` methods, which are
used for adding external interface implementations to attributes, operations
and types, that ensure `ExternalModel` interface classes are instantiated for
the same concrete operation for the concrete base (potentially self) attribute
or type as they are attached to. `FallbackModel`s remain usable for generic
interface models that should support more than one kind of entities.
Reviewed By: springerm
Differential Revision: https://reviews.llvm.org/D127850
Just short-circuit when a change was made, the erased value is invalid
after that. Found by asan.
This pass looks like it could use rewrite patterns instead which don't
have this issue, but let's fix the asan build first.
If `create-deallocs=0`, mark all bufferization.alloc_tensor ops as escaping. (Unless they already have an `escape` attribute.) In the absence of analysis information, check SSA use-def chains to see if the value may be yielded.
Differential Revision: https://reviews.llvm.org/D127302
Bufferization of the func dialect must go through `OneShotModuleBufferize`. With this change, the analysis interface methods of the BufferizableOpInterface of func dialect ops can be used together with the normal `OneShotBufferize`. (In the absence of analysis information, they will return conservative results.)
Differential Revision: https://reviews.llvm.org/D127299
scf::ForOp and scf::WhileOp must insert buffer copies not only for out-of-place bufferizations, but also to enforce additional invariants wrt. to buffer aliasing behavior. This is currently happening in the respective `bufferize` methods. With this change, the tensor copy insertion pass will also enforce these invariants by inserting copies. The `bufferize` methods can then be simplified and made independent of the `AnalysisState` data structure in a subsequent change.
Differential Revision: https://reviews.llvm.org/D126822
Per GLSL Pow extended instruction spec: "Result is undefined if
x < 0. Result is undefined if x = 0 and y <= 0." So we need to
handle negative `x` values specifically.
Reviewed By: ThomasRaoux
Differential Revision: https://reviews.llvm.org/D127816
Make default loop tiling options explicit from CLI options. We can also set default value for separate option which is declared implicitly.
Reviewed By: ayzhuang
Differential Revision: https://reviews.llvm.org/D127711
Removes one element of the pointer union to make it work on 32-bit
systems.
This patch introduces a generic data-flow analysis framework to MLIR. The framework implements a fixed-point iteration algorithm and a dependency graph between lattice states and analysis. Lattice states and points are fully extensible to support highly-customizable analyses.
Reviewed By: phisiart, rriddle
Differential Revision: https://reviews.llvm.org/D126751
If all the arguments to and results of an operation are known to be
non-negative when interpreted as signed (which also implies that all
computations producing those values did not experience signed
overflow), we can replace that operation with an equivalent one that
operates on unsigned values.
Such a replacement, when it is possible, can provide useful hints to
backends, such as by allowing LLVM to replace remainder with bitwise
operations in more cases.
Depends on D124022
Depends on D124023
Reviewed By: Mogball
Differential Revision: https://reviews.llvm.org/D124024
This patch allows attaching user information, called "values" to each
identifier. The values are used to carry information along with variables and
are also used to determine if two variables are identical.
This patch is part of a series of patches to allow attaching user information
with variables in Presburger library.
Reviewed By: ftynse
Differential Revision: https://reviews.llvm.org/D127347
This patch introduces a generic data-flow analysis framework to MLIR. The framework implements a fixed-point iteration algorithm and a dependency graph between lattice states and analysis. Lattice states and points are fully extensible to support highly-customizable analyses.
Reviewed By: phisiart, rriddle
Differential Revision: https://reviews.llvm.org/D126751
Also complete the set by adding a variant of depthwise 1d convolution
with the multiplier != 1.
Differential Revision: https://reviews.llvm.org/D127687
Introduce a transform dialect op that allows one to attempt different
transformation sequences on the same piece of payload IR until one of them
succeeds. This op fundamentally expands the scope of possibilities in the
transform dialect that, until now, could only propagate transformation failure,
at least using in-tree operations. This requires a more detailed specification
of the execution model for the transform dialect that now indicates how failure
is handled and propagated.
Transformations described by transform operations now have tri-state results,
with some errors being fundamentally irrecoverable (e.g., generating malformed
IR) and some others being recoverable by containing ops. Existing transform ops
directly implementing the `apply` interface method are updated to produce this
directly. Transform ops with the `TransformEachTransformOpTrait` are currently
considered to produce only irrecoverable failures and will be updated
separately.
Reviewed By: springerm
Differential Revision: https://reviews.llvm.org/D127724
Add a pattern to do ad hoc lowering of vector.reduction to a sequence of
warp shuffles. This allow distributing reduction on a warp for GPU targets.
Also add an execution test for warp reduction.
co-authored with @springerm
Differential Revision: https://reviews.llvm.org/D127176
Add patterns to propagate vector distribution and remove dead
arguments. This handles propagation for several vector operations.
recommit after minor bug fix.
Differential Revision: https://reviews.llvm.org/D127167
The generated attribute and type def accessors are changed to match the setting on the dialect. Most importantly, "prefixed" will now correctly convert snake case to camel case (e.g. `weight_zp` -> `getWeightZp`)
Reviewed By: jpienaar
Differential Revision: https://reviews.llvm.org/D127688
Ops that implement `RegionBranchOpInterface` are allowed to indicate that they can branch back to themselves in `getSuccessorRegions`, but there is no API that allows them to specify the forwarded operands. This patch enables that by changing `getSuccessorEntryOperands` to accept `None`.
Fixes#54928
Reviewed By: rriddle
Differential Revision: https://reviews.llvm.org/D127239
This patch adds support for tiling operations that implement the
TilingInterface.
- It separates the loop constructs that are used to iterate over tile
from the implementation of the tiling itself. For example, the use
of destructive updates is more related to use of scf.for for
iterating over tiles that are tensors.
- To test the transformation, TilingInterface is implemented for
LinalgOps. The separation of the looping constructs used from the
implementation of tile code generation greatly simplifies the
latter.
- The implementation of TilingInterface for LinalgOp is kept as an
external model for now till this approach can be fully flushed out
to replace the existing tiling + fusion approaches in Linalg.
Differential Revision: https://reviews.llvm.org/D127133
We cannot directly use the original result type; instead we need
to deduce it from the converted operand type. This addresses
invalid ops generated from converting single element vectors.
Reviewed By: ThomasRaoux
Differential Revision: https://reviews.llvm.org/D127574
This avoids pulling in function converion patterns, which is not
part of what we want to test in ArithmeticToSPIRV. It also allows
using ConvertArithmeticToSPIRVPass as a standalone step.
Reviewed By: ThomasRaoux
Differential Revision: https://reviews.llvm.org/D127573
Add patterns to propagate vector distribution and remove dead
arguments. This handles propagation for several vector operations.
Differential Revision: https://reviews.llvm.org/D127167
The function creates dim ops for each dynamic dimension of the raked tensor
argument and returns these as values.
Differential Revision: https://reviews.llvm.org/D127533
When convertEndianOfCharForBEmachine is called with elementBitWidth
smaller than CHAR_BIT, the default case is invoked, but this does
nothing at all and leaves the output array unchanged.
Fix DenseIntOrFPElementsAttr::convertEndianOfArrayRefForBEmachine
by not calling convertEndianOfCharForBEmachine in this case, and
instead simply copying the input to the output (for sub-byte types,
endian conversion is in fact a no-op).
Reviewed By: rriddle
Differential Revision: https://reviews.llvm.org/D125676