This fixes all sorts of ABI issues due to passing by-value
(using by-reference with memref's exclusively).
Reviewed By: bkramer
Differential Revision: https://reviews.llvm.org/D128018
This change adds a transformation and pass to the NvGPU dialect that
attempts to optimize reads/writes from a memref representing GPU shared
memory in order to avoid bank conflicts. Given a value representing a
shared memory memref, it traverses all reads/writes within the parent op
and, subject to suitable conditions, rewrites all last dimension index
values such that element locations in the final (col) dimension are
given by
`newColIdx = col % vecSize + perm[row](col/vecSize,row)`
where `perm` is a permutation function indexed by `row` and `vecSize`
is the vector access size in elements (currently assumes 128bit
vectorized accesses, but this can be made a parameter). This specific
transformation can help optimize typical distributed & vectorized accesses
common to loading matrix multiplication operands to/from shared memory.
Differential Revision: https://reviews.llvm.org/D127457
This resolves problems reported in commit 1a20252978.
1. Promote to float lowering for nodes XINT_TO_FP
2. Bail out f16 from shuffle combine due to vector type is not legal in the version
With the recent refactorings, this class is no longer needed. We can use BufferizationOptions in all places were BufferizationState was used.
Differential Revision: https://reviews.llvm.org/D127653
This change changes the bufferization so that it utilizes the new TensorCopyInsertion pass. One-Shot Bufferize no longer calls the One-Shot Analysis. Instead, it relies on the TensorCopyInsertion pass to make the entire IR fully inplacable. The `bufferize` implementations of all ops are simplified; they no longer have to account for out-of-place bufferization decisions. These were already materialized in the IR in the form of `bufferization.alloc_tensor` ops during the TensorCopyInsertion pass.
Differential Revision: https://reviews.llvm.org/D127652
The 'emit_c_wrappers' option in the FuncToLLVM conversion requests C interface
wrappers to be emitted for every builtin function in the module. While this has
been useful to bootstrap the interface, it is problematic in the longer term as
it may unintentionally affect the functions that should retain their existing
interface, e.g., libm functions obtained by lowering math operations (see
D126964 for an example). Since D77314, we have a finer-grain control over
interface generation via an attribute that avoids the problem entirely. Remove
the 'emit_c_wrappers' option. Introduce the '-llvm-request-c-wrappers' pass
that can be run in any pipeline that needs blanket emission of functions to
annotate all builtin functions with the attribute before performing the usual
lowering that accounts for the attribute.
Reviewed By: chelini
Differential Revision: https://reviews.llvm.org/D127952
* Split ops into X_graph variants as discussed;
* Remove tokens from non-Graph region variants and rely on side-effect
modelling there while removing side-effect modelling from Graph
variants and relying on explicit ordering there;
* Make tokens required to be produced by Graph variants - but kept
explicit token type specification given previous discussion on this
potentially being configurable in future;
This results in duplicating some code. I considered adding helper
functions but decided against adding an abstraction there early given
size of duplication and creating accidental coupling.
Differential Revision: https://reviews.llvm.org/D127813
Where a constraint also has a def, emit the def only to avoid duplicate
output (and def has more complete info). Also move attributes and types
to the end rather than some on top and some at end.
Differential Revision: https://reviews.llvm.org/D127823
The semi-ring blocks were simply "inlined" by the sparse compiler but
without any filtering or patching. This revision improves the analysis
(rejecting blocks that use non-invariant computations from outside
their blocks, except for linalg.index) and also improves the codegen
by properly patching up index computations (previous version crashed).
With a regression test. Also updated the documentation now that the
example code is properly working.
Reviewed By: bixia
Differential Revision: https://reviews.llvm.org/D128000
The LinalgElementwiseOpFusion pass has become smarter, and converts
the simple conversion linalg operation into a sparse dialect convert
operation. However, since our current bufferization does not take the
new semantics into consideration, we leak memory of the allocation.
For now, this has been fixed by making the operation less trivial.
Reviewed By: bixia
Differential Revision: https://reviews.llvm.org/D128002
contraction op can have mixed type, add support for this case to the pattern
lowering contraction op to outerproduct.
Differential Revision: https://reviews.llvm.org/D127926
The previous approach does not work as the Adreno driver is
clever at optimizing away the selection. So now check two
inputs together.
Reviewed By: ThomasRaoux
Differential Revision: https://reviews.llvm.org/D127930
The useLocalScope printing flag has been passed around between pybind methods, but doesn't actually enable the corresponding printing flag.
Reviewed By: stellaraccident
Differential Revision: https://reviews.llvm.org/D127907
If any of the operands for ICmpOp is a vector, returns a vector<Nxi1>
, rather than an i1 type result.
Differential Revision: https://reviews.llvm.org/D127536
Instead of casting the incoming operand into VectorType to check if it's
scalable or not.
This is the place I missed to fix in f088b99eac.
Differential Revision: https://reviews.llvm.org/D127535
When translating from a llvm::ConstantAggregate with vector type, we
should lower to insertelement operations (if needed) rather than using
insertvalue.
Differential Revision: https://reviews.llvm.org/D127534
The maxf implementation of wmma elementwise op was incorrect as the
operands of the select to check for Nan were swapped.
Differential Revision: https://reviews.llvm.org/D127879
Make the reduction distribution pattern more generic and remove layering
problem. The new pattern to distribute reduction is now independent of
GPU and takes a lamdba to decide how the distributed reduction should be
generated.
Differential Revision: https://reviews.llvm.org/D127867
When specifying an op attribute with a default value (via DefaultValuedAttr), the default value is a string of C++ code. In the general case, the default value of such an attribute cannot be translated to Python when generating the bindings. However, we can hard-code default Python values for frequently-used C++ default values.
This change adds a Python default value for empty ArrayAttrs.
Differential Revision: https://reviews.llvm.org/D127750
Add static assertions into the various `attachInterface` methods, which are
used for adding external interface implementations to attributes, operations
and types, that ensure `ExternalModel` interface classes are instantiated for
the same concrete operation for the concrete base (potentially self) attribute
or type as they are attached to. `FallbackModel`s remain usable for generic
interface models that should support more than one kind of entities.
Reviewed By: springerm
Differential Revision: https://reviews.llvm.org/D127850
Just short-circuit when a change was made, the erased value is invalid
after that. Found by asan.
This pass looks like it could use rewrite patterns instead which don't
have this issue, but let's fix the asan build first.
If `create-deallocs=0`, mark all bufferization.alloc_tensor ops as escaping. (Unless they already have an `escape` attribute.) In the absence of analysis information, check SSA use-def chains to see if the value may be yielded.
Differential Revision: https://reviews.llvm.org/D127302
Bufferization of the func dialect must go through `OneShotModuleBufferize`. With this change, the analysis interface methods of the BufferizableOpInterface of func dialect ops can be used together with the normal `OneShotBufferize`. (In the absence of analysis information, they will return conservative results.)
Differential Revision: https://reviews.llvm.org/D127299
scf::ForOp and scf::WhileOp must insert buffer copies not only for out-of-place bufferizations, but also to enforce additional invariants wrt. to buffer aliasing behavior. This is currently happening in the respective `bufferize` methods. With this change, the tensor copy insertion pass will also enforce these invariants by inserting copies. The `bufferize` methods can then be simplified and made independent of the `AnalysisState` data structure in a subsequent change.
Differential Revision: https://reviews.llvm.org/D126822
Per GLSL Pow extended instruction spec: "Result is undefined if
x < 0. Result is undefined if x = 0 and y <= 0." So we need to
handle negative `x` values specifically.
Reviewed By: ThomasRaoux
Differential Revision: https://reviews.llvm.org/D127816
Make default loop tiling options explicit from CLI options. We can also set default value for separate option which is declared implicitly.
Reviewed By: ayzhuang
Differential Revision: https://reviews.llvm.org/D127711
Removes one element of the pointer union to make it work on 32-bit
systems.
This patch introduces a generic data-flow analysis framework to MLIR. The framework implements a fixed-point iteration algorithm and a dependency graph between lattice states and analysis. Lattice states and points are fully extensible to support highly-customizable analyses.
Reviewed By: phisiart, rriddle
Differential Revision: https://reviews.llvm.org/D126751
If all the arguments to and results of an operation are known to be
non-negative when interpreted as signed (which also implies that all
computations producing those values did not experience signed
overflow), we can replace that operation with an equivalent one that
operates on unsigned values.
Such a replacement, when it is possible, can provide useful hints to
backends, such as by allowing LLVM to replace remainder with bitwise
operations in more cases.
Depends on D124022
Depends on D124023
Reviewed By: Mogball
Differential Revision: https://reviews.llvm.org/D124024
This patch allows attaching user information, called "values" to each
identifier. The values are used to carry information along with variables and
are also used to determine if two variables are identical.
This patch is part of a series of patches to allow attaching user information
with variables in Presburger library.
Reviewed By: ftynse
Differential Revision: https://reviews.llvm.org/D127347
This patch introduces a generic data-flow analysis framework to MLIR. The framework implements a fixed-point iteration algorithm and a dependency graph between lattice states and analysis. Lattice states and points are fully extensible to support highly-customizable analyses.
Reviewed By: phisiart, rriddle
Differential Revision: https://reviews.llvm.org/D126751
Also complete the set by adding a variant of depthwise 1d convolution
with the multiplier != 1.
Differential Revision: https://reviews.llvm.org/D127687
Introduce a transform dialect op that allows one to attempt different
transformation sequences on the same piece of payload IR until one of them
succeeds. This op fundamentally expands the scope of possibilities in the
transform dialect that, until now, could only propagate transformation failure,
at least using in-tree operations. This requires a more detailed specification
of the execution model for the transform dialect that now indicates how failure
is handled and propagated.
Transformations described by transform operations now have tri-state results,
with some errors being fundamentally irrecoverable (e.g., generating malformed
IR) and some others being recoverable by containing ops. Existing transform ops
directly implementing the `apply` interface method are updated to produce this
directly. Transform ops with the `TransformEachTransformOpTrait` are currently
considered to produce only irrecoverable failures and will be updated
separately.
Reviewed By: springerm
Differential Revision: https://reviews.llvm.org/D127724
Add a pattern to do ad hoc lowering of vector.reduction to a sequence of
warp shuffles. This allow distributing reduction on a warp for GPU targets.
Also add an execution test for warp reduction.
co-authored with @springerm
Differential Revision: https://reviews.llvm.org/D127176
Add patterns to propagate vector distribution and remove dead
arguments. This handles propagation for several vector operations.
recommit after minor bug fix.
Differential Revision: https://reviews.llvm.org/D127167
The generated attribute and type def accessors are changed to match the setting on the dialect. Most importantly, "prefixed" will now correctly convert snake case to camel case (e.g. `weight_zp` -> `getWeightZp`)
Reviewed By: jpienaar
Differential Revision: https://reviews.llvm.org/D127688
Ops that implement `RegionBranchOpInterface` are allowed to indicate that they can branch back to themselves in `getSuccessorRegions`, but there is no API that allows them to specify the forwarded operands. This patch enables that by changing `getSuccessorEntryOperands` to accept `None`.
Fixes#54928
Reviewed By: rriddle
Differential Revision: https://reviews.llvm.org/D127239
This patch adds support for tiling operations that implement the
TilingInterface.
- It separates the loop constructs that are used to iterate over tile
from the implementation of the tiling itself. For example, the use
of destructive updates is more related to use of scf.for for
iterating over tiles that are tensors.
- To test the transformation, TilingInterface is implemented for
LinalgOps. The separation of the looping constructs used from the
implementation of tile code generation greatly simplifies the
latter.
- The implementation of TilingInterface for LinalgOp is kept as an
external model for now till this approach can be fully flushed out
to replace the existing tiling + fusion approaches in Linalg.
Differential Revision: https://reviews.llvm.org/D127133
We cannot directly use the original result type; instead we need
to deduce it from the converted operand type. This addresses
invalid ops generated from converting single element vectors.
Reviewed By: ThomasRaoux
Differential Revision: https://reviews.llvm.org/D127574
This avoids pulling in function converion patterns, which is not
part of what we want to test in ArithmeticToSPIRV. It also allows
using ConvertArithmeticToSPIRVPass as a standalone step.
Reviewed By: ThomasRaoux
Differential Revision: https://reviews.llvm.org/D127573
Add patterns to propagate vector distribution and remove dead
arguments. This handles propagation for several vector operations.
Differential Revision: https://reviews.llvm.org/D127167
The function creates dim ops for each dynamic dimension of the raked tensor
argument and returns these as values.
Differential Revision: https://reviews.llvm.org/D127533
When convertEndianOfCharForBEmachine is called with elementBitWidth
smaller than CHAR_BIT, the default case is invoked, but this does
nothing at all and leaves the output array unchanged.
Fix DenseIntOrFPElementsAttr::convertEndianOfArrayRefForBEmachine
by not calling convertEndianOfCharForBEmachine in this case, and
instead simply copying the input to the output (for sub-byte types,
endian conversion is in fact a no-op).
Reviewed By: rriddle
Differential Revision: https://reviews.llvm.org/D125676
Operand's defining op may not be valid for adding to the worklist under
stict mode
Reviewed By: rriddle
Differential Revision: https://reviews.llvm.org/D127180
This commit extends the UnifyAliasedResourcePass to handle scalar
types of different bitwidths. It requires to get the smaller bitwidth
resource as the canonical resource so that we can avoid subcomponent
load/store. Instead we load/store multiple smaller bitwidth ones.
Reviewed By: hanchung
Differential Revision: https://reviews.llvm.org/D127266
Currently, linking the device libraries requires setting a constant
that indicates the code object ABI version the compilation is
targeting.
This fixes the MLIR linking process by setting this constant to 400,
which is the value corresponding to the current code object ABI
default, version 4.
Reviewed By: Mogball
Differential Revision: https://reviews.llvm.org/D126913
Add pattern to hoist scalar code outside of warp distribute region as
those cannot be distributed and we would want to execute them on all
the lanes.
Add patterns to distribute transfer_write ops. Those operations can be
distributed in different ways and it is control by user.
Differential Revision: https://reviews.llvm.org/D127152
First of all, `LLVM_TOOLS_INSTALL_DIR` put there breaks our NixOS
builds, because `LLVM_TOOLS_INSTALL_DIR` defined the same as
`CMAKE_INSTALL_BINDIR` becomes an *absolute* path, and then when
downstream projects try to install there too this breaks because our
builds always install to fresh directories for isolation's sake.
Second of all, note that `LLVM_TOOLS_INSTALL_DIR` stands out against the
other specially crafted `LLVM_CONFIG_*` variables substituted in
`llvm/cmake/modules/LLVMConfig.cmake.in`.
@beanz added it in d0e1c2a550 to fix a
dangling reference in `AddLLVM`, but I am suspicious of how this
variable doesn't follow the pattern.
Those other ones are carefully made to be build-time vs install-time
variables depending on which `LLVMConfig.cmake` is being generated, are
carefully made relative as appropriate, etc. etc. For my NixOS use-case
they are also fine because they are never used as downstream install
variables, only for reading not writing.
To avoid the problems I face, and restore symmetry, I deleted the
exported and arranged to have many `${project}_TOOLS_INSTALL_DIR`s.
`AddLLVM` now instead expects each project to define its own, and they
do so based on `CMAKE_INSTALL_BINDIR`. `LLVMConfig` still exports
`LLVM_TOOLS_BINARY_DIR` which is the location for the tools defined in
the usual way, matching the other remaining exported variables.
For the `AddLLVM` changes, I tried to copy the existing pattern of
internal vs non-internal or for LLVM vs for downstream function/macro
names, but it would good to confirm I did that correctly.
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D117977
In the transform dialect, a transform IR handle may be pointing to a payload IR
operation that is an ancestor of another payload IR operation pointed to by
another handle. If such a "parent" handle is consumed by a transformation, this
indicates that the associated operation is likely rewritten, which in turn
means that the "child" handle may now be associated with a dangling pointer or
a pointer to a different operation than originally. Add a handle invalidation
mechanism to guard against such situations by reporting errors at runtime.
Reviewed By: springerm
Differential Revision: https://reviews.llvm.org/D127480
There are various shortcuts in `BufferizationState::getBuffer` that avoid a buffer copy when we just need an allocation (and no initialization). This change adds those shortcuts to the TensorCopyInsertion pass, so that `getBuffer` can be simplified in a subsequent change.
Differential Revision: https://reviews.llvm.org/D126821
The constructor already supports passing an ostream as argument,
so let's make the create function support it too.
Differential Revision: https://reviews.llvm.org/D127449
It was a StructAttr. Also adds a FieldParser for AffineMap.
Depends on D127348
Reviewed By: rriddle
Differential Revision: https://reviews.llvm.org/D127350
It is sometimes better to make a copy of the OpResult instead of making a copy of the OpOperand. E.g., when bufferizing tensor.extract_slice.
This implementation will eventually make parts of extract_slice's `bufferize` implementation obsolete (and simplify it). It will only need to handle in-place OpOperands.
Differential Revision: https://reviews.llvm.org/D126819
The TensorCopyInsertion pass resolves out-of-place bufferization decisions by inserting explicit `bufferization.alloc_tensor` ops. This change moves that functionality into a new BufferizableOpInterface method, so that it can be overridden by op implementations. Some op bufferizations must insert additional `alloc_tensor` ops to make sure that certain aliasing invariants are not violated (e.g., scf::ForOp). This will be addressed in a subsequent change.
Differential Revision: https://reviews.llvm.org/D126817
Fixed issue with vector.contract default unroll permutation.
Adds support for vector unroll transformations to unroll in different
orders. For example, the vector.contract can be unrolled into a
smaller set of contractions. There is a choice of how to unroll the
decomposition based on the traversal order of (dim0, dim1, dim2).
The choice of traversal order can now be specified by a callback which
given by the caller of the transform. For now, only the
vector.contract, vector.transfer_read/transfer_write operations
support the callback.
Differential Revision: https://reviews.llvm.org/D127004
This pass runs the One-Shot Analysis to find out which tensor OpOperands must bufferize out-of-place. It then rewrites those tensor OpOperands to explicit allocations with a copy in the form of `bufferization.alloc_tensor`. The resulting IR can then be bufferized without having to care about read-after-write conflicts.
This change makes it possible to connect One-Shot Analysis to other bufferizations such as the sparse compiler.
Differential Revision: https://reviews.llvm.org/D126573
If `copy` is specified, the newly allocated buffer is initialized with the given contents. Also add an optional `escape` attribute to indicate whether the buffer of the tensor may be returned from the parent block (aka. "escape") after bufferization.
This change is in preparation of connecting One-Shot Bufferize to the sparse compiler.
Differential Revision: https://reviews.llvm.org/D126570
Using "default:" in the switch statemements that handle all our
merger ops has become a bit cumbersome since it is easy to overlook
parts of the code that need to handle ops specifically. By enforcing
full switch statements without "default:", we get a compiler warning
when cases are overlooked.
Reviewed By: wrengr
Differential Revision: https://reviews.llvm.org/D127263
This simplifies the bufferization itself and is in preparation of connecting with the sparse compiler.
Differential Revision: https://reviews.llvm.org/D126814
Users should explicitly run `-buffer-results-to-out-params` instead.
The purpose of this change is to remove `finalizeBuffers`, which made it difficult to extend the bufferization to custom buffer types.
Differential Revision: https://reviews.llvm.org/D126253
The buffer deallocation pass must now be run explicitly when `allow-return-alloc` is set.
This results in a few extra buffer copies in unoptimized test cases. The proper way to avoid such copies is to relax the OpOperand/OpResult aliasing contract on ops such as scf.for. Some of these copies can also be avoided by improving the buffer deallocation pass.
Differential Revision: https://reviews.llvm.org/D126252
The operation `shape.concat` was used for type shape only.
We now enable it for extent tensors.
Reviewed By: jpienaar
Differential Revision: https://reviews.llvm.org/D127321
This commit allows for One-Shot Bufferize to be used through the transform dialect. No op handle is currently returned for the bufferized IR.
Differential Revision: https://reviews.llvm.org/D125098
This relies on the existing TileAndFuse pattern for tensor-based structured
ops. It complements pure tiling, from which some utilities are generalized.
Depends On D127300
Reviewed By: springerm
Differential Revision: https://reviews.llvm.org/D127319
Introduce transform ops for "for" loops, in particular for peeling, software
pipelining and unrolling, along with a couple of "IR navigation" ops. These ops
are intended to be generalized to different kinds of loops when possible and
therefore use the "loop" prefix. They currently live in the SCF dialect as
there is no clear place to put transform ops that may span across several
dialects, this decision is postponed until the ops actually need to handle
non-SCF loops.
Additionally refactor some common utilities for transform ops into trait or
interface methods, and change the loop pipelining to be a returning pattern.
Reviewed By: springerm
Differential Revision: https://reviews.llvm.org/D127300
Support complex numbers for Matrix Market Exchange Formats. Add a test case.
Reviewed By: aartbik
Differential Revision: https://reviews.llvm.org/D127138
When constraints in the two operands make each other redundant, prefer constraints of the second because this affects the number of sets in the output at each level; reducing these can help prevent exponential blowup.
This is accomplished by adding extra overloads to Simplex::detectRedundant that only scan a subrange of the constraints for redundancy.
Reviewed By: Groverkss
Differential Revision: https://reviews.llvm.org/D127237
Reduces repetition in tablegen files for defining various tensor types. In particular the goal is to reduce the repetition when defining new tensor types (e.g., D126994).
Reviewed By: aartbik, rriddle
Differential Revision: https://reviews.llvm.org/D127039
Implement the C-API and Python bindings for the builtin opaque type, which was previously missing.
Reviewed By: ftynse
Differential Revision: https://reviews.llvm.org/D127303
When `RegionBranchOpInterface::getSuccessorRegions` is called for anything other than the parent op, it expects the operands of the terminator of the source region to be passed, not the operands of the parent op. This was not always respected.
This fixes a bug in integer range inference and ForwardDataFlowSolver and changes `scf.while` to allow narrowing of successors using constant inputs.
Fixes#55873
Reviewed By: mehdi_amini, krzysz00
Differential Revision: https://reviews.llvm.org/D127261
This is the first PR to add `F16` and `BF16` support to the sparse codegen. There are still problems in supporting these two data types, such as `BF16` is not quite working yet.
Add tests cases.
Reviewed By: aartbik
Differential Revision: https://reviews.llvm.org/D127010
Introduce RoundOp in the math dialect. The operation rounds the operand to the
nearest integer value in floating-point format. RoundOp lowers to LLVM
intrinsics 'llvm.intr.round' or as a function call to libm (round or roundf).
Reviewed By: ftynse
Differential Revision: https://reviews.llvm.org/D127286
Find writability conflicts (writes to buffers that are not allowed to be written to) by checking SSA use-def chains. This is better than the current writability analysis, which is too conservative and finds false positives.
Differential Revision: https://reviews.llvm.org/D127256
This is required for the distribution system for installing the
mlir-libraries component. This is copied from clang's equivalent
feature.
Differential Revision: https://reviews.llvm.org/D126837
The `init` and `tensor` ops are renamed (and one moved to another dialect).
Reviewed By: springerm
Differential Revision: https://reviews.llvm.org/D127169
Reverts commit 1469ebf838 (original commit)
Reverts commit a392a39f75 (build fix for above commit)
The commit broke tests in out-of-tree projects, indicating that some logical
error was made in the previous change but not covered by current tests.
This patch fixes a bug in PresburgeRelation::subtract that made it process the
inequality at index 0, multiple times. This was caused by allocating memory
instead of reserving memory in llvm::SmallVector.
Reviewed By: arjunp
Differential Revision: https://reviews.llvm.org/D127228
A few OpenMP tests were retaining the FIR operands even after running
the LLVM conversion pass. To fix these tests the legality checkes for
OpenMP conversion are made stricter to include operands and results.
The Flush, Single and Sections operations are added to conversions or
legality checks. The RegionLessOpConversion is appropriately renamed
to clarify that it works only for operations with Variable operands.
The operands of the flush operation are changed to match those of
Variable Operands.
Fix for an OpenMP issue mentioned in
https://github.com/llvm/llvm-project/issues/55210.
Reviewed By: shraiysh, peixin, awarzynski
Differential Revision: https://reviews.llvm.org/D127092
Four leading spaces are interpreted as a code block in markdown. Unless
used consistently in ODS op description, they cannot be stripped away by
the tablegen backend, which results in malformed markdown being
generated.
Add complex.conj op to calculate the complex conjugate which is widely used for the mathematical operation on the complex space.
Reviewed By: pifon2a
Differential Revision: https://reviews.llvm.org/D127181
These allow for displaying additional inline information,
such as the types of variables, names operands/results,
constraint/rewrite arguments, etc. This requires a bump in the
vscode extension to a newer version, as inlay hints are a new LSP feature.
Differential Revision: https://reviews.llvm.org/D126033
This is much more efficient over the full mode, as it only requires sending
smalls chunks of files. It also works around a weird command ordering
issue (full document updates are being sent after other commands like
code completion) in newer versions of vscode.
Differential Revision: https://reviews.llvm.org/D126032
This commit beefs up the documentation for MLIR language servers by
adding proper documentations/examples/etc for the provided TableGen
language server capabilities. Given that this documentation is also used
for the vscode extension, this commit also updates the user facing vscode
extension documentation.
Note that the images referenced in the new documentation are hosted on
the website, and will be commited to mlir-www shortly after this commit
lands.
Transpose operations on constant data were getting folded during the
canonicalization process. This has compile time cost proportional to
the constant size. Moving this to a separate pass to enable optionality
and flexibility of how such scenarios can be handled.
Reviewed By: rsuderman, jpienaar, stellaraccident
Differential Revision: https://reviews.llvm.org/D124685
Adds supprot for vector unroll transformations to unroll in different
orders. For example, the `vector.contract` can be unrolled into a
smaller set of contractions. There is a choice of how to unroll the
decomposition based on the traversal order of (dim0, dim1, dim2).
The choice of traversal order can now be specified by a callback which
given by the caller of the transform. For now, only the
`vector.contract`, `vector.transfer_read/transfer_write` operations
support the callback.
Differential Revision: https://reviews.llvm.org/D127004
This commit beefs up the documentation for MLIR language servers by
adding proper documentations/examples/etc for the provided PDLL
language server capabilities. Given that this documentation is also used
for the vscode extension, this commit also updates the user facing vscode
extension documentation.
Not that the images referenced in the new documentation are hosted on
the website, and will be commited to mlir-www shortly after this commit
lands.
Differential Revision: https://reviews.llvm.org/D125650
This operation should be supported as a named op because
when the operands are viewed as having canonical layouts
with decreasing strides, then the "reduction" dimensions
of the filter (h, w, and c) are contiguous relative to each
output channel. When lowered to a matrix multiplication,
this layout is the simplest to deal with, and thus future
transforms/vectorizations of `conv2d` may find using this
named op convenient.
Differential Revision: https://reviews.llvm.org/D126995
This change adds support for promoting `linalg` operation operands that
are produced by rank-reducing `memref.subview` ops.
Differential Revision: https://reviews.llvm.org/D127086
When building in debug mode, the link time of the standalone sample is excessive, taking upwards of a minute if using BFD. This at least allows lld to be used if the main invocation was configured that way. On my machine, this gets a standalone test that requires a relink to run in ~13s for Debug mode. This is still a lot, but better than it was. I think we may want to do something about this test: it adds a lot of latency to a normal compile/test cycle and requires a bunch of arg fiddling to exclude.
I think we may end up wanting a `check-mlir-heavy` target that can be used just prior to submit, and then make `check-mlir` just run unit/lite tests. More just thoughts for the future (none of that is done here).
Reviewed By: bondhugula, mehdi_amini
Differential Revision: https://reviews.llvm.org/D126585
This is correct for all values, i.e. the same as promoting the division to fp32 in the NVPTX backend. But it is faster (~10% in average, sometimes more) because:
- it performs less Newton iterations
- it avoids the slow path for e.g. denormals
- it allows reuse of the reciprocal for multiple divisions by the same divisor
Test program:
```
#include <stdio.h>
#include "cuda_fp16.h"
// This is a variant of CUDA's own __hdiv which is fast than hdiv_promote below
// and doesn't suffer from the perf cliff of div.rn.fp32 with 'special' values.
__device__ half hdiv_newton(half a, half b) {
float fa = __half2float(a);
float fb = __half2float(b);
float rcp;
asm("{rcp.approx.ftz.f32 %0, %1;\n}" : "=f"(rcp) : "f"(fb));
float result = fa * rcp;
auto exponent = reinterpret_cast<const unsigned&>(result) & 0x7f800000;
if (exponent != 0 && exponent != 0x7f800000) {
float err = __fmaf_rn(-fb, result, fa);
result = __fmaf_rn(rcp, err, result);
}
return __float2half(result);
}
// Surprisingly, this is faster than CUDA's own __hdiv.
__device__ half hdiv_promote(half a, half b) {
return __float2half(__half2float(a) / __half2float(b));
}
// This is an approximation that is accurate up to 1 ulp.
__device__ half hdiv_approx(half a, half b) {
float fa = __half2float(a);
float fb = __half2float(b);
float result;
asm("{div.approx.ftz.f32 %0, %1, %2;\n}" : "=f"(result) : "f"(fa), "f"(fb));
return __float2half(result);
}
__global__ void CheckCorrectness() {
int i = threadIdx.x + blockIdx.x * blockDim.x;
half x = reinterpret_cast<const half&>(i);
for (int j = 0; j < 65536; ++j) {
half y = reinterpret_cast<const half&>(j);
half d1 = hdiv_newton(x, y);
half d2 = hdiv_promote(x, y);
auto s1 = reinterpret_cast<const short&>(d1);
auto s2 = reinterpret_cast<const short&>(d2);
if (s1 != s2) {
printf("%f (%u) / %f (%u), got %f (%hu), expected: %f (%hu)\n",
__half2float(x), i, __half2float(y), j, __half2float(d1), s1,
__half2float(d2), s2);
//__trap();
}
}
}
__device__ half dst;
__global__ void ProfileBuiltin(half x) {
#pragma unroll 1
for (int i = 0; i < 10000000; ++i) {
x = x / x;
}
dst = x;
}
__global__ void ProfilePromote(half x) {
#pragma unroll 1
for (int i = 0; i < 10000000; ++i) {
x = hdiv_promote(x, x);
}
dst = x;
}
__global__ void ProfileNewton(half x) {
#pragma unroll 1
for (int i = 0; i < 10000000; ++i) {
x = hdiv_newton(x, x);
}
dst = x;
}
__global__ void ProfileApprox(half x) {
#pragma unroll 1
for (int i = 0; i < 10000000; ++i) {
x = hdiv_approx(x, x);
}
dst = x;
}
int main() {
CheckCorrectness<<<256, 256>>>();
half one = __float2half(1.0f);
ProfileBuiltin<<<1, 1>>>(one); // 1.001s
ProfilePromote<<<1, 1>>>(one); // 0.560s
ProfileNewton<<<1, 1>>>(one); // 0.508s
ProfileApprox<<<1, 1>>>(one); // 0.304s
auto status = cudaDeviceSynchronize();
printf("%s\n", cudaGetErrorString(status));
}
```
Reviewed By: herhut
Differential Revision: https://reviews.llvm.org/D126158
The current vectorization logic implicitly expects "elementwise"
linalg ops to have projected permutations for indexing maps, but
the precondition logic misses this check. This can result in a
crash when executing the generic vectorization transform on an op
with a non-projected permutation input indexing map. This change
fixes the logic and adds a test (which crashes without this fix).
Differential Revision: https://reviews.llvm.org/D127000
This patch adds the knobs to use peeling in the codegen strategy
infrastructure.
Reviewed By: springerm
Differential Revision: https://reviews.llvm.org/D126842
This patch adds tests for memory_order clause for atomic update and
capture operations. This patch also adds a check for making sure that
the operations inside and omp.atomic.capture region do not specify the
memory_order clause.
Reviewed By: kiranchandramohan, peixin
Differential Revision: https://reviews.llvm.org/D126195
`scf.foreach_thread` results alias with the underlying `scf.foreach_thread.parallel_insert_slice` destination operands
and they bufferize to equivalent buffers in the absence of other conflicts.
`scf.foreach_thread.parallel_insert_slice` conflict detection is similar to `tensor.insert_slice` conflict detection.
Reviewed By: springerm
Differential Revision: https://reviews.llvm.org/D126769
Add an option to predicate the epilogue within the kernel instead of
peeling the epilogue. This is a useful option to prevent generating
large amount of code for deep pipeline. This currently require a user
lamdba to implement operation predication.
Differential Revision: https://reviews.llvm.org/D126753
Similar to complex128/complex64, float16 has no direct support
in the ctypes implementation. This fixes the issue by using a
custom F16 type to change the view in and out of MLIR code
Reviewed By: wrengr
Differential Revision: https://reviews.llvm.org/D126928
Since version 0.7 we've added:
* Initial language support for TableGen
* Tweaked syntax highlighting for PDLL
* Added a new command to view intermediate PDLL output
This commit enables providing long-form documentation more seamlessly to the LSP
by revamping decl documentation. For ODS imported constructs, we now also import
descriptions and attach them to decls when possible. For PDLL constructs, the LSP will
now try to provide documentation by parsing the comments directly above the decls
location within the source file. This commit also adds a new parser flag
`enableDocumentation` that gates the import and attachment of ODS documentation,
which is unnecessary in the normal build process (i.e. it should only be used/consumed
by tools).
Differential Revision: https://reviews.llvm.org/D124881
This commit defines a dataflow analysis for integer ranges, which
uses a newly-added InferIntRangeInterface to compute the lower and
upper bounds on the results of an operation from the bounds on the
arguments. The range inference is a flow-insensitive dataflow analysis
that can be used to simplify code, such as by statically identifying
bounds checks that cannot fail in order to eliminate them.
The InferIntRangeInterface has one method, inferResultRanges(), which
takes a vector of inferred ranges for each argument to an op
implementing the interface and a callback allowing the implementation
to define the ranges for each result. These ranges are stored as
ConstantIntRanges, which hold the lower and upper bounds for a
value. Bounds are tracked separately for the signed and unsigned
interpretations of a value, which ensures that the impact of
arithmetic overflows is correctly tracked during the analysis.
The commit also adds a -test-int-range-inference pass to test the
analysis until it is integrated into SCCP or otherwise exposed.
Finally, this commit fixes some bugs relating to the handling of
region iteration arguments and terminators in the data flow analysis
framework.
Depends on D124020
Depends on D124021
Reviewed By: rriddle, Mogball
Differential Revision: https://reviews.llvm.org/D124023
The example was still using the -now- removed sparse_tensor.init_tensor.
Also, I made the input operands of the matrix multiplication sparse too
(since it looks a bit strange to multiply two dense matrices into a sparse).
Reviewed By: bixia
Differential Revision: https://reviews.llvm.org/D126897
In strict mode, only the new inserted operation is allowed to add to the
worklist. Before this change, it would add the users of a replaced op
and it didn't check if the users are allowed to be pushed into the
worklist
Reviewed By: rriddle
Differential Revision: https://reviews.llvm.org/D126899
Prior to this patch, the lowering of memref.reshape operations to the
LLVM dialect failed if the shape argument had a static shape with
dynamic dimensions. This patch adds the necessary support so that when
the shape argument has dynamic values, the lowering probes the dimension
at runtime to set the size in the `MemRefDescriptor` type. This patch
also computes the stride for dynamic dimensions by deriving it from the
sizes of the inner dimensions.
Reviewed By: ftynse
Differential Revision: https://reviews.llvm.org/D126604
These ops complement the tiling/padding transformations by transforming
higher-level named structured operations such as depthwise convolutions into
lower-level and/or generic equivalents that are better handled by some
downstream transformations.
Differential Revision: https://reviews.llvm.org/D126698
This enabled opaque pointers by default in LLVM. The effect of this
is twofold:
* If IR that contains *neither* explicit ptr nor %T* types is passed
to tools, we will now use opaque pointer mode, unless
-opaque-pointers=0 has been explicitly passed.
* Users of LLVM as a library will now default to opaque pointers.
It is possible to opt-out by calling setOpaquePointers(false) on
LLVMContext.
A cmake option to toggle this default will not be provided. Frontends
or other tools that want to (temporarily) keep using typed pointers
should disable opaque pointers via LLVMContext.
Differential Revision: https://reviews.llvm.org/D126689
Now that we have an AllocTensorOp (previously InitTensorOp) in the bufferization dialect, the InitOp in the sparse dialect is no longer needed.
Differential Revision: https://reviews.llvm.org/D126180
The trick of using an empty token in the `FOREVERY_O` x-macro relies on preprocessor behavior which is only standard since C99 6.10.3/4 and C++11 N3290 16.3/4 (whereas it was undefined behavior up through C++03 16.3/10). Since the `ExecutionEngine/SparseTensorUtils.cpp` file is required to be compile-able under C++98 compatibility mode (unlike the C++11 used elsewhere in MLIR), we shouldn't rely on that behavior.
Also, using a non-empty suffix helps improve uniformity of the API, since all other primary/overhead suffixes are also non-empty. I'm using the suffix `0` since that's the value used by the `SparseTensorEncoding` attribute for indicating the index overhead-type.
Depends On D126720
Reviewed By: aartbik
Differential Revision: https://reviews.llvm.org/D126724
Ctlz is an intrinsic in LLVM but does not have equivalent operations in SPIR-V.
Including a decomposition gives an alternative path for these platforms.
Reviewed By: NatashaKnk
Differential Revision: https://reviews.llvm.org/D126261
There is no direct ctypes for MLIR's complex (and thus np.complex128
and np.complex64) yet, causing the mlir python binding methods for
memrefs to crash. This revision fixes this by passing complex arrays
as tuples of floats, correcting at the boundaries for the proper view.
NOTE: some of these changes (4 -> 2) were forced by the new "linting"
Reviewed By: mehdi_amini
Differential Revision: https://reviews.llvm.org/D126422
This fillRow(..., 0) is redundant because when the size of the tableau is
consistent, the resize always creates a new row, which is zero-initialized.
Also added asserts throughout to ensure the dimensions of the tableau remain
consistent.
Reviewed By: Groverkss
Differential Revision: https://reviews.llvm.org/D126709