Note, this is a re-submission of D125894 with `features = ["-header_modules"]`
added to the main BUILD.bazel file.
Some functions like `stpncpy` are implemented in terms of `memset` but are not
currently using `-fno-builtin-memset`. This is somewhat hidden by the fact that
we use `-ffreestanding` globally and that `-ffreestanding` implies
`-fno-builtin` for Clang.
This patch also removes `-mllvm -combiner-global-alias-analysis` that is Clang
specific and that does not bring substantial gains on modern processors.
Also we keep `-mllvm --tail-merge-threshold=0` for aarch64 in CMakeLists.txt
but we omit it in the Bazel config. This is because Bazel consumes the source
files directly and so it can use PGO to take optimal decisions locally.
Differential Revision: https://reviews.llvm.org/D126773
Currently, the Bazel build uses static, checked in [llvm-]config.h files
in combination with global macro definitions to mimic CMake's generated
headers. This change reuses the write_cmake_config.py script from the GN
build to generate the headers from source in the same way. The purpose
is to ensure that the Bazel build stays up to date with any changes to
the CMake config files. The write_cmake_config.py script has good error
checking to ensure that unneeded, stale variables are not passed, and
that any missing variables are reported as errors.
I tried to closely follow the logic in the GN build here:
llvm/utils/gn/secondary/llvm/include/Config/BUILD.gn
The duplication between this file and config.bzl is significant, and we
could consider going further, but I'd like to hold off on it for now.
The GN build changes are to move the write_cmake_config.py script up to
//llvm/utils/write_cmake_config.py, and update the paths accordingly.
The next logical change is to generate Clang's config.h header.
Differential Revision: https://reviews.llvm.org/D126581
Python bindings for extensions of the Transform dialect are defined in separate
Python source files that can be imported on-demand, i.e., that are not imported
with the "main" transform dialect. This requires a minor addition to the
ODS-based bindings generator. This approach is consistent with the current
model for downstream projects that are expected to bundle MLIR Python bindings:
such projects can include their custom extensions into the bundle similarly to
how they include their dialects.
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D126208
Vectorization is a key transformation to achieve high performance on most
architectures. In the transform dialect, vectorization is implemented as a
parameterizable transform op. It currently applies to a scope of payload IR
delimited by some isolated-from-above op, mainly because several enabling
transformations (such as affine simplification) are needed to perform
vectorization and these transformation would apply to ops other than the "main"
computational payload op. A separate "navigation" transform op that obtains the
isolated-from-above ancestor of an op is introduced in the core transform
dialect. Even though it is currently only useful for vectorization,
isolated-from-above ops are a common anchor for transformations (usually
implemented as passes) that is likely to be reused in the future.
Depends On D126374
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D126542
This change makes the public API of SparseTensorUtils.cpp explicit, whereas before the publicity of these functions was only implicit. Implicit publicity is sufficient for mlir-opt to generate calls to these functions, but it's not enough to enable C/C++ code to call them directly in the usual way (i.e., without going through codegen). Thus, leaving the publicity implicit prevents development of other tools (e.g., microbenchmarks).
In addition this change also marks the functions MLIR_CRUNNERUTILS_EXPORT, which is required by the JIT under certain configurations (albeit not for anything in our test suite).
Reviewed By: aartbik
Differential Revision: https://reviews.llvm.org/D126105
The Transform dialect uses the side effect modeling mechanism to record the
effects of the transform ops on the mapping between Transform IR values and
Payload IR ops. Introduce a checker pass that warns if a Transform IR value is
used after it has been freed (consumed). This pass is mostly intended as a
debugging aid in addition to the verification/assertion mechanisms in the
transform interpreter. It reports all potential use-after-free situations.
The implementation makes a series of simplifying assumptions to be simple and
conservative. A more advanced implementation would rely on the data flow-like
analysis associated with a side-effect resource rather than a value, which is
currently not supported by the analysis infrastructure.
Reviewed By: springerm
Differential Revision: https://reviews.llvm.org/D126381
This change adds a new op `alloc_tensor` to the bufferization dialect. During bufferization, this op is always lowered to a buffer allocation (unless it is "eliminated" by a pre-processing pass). It is useful to have such an op in tensor land, because it allows users to model tensor SSA use-def chains (which drive bufferization decisions) and because tensor SSA use-def chains can be analyzed by One-Shot Bufferize, while memref values cannot.
This change also replaces all uses of linalg.init_tensor in bufferization-related code with bufferization.alloc_tensor.
linalg.init_tensor and bufferization.alloc_tensor are similar, but the purpose of the former one is just to carry a shape. It does not indicate a memory allocation.
linalg.init_tensor is not suitable for modelling SSA use-def chains for bufferization purposes, because linalg.init_tensor is marked as not having side effects (in contrast to alloc_tensor). As such, it is legal to move linalg.init_tensor ops around/CSE them/etc. This is not desirable for alloc_tensor; it represents an explicit buffer allocation while still in tensor land and such allocations should not suddenly disappear or get moved around when running the canonicalizer/CSE/etc.
BEGIN_PUBLIC
No public commit message needed for presubmit.
END_PUBLIC
Differential Revision: https://reviews.llvm.org/D126003
Some functions like `stpncpy` are implemented in terms of `memset` but are not
currently using `-fno-builtin-memset`. This is somewhat hidden by the fact that
we use `-ffreestanding` globally and that `-ffreestanding` implies
`-fno-builtin` for Clang.
This patch also removes `-mllvm -combiner-global-alias-analysis` that is Clang
specific and that does not bring substantial gains on modern processors.
Also we keep `-mllvm --tail-merge-threshold=0` for aarch64 in CMakeLists.txt
but we omit it in the Bazel config. This is because Bazel consumes the source
files directly and so it can use PGO to take optimal decisions locally.
Differential Revision: https://reviews.llvm.org/D125894
The approach I took was to define a dialect 'extern' attribute that a GlobalOp can take as a value to signify external linkage. I think this approach should compose well and should also work with wherever the OpaqueElements work goes in the future (since that is just another kind of attribute). I special cased the GlobalOp parser/printer for this case because it is significantly easier on the eyes.
In the discussion, Jeff Niu had proposed an alternative syntax for GlobalOp that I ended up not taking. I did try to implement it but a) I don't think it made anything easier to read in the common case, and b) it made the parsing/printing logic a lot more complicated (I think I would need a completely custom parser/printer to do it well). Please have a look at the common cases where the global type and initial value type match: I don't think how I have it is too bad. The less common cases seem ok to me.
I chose to only implement the direct, constant load op since that is non side effecting and there was still discussion pending on that.
Differential Revision: https://reviews.llvm.org/D124318
Lowering through libm gives us a baseline version, even though it's not
going to be particularly fast. This is similar to what we do for some
math dialect ops.
Differential Revision: https://reviews.llvm.org/D125550
This is the first implementation of complex (f64 and f32) support
in the sparse compiler, with complex add/mul as first operations.
Note that various features are still TBD, such as other ops, and
reading in complex values from file. Also, note that the
std::complex<float> had a bit of an ABI issue when passed as
single argument. It is still TBD if better solutions are possible.
Reviewed By: bixia
Differential Revision: https://reviews.llvm.org/D125596
This pass is to handle computationally complex operations like
tensor.pad which are not simply lowered to the exact same operation in
the memref dialect.
Differential Revision: https://reviews.llvm.org/D125384
This patch fixes the includes for the new UInt class so that the api
test now passes, additionally it fixes the bazel files to account for
the new dependencies.
Differential Revision: https://reviews.llvm.org/D125490
Add lowering of the vector.warp_execute_on_lane_0 into scf.if plus memory
transfer for the operands and yield values.
This also add an integration test running on GPU warp. The same tests can be
later re-used with different comment lines to tests distribution
transformations.
This is mostly from @springerm contribution.
Differential Revision: https://reviews.llvm.org/D125430
While executing the test suite for Tensorflow(v2.8.0), we encountered multiple TC failures with the below error
```
'z14' is not a recognized processor for this target
```
This patch adds the s390x target to the build target list. It fixes TC failures in multiple modules of Tensorflow on s390x arch. It is also tested to have no effect on x86 machines.
Reviewed By: GMNGeoffrey
Differential Revision: https://reviews.llvm.org/D125096
Move async copy operations to NVGPU as they only exist on NV target and are
designed to match ptx semantic. This allows us to also add more fine grain
caching hint attribute to the op.
Add hint to bypass L1 and hook it up to NVVM op.
Differential Revision: https://reviews.llvm.org/D125244
This ensures that attributes such as the index bitwidth propagate
correctly to the AMDGPUToROCDL patterns.
Differential Revision: https://reviews.llvm.org/D125320
By analogy with the NVGPU dialect, introduce an AMDGPU dialect for
AMD-specific intrinsic wrappers.
The dialect initially includes wrappers around the raw buffer intrinsics.
On AMD GPUs, a memref can be converted to a "buffer descriptor" that
allows more precise control of memory access, such as by allowing for
out of bounds loads/stores to be replaced by 0/ignored without adding
additional conditional logic, which is important for performance.
The repository currently contains a limited conversion from
transfer_read/transfer_write to Mubuf intrinsics, which are an older,
deprecated intrinsic for the same functionality.
The new amdgpu.raw_buffer_* ops allow these operations to be used
explicitly and for including metadata such as whether the target
chipset is an RDNA chip or not (which impacts the interpretation of
some bits in the buffer descriptor), while still maintaining an
MLIR-like interface.
(This change also exposes the floating-point atomic add intrinsic.)
Reviewed By: ThomasRaoux
Differential Revision: https://reviews.llvm.org/D122765