-Remove feq, fle, flt tests from *-arith.ll in favor of *-fcmp.ll which tests all predicates.
Reviewed By: asb
Differential Revision: https://reviews.llvm.org/D113703
The sparse tensor code generator allocates memory for the output tensor. As
such, we only need to allocate a MemRefDescriptor to receive the output tensor
and do not need to allocate and initialize the storage for the tensor.
Reviewed By: aartbik
Differential Revision: https://reviews.llvm.org/D115292
This patch adds the documentation for the operations `omp.atomic.read`,
`omp.atomic.write` and `omp.atomic.update`.
Reviewed By: peixin
Differential Revision: https://reviews.llvm.org/D115445
This patch adds on an overhead cost for gathers and scatters, which
is a rough estimate based on performance investigations I have
performed on SVE hardware for various micro-benchmarks.
Differential Revision: https://reviews.llvm.org/D115143
- Define a gpu.printf op, which can be lowered to any GPU printf() support (which is present in CUDA, HIP, and OpenCL). This op only supports constant format strings and scalar arguments
- Define the lowering of gpu.pirntf to a call to printf() (which is what is required for AMD GPUs when using OpenCL) as well as to the hostcall interface present in the AMD Open Compute device library, which is the interface present when kernels are running under HIP.
- Add a "runtime" enum that allows specifying which of the possible runtimes a ROCDL kernel will be executed under or that the runtime is unknown. This enum controls how gpu.printf is lowered
This change does not enable lowering for Nvidia GPUs, but such a lowering should be possible in principle.
And:
[MLIR][AMDGPU] Always set amdgpu-implicitarg-num-bytes=56 on kernels
This is something that Clang always sets on both OpenCL and HIP kernels, and failing to include it causes mysterious crashes with printf() support.
In addition, revert the max-flat-work-group-size to (1, 256) to avoid triggering bugs in the AMDGPU backend.
Reviewed By: mehdi_amini
Differential Revision: https://reviews.llvm.org/D110448
I've added some tests that were previously missing for the gather-scatter costs
being calculated by the vectorizer for AArch64:
Transforms/LoopVectorize/AArch64/sve-gather-scatter-cost.ll
The costs are sometimes different to the ones in
Analysis/CostModel/AArch64/sve-gather.ll
because the vectorizer also adds on the address computation cost.
Depends On D115263
By aligning block size to inner loop iterations parallel_compute_fn LLVM can later unroll and vectorize some of the inner loops with small number of trip counts. Up to 2x speedup in multiple benchmarks.
Reviewed By: bkramer
Differential Revision: https://reviews.llvm.org/D115436
With complex recursive structure of async dispatch function LLVM can't always propagate constants to the parallel_compute_fn and it often prevents optimizations like loop unrolling and vectorization. We help LLVM by pushing known constants into the parallel_compute_fn explicitly.
Reviewed By: bkramer
Differential Revision: https://reviews.llvm.org/D115263
This patch adds allows the LIT runs within test-release.sh to obey the width that
is passed into the script. This is accomplished by adding the width in the LLVM_LIT_ARGS
CMake configuration.
Differential Revision: https://reviews.llvm.org/D115350
The following tests are failing due to missing DWARF sections: `DwarfAccelNamesSection` and `DwarfAddrSection`. This patch sets these tests as `XFAIL` until the sections can be implemented for AIX.
Reviewed By: dblaikie
Differential Revision: https://reviews.llvm.org/D114681
LinalgOp results usually bufferize inplace with output args. With this change, they may buffer inplace with input args if the value of the output arg is not used in the computation.
Differential Revision: https://reviews.llvm.org/D115022
This patch adds 4 options for specifying functions, aliases, globals and
structs name prefixes hat don't need to be renamed by MetaRenamer pass.
This is useful if one has some downstream logic that depends directly
on an entity name. MetaRenamer can break this logic, but with the patch
you can tell it not to rename certain names.
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D115323
This patch factors out math functionality that is a subset of Presburger arithmetic and moves it from FlatAffineConstraints to Presburger/IntegerPolyhedron. This patch only moves some parts of the functionality planned to be moved, with subsequent patches moving more functionality. There are three main reasons for this:
1. This split makes the Presburger Library easier and more flexible to use
across MLIR, by not depending on IR.
2. This split allows the Presburger library to be developed independently from
Affine Analysis, with Affine Analysis using this library.
3. With more functionality being upstreamed to the Presburger Library, the
mlir/Analysis directory will be cluttered with Presburger library components
since they depend on math functionality from FlatAffineConstraints. Moving this
functionality to the Presburger directory allows keeping the new functionality
in the Presburger directory.
This patch is part of an ongoing effort to make the Presburger Library easier to use. The motivation for this effort is the feedback received at the LLVM conference from Mehdi and others.
Reviewed By: bondhugula
Differential Revision: https://reviews.llvm.org/D114674
This reverts commit 317dc31e53.
After that change, OpenMP doesn't find dependencies in the host
system (it fails do find e.g. /usr/lib/x86_64-linux-gnu/libelf.so
which it found before), which causes some OpenMP target offloading
plugins to not be found. This doesn't break the build, but just
causes the AMDGPU OpenMP target plugin to be omitted. See
https://reviews.llvm.org/D113253#3181934 for the report of this
issue.
This patch avoids unnecessarily copying contents of `mmap`-ed files into `CachedFileSystemEntry` by storing `MemoryBuffer` instead. The change leads to ~50% reduction of peak memory footprint when scanning LLVM+Clang via `clang-scan-deps`.
Depends on D115331.
Reviewed By: dexonsmith
Differential Revision: https://reviews.llvm.org/D115043
Most of `MemoryBuffer` interfaces expose a `RequiresNullTerminator` parameter that's being used to:
* determine how to open a file (`mmap` vs `open`),
* assert newly initialized buffer indeed has an implicit null terminator.
This patch adds the paramater to the `SmallVectorMemoryBuffer` constructors, meaning:
* null terminator can now be added to `SmallVector`s that didn't have one before,
* `SmallVectors` that had a null terminator before keep it even after the move.
In line with existing code, the new parameter is defaulted to `true`. This patch makes sure all calls to the `SmallVectorMemoryBuffer` constructor set it to `false` to preserve the current semantics.
Reviewed By: dexonsmith
Differential Revision: https://reviews.llvm.org/D115331
This patch provides functionality for simplifying `PresburgerSet`s by checking if any `FlatAffineConstraints` in the set is contained in another, and removing such redundant FACs.
This is part of a series of patches to provide functionality for [integer set coalescing](http://impact.gforge.inria.fr/impact2015/papers/impact2015-verdoolaege.pdf) in MLIR.
Reviewed By: arjunp
Differential Revision: https://reviews.llvm.org/D110617
This patch supports the atomic construct (update) following section 2.17.7 of OpenMP 5.0 standard. Also added tests and verifier for the same.
Reviewed By: kiranchandramohan, peixin
Differential Revision: https://reviews.llvm.org/D112982
This patch changes uses of `std::unique_lock` to `std::lock_guard`.
The `std::unique_lock` template provides some advanced capabilities (deferred locking, time-constrained locking attempts, etc.) we don't use in the caching filesystem. Plain `std::lock_guard` will do here.
Reviewed By: dexonsmith
Differential Revision: https://reviews.llvm.org/D115332
This change switches tsan to the new runtime which features:
- 2x smaller shadow memory (2x of app memory)
- faster fully vectorized race detection
- small fixed-size vector clocks (512b)
- fast vectorized vector clock operations
- unlimited number of alive threads/goroutimes
Depends on D112602.
Reviewed By: melver
Differential Revision: https://reviews.llvm.org/D112603
According to [dcl.fct.def.coroutine]/p14:
> If the evaluation of the expression promise.unhandled_exception()
> exits via an exception, the coroutine is considered suspended at the
> final suspend point.
But this is not implemented in clang before. This patch would implement
this feature by marking the coroutine as done at the place of
coro.end(frame, /*InUnwindPath=*/true ).
Reviewed By: rjmccall
Differential Revision: https://reviews.llvm.org/D115219
- `vm` constraint is used for masking operand, which always v0.
- Update testcase, only masking operand should use `vm`, vector mask operations
should just use `vr` for any vector register.
- Revise the description of `vm` constraint.
- This patch also fix issue on RISCVRegisterInfo.td and RISCVISelLowering.cpp.
RISCVRegisterInfo.td:
- The first VT in the list must be the largest total size since the
SelectionDAGBuilder uses the first register in the list as the canonical
type for the register.
RISCVISelLowering.cpp:
- Fix RISCVTargetLowering::splitValueIntoRegisterParts and
RISCVTargetLowering::joinRegisterPartsIntoValue for handling vectors
with different total size, that will happened on fractional LMUL since
fractional LMUL is always occupy one vector register.
Reviewed By: frasercrmck
Differential Revision: https://reviews.llvm.org/D112599
I found that the coroutine intrinsic llvm.coro.param in documentation
(https://llvm.org/docs/Coroutines.html#id101) didn't get used actually
since there isn't lowering codes in LLVM. I also checked the
implementation of libstdc++ and libc++. Both of them didn't use
llvm.coro.param. So I am pretty sure that the llvm.coro.param intrinsic
is unused. I think it would be better t to remove it to avoid possible
misleading understandings.
Note: according to [class.copy.elision]/p1.3, this optimization is
allowed by the C++ language specification. Let's make it someday.
Reviewed By: rjmccall
Differential Revision: https://reviews.llvm.org/D115222
The region of `linalg.generic` might contain `tensor` operations. For
example, current lowering of `gather` uses a `tensor.extract` in the
body of the `LinalgOp`. Bufferize the ops within a `LinalgOp` region
as well to catch such cases.
Differential Revision: https://reviews.llvm.org/D115322
Fork the current version of tsan runtime before commiting
rewrite of the runtime (D112603). The old runtime can be
enabled with TSAN_USE_OLD_RUNTIME option.
This is a temporal measure for emergencies and is required
for Chromium rollout (for context see http://crbug.com/1275581).
The old runtime is supposed to be deleted soon.
Reviewed By: thakis
Differential Revision: https://reviews.llvm.org/D115223
Previously we would create global module fragment for extern linkage
declaration which is alreday in global module fragment. However, it is
clearly redundant to do so. This patch would check if the extern linkage
declaration are already in GMF before we create a GMF for it.
This prepares it for the regalloc work. Part of it is making model
evaluation accross 'development' and 'release' scenarios more reusable.
This patch:
- extends support to tensors of any shape (not just scalars, like we had
in the inliner -Oz case). While the tensor shape can be anything, we
assume row-major layout and expose the tensor as a buffer.
- exposes the NoInferenceModelRunner, which we use in the 'development'
mode to keep the evaluation code path consistent and simplify logging,
as we'll want to reuse it in the regalloc case.
Differential Revision: https://reviews.llvm.org/D115306
Complete basic arithmetic operations such as add/sub/mul/div, and it also includes converions
and some specific operations such as bswap.Add load/store patterns to generate different addressing mode instructions.
Also enable some infra such as copy physical register and eliminate frame index.