Two DebugInfo tests currently `FAIL` on Sparc:
LLVM :: DebugInfo/Generic/2010-06-29-InlinedFnLocalVar.ll
LLVM :: DebugInfo/Generic/array.ll
both in a similar way. E.g.
: 'RUN: at line 1'; /var/llvm/local-sparcv9-A/bin/llc -O2 /vol/llvm/src/llvm-project/local/llvm/test/DebugInfo/Generic/2010-06-29-InlinedFnLocalVar.ll -o - | /var/llvm/local-sparcv9-A/bin/FileCheck /vol/llvm/src/llvm-project/local/llvm/test/DebugInfo/Generic/2010-06-29-InlinedFnLocalVar.ll
/vol/llvm/src/llvm-project/local/llvm/test/DebugInfo/Generic/2010-06-29-InlinedFnLocalVar.ll:4:10: error: CHECK: expected string not found in input
; CHECK: debug_info,
^
On `amd64-pc-solaris2.11`, the corresponding line is
.section .debug_info,"",@progbits
while on `sparcv9-sun-solaris2.11` we have only
.section .debug_info
This happens because Sparc currently emits `.section` directives using the
style of the Solaris/SPARC assembler (controlled by `SunStyleELFSectionSwitchSyntax`).
This patch takes the easy way out and allows both forms while tightening the
check to only match the `.section` directive.
Tested on `sparcv9-sun-solaris2.11`, `amd64-pc-solaris2.11`,
`x86_64-pc-linux-gnu`, and `x86_64-apple-darwin20.0.0`.
Differential Revision: https://reviews.llvm.org/D85414
This eliminates UnitTest's dependency on FPUtil and hence prevents
non-math tests from depending indirectly on FPUtil. The patch
essentially moves some of the existing pieces into a library of its own.
Along the way, renamed add_math_unittest to add_fp_unittest.
Reviewed By: lntue
Differential Revision: https://reviews.llvm.org/D85486
This patch also fixes a minor issue that shape.rank should allow
returning !shape.size. The dialect doc has such an example for
shape.rank.
Differential Revision: https://reviews.llvm.org/D85556
This patch adds an optimization that folds select(freeze(icmp eq/ne x, y), x, y)
to x or y.
This was needed to resolve slowdown after D84940 is applied.
I tried to bake this logic into foldSelectInstWithICmp, but it wasn't clear.
This patch conservatively writes the pattern in a separate function,
foldSelectWithFrozenICmp.
The output does not need freeze; https://alive2.llvm.org/ce/z/X49hNE (from @nikic)
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D85533
The new code added is still very x86_64 specific. AArch64 support will
be added very soon and refactoring of the loader code will be done as
part of the patches adding it.
Reviewed By: asteinhauser
Differential Revision: https://reviews.llvm.org/D82700
Previously the transform was doing these two canonicalizations
(x > y) ? x : y -> (x >= y) ? x : y
(x < y) ? x : y -> (x <= y) ? x : y
But those don't seem to be useful generally. And they actively
pessimize the cases in PR47049.
This patch limits it to
(x > 0) ? x : 0 -> (x >= 0) ? x : 0
(x < -1) ? x : -1 -> (x <= -1) ? x : -1
These are the cases mentioned in the comments as the motivation
for the canonicalization. These allow the CMOV to use the S
flag from the compare thus improving opportunities to use a TEST
or the flags from an arithmetic instruction.
This reverts commit 9f24640b7e.
We hit some dead-locks on thread exit in some configurations: TLS exit handler is taking a lock.
Temporarily reverting this change as we're debugging what is going on.
glibc/sysdeps/unix/sysv/linux/x86_64/sigaction.c libc.a(sigaction.o) has a CIE
with the augmentation string "zRS". Support 'S' to allow --icf={safe,all}.
This revision aims to provide a new API, `checkTilingLegality`, to
verify that the loop tiling result still satisifes the dependence
constraints of the original loop nest.
Previously, there was no check for the validity of tiling. For instance:
```
func @diagonal_dependence() {
%A = alloc() : memref<64x64xf32>
affine.for %i = 0 to 64 {
affine.for %j = 0 to 64 {
%0 = affine.load %A[%j, %i] : memref<64x64xf32>
%1 = affine.load %A[%i, %j - 1] : memref<64x64xf32>
%2 = addf %0, %1 : f32
affine.store %2, %A[%i, %j] : memref<64x64xf32>
}
}
return
}
```
You can find more information about this example from the Section 3.11
of [1].
In general, there are three types of dependences here: two flow
dependences, one in direction `(i, j) = (0, 1)` (notation that depicts a
vector in the 2D iteration space), one in `(i, j) = (1, -1)`; and one
anti dependence in the direction `(-1, 1)`.
Since two of them are along the diagonal in opposite directions, the
default tiling method in `affine`, which tiles the iteration space into
rectangles, will violate the legality condition proposed by Irigoin and
Triolet [2]. [2] implies two tiles cannot depend on each other, while in
the `affine` tiling case, two rectangles along the same diagonal are
indeed dependent, which simply violates the rule.
This diff attempts to put together a validator that checks whether the
rule from [2] is violated or not when applying the default tiling method
in `affine`.
The canonical way to perform such validation is by examining the effect
from adding the constraint from Irigoin and Triolet to the existing
dependence constraints.
Since we already have the prior knowlegde that `affine` tiles in a
hyper-rectangular way, and the resulting tiles will be scheduled in the
same order as their respective loop indices, we can simplify the
solution to just checking whether all dependence components are
non-negative along the tiling dimensions.
We put this algorithm into a new API called `checkTilingLegality` under
`LoopTiling.cpp`. This function iterates every `load`/`store` pair, and
if there is any dependence between them, we get the dependence component
and check whether it has any negative component. This function returns
`failure` if the legality condition is violated.
[1]. Bondhugula, Uday. Effective Automatic parallelization and locality optimization using the Polyhedral model. https://dl.acm.org/doi/book/10.5555/1559029
[2]. Irigoin, F. and Triolet, R. Supernode Partitioning. https://dl.acm.org/doi/10.1145/73560.73588
Differential Revision: https://reviews.llvm.org/D84882
In D85499, I attempted to fix this same issue by canonicalizing
andnp for i1 vectors, but since there was some opposition to such
a change, this commit just fixes the bug by using two different
forms depending on which kind of vector type is in use. We can
then always decide to switch the canonical forms later.
Description of the original bug:
We have a DAG combine that tries to fold (vselect cond, 0000..., X) -> (andnp cond, x).
However, it does so by attempting to create an i64 vector with the number
of elements obtained by truncating division by 64 from the bitwidth. This is
bad for mask vectors like v8i1, since that division is just zero. Besides,
we don't want i64 vectors anyway. For i1 vectors, switch the pattern
to (andnp (not cond), x), which is the canonical form for `kandn`
on mask registers.
Fixes https://github.com/JuliaLang/julia/issues/36955.
Differential Revision: https://reviews.llvm.org/D85553
Objects that are storage associated by EQUIVALENCE and
initialized with DATA are initialized by creating a
compiler temporary data object in the same scope,
assigning it an offset, type, and size that covers the
transitive closure of the associated initialized original
symbols, and combining their initializers into one common
initializer for the temporary.
Some problems with offset assignment of EQUIVALENCE'd objects
in COMMON were exposed and corrected, and some more error
cases are checked.
Remove obsolete function.
Small bugfix (nested implied dos).
Add a test.
Fix struct/class warning.
Differential Revision: https://reviews.llvm.org/D85560
As pointed out in D85387, part of the comment for MapDynamicShadow
refactored to sanitizer_common in D83247 was incorrect for non-Linux
versions. Update the comment to reflect that.
Implement the Reduction Tree Pass framework as part of the MLIR Reduce tool. This is a parametarizable pass that allows for the implementation of custom reductions passes in the tool.
Implement the FunctionReducer class as an example of a Reducer class parameter for the instantiation of a Reduction Tree Pass.
Create a pass pipeline with a Reduction Tree Pass with the FunctionReducer class specified as parameter.
Reviewed By: jpienaar
Differential Revision: https://reviews.llvm.org/D83969
This also beefs up the test coverage:
- Make unranked memref testing consistent with ranked memrefs.
- Add testing for the invalid element type cases.
This is not quite NFC: index types are now allowed in unranked memrefs.
Differential Revision: https://reviews.llvm.org/D85541
If OptNoneInstrumentation prints it instead, 'Skipping pass' will print for even required passes.
Reviewed By: ychen
Differential Revision: https://reviews.llvm.org/D85493
Add support for `-D` and `-U` options for llvm-libtool-darwin. `-D`
allows for using zero for timestamps and UIDs/GIDs. `-U` allows for
using actual timestamps and UIDs/GIDs.
Reviewed by jhenderson, smeenai
Differential Revision: https://reviews.llvm.org/D84209
Add support for `-filelist` option for llvm-libtool-darwin. `-filelist`
option allows for passing in a file containing a list of filenames.
Reviewed by jhenderson, smeenai
Differential Revision: https://reviews.llvm.org/D84206
Add support for constant MachO::CPU_SUBTYPE_ARM64_V8. This constant is
needed so as to match `llvm-libtool-darwin`'s behavior to that of
cctools' libtool when `-arch_only` flag is passed in on command line.
Reviewed by jhenderson, alexshap, smeenai
Differential Revision: https://reviews.llvm.org/D85041
This simple patch translates the num_threads and if clauses of the parallel
operation. Also includes test cases.
A minor change was made to parsing of the if clause to parse AnyType and
return the parsed type. Updates to test cases also.
Reviewed by: SouraVX
Differential Revision: https://reviews.llvm.org/D84798
The test file is the single longest test among clang's tests and ends up about
doubling the wall time of clang tests on machines with high number of cores.
The test appears to consist of multiple independent subtests and does not have
to be in one file. Splitting it into smaller parts reduces test time on my
machine from ~80s down to ~45.
Differential Revision: https://reviews.llvm.org/D85551
Add symlinks for `llvm-libtool-darwin` and
`llvm-install-name-tool`.
Reviewed by jhenderson, smeenai
Differential Revision: https://reviews.llvm.org/D85054
This revision refactors the default definition of the attribute and type `classof` methods to use the TypeID of the concrete class instead of invoking the `kindof` method. The TypeID is already used as part of uniquing, and this allows for removing the need for users to define any of the type casting utilities themselves.
Differential Revision: https://reviews.llvm.org/D85356
Subclass data is useful when a certain amount of memory is allocated, but not all of it is used. In the case of Type, that hasn't been the case for a while and the subclass is just taking up a full `unsigned`. Removing this frees up ~8 bytes for almost every type instance.
Differential Revision: https://reviews.llvm.org/D85348
This class allows for defining thread local objects that have a set non-static lifetime. This internals of the cache use a static thread_local map between the various different non-static objects and the desired value type. When a non-static object destructs, it simply nulls out the entry in the static map. This will leave an entry in the map, but erase any of the data for the associated value. The current use cases for this are in the MLIRContext, meaning that the number of items in the static map is ~1-2 which aren't particularly costly enough to warrant the complexity of pruning. If a use case arises that requires pruning of the map, the functionality can be added.
This is especially useful in the context of MLIR for implementing thread-local caching of context level objects that would otherwise have very high lock contention. This revision adds a thread local cache in the MLIRContext for attributes, identifiers, and types to reduce some of the locking burden. This led to a speedup of several hundred miliseconds when compiling a conversion pass on a very large mlir module(>300K operations).
Differential Revision: https://reviews.llvm.org/D82597
This allows for bucketing the different possible storage types, with each bucket having its own allocator/mutex/instance map. This greatly reduces the amount of lock contention when multi-threading is enabled. On some non-trivial .mlir modules (>300K operations), this led to a compile time decrease of a single conversion pass by around half a second(>25%).
Differential Revision: https://reviews.llvm.org/D82596