Currently the behavior with relative paths is pretty broken. It differs
between external shell and internal shell because the path resolution
is done with a different working directory. With the internal shell,
it's resolved relative to the directory from which lit is executed,
whereas with the external shell it's resolved relative to where the
test case is executed. To make matters worse, using the internal shell
the filepath to binaries looked up with `which` is returned relative
to the directory from which lit is executed, but then executed from
the test execution directory. That means that relative paths with the
internal shell give a `[Errno 2] No such file or directory` error
instead of the expected `command not found`.
To address these issues this patch makes lit interpret relative paths
as relative to the directory from which lit was invoked and modifies
`which` to return absolute paths, matching the behavior of its
namesake unix function.
See https://groups.google.com/g/llvm-dev/c/KzMWlOXR98Y/m/QJoqn0U5HAAJ
Reviewed By: yln
Differential Revision: https://reviews.llvm.org/D115486
This patch allows the user to request all resources of a particular
layer (or core-attribute). The syntax of KMP_HW_SUBSET is modified
so the number of units requested is optional or can be replaced with an
'*' character.
e.g., KMP_HW_SUBSET=c:intel_atom@3 will use all the cores after offset 3
e.g., KMP_HW_SUBSET=*c:intel_core will use all the big cores
e.g., KMP_HW_SUBSET=*s,*c,1t will use all the sockets, all cores per
each socket and 1 thread per core.
Differential Revision: https://reviews.llvm.org/D115826
Test a range of acceptable forms of co_max calls, including
combinations of keyword and non-keyword actual arguments of
numeric types. Also test that several invalid forms of
co_max call generate the correct error messages.
Reviewed By: ktras
Differential Revision: https://reviews.llvm.org/D113083
This reverts 3816c53f04 and removes follow-up
fixups.
The original intention was to show error earlier (posix_fallocate time) than
later for ld.lld but it appears to cause some problems which make it not free.
* FreeBSD ZFS: EINVAL, not too bad.
* FreeBSD UFS: according to khng "devastatingly slow on freebsd because UFS on freebsd does not have preallocation support like illumos. It zero-fills."
* NetBSD: maybe EOPNOTSUPP
* Linux tmpfs: unless tmpfs is set up to use huge pages (requires CONFIG_TRANSPARENT_HUGE_PAGECACHE=y), I can consistently demonstrate ~300ms delay for a 1.4GiB output.
* Linux ext4: I don't measure any benefit, either backed by a hard disk or by a file in tmpfs.
* The current code organization of `defined(HAVE_POSIX_FALLOCATE)` costs us a macro dispatch for AIX.
I think we should just remove it. I think if posix_fallocate ever finds demonstrable benefit,
it is likely Linux specific and will not need HAVE_POSIX_FALLOCATE, and possibly opt-in by some specific programs.
In a filesystem with CoW and compression, the ENOSPC benefit may be lost as well.
Reviewed By: khng300
Differential Revision: https://reviews.llvm.org/D115957
Test various acceptable forms of co_min calls, including
combinations of keyword and non-keyword actual arguments of
integer, real, and character types. Also test that several
invalid forms of co_min call generate the correct error messages.
Reviewed By: ktras
Differential Revision: https://reviews.llvm.org/D113077
writeSections is typically a bottleneck.
This was used to track down the following bottlenecks:
* Output section .rela.dyn (9115d75117)
* Output section .debug_str (3aae04c744)
* posix_fallocate is slow for Linux tmpfs: D115957
Reviewed By: ikudrin
Differential Revision: https://reviews.llvm.org/D115984
Test a range of acceptable forms of co_reduce calls, including
combinations of keyword and non-keyword actual arguments of
numeric types. Also test that several invalid forms of
co_reduce call generate the correct error messages.
Reviewed By: kiranchandramohan, ktras, ekieri
Differential Revision: https://reviews.llvm.org/D113086
When P0883R2 was initially implemented in D103769 #pragma clang deprecated didn't exist yet.
We also forgot to cleanup usages in libc++ itself.
This takes care of both.
Differential Revision: https://reviews.llvm.org/D115995
These conversions are better suited to be applied at whole tensor
level. Applying these as canonicalizations end up triggering such
canonicalizations at all levels of the stack which might be
undesirable. For example some of the resulting code patterns wont
bufferize in-place and need additional stack buffers. Best is to be
more deliberate in when these canonicalizations apply.
Differential Revision: https://reviews.llvm.org/D115912
GCC's powerpc32 port predefines `PPC` as a macro in GNU C++ mode in some configurations (Linux,
FreeBSD, and some others. See `builtin_define_std ("PPC"); ` in gcc/config/rs6000).
```
% powerpc-linux-gnu-g++ -E -dM -xc++ /dev/null -o - | grep -w PPC
#define PPC 1
```
Fixes https://bugs.gentoo.org/829599
Reviewed By: thesamesam
Differential Revision: https://reviews.llvm.org/D116017
There is a small chance that the slot may be not queued in TraceSwitchPart.
This can happen if the slot has kEpochLast epoch and another thread
in FindSlotAndLock discovered that it's exhausted and removed it from
the slot queue. kEpochLast can happen in 2 cases: (1) if TraceSwitchPart
was called with the slot locked and epoch already at kEpochLast,
or (2) if we've acquired a new slot in SlotLock in the beginning
of the function and the slot was at kEpochLast - 1, so after increment
in SlotAttachAndLock it become kEpochLast.
If this happens we crash on ctx->slot_queue.Remove(thr->slot).
Skip the requeueing if the slot is not queued.
The slot is exhausted, so it must not be ctx->slot_queue.
The existing stress test triggers this with very small probability.
I am not sure how to make this condition more likely to be triggered,
it evaded lots of testing.
Depends on D116040.
Reviewed By: melver
Differential Revision: https://reviews.llvm.org/D116041
SlotPairLocker calls SlotLock under ctx->multi_slot_mtx.
SlotLock can invoke global reset DoReset if we are out of slots/epochs.
But DoReset locks ctx->multi_slot_mtx as well, which leads to deadlock.
Resolve the deadlock by removing SlotPairLocker/multi_slot_mtx
and only lock one slot for which we will do RestoreStack.
We need to lock that slot because RestoreStack accesses the slot journal.
But it's unclear why we need to lock the current slot.
Initially I did it just to be on the safer side (but at that time
we dit not lock the second slot, so it was easy just to lock the current slot).
Reviewed By: melver
Differential Revision: https://reviews.llvm.org/D116040
Profile merging is not supported when using debug info profile
correlation because the data section won't be in the binary at runtime.
Change the default profile name in this mode to `default_%p.proflite` so
we don't use profile merging.
Reviewed By: kyulee
Differential Revision: https://reviews.llvm.org/D115979
ctlz/cttz get lowered to the set of target opcodes
This change enables the ISel to select SALU or VALU form according to the SDNode divergence.
CTLZ - S_FLBIT_I32_B32 if uniform and V_FFBH_U32_e64 if divergent
CTTZ - S_FF1_I32_B32 if uniform and V_FFBL_B32_e64 if divergent
Also @llvm.amdgcn.sffbh.i32 gets lowered to S_FLBIT_I32 if uniform and V_FFBH_I32_e64 if divergent
NOTE: 64bit versions S_FF1_I32_B64 and S_FLBIT_I32_B64 are not currently supported by the DAG ISel.
ctlz/cttz with i64 input are split into two 32bit instructions. Nevertheless, they already have the patterns
and were equipped with the divergence predicates to make sure they will be selected correctly when enabled.
Reviewed By: rampitec
Differential Revision: https://reviews.llvm.org/D116044
This reverts commit cc56c66f27.
Fixed a bad assertion, the target of a UsingShadowDecl must not have
*local* qualifiers, but it can be a typedef whose underlying type is qualified.
The diagnostics concerning mixing std::experimental and std are
somewhat wordy and have some typographical errors. Diagnostics do not
start with a capital letter nor end with a fullstop. Usually we try
and link clauses with a semicolon, rather than start a new sentence.
So that's what this patch does. Along with avoiding repetition about
std::experimental going away.
Differential Revision: https://reviews.llvm.org/D116026
Commit 5fbe21a774 missed committing the correct checking of
out-of-class comparision operator argument types. These are they,
from the originally posted diff.
Reviewed By: mizvekov
Differential Revision: https://reviews.llvm.org/D115894
The availability of SVE should be sufficient to enable scalable
auto-vectorization.
This patch adds a new TTI interface to query the target what style of
vectorization it wants when scalable vectors are available. For other
targets than AArch64, this currently defaults to 'FixedWidthOnly'.
Differential Revision: https://reviews.llvm.org/D115651
This preserves all the results we've processed already rather than
throwing them away in the end.
It has some performance implications on the edge cases, in the worst case we
might issue 1 relations and 2 xrefs requests in extra to deduce `HasMore`
correctly.
Fixes https://github.com/clangd/clangd/issues/204.
Differential Revision: https://reviews.llvm.org/D116043
__transaction is a helper class that allows rolling back code in case an
exception is thrown. The main goal is to reduce the clutter when code
needs to be guarded with `#if _LIBCPP_NO_EXCEPTIONS`.
Differential Revision: https://reviews.llvm.org/D115730
Currently there's no way to find the UsingDecl that a typeloc found its
underlying type through. Compare to DeclRefExpr::getFoundDecl().
Design decisions:
- a sugar type, as there are many contexts this type of use may appear in
- UsingType is a leaf like TypedefType, the underlying type has no TypeLoc
- not unified with UnresolvedUsingType: a single name is appealing,
but being sometimes-sugar is often fiddly.
- not unified with TypedefType: the UsingShadowDecl is not a TypedefNameDecl or
even a TypeDecl, and users think of these differently.
- does not cover other rarer aliases like objc @compatibility_alias,
in order to be have a concrete API that's easy to understand.
- implicitly desugared by the hasDeclaration ASTMatcher, to avoid
breaking existing patterns and following the precedent of ElaboratedType.
Scope:
- This does not cover types associated with template names introduced by
using declarations. A future patch should introduce a sugar TemplateName
variant for this. (CTAD deduced types fall under this)
- There are enough AST matchers to fix the in-tree clang-tidy tests and
probably any other matchers, though more may be useful later.
Caveats:
- This changes a fairly common pattern in the AST people may depend on matching.
Previously, typeLoc(loc(recordType())) matched whether a struct was
referred to by its original scope or introduced via using-decl.
Now, the using-decl case is not matched, and needs a separate matcher.
This is similar to the case of typedefs but nevertheless both adds
complexity and breaks existing code.
Differential Revision: https://reviews.llvm.org/D114251
This mnemonic has been supported by GAS for years and
it was added to the PowerPC ISA as of ISA 3.1. We will
support the mnemonic to be compatible with GAS.
This patch enables SPIR-V binary emission for HIP device code via the
HIPSPV tool chain.
‘--offload’ option, which is envisioned in [1], is added for specifying
offload targets. This option is used to override default device target
(amdgcn-amd-amdhsa) for HIP compilation for emitting device code as
SPIR-V binary. The option is handled in getHIPOffloadTargetTriple().
getOffloadingDeviceToolChain() function (based on the design in the
SYCL repository) is added to select HIPSPVToolChain when HIP offload
target is ‘spirv64’.
The HIPActionBuilder is modified to produce LLVM IR at the backend
phase. HIPSPV tool chain expects to receive HIP device code as LLVM
IR so it can run external LLVM passes over them. HIPSPV TC is also
responsible for emitting the SPIR-V binary.
A Cuda GPU architecture ‘generic’ is added. The name is picked from
the LLVM SPIR-V Backend. In the HIPSPV code path the architecture
name is inserted to the bundle entry ID as target ID. Target ID is
expected to be always present so a component in the target triple
is not mistaken as target ID.
Tests are added for checking the HIPSPV tool chain.
[1]: https://lists.llvm.org/pipermail/cfe-dev/2020-December/067362.html
Patch by: Henry Linjamäki
Reviewed by: Yaxun Liu, Artem Belevich, Alexey Bader
Differential Revision: https://reviews.llvm.org/D110622
Checking ELFType == ELF::ET_CORE first skips string comparison for the
majority of cases.
Suggested by Fangrui Song in D114635 for a similar construct.
Need to check for the number of the unique non-constant values since the
unique values may include several constants.
Differential Revision: https://reviews.llvm.org/D115939
SymbolAndSignals stores SymbolInfo which stores two std::strings. Then
the values are stored in a llvm::DenseMap<llvm::StringRef, double>. When
the sorting is happening, SymbolAndSignals are swapped and thus because
of small string optimization some strings may become invalid. This
results in incorrect ranking.
This was detected when running new std::sort algorithm against llvm
toolchain. This could have been prevented with running llvm::sort and
EXPENSIVE_CHECKS. Unfortunately, no sanitizer yelled.
I don't have commit rights, kutdanila@yandex.ru Danila Kutenin
Reviewed By: bkramer
Differential Revision: https://reviews.llvm.org/D116037
This commit rewrites most existing unittests involving FlatAffineConstraints
to use the parsing utility. This helps to make the tests more understandable.
This relands commit b0e8667b1d, which was
reverted in 6963be1276, with a fix to a unittest
which was incorrectly rewritten before.
Reviewed By: arjunp
Differential Revision: https://reviews.llvm.org/D115920
Instead of summing leading zeros on the input operands, multiply the
max possible values of those inputs and count the leading zeros of
the result. This can give us an extra zero bit (typically in cases
where one of the operands is a known constant).
This allows folding away the remaining 'add' ops in the motivating
bug (modeled in the PhaseOrdering IR test):
https://github.com/llvm/llvm-project/issues/48399Fixes#48399
Differential Revision: https://reviews.llvm.org/D115969
Need to check for the number of the unique non-constant values since the
unique values may include several constants.
Differential Revision: https://reviews.llvm.org/D115939
This patch enables divergence predicates for min/max nodes.
It makes ISD::MIN/MAX selected to S_MIN_I(U)32/S_MAX_I(U)32 or V_MIN_I(U)32_e64/V_MAX_I(U)32_e64
Reviewed By: rampitec
Differential Revision: https://reviews.llvm.org/D115954
The "not" is defined as XOR $src -1.
We need to transform this pattern to either S_NOT_B32 or V_NOT_B32_e32
dependent on the "xor" node divergence.
Reviewed By: rampitec, foad
Differential Revision: https://reviews.llvm.org/D115884