Add i8x16 relaxed_swizzle instructions. These are only
exposed as builtins, and require user opt-in.
Differential Revision: https://reviews.llvm.org/D112022
heap.py has a lot of large hand written expressions and each name in the
expression will be looked up by clang during expression parsing. For
function parameters this will be in Sema::ActOnParamDeclarator(...) in order to
catch redeclarations of parameters. The names are not needed and we have seen
some rare cases where since we don't have symbols we end up in
SymbolContext::FindBestGlobalDataSymbol(...) which may conflict with other global
symbols.
There may be a way to make this lookup smarter to avoid these cases but it is
not clear how well tested this path is and how much work it would be to fix it.
So we will go with this fix while we investigate more.
This is a second try at getting all the cases we care about.
Ref: rdar://78265641
Currently, .BTF and .BTF.ext has default alignment of 1.
For example,
$ cat t.c
int foo() { return 0; }
$ clang -target bpf -O2 -c -g t.c
$ llvm-readelf -S t.o
...
Section Headers:
[Nr] Name Type Address Off Size ES Flg Lk Inf Al
...
[ 7] .BTF PROGBITS 0000000000000000 000167 00008b 00 0 0 1
[ 8] .BTF.ext PROGBITS 0000000000000000 0001f2 000050 00 0 0 1
But to have no misaligned data access, .BTF and .BTF.ext
actually requires alignment of 4. Misalignment is not an issue
for architecture like x64/arm64 as it can handle it well. But
some architectures like mips may incur a trap if .BTF/.BTF.ext
is not properly aligned.
This patch explicitly forced .BTF and .BTF.ext alignment to be 4.
For the above example, we will have
[ 7] .BTF PROGBITS 0000000000000000 000168 00008b 00 0 0 4
[ 8] .BTF.ext PROGBITS 0000000000000000 0001f4 000050 00 0 0 4
Differential Revision: https://reviews.llvm.org/D112106
This canonicalizer replaces reshapes of constant tensors that contain the updated shape (skipping the reshape operation).
Differential Revision: https://reviews.llvm.org/D112038
Emit __clangast in custom section instead of named data segment
to find it while iterating sections.
This could be avoided if all data segements (the wasm sense) were
represented as their own sections (in the llvm sense).
This can be resolved by https://github.com/WebAssembly/tool-conventions/issues/138
And the on-disk hashtable in clangast needs to be aligned by 4 bytes,
so add paddings in name length field in custom section header.
The length of clangast section name can be represented in 1 byte
by leb128, and possible maximum pads are 3 bytes, so the section
name length won't be invalid in theory.
Fixes https://bugs.llvm.org/show_bug.cgi?id=35928
Differential Revision: https://reviews.llvm.org/D74531
Having non-undef constants in a final llvm-reduce output is nicer than
having undefs.
This splits the existing reduce-operands pass into three, one which does
the same as the current pass of reducing to undef, and two more to
reduce to the constant 1 and the constant 0. Do not reduce to undef if
the operand is a ConstantData, and do not reduce 0s to 1s.
Reducing GEP operands very frequently causes invalid IR (since types may
not match up if we index differently into a struct), so don't touch GEPs.
Reviewed By: Meinersbur
Differential Revision: https://reviews.llvm.org/D111765
usubsat X, SMIN --> (X ^ SMIN) & (X s>> BW-1)
This would be a regression with D112085 where we combine to
usubsat more aggressively, so avoid that by matching the
special-case where we are subtracting SMIN (signmask):
https://alive2.llvm.org/ce/z/4_3gBD
Differential Revision: https://reviews.llvm.org/D112095
When building a multiarch MachO binary, previously the intermediate
output file names would contain random characters. On macOS this
filename, since it's used when linking, ended up being used as a
stable-ish identifier for the adhoc codesignature of the binary, leading
to non-reproducible binaries. This change uses the architecture, when
available, to create a stable, but unique, basename for the file.
Differential Revision: https://reviews.llvm.org/D111269
Extract part of D111337 in order to mke it smaller
and easier to review. This patch add some utility
functions to the FIRBuilder.
Add the following utility functions:
- getCharacterLengthType
- createStringLiteral
- locationToFilename
- characterWithDynamicLen
- sequenceWithNonConstantShape
- hasDynamicSize
These bring up the BoxValue implementation together with it.
This patch is part of the upstreaming effort from fir-dev branch.
Reviewed By: AlexisPerry
Differential Revision: https://reviews.llvm.org/D112074
Co-authored-by: Jean Perier <jperier@nvidia.com>
Co-authored-by: Eric Schweitz <eschweitz@nvidia.com>
Representation of the file's last modification time depends on the file
system and isn't guaranteed to be in seconds. Cast to seconds explicitly
and tighten the test case to check the magnitude of the calculated
value, so we can catch passing milliseconds or nanoseconds.
rdar://83915615
Differential Revision: https://reviews.llvm.org/D111205
In macOS 12, dyld switched to using chained fixups. As a result, all symbols
are bound at launch and there are no lazy pointers any more. Since we wish to
import/dlopen() a dylib with missing symbols, we need to use a weak import.
This applies to all macOS 12-aligned OS releases, e.g. iOS 15, etc.
rdar://81295101
Differential Revision: https://reviews.llvm.org/D112034
- Add the missing NVVM predicate builtins on address space checking
- Redefine them as pure functions so that they could be used in
__builtin_assume.
Reviewed By: tra
Differential Revision: https://reviews.llvm.org/D112053
The help string can be more helpful by explaining these are
aliases for 'process connect'
Reviewed By: JDevlieghere
Differential Revision: https://reviews.llvm.org/D111965
As seen in PR51869 the ScalarEvolution::isImpliedCond function might
end up spending lots of time when doing the isKnownPredicate checks.
Calling isKnownPredicate for example result in isKnownViaInduction
being called, which might result in isLoopBackedgeGuardedByCond being
called, and then we might get one or more new calls to isImpliedCond.
Even if the scenario described here isn't an infinite loop, using
some random generated C programs as input indicates that those
isKnownPredicate checks quite often returns true. On the other hand,
the third condition that needs to be fulfilled in order to "prove
implications via truncation", i.e. the isImpliedCondBalancedTypes
check, is rarely fulfilled.
I also made some similar experiments to look at how often we would
get the same result when using isKnownViaNonRecursiveReasoning instead
of isKnownPredicate. So far I haven't seen a single case when codegen
is negatively impacted by using isKnownViaNonRecursiveReasoning. On
the other hand, it seems like we get rid of the compile time explosion
seen in PR51869 that way. Hence this patch.
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D112080
This is trivial. It was left out of the original review only because we had multiple copies of the same code in review at the same time, and keeping them in sync was easiest if the structure was kept in sync.
This patch duplicates a bit of logic we apply to comparisons encountered during the IV users walk to conditions which feed exit conditions. Why? simplifyAndExtend has a very limited list of users it walks. In particular, in the examples is stops at the zext and never visits the icmp. (Because we can't fold the zext to an addrec yet in SCEV.) Being willing to visit when we haven't simplified regresses multiple tests (seemingly because of less optimal results when computing trip counts).
Note that this can be trivially extended to multiple exiting blocks. I'm leaving that to a future patch (solely to cut down on the number of versions of the same code in review at once.)
Differential Revision: https://reviews.llvm.org/D111896
Using BPI within loop predication is non-trivial because BPI is only
preserved lossily in loop pass manager (one fix exposed by lossy
preservation is up for review at D111448). However, since loop
predication is only used in downstream pipelines, it is hard to keep BPI
from breaking for incomplete state with upstream changes in BPI.
Also, correctly preserving BPI for all loop passes is a non-trivial
undertaking (D110438 does this lossily), while the benefit of using it
in loop predication isn't clear.
In this patch, we rely on profile metadata to get almost similar benefit as
BPI, without actually using the complete heuristics provided by BPI.
This avoids the compile time explosion we tried to fix with D110438 and
also avoids fragile bugs because BPI can be lossy in loop passes
(D111448).
Reviewed-By: asbirlea, apilipenko
Differential Revision: https://reviews.llvm.org/D111668
`weekday` has a static member function `__weekday_from_days` which is
not part of the mandated public interface of `weeekday` according to the
standard. Since it is only used internally in the constructors of
`weekday`, let's make it private.
Reviewed By: ldionne, Mordante, #libc
Differential Revision: https://reviews.llvm.org/D112072
Mark LWG3573 as complete. It involves a change in wording around when
`basic_string_view`'s constructor for iterator/sentinel can throw. The
current implementation is not marked conditionally `noexcept`, so there
is nothing to do here. Add a test that binds this behavior to verify the
constructor is not marked `noexcept(true)` when `end - begin` throws.
Reviewed By: ldionne, Mordante, #libc
Differential Revision: https://reviews.llvm.org/D111925
The only possible kind of a conversion in initialization of a shared
pointer to an array is a qualification conversion (i.e., adding
cv-qualifiers). This patch adds tests for converting from `A[]` to
`const A[]` to the following functions:
```
template<class Y> explicit shared_ptr(Y* p);
template<class Y> shared_ptr(const shared_ptr<Y>& r);
template<class Y> shared_ptr(shared_ptr<Y>&& r);
template<class Y> shared_ptr& operator=(const shared_ptr<Y>& r);
template<class Y> shared_ptr& operator=(shared_ptr<Y>&& r);
template<class Y> void reset(Y* p);
template<class Y, class D> void reset(Y* p, D d);
template<class Y, class D, class A> void reset(Y* p, D d, A a);
```
Similar tests for converting functions that involve a `weak_ptr` should
be added once LWG issue [3001](https://cplusplus.github.io/LWG/issue3001)
is implemented.
Differential Revision: https://reviews.llvm.org/D112048
To simplify suppressing warnings (for example, for
when multiple check aliases are enabled).
The globbing format reuses the same code as for
globbing when enabling checks, so the semantics
and behavior is identical.
Differential Revision: https://reviews.llvm.org/D111208
The plugin currently uses a macro to check if this is a debug built
before assigning the debug kind variable to the device environment
struct. This is being deprecated because the new device runtime does not
maintain separate debug builds and should always be availible.
Reviewed By: tianshilei1992
Differential Revision: https://reviews.llvm.org/D112083
Running tests for libunwind is a lot simpler than running tests for
libc++, so a simple Lit config file is sufficient. The benefit is that
we disentangle the libunwind test configuration from the libc++ and
libc++abi test configuration. The setup was too complicated, which led
to some bugs (notably we were running against the system libunwind on
Apple platforms).
Differential Revision: https://reviews.llvm.org/D111664
Use FPU_REG macro to define dN registers, removing the wrong value_regs
while at it. This is a piece-wise attempt of reconstructing D112066
with the goal of figuring out which part of the larger change breaks
the buildbot.
Differential Revision: https://reviews.llvm.org/D112066
Summary:
Break out non-functional changes to the print-changed classes that are needed
for reuse with the DotCfg change printer in https://reviews.llvm.org/D87202.
Various changes to the change printers to facilitate reuse with the
upcoming DotCfg change printer. This includes changing several of
the classes and their support classes to being templates. Also,
some template parameter names were simplified to avoid confusion
with planned identifiers in the DotCfg change printer to come. A
virtual function in the class for comparing functions was changed
to a lambda. The virtual function same was replaced with calls to
operator==. The only intentional functional change was to add the exe name
as the first parameter to llvm::sys::ExecuteAndWait
Author: Jamie Schmeiser <schmeise@ca.ibm.com>
Reviewed By: aeubanks (Arthur Eubanks)
Differential Revision: https://reviews.llvm.org/D110737
Following on from an earlier patch that introduced support for -mtune
for AArch64 backends, this patch splits out the tuning features
from the processor features. This gives us the ability to enable
architectural feature set A for a given processor with "-mcpu=A"
and define the set of tuning features B with "-mtune=B".
It's quite difficult to write a test that proves we select the
right features according to the tuning attribute because most
of these relate to scheduling. I have created a test here:
CodeGen/AArch64/misched-fusion-addr-tune.ll
that demonstrates the different scheduling choices based upon
the tuning.
Differential Revision: https://reviews.llvm.org/D111551
The functions are moved above the parseClauses function as they
will be used inside it to parse `hint` clause
Reviewed By: clementval
Differential Revision: https://reviews.llvm.org/D112071
This patch attempts to restrict the following P10 options:
```
-mprefixed
-mpcrel
-mpaired-vector-memops
```
To P10 only. This will prevent the use of these options on P9 and earlier.
The behaviour of this patch looks like the following on pre-P10:
```
$ clang -mcpu=pwr9 -mpaired-vector-memops test.c -o test
error: option '-mpaired-vector-memops' cannot be specified without '-mcpu=pwr10'
$ clang -mcpu=pwr9 -mprefixed test.c -o test
error: option '-mprefixed' cannot be specified without '-mcpu=pwr10'
$ clang -mcpu=pwr9 -mprefixed -mpcrel test.c -o test
error: option '-mpcrel' cannot be specified without '-mcpu=pwr10 -mprefixed'
$ clang -mcpu=pwr9 -mpcrel -mprefixed test.c -o test
error: option '-mpcrel' cannot be specified without '-mcpu=pwr10 -mprefixed'
$ clang -mcpu=pwr9 -mpcrel test.c -o test
error: option '-mpcrel' cannot be specified without '-mcpu=pwr10 -mprefixed'
```
Differential Revision: https://reviews.llvm.org/D109652
This patch ensures that we always tune for a given CPU on AArch64
targets when the user specifies the "-mtune=xyz" flag. In the
AArch64Subtarget if the tune flag is unset we use the CPU value
instead.
I've updated the release notes here:
llvm/docs/ReleaseNotes.rst
and added tests here:
clang/test/Driver/aarch64-mtune.c
Differential Revision: https://reviews.llvm.org/D110258
Mark LWG3420 as complete. Currently, the `cpp17_iterator` concept
checks that the type looks like an iterator first before checking if it
is copyable.
Reviewed By: ldionne, Quuxplusone, #libc
Differential Revision: https://reviews.llvm.org/D111598
This is a piece-wise attempt of reconstructing D112066 with the goal
of figuring out which part of the larger change breaks the buildbot.
Differential Revision: https://reviews.llvm.org/D112066
Inspired by D111968, provide a isNegatedPowerOf2() wrapper instead of obfuscating code with (-Value).isPowerOf2() patterns, which I'm sure are likely avenues for typos.....
Differential Revision: https://reviews.llvm.org/D111998