This relands D85743 with a fix for test
CodeGen/attr-arm-sve-vector-bits-call.c that disables the new pass
manager with '-fno-experimental-new-pass-manager'. Test was failing due
to IR differences with the new pass manager which broke the Fuchsia
builder [1]. Reverted in 2e7041f.
[1] http://lab.llvm.org:8011/builders/fuchsia-x86_64-linux/builds/10375
Original summary:
This patch implements codegen for the 'arm_sve_vector_bits' type
attribute, defined by the Arm C Language Extensions (ACLE) for SVE [1].
The purpose of this attribute is to define vector-length-specific (VLS)
versions of existing vector-length-agnostic (VLA) types.
VLSTs are represented as VectorType in the AST and fixed-length vectors
in the IR everywhere except in function args/return. Implemented in this
patch is codegen support for the following:
* Implicit casting between VLA <-> VLS types.
* Coercion of VLS types in function args/return.
* Mangling of VLS types.
Casting is handled by the CK_BitCast operation, which has been extended
to support the two new vector kinds for fixed-length SVE predicate and
data vectors, where the cast is implemented through memory rather than a
bitcast which is unsupported. Implementing this as a normal bitcast
would require relaxing checks in LLVM to allow bitcasting between
scalable and fixed types. Another option was adding target-specific
intrinsics, although codegen support would need to be added for these
intrinsics. Given this, casting through memory seemed like the best
approach as it's supported today and existing optimisations may remove
unnecessary loads/stores, although there is room for improvement here.
Coercion of VLSTs in function args/return from fixed to scalable is
implemented through the AArch64 ABI in TargetInfo.
The VLA and VLS types are defined by the ACLE to map to the same
machine-level SVE vectors. VLS types are mangled in the same way as:
__SVE_VLS<typename, unsigned>
where the first argument is the underlying variable-length type and the
second argument is the SVE vector length in bits. For example:
#if __ARM_FEATURE_SVE_BITS==512
// Mangled as 9__SVE_VLSIu11__SVInt32_tLj512EE
typedef svint32_t vec __attribute__((arm_sve_vector_bits(512)));
// Mangled as 9__SVE_VLSIu10__SVBool_tLj512EE
typedef svbool_t pred __attribute__((arm_sve_vector_bits(512)));
#endif
The latest ACLE specification (00bet5) does not contain details of this
mangling scheme, it will be specified in the next revision. The
mangling scheme is otherwise defined in the appendices to the Procedure
Call Standard for the Arm Architecture, see [2] for more information.
[1] https://developer.arm.com/documentation/100987/latest
[2] https://github.com/ARM-software/abi-aa/blob/master/aapcs64/aapcs64.rst#appendix-c-mangling
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D85743
In this patch, a pc-rel based long branch thunk is added for the local
call protocol that caller and callee does not use TOC.
Reviewed By: sfertile, nemanjai
Differential Revision: https://reviews.llvm.org/D86706
This patch changes ElementCount so that the Min and Scalable
members are now private and can only be accessed via the get
functions getKnownMinValue() and isScalable(). In addition I've
added some other member functions for more commonly used operations.
Hopefully this makes the class more useful and will reduce the
need for calling getKnownMinValue().
Differential Revision: https://reviews.llvm.org/D86065
The abbrev codes in a new abbrev table should start from 1 (by default),
rather than inherit the value from the code in the previous table.
Reviewed By: jhenderson
Differential Revision: https://reviews.llvm.org/D86545
Original D81646 had check for tied regs in foldPatchpoint().
Due to unfortunate miscommunication with review comments and
adressing some comments post commit, it turned into assertion.
We had an offline talk and agreed that with current implementation
this path is possible, so I'm changing it back to check.
Note that this is workaround until ussues described in PR46917 are
resolved.
Remove the code that tried to look for reduction patterns, since the
vectorizer and isel can now produce predicated arithmetic instructios
within the loop body. This has required some reorganisation and fixes
around live-out and predication checks, as well as looking for cases
where an input/output is initialised to zero.
Differential Revision: https://reviews.llvm.org/D86613
Previously in addTypeForNeon, we would set the operations for bfloat vectors
like other generic types. But as bfloat is a storage-only type a number of
operations shouldn't be set. This patch fixes that.
Reviewed By: dmgreen
Differential Revision: https://reviews.llvm.org/D85101
When guessing whether a closing paren is then end of a cast expression also
skip over pointer qualifiers while looking for TT_PointerOrReference.
This prevents some address-of and dereference operators from being parsed
as a binary operator.
Before:
x = (foo *const) * v;
x = (foo *const volatile restrict __attribute__((foo)) _Nonnull _Null_unspecified _Nonnull) & v;
After:
x = (foo *const)*v;
x = (foo *const volatile restrict __attribute__((foo)) _Nonnull _Null_unspecified _Nonnull)&v;
Reviewed By: MyDeveloperDay
Differential Revision: https://reviews.llvm.org/D86716
This changes getDomMemoryDef to check if a Current is a valid
candidate for elimination before checking for reads. Before the change,
we were spending a lot of compile-time in checking for read accesses for
Current that might not even be removable.
This patch flips the logic, so we skip Current if they cannot be
removed before checking all their uses. This is much more efficient in
practice.
It also adds a more aggressive limit for checking partially overlapping
stores. The main problem with overlapping stores is that we do not know
if they will lead to elimination until seeing all of them. This patch
limits adds a new limit for overlapping store candidates, which keeps
the number of modified overlapping stores roughly the same.
This is another substantial compile-time improvement (while also
increasing the number of stores eliminated). Geomean -O3 -0.67%,
ReleaseThinLTO -0.97%.
http://llvm-compile-time-tracker.com/compare.php?from=0a929b6978a068af8ddb02d0d4714a2843dd8ba9&to=2e630629b43f64b60b282e90f0d96082fde2dacc&stat=instructions
Reviewed By: asbirlea
Differential Revision: https://reviews.llvm.org/D86487
This patch is mostly about removing the "Category" enum, which was
very useful when the Type enum contained a large number of types, but
now the two are completely identical.
It also removes some other artifacts like unused typedefs and macros.
`{@code xxxxx}` triggers a Doxygen bug. The bug may be matching the
close brace with the open brace of the namespace
declaration (`namespace clang {` or `namespace ento {`).
Differential Revision: https://reviews.llvm.org/D85105
Tests on Solaris/sparcv9 currently show about 250 failures when building
with gcc, most of them like the following:
FAIL: LLVM-Unit :: Support/./SupportTests/TaskQueueTest.UnOrderedFutures (4269 of 67884)
******************** TEST 'LLVM-Unit :: Support/./SupportTests/TaskQueueTest.UnOrderedFutures' FAILED ********************
Note: Google Test filter = TaskQueueTest.UnOrderedFutures
[==========] Running 1 test from 1 test case.
[----------] Global test environment set-up.
[----------] 1 test from TaskQueueTest
[ RUN ] TaskQueueTest.UnOrderedFutures
0 SupportTests 0x0000000100753b20 llvm::sys::PrintStackTrace(llvm::raw_ostream&) + 32
1 SupportTests 0x0000000100752974 llvm::sys::RunSignalHandlers() + 68
2 SupportTests 0x0000000100752b18 SignalHandler(int) + 372
3 libc.so.1 0xffffffff7eedc800 __sighndlr + 12
4 libc.so.1 0xffffffff7eecf23c call_user_handler + 852
5 libc.so.1 0xffffffff7eecf594 sigacthandler + 84
6 SupportTests 0x00000001006f8cb8 std:🧵:_State_impl<std:🧵:_Invoker<std::tuple<llvm::ThreadPool::ThreadPool(llvm::ThreadPoolStrategy)::'lambda'()> > >::_M_run() + 512
7 libstdc++.so.6.0.28 0xfffffffc628117cc execute_native_thread_routine + 16
8 libc.so.1 0xffffffff7eedc6a0 _lwp_start + 0
Since it's effectively impossible to debug such a `SEGV` in a `Release`
build, I tried a `Debug` build instead, only to find that the failures had
gone away.
Further investigation revealed that most of the issue centers around
`llvm/lib/Support/ThreadPool.cpp`. That file is built with `-O3 -fPIC` in
a `Release` build. The failure vanishes if
- compiling without `-fPIC`
- compiling with `-O -fPIC`
- linking with GNU `ld` instead of Solaris `ld`
It has meanwhile been determined that `gcc` doesn't correctly heed some TLS
code sequences. To make things worse, Solaris `ld` doesn't properly
validate its assumptions against the input, generating wrong code.
`gld` like `gcc` is more liberal here and correctly deals with the code it
gets fed from `gcc`.
There's PR target/96607: GCC feeds SPARC/Solaris linker with unrecognized
TLS sequences <https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96607> now.
An attempt to build with `-DLLVM_ENABLE_PIC=Off` initially failed since
neither `libRemarks.so` (D85626 <https://reviews.llvm.org/D85626>) nor
`LLVMPolly.so` (D85627 <https://reviews.llvm.org/D85627>) heed that option.
Even with that fixed, a few codegen failures remain.
Next I tried to build just `ThreadPool.cpp` with `-O -fPIC`. While that
fixed the vast majority of the failures, 16 `LLVM :: CodeGen/X86` failures
remained.
Given that that solution was both incomplete and fragile, I went for
building the whole tree with `-O -fPIC` for `Release` and `RelWithDebInfo`
builds.
As detailed in Bug 47304, 2-stage builds also show large numbers of
failures when building with `-O3` or `-O2`, which are likewise worked
around by building with `-O` until they are sufficiently analyzed and
fixed.
This way, all failures relative to a `Debug` build go away.
Tested on `sparcv9-sun-solaris2.11`.
Differential Revision: https://reviews.llvm.org/D85630
This patch adds support for memcmp in MemoryLocation::getForArgument.
memcmp reads from the first 2 arguments up to the number of bytes of the
third argument.
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D86725
The tensor_reshape op was only fusible only if it is a collapsing case. Now we
propagate the op to all the operands so there is a further chance to fuse it
with generic op. The pre-conditions are:
1) The producer is not an indexed_generic op.
2) All the shapes of the operands are the same.
3) All the indexing maps are identity.
4) All the loops are parallel loops.
5) The producer has a single user.
It is possible to fuse the ops if the producer is an indexed_generic op. We
still can compute the original indices. E.g., if the reshape op collapses the d0
and d1, we can use DimOp to get the width of d1, and calculate the index
`d0 * width + d1`. Then replace all the uses with it. However, this pattern is
not implemented in the patch.
Reviewed By: mravishankar
Differential Revision: https://reviews.llvm.org/D86314
strspn, strncmp, strcspn, strcasecmp, strncasecmp, memcmp, memchr,
memrchr, memcpy, memmove, memcpy, mempcpy, strchr, strrchr, bcmp
should all only access memory through their arguments.
I broke out strcoll, strcasecmp, strncasecmp because the result
depends on the locale, which might get accessed through memory.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D86724
We have a few helper functions like the following:
```
std::error_code create*Dumper(...)
```
In fact we do not need or want to use `std::error_code` and the code
can be simpler if we just return `std::unique_ptr<ObjDumper>`.
This patch does this change and refines the signature of `createDumper`
as well.
Differential revision: https://reviews.llvm.org/D86718
This adds testing for the "Format" field printed with `--file-headers`.
llvm-readelf doesn't use them, so only llvm-readobj needs to be tested.
All possible values are defined and tested in `ELFObjectFile<ELFT>::getFileFormatName()`.
Here we test just a few arbitrary ones.
Differential revision: https://reviews.llvm.org/D86350
This adds all missing format values that are defined in
ELFObjectFile<ELFT>::getFileFormatName().
Differential revision: https://reviews.llvm.org/D86625
Some reduction passes may create invalid IR. I am not aware of any use
case where we would like to proceed reducing invalid IR. Various utils
used here, including CloneModule, assume the module to clone is valid
and crash otherwise.
Ideally, no reduction pass would create invalid IR, but some currently
do. ReduceInstructions can be fixed relatively easily (D86210), but
others are harder. For example, ReduceBasicBlocks may remove result in
invalid PHI nodes.
For now, skip the chunks. If we get to the point where all reduction
passes result in valid IR, we may want to turn this into an assertion.
Reviewed By: lebedev.ri
Differential Revision: https://reviews.llvm.org/D86212
If there's no unwinding opcodes, omit writing the xdata/pdata records.
Previously, this generated truncated xdata records, and llvm-readobj
would error out when trying to print them.
If writing of an xdata record is forced via the .seh_handlerdata
directive, skip it if there's no info to make a sensible unwind
info structure out of, and clearly error out if such info appeared
later in the process.
Differential Revision: https://reviews.llvm.org/D86527
This is intended to ease the transition for client with a lot of
dependencies. It'll be removed in the coming weeks.
Differential Revision: https://reviews.llvm.org/D86755
It's not undefined behavior for an unsigned left shift to overflow (i.e. to
shift bits out), but it has been the source of bugs and exploits in certain
codebases in the past. As we do in other parts of UBSan, this patch adds a
dynamic checker which acts beyond UBSan and checks other sources of errors. The
option is enabled as part of -fsanitize=integer.
The flag is named: -fsanitize=unsigned-shift-base
This matches shift-base and shift-exponent flags.
<rdar://problem/46129047>
Differential Revision: https://reviews.llvm.org/D86000
This patch fix the prasing for the gang-arg values for the gang clause. It also adds
some clause validity tests for the loop construct.
Reviewed By: klausler
Differential Revision: https://reviews.llvm.org/D86584
Add functions exposed via the MSAN interface to enable MSAN within
binaries that perform manual stack switching (e.g. through using fibers
or coroutines).
This functionality is analogous to the fiber APIs available for ASAN and TSAN.
Fixesgoogle/sanitizers#1232
Reviewed By: vitalybuka
Differential Revision: https://reviews.llvm.org/D86471
The tile clause in OpenACC 3.0 imposes some restriction. Element in the tile size list are either * or a
constant positive integer expression. If there are n tile sizes in the list, the loop construct must be immediately
followed by n tightly-nested loops.
This patch implement these restrictions and add some tests.
Reviewed By: klausler
Differential Revision: https://reviews.llvm.org/D86655
When collecting `i1` values via `findAllDefs`, ignore Constant's
operands, since Constant's operands might not be `i1`.
Fixes https://bugs.llvm.org/show_bug.cgi?id=46923 which causes ICE
```
llvm-project/llvm/lib/IR/Constants.cpp:1924: static llvm::Constant *llvm::ConstantExpr::getZExt(llvm::Constant *, llvm::Type *, bool): Assertion `C->getType()->getScalarSizeInBits() < Ty->getScalarSizeInBits()&& "SrcTy must be smaller than DestTy for ZExt!"' failed.
```
Differential Revision: https://reviews.llvm.org/D85007
The introduction of find_library for ncurses caused more issues than it solved problems. The current open issue is it makes the static build of LLVM fail. It is better to revert for now, and get back to it later.
Revert "[CMake] Fix an issue where get_system_libname creates an empty regex capture on windows"
This reverts commit 1ed1e16ab8.
Revert "Fix msan build"
This reverts commit 34fe9613dd.
Revert "[CMake] Always mark terminfo as unavailable on Windows"
This reverts commit 76bf26236f.
Revert "[CMake] Fix OCaml build failure because of absolute path in system libs"
This reverts commit 8e4acb82f7.
Revert "[CMake] Don't look for terminfo libs when LLVM_ENABLE_TERMINFO=OFF"
This reverts commit 495f91fd33.
Revert "Use find_library for ncurses"
This reverts commit a52173a3e5.
Differential revision: https://reviews.llvm.org/D86521
Found such a relocation while testing some real world programs.
Reviewed By: #lld-macho, smeenai
Differential Revision: https://reviews.llvm.org/D86642