@kpn pointed out that the global variable initialization functions didn't
have the "strictfp" metadata set correctly, and @rjmccall said that there
was buggy code in SetFPModel and StartFunction, this patch is to solve
those problems. When Sema creates a FunctionDecl, it sets the
FunctionDeclBits.UsesFPIntrin to "true" if the lexical FP settings
(i.e. a combination of command line options and #pragma float_control
settings) correspond to ConstrainedFP mode. That bit is used when CodeGen
starts codegen for a llvm function, and it translates into the
"strictfp" function attribute. See bugs.llvm.org/show_bug.cgi?id=44571
Reviewed By: Aaron Ballman
Differential Revision: https://reviews.llvm.org/D102343
If a target lists both a subreg and a superreg in a callee-saved
register mask, the prolog will spill both aliasing registers. Instead,
don't spill the subreg if a superreg is being spilled. This case is hit by the
PowerPC SPE code, as well as a modified RISC-V backend for CHERI I maintain out
of tree.
Reviewed By: jhibbits
Differential Revision: https://reviews.llvm.org/D73170
Currently TFRT does not support top-level coroutines, so this functionality will allow to have a single blocking await at the top level until TFRT implements the necessary functionality.
Reviewed By: ezhulenev
Differential Revision: https://reviews.llvm.org/D106730
Change the formatting of the debug print outs to elide unnecessary information.
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D106661
We call non-inlinable Initialize from all interceptors/syscalls,
but most of the time runtime is already initialized and this just
introduces unnecessary overhead.
Add LazyInitialize that (1) inlinable, (2) does nothing if
.preinit_array is enabled (expected case on Linux).
Depends on D107071.
Reviewed By: melver
Differential Revision: https://reviews.llvm.org/D107072
We are very paranoid with CHECKs in all Java entry points.
These CHECKs were added along with Java support.
At that point it wasn't clear what exactly to expect from JVM part
and if JVM part is correct or not. Thus CHECK paranoia.
These CHECKs never fired in practice, but we pay runtime cost
in every entry point all the time.
Replace CHECKs with DCHECKs.
Depends on D107069.
Reviewed By: melver
Differential Revision: https://reviews.llvm.org/D107071
We used to call Initialize in every Java point.
That was removed in 6563bb53b5 ("tsan: don't use caller/current PC in Java interfaces").
The intention was to add a single Initialize to __tsan_java_init instead.
Do that.
Reviewed By: melver
Differential Revision: https://reviews.llvm.org/D107069
ld64 seems to handle common symbols in bitcode rather
bizarrely. They follow entirely different precedence rules from their
non-bitcode counterparts. I initially tried to emulate ld64 in D106597,
but I'm not sure the extra complexity is worth it, especially given that
common symbols are not, well, very common.
This diff accords common bitcode symbols the same precedence as regular
common symbols, just as we treat all other pairs of bitcode and
non-bitcode symbol types. The tests document ld64's behavior in detail,
just in case we want to revisit this.
Reviewed By: #lld-macho, thakis
Differential Revision: https://reviews.llvm.org/D107027
Add intrusive doubly-linked list container template, IList.
It will be used in the new tsan runtime.
Reviewed By: melver
Differential Revision: https://reviews.llvm.org/D107050
Add a test where atomic-release happens while
another thread spins calling load-acquire.
This can expose some interesting interleavings
of release and acquire.
Reviewed By: melver
Differential Revision: https://reviews.llvm.org/D107055
If we succeed at gathering global variables for a compile
unit, there is no need to fallback to generating a manual index.
Reviewed By: jankratochvil
Differential Revision: https://reviews.llvm.org/D106355
This is somewhat of a repeat of D66658 but for sections in PT_TLS
segments. Although such sections don't need to be aligned such that
address and offset are congruent modulo the page size, they do need
to be congruent modulo the segment alignment, otherwise the
whole PT_TLS will be unaligned. We therefore use the normal calculation
to determine the section's address within the PT_LOAD rather than
bailing out early due to being SHT_NOBITS.
Reviewed By: MaskRay
Differential Revision: https://reviews.llvm.org/D106987
This is a similar problem to D66658, where we are too aggressive in not
aligning NOBITS sections, and the tests are based on the ones added for
that fix. If a .tbss section is first in a PT_TLS segment (i.e. there is
no .tdata section) then, although it doesn't need to be aligned such
that address and offset are congruent modulo the page size, they do need
to be congruent modulo the segment alignment, otherwise the whole PT_TLS
will be unaligned.
Reviewed By: MaskRay
Differential Revision: https://reviews.llvm.org/D106986
Summary:
Set the TargetCPUName for AIX to default to pwr7, removing the setting
of it based on the major/minor of the OS version, which previously
set it to pwr4 for AIX 7.1 and earlier. The old code would also set it to
pwr4 when the OS version was not specified and with the change, it will
default it to pwr7 in all cases.
Author: Jamie Schmeiser <schmeise@ca.ibm.com>
Reviewed By:hubert.reinterpretcast (Hubert Tong)
Differential Revision: https://reviews.llvm.org/D107063
This transform was added with D58874, but there were no tests for overflow ops.
We need to change this one way or another because it can crash as shown in:
https://llvm.org/PR51238
Note that if there are no uses of an overflow op's bool overflow result, we
reduce it to a regular math op, so we continue to fold that case either way.
If we have uses of both the math and the overflow bool, then we are likely
not saving anything by creating an independent sub instruction as seen in
the test diffs here.
This patch makes the behavior in SDAG consistent with what we do in
instcombine AFAICT.
Differential Revision: https://reviews.llvm.org/D106983
There's a generic combine for these, but no test coverage.
It's not clear if this is actually a good fold.
The combine was added with D58874, but it has a bug that
can cause crashing ( https://llvm.org/PR51238 ).
Load/Store unit is used to enforce order of loads and stores if they
alias (controlled by --noalias=false option).
Fixes PR50483 - [MCA] In-order pipeline doesn't track memory
load/store dependencies.
Differential Revision: https://reviews.llvm.org/D103955
An incorrect mask type when lowering an SVE gather/scatter was causing
a codegen fault which manifested as the incorrect predicate size being
used for an SVE gather/scatter, (e.g.. p0.b rather than p0.d).
Fixes PR51182.
Differential Revision: https://reviews.llvm.org/D106943
When checking if two prefixes can be merged for a function,
update_llc_test_checks.py removed IR comments before comparing
llc outputs of different RUN lines.
This means, if one RUN line emited lines starting with ';' and another
RUN line emited the same lines except the ones starting with ';', both
RUNs would be merged (if they share a prefix).
However, CHECK-NEXT lines check the comments, otherwise they fail, so
the script should not merge RUNs if they contain different comments.
Differential Revision: https://reviews.llvm.org/D101312
Many concepts emulation libraries, such as the one found in Range v3, tend to
use non-type template parameters for the enable_if type expression, due to
their versatility in template functions and constructors containing variadic
template parameter packs.
Unfortunately the bugprone-forwarding-reference-overload check does not
handle non-type template parameters, as was first noted in this bug report:
https://bugs.llvm.org/show_bug.cgi?id=38081
This patch fixes this long standing issue and allows for the check to be suppressed
with the use of a non-type template parameter containing enable_if or enable_if_t in
the type expression, so long as it has a default literal value.
SCEVToIterCountExpr only expects to be fed affine expressions, but
DbgRewriteSalvageableDVIs is feeding it non-affine induction variables.
Following this up with an obvious fix, will add test coverage too if
this avoids D105207 being reverted.
This patch only modifies `flang` - the bash wrapper script.
`-fopenmp`/`-fopenacc` are required to enable the OpenMP/OpenACC
extension in the frontend and to make sure that the required libraries
are linked when generating the final binary. This patch makes sure that
`-fopnemp`/`-fopenacc` is used for both unparsing and the code
generation (via the host compiler).
Differential Revision: https://reviews.llvm.org/D106871
In the latest Linux kernels synchronous tag faults
include the tag bits in their address.
This change adds logical and allocation tags to the
description of synchronous tag faults.
(asynchronous faults have no address)
Process 1626 stopped
* thread #1, name = 'a.out', stop reason = signal SIGSEGV: sync tag check fault (fault address: 0x900fffff7ff9010 logical tag: 0x9 allocation tag: 0x0)
This extends the existing description and will
show as much as it can on the rare occasion something
fails.
This change supports AArch64 MTE only but other
architectures could be added by extending the
switch at the start of AnnotateSyncTagCheckFault.
The rest of the function is generic code.
Tests have been added for synchronous and asynchronous
MTE faults.
Reviewed By: omjavaid
Differential Revision: https://reviews.llvm.org/D105178
While v_cmp will AND inactive lanes with 0, that is not the case for logical
operations.
This fixes a Vulkan CTS test that would hang otherwise.
Differential Revision: https://reviews.llvm.org/D105709
This is redundant with the callback variant and untested. Also remove
the callback-less methods for adding a dynamically legal op, as they
are no longer useful.
Differential Revision: https://reviews.llvm.org/D106786
Use absolute path to link z3 to allow builds both on windows and linux
since the library name is platform dependent for Z3 (libz3 on Windows
and z3 on Linux) and MSVC does not recognized -L and -l options.
Fix CMAKE_CROSSCOMPILING that does not work correctly since it uses
Z3_BUILD_VERSION instead of Z3_BUILD_NUMBER
Fix building with the static version of z3 library (supersedes D80227).
- Build the Z3 version detection code as C++, since the static
library brings in libstdc++ symbols
- Detect threading support and link against threading, in the
(likely) case Z3 was built with threads
Exposed compilation error from building a program that is used to detect
z3 version in the warning message, to simplify troubleshooting.
Reviewed By: JDevlieghere
Differential Revision: https://reviews.llvm.org/D106131
When the trip count of the inner loop is a constant, the InstCombine
pass now causes the transformation e.g. imcp ult i32 %inc, tripcount ->
icmp ult %j, tripcount-step (where %j is the inner loop induction
variable and %inc is add %j, step), which is now accounted for when
identifying the trip count of the loop. This is also an acceptable use
of %j (provided the step is 1) so is ignored as long as the compare
that it's used in is also the condition of the inner branch.
Differential Revision: https://reviews.llvm.org/D105802
This patch aims to improve the performance of BUILD_VECTORs which are
identified as containing a dominant element. Given that most
floating-point constants themselves require a load from the constant
pool, it was possible for the optimization to actually increase the
number of individual loads on small vectors. The exception is the zero
constant -- +0.0 -- which can be materialized efficiently.
While this optimization could do with a proper cost model to weigh the
benfits of a single vector load vs. the manipulation of individual
elements -- even for integer vectors which often require several
instructions to materialize -- without a concrete RVV implementation to
work with any heuristic is likely to be both more obtuse and inaccurate.
Until then, this patch fixes at least one known obvious deficiency.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D106963