The old expansion open-coded a 64-bit addition in a strange way, by
adding the high parts *without* carry-in from the low part, and then
adding the carry back in later on. Fixing this saves a couple of
instructions and makes the code much easier to understand.
Differential Revision: https://reviews.llvm.org/D113679
Change the error message to use ignorelist, and changed some variable and function
names in related code and test.
Reviewed By: thakis
Differential Revision: https://reviews.llvm.org/D113189
Reuses C++ for OpenCL constructor address space test so that it
supports optional generic address spaces in version 2021.
Differential Revision: https://reviews.llvm.org/D110184
`collectElementTypesForWidening` collects the types of load, store and
reduction Phis in a loop. These types are later checked using
`isElementTypeLegalForScalableVector` to prevent vectorisation of
loops with instruction types that are unsupported.
This patch removes i1 from the list of types supported for scalable
vectors. This fixes an assert ("Cannot yet scalarize uniform stores") in
`setCostBasedWideningDecision` when we have a loop containing a uniform
i1 store and a scalable VF, which we cannot create a scatter for.
Reviewed By: david-arm
Differential Revision: https://reviews.llvm.org/D113680
Need to fix ther cost estimation for split loads, since we look at the
subregs already, no need to permute them, need just to estimate
subregister insert, if it is smaller than the real register. Also, using
split loads, it might be profitable already to vectorize smaller trees
with gathering of the loads.
Differential Revision: https://reviews.llvm.org/D107188
Currently we only constant fold target shuffles if any of the sources has one use, or it would remove a variable shuffle mask - the aim being to avoid constant pool bloat.
This patch proposes we should constant fold by default and only limit this if optsize is enabled - I've added a basic test for this in vector-mul.ll (the pmuludq case is by far the most common), I can add other specific test cases if people need them.
This should permit further constant folding, break some instruction dependencies and help reduce shuffle port pressure.
Differential Revision: https://reviews.llvm.org/D113748
This change switches tsan to the new runtime which features:
- 2x smaller shadow memory (2x of app memory)
- faster fully vectorized race detection
- small fixed-size vector clocks (512b)
- fast vectorized vector clock operations
- unlimited number of alive threads/goroutimes
Depends on D112602.
Reviewed By: melver
Differential Revision: https://reviews.llvm.org/D112603
When LLDB receives a SIGINT while running the embedded Python REPL it currently
just crashes in `ScriptInterpreterPythonImpl::Interrupt` with an error such as
the one below:
```
Fatal Python error: PyThreadState_Get: the function must be called with the GIL
held, but the GIL is released (the current Python thread state is NULL)
```
The faulty code that causes this error is this part of `ScriptInterpreterPythonImpl::Interrupt`:
```
PyThreadState *state = PyThreadState_GET();
if (!state)
state = GetThreadState();
if (state) {
long tid = state->thread_id;
PyThreadState_Swap(state);
int num_threads = PyThreadState_SetAsyncExc(tid, PyExc_KeyboardInterrupt);
```
The obvious fix I tried is to just acquire the GIL before this code is running
which fixes the crash but the `KeyboardInterrupt` we want to raise immediately
is actually just queued and would only be raised once the next line of input has
been parsed (which e.g. won't interrupt Python code that is currently waiting on
a timer or IO from what I can see). Also none of the functions we call here is
marked as safe to be called from a signal handler from what I can see, so we
might still end up crashing here with some bad timing.
Python 3.2 introduced `PyErr_SetInterrupt` to solve this and the function takes
care of all the details and avoids doing anything that isn't safe to do inside a
signal handler. The only thing we need to do is to manually setup our own fake
SIGINT handler that behaves the same way as the standalone Python REPL signal
handler (which raises a KeyboardInterrupt).
From what I understand the old code used to work with Python 2 so I kept the old
code around until we officially drop support for Python 2.
There is a small gap here with Python 3.0->3.1 where we might still be crashing,
but those versions have reached their EOL more than a decade ago so I think we
don't need to bother about them.
Reviewed By: JDevlieghere
Differential Revision: https://reviews.llvm.org/D104886
and (pcmpgt X, -1), Y --> pandn (vsrai X, BitWidth-1), Y
This avoids the -1 constant vector in favor of an arithmetic shift
instruction if it exists (the ISA is still not complete after all
these years...).
We catch this pattern late in combining by matching PCMPGT, so it
should not interfere with more general folds.
Differential Revision: https://reviews.llvm.org/D113603
At this time the 2 flavors of conv are a little too different to allow significant code sharing and other will likely come up.
so we go the easy route first by duplicating and adapting.
Reviewed By: gysit
Differential Revision: https://reviews.llvm.org/D113758
Start the background thread only after fork, but not after clone.
For fork we did this always and it's known to work (or user code has adopted).
But if we do this for the new clone interceptor some code (sandbox2) fails.
So model we used to do for years and don't start the background thread after clone.
Reviewed By: melver
Differential Revision: https://reviews.llvm.org/D113744
The compiler does not recognize HACKY_CALL as a call
(we intentionally hide it from the compiler so that it can
compile non-leaf functions as leaf functions).
To compensate for that hacky call thunk saves and restores
all caller-saved registers. However, it saves only
general-purposes registers and does not save XMM registers.
This is a latent bug that was masked up until a recent "NFC" commit
d736002e90 ("tsan: move memory access functions to a separate file"),
which allowed more inlining and exposed the 10-year bug.
Save and restore caller-saved XMM registers (all) as well.
Currently the bug manifests as e.g. frexp interceptor messes the
return value and the added test fails with:
i=8177 y=0.000000 exp=4
Reviewed By: melver
Differential Revision: https://reviews.llvm.org/D113742
Add support for demangling Rust v0 symbols to llvm-nm by reusing
nonMicrosoftDemangle which supports both Itanium and Rust mangling.
Reviewed By: dblaikie, jhenderson
Differential Revision: https://reviews.llvm.org/D111937
We no longer need a reference to RangedConstraintManager, we call top
level `State->assume` functions.
Differential Revision: https://reviews.llvm.org/D113261
D103314 introduced symbol simplification when a new constant constraint is
added. Currently, we simplify existing equivalence classes by iterating over
all existing members of them and trying to simplify each member symbol with
simplifySVal.
At the end of such a simplification round we may end up introducing a
new constant constraint. Example:
```
if (a + b + c != d)
return;
if (c + b != 0)
return;
// Simplification starts here.
if (b != 0)
return;
```
The `c == 0` constraint is the result of the first simplification iteration.
However, we could do another round of simplification to reach the conclusion
that `a == d`. Generally, we could do as many new iterations until we reach a
fixpoint.
We can reach to a fixpoint by recursively calling `State->assume` on the
newly simplified symbol. By calling `State->assume` we re-ignite the
whole assume machinery (along e.g with adjustment handling).
Why should we do this? By reaching a fixpoint in simplification we are capable
of discovering infeasible states at the moment of the introduction of the
**first** constant constraint.
Let's modify the previous example just a bit, and consider what happens without
the fixpoint iteration.
```
if (a + b + c != d)
return;
if (c + b != 0)
return;
// Adding a new constraint.
if (a == d)
return;
// This brings in a contradiction.
if (b != 0)
return;
clang_analyzer_warnIfReached(); // This produces a warning.
// The path is already infeasible...
if (c == 0) // ...but we realize that only when we evaluate `c == 0`.
return;
```
What happens currently, without the fixpoint iteration? As the inline comments
suggest, without the fixpoint iteration we are doomed to realize that we are on
an infeasible path only after we are already walking on that. With fixpoint
iteration we can detect that before stepping on that. With fixpoint iteration,
the `clang_analyzer_warnIfReached` does not warn in the above example b/c
during the evaluation of `b == 0` we realize the contradiction. The engine and
the checkers do rely on that either `assume(Cond)` or `assume(!Cond)` should be
feasible. This is in fact assured by the so called expensive checks
(LLVM_ENABLE_EXPENSIVE_CHECKS). The StdLibraryFuncionsChecker is notably one of
the checkers that has a very similar assertion.
Before this patch, we simply added the simplified symbol to the equivalence
class. In this patch, after we have added the simplified symbol, we remove the
old (more complex) symbol from the members of the equivalence class
(`ClassMembers`). Removing the old symbol is beneficial because during the next
iteration of the simplification we don't have to consider again the old symbol.
Contrary to how we handle `ClassMembers`, we don't remove the old Sym->Class
relation from the `ClassMap`. This is important for two reasons: The
constraints of the old symbol can still be found via it's equivalence class
that it used to be the member of (1). We can spare one removal and thus one
additional tree in the forest of `ClassMap` (2).
Performance and complexity: Let us assume that in a State we have N non-trivial
equivalence classes and that all constraints and disequality info is related to
non-trivial classes. In the worst case, we can simplify only one symbol of one
class in each iteration. The number of symbols in one class cannot grow b/c we
replace the old symbol with the simplified one. Also, the number of the
equivalence classes can decrease only, b/c the algorithm does a merge operation
optionally. We need N iterations in this case to reach the fixpoint. Thus, the
steps needed to be done in the worst case is proportional to `N*N`. Empirical
results (attached) show that there is some hardly noticeable run-time and peak
memory discrepancy compared to the baseline. In my opinion, these differences
could be the result of measurement error.
This worst case scenario can be extended to that cases when we have trivial
classes in the constraints and in the disequality map are transforming to such
a State where there are only non-trivial classes, b/c the algorithm does merge
operations. A merge operation on two trivial classes results in one non-trivial
class.
Differential Revision: https://reviews.llvm.org/D106823
Convert fir.int<kind> to their llvm equivalent type
This patch is part of the upstreaming effort from fir-dev branch.
Reviewed By: clementval, awarzynski
Differential Revision: https://reviews.llvm.org/D113660
Co-authored-by: Eric Schweitz <eschweitz@nvidia.com>
Convert fir.heap type to its llvm equivalent type (llvm.ptr)
This patch is part of the upstreaming effort from fir-dev branch.
Reviewed By: clementval
Differential Revision: https://reviews.llvm.org/D113670
Co-authored-by: Eric Schweitz <eschweitz@nvidia.com>
Co-authored-by: Jean Perier <jperier@nvidia.com>
This patch extends the existing out-of-bounds store tests with a case
with a bigger object and multiple inbounds stores, followed by an OOB
store. The OOB store is not used to remove the inbounds stores in this
case at the moment.
- Jaro–Winkler and Sørensen–Dice should use en-dashes not regular
dashes. In reStructuredText this is typed as `--`.
- Letters at the beginning of a sentence should be capitalized.
The source index should not be compared to zero after applying the
shift with the modulo, it must be compared to the lower bound.
Otherwise, the extent is not added in case it should and the computed
source index may be less than the lower bound, causing invalid results.
Differential Revision: https://reviews.llvm.org/D113659
Some compiler-rt tests are inherently incompatible with VE because..
* No consistent denormal support on VE. We skip denormal fp inputs in builtin tests.
* `madvise` unsupported on VE.
* Instruction alignment requirements.
Reviewed By: phosek
Differential Revision: https://reviews.llvm.org/D113093
TableGen has started warning about unused template parameters in the isel patterns. Remove those.
Reviewed By: kaz7
Differential Revision: https://reviews.llvm.org/D113675