The current code makes the assumption that equality
comparison can be performed with a word comparison
instruction. While this is true if the entire 64-bit
results are used, it does not generally work. It is
possible that the low order words and high order
words produce different results and a user of only
one will get the wrong result.
This patch adds an and of the result words so that
each word has the result of the comparison of the
entire doubleword that contains it.
Differential revision: https://reviews.llvm.org/D115678
Commit 150681f increases
cost of producing MMA types (vector pair and quad).
However, it increases the cost for getUserCost() which is
used in unrolling. As a result, loops that contain these
types already (from the user code) cannot be unrolled
(even with the user's unroll pragma). This was an unintended
sideeffect. Reverting that portion of the commit to allow
unrolling such loops.
Differential revision: https://reviews.llvm.org/D115424
Run performance tests in denormal and normal ranges separately and show more detailed results.
Reviewed By: sivachandra
Differential Revision: https://reviews.llvm.org/D116112
The new tsan runtime has 2x more compact shadow.
Adjust shadow ranges accordingly.
Depends on D112603.
Reviewed By: vitalybuka, melver
Differential Revision: https://reviews.llvm.org/D113751
If there are multiple processes, it's hard to understand
what output comes from what process.
VReport prepends pid to the output. Use it.
Depends on D113982.
Reviewed By: melver
Differential Revision: https://reviews.llvm.org/D113983
Update now after long operations so that we don't use
stale value in subsequent computations.
Depends on D113981.
Reviewed By: melver
Differential Revision: https://reviews.llvm.org/D113982
Older Go cmd/link used SHT_PROGBITS for .init_array .
Work around the lack of https://golang.org/cl/373734 for a while.
It does not generate .fini_array or .preinit_array
In regular LTO, analyze IR and discard unreachable functions when finding virtual call targets.
Differential Revision: https://reviews.llvm.org/D116056
When linking a 1.2G output (nearly no debug info, 2846621 dynamic relocations) using `--threads=8`, I measured
```
9.131462 Total ExecuteLinker
1.449913 Total Write output file
1.445784 Total Write sections
0.657152 Write sections {"detail":".rela.dyn"}
```
This change decreases the .rela.dyn time to 0.25, leading to 4% speed up in the total time.
* The parallelSort is slow because of expensive r_sym/r_offset computation. Cache the values.
* The iteration is slow. Move r_sym/r_addend computation ahead of time and parallelize it.
With the change, the new encodeDynamicReloc is cheap (0.05s). So no need to parallelize it.
Reviewed By: ikudrin
Differential Revision: https://reviews.llvm.org/D115993
We can't just memoize _supportsVerify in place in format.py, as it
previously was executed in each of the individual processes.
Instead use hasCompileFlag() and add a feature flag for it instead,
which can be used both by tests (that already have such a flag,
locally for one set of tests) and for the testing framework itself.
By using hasCompileFlag(), this also implicitly fixes two other issues:
Previously, _supportsVerify called subprocess.call() directly, which can
interpret command line quoting differently than lit.TestRunner.
(In particular, TestRunner handles arguments quoted by a single quote,
while launching Windows processes with subprocess.call() only supports
double quotes. This allows using shlex.quote(), which uses single quotes,
everywhere - as all commands now go through TestRunner. This should make
41d7909368 redundant.)
Secondly, the old _supportsVerify method didn't include %{flags) or
%{compile_flags}.
Differential Revision: https://reviews.llvm.org/D116010
This change allows us to infer access attributes (readnone, readonly) on arguments passed to vararg functions. Since there isn't a formal argument corresponding to the parameter, they'll never be considered part of the speculative SCC, but they can still benefit from attributes on the call site or the callee function.
The main motivation here is just to simplify the code, and remove some special casing. Previously, an indirect vararg call could return more precise results than an direct vararg call which is just weird.
Differential Revision: https://reviews.llvm.org/D115964
This fixes a bug where we would infer readnone/readonly for a function which passed a value to a function which could capture it. With the value captured in memory, the function could reload the value from memory after the call, and write to it. Inferring the argument as readnone or readonly is unsound.
@jdoerfert apparently noticed this about two years ago, and tests were checked in with 76467c4, but the issue appears to have never gotten fixed.
Since this seems like this issue should break everything, let me explain why the case is actually fairly narrow. The main inference loop over the argument SCCs only analyzes nocapture arguments. As such, we can only hit this when construction the partial SCCs. Due to that restriction, we can only hit this when we have either a) a function declaration with a manually annotated argument, or b) an immediately self recursive call.
It's also worth highlighting that we do have cases we can infer readonly/readnone on a capturing argument validly. The easiest example is a function which simply returns its argument without ever accessing it.
Differential Revision: https://reviews.llvm.org/D115961
1) remove verbose information (function linkage types, alignment, TBAA) 2) remove unused element or replace irrelevant element with null (as placeholders) in virtual table, remove unused definitions of deleted elements accordingly.
Differential Revision: https://reviews.llvm.org/D116071
Converts concat_vectors((trunc (lshr)), (trunc (lshr))) to UZP2
when the shift amount is half the width of the vector element.
Differential Revision: https://reviews.llvm.org/D116021
Creating threads after a multi-threaded fork is semi-supported,
we don't give particular guarantees, but we try to not fail
on simple cases and we have die_after_fork=0 flag that enables
not dying on creation of threads after a multi-threaded fork.
This flag is used in the wild:
23c052e3e3/SConstruct (L3599)
fork_multithreaded.cpp test started hanging in debug mode
after the recent "tsan: fix deadlock during race reporting" commit,
which added proactive ThreadRegistryLock check in SlotLock.
But the test broke earlier after "tsan: remove quadratic behavior in pthread_join"
commit which made tracking of alive threads based on pthread_t stricter
(CHECK-fail on 2 threads with the same pthread_t, or joining a non-existent thread).
When we start a thread after a multi-threaded fork, the new pthread_t
can actually match one of existing values (for threads that don't exist anymore).
Thread creation started CHECK-failing on this, but the test simply
ignored this CHECK failure in the child thread and "passed".
But after "tsan: fix deadlock during race reporting" the test started hanging dead,
because CHECK failures recursively lock thread registry.
Fix this purging all alive threads from thread registry on fork.
Also the thread registry mutex somehow lost the internal deadlock detector id
and was excluded from deadlock detection. If it would have the id, the CHECK
wouldn't hang because of the nested CHECK failure due to the deadlock.
But then again the test would have silently ignore this error as well
and the bugs wouldn't have been noticed.
Add the deadlock detector id to the thread registry mutex.
Also extend the test to check more cases and detect more bugs.
Reviewed By: melver
Differential Revision: https://reviews.llvm.org/D116091
To allow transition from the TS-specified
std::experimental::coroutine_traits to the C++20-specified
std::coroutine_traits, we lookup in both places and provide helpful
diagnostics. This refactors the code to avoid separate paths to
std::experimental lookups.
Reviewed By: ChuanqiXu
Differential Revision: https://reviews.llvm.org/D116029
Add test for various customization point object properties as defined by
the Standard. Test various CPOs from `<ranges>`, `<iterator>`,
`<concepts>`, etc.
The test is mostly from https://reviews.llvm.org/D107036 and split up
into this.
Differential Revision: https://reviews.llvm.org/D115588
This fixes a typo/bug when checking for pointer reuse when testing
DI location preservation in the Debugify original mode (when
checking -g generated Debug Info).
Differential Revision: https://reviews.llvm.org/D115621
Add an overload that accepts and returns an Address, as we
generally just want to replace the pointer with a laundered one,
while retaining remaining information.
* There is no reason to forbid that case
* Also, user will get very unfriendly error like `expected result type with offset = -9223372036854775808 instead of 1`
Differential Revision: https://reviews.llvm.org/D114678
Move shift-by-constant handling and move it into its only user (VSHIFT intrinsics lowering).
This is some prep-work for getTargetVShiftNode to no longer take a scalar shift amount - we're introducing temporary ISD::EXTRACT_VECTOR_ELT nodes via SelectionDAG::getSplatValue to accommodate this which can cause various issues, including unnecessary scalarization and xmm->gpr->xmm transfers, and causes problems for 32-bit codegen if we fail to remove an (illegal) i64 scalar extracted from a (legal) vXi64 vector.