Use of gethostent provokes caching of some resources inside of libc.
They are freed in __libc_thread_freeres very late in thread lifetime,
after our ThreadFinish. __libc_thread_freeres calls free which
previously crashed in malloc hooks.
Fix it by setting ignore_interceptors for finished threads,
which in turn prevents malloc hooks.
Reviewed By: melver
Differential Revision: https://reviews.llvm.org/D113989
Precisely specifying the unused parts of the bitfield is critical for
performance. If we don't specify them, compiler will generate code to load
the old value and shuffle it to extract the unused bits to apply to the new
value. If we specify the unused part and store 0 in there, all that
unnecessary code goes away (store of the 0 const is combined with other
constant parts).
I don't see a good way to ensure we cover all of u64 bits with fields.
So at least introduce named kUnusedBits consts and check that bits
sum up to 64.
Depends on D113978.
Reviewed By: melver
Differential Revision: https://reviews.llvm.org/D113979
In the old runtime we used to use different number of trace parts
for C++ and Go to reduce trace memory consumption for Go.
But now it's easier and better to use smaller parts because
we already use minimal possible number of parts for C++ (3).
Reviewed By: melver
Differential Revision: https://reviews.llvm.org/D113978
man pthread_equal:
The pthread_equal() function is necessary because thread IDs
should be considered opaque: there is no portable way for
applications to directly compare two pthread_t values.
Depends on D113916.
Reviewed By: vitalybuka
Differential Revision: https://reviews.llvm.org/D113919
pthread_setname_np does linear search over all thread descriptors
to map pthread_t to the thread descriptor. This has O(N^2) complexity
and becomes much worse in the new tsan runtime that keeps all ever
existed threads in the thread registry.
Replace linear search with direct access if pthread_setname_np
is called for the current thread (a very common case).
Reviewed By: vitalybuka
Differential Revision: https://reviews.llvm.org/D113916
We used to mmap C++ shadow stack as part of the trace region
before ed7f3f5bc9 ("tsan: move shadow stack into ThreadState"),
which moved the shadow stack into TLS. This started causing
timeouts and OOMs on some of our internal tests that repeatedly
create and destroy thousands of threads.
Allocate C++ shadow stack with mmap and small pages again.
This prevents the observed timeouts and OOMs.
But we now need to be more careful with interceptors that
run after thread finalization because FuncEntry/Exit and
TraceAddEvent all need the shadow stack.
Reviewed By: vitalybuka
Differential Revision: https://reviews.llvm.org/D113786
This change switches tsan to the new runtime which features:
- 2x smaller shadow memory (2x of app memory)
- faster fully vectorized race detection
- small fixed-size vector clocks (512b)
- fast vectorized vector clock operations
- unlimited number of alive threads/goroutimes
Depends on D112602.
Reviewed By: melver
Differential Revision: https://reviews.llvm.org/D112603
Start the background thread only after fork, but not after clone.
For fork we did this always and it's known to work (or user code has adopted).
But if we do this for the new clone interceptor some code (sandbox2) fails.
So model we used to do for years and don't start the background thread after clone.
Reviewed By: melver
Differential Revision: https://reviews.llvm.org/D113744
The compiler does not recognize HACKY_CALL as a call
(we intentionally hide it from the compiler so that it can
compile non-leaf functions as leaf functions).
To compensate for that hacky call thunk saves and restores
all caller-saved registers. However, it saves only
general-purposes registers and does not save XMM registers.
This is a latent bug that was masked up until a recent "NFC" commit
d736002e90 ("tsan: move memory access functions to a separate file"),
which allowed more inlining and exposed the 10-year bug.
Save and restore caller-saved XMM registers (all) as well.
Currently the bug manifests as e.g. frexp interceptor messes the
return value and the added test fails with:
i=8177 y=0.000000 exp=4
Reviewed By: melver
Differential Revision: https://reviews.llvm.org/D113742
Clone does not exist on Mac.
There are chances it will break on other OSes.
Enable it incrementally starting with Linux only,
other OSes can enable it later as needed.
Reviewed By: melver, thakis
Differential Revision: https://reviews.llvm.org/D113693
gtest uses clone for death tests and it needs the same
handling as fork to prevent deadlock (take runtime mutexes
before and release them after).
Reviewed By: melver
Differential Revision: https://reviews.llvm.org/D113677
stats_size argument is unnecessary in GetMemoryProfile and in the callback.
It just clutters code. The callback knowns how many stats to expect.
Depends on D112789.
Reviewed By: melver
Differential Revision: https://reviews.llvm.org/D112790
tsan_rtl.cpp is huge and does lots of things.
Move everything related to memory access and tracing
to a separate tsan_rtl_access.cpp file.
No functional changes, only code movement.
Reviewed By: vitalybuka, melver
Differential Revision: https://reviews.llvm.org/D112625
Gtest's EXPECT calls whole lot of libc functions
(mem*, malloc) even when EXPECT does not fail.
This does not play well with tsan runtime unit tests
b/c e.g. we call some EXPECTs with runtime mutexes locked.
Reviewed By: melver
Differential Revision: https://reviews.llvm.org/D112601
Don't leak caller_pc var from the macro
(it's not supposed to be used by interceptors).
Use UNUSED instead of (void) cast.
Depends on D112540.
Reviewed By: melver
Differential Revision: https://reviews.llvm.org/D112541
If the real function is not intercepted,
we are going to crash one way or another.
The question is just in the failure mode:
error message vs NULL deref. But the message
costs us a check in every interceptor and
they are not observed to be failing in real life
for a long time, also other sanitizers don't
have this check as well (also crash on
NULL deref if that happens).
Remove the check from non-debug mode.
Reviewed By: melver
Differential Revision: https://reviews.llvm.org/D112540
All tsan interceptors check for initialization and/or initialize things
as necessary lazily, so we can pretend everything is initialized in the
COMMON_INTERCEPTOR_NOTHING_IS_INITIALIZED check to avoid double-checking
for initialization (this is only necessary for sanitizers that don't
handle initialization on common grounds).
Reviewed By: melver
Differential Revision: https://reviews.llvm.org/D112446
MutexSet is too large to be allocated on stack.
But we need local MutexSet objects in few places
and use various hacks to allocate them.
Add DynamicMutexSet helper that simplifies allocation
of such objects.
Reviewed By: melver
Differential Revision: https://reviews.llvm.org/D112449
Building Go programs with the current runtime fails with:
loadelf: race_linux_amd64: malformed elf file:
_ZZN6__tsan15RestoreAddrImpl5ApplyINS_11MappingGo48EEEmmE6ranges: invalid symbol binding 10
Go linker does not understand ELF in all its generality.
Don't use static const data in inline methods.
Reviewed By: melver
Differential Revision: https://reviews.llvm.org/D112434
Instead of creating real threads for trace tests
create a new ThreadState in the main thread.
This makes the tests more unit-testy and will also
help with future trace tests that will need
more than 1 thread. Creating more than 1 real thread and
dispatching test actions across multiple threads in the
required deterministic order is painful.
This is resubmit of reverted D110546 with 2 changes:
1. The previous version patched ImitateTlsWrite to not
expect ThreadState to be allocated in TLS (the CHECK
failed for the fake test threads).
This added an ugly hack into production code and was still
logically wrong because we imitated write to the main
thread TLS/stack when we started the fake test thread
(which has nothing to do with the main thread TLS/stack).
This version uses ThreadType::Fiber instead of ThreadType::Regular
for the fake threads. This naturally makes ThreadStart skip
obtaining stack/tls and imitating writes to them.
2. This version still skips the tests on Darwin and PowerPC
to be on the safer side. Build bots reported failures for PowerPC
for the previous version.
Reviewed By: melver
Differential Revision: https://reviews.llvm.org/D111156
1. Include <cet.h> in sanitizer_common/sanitizer_asm.h, if it exists, to
mark Intel CET support when Intel CET is enabled.
2. Define _CET_ENDBR as empty if it isn't defined.
3. Add _CET_ENDBR to function entries in assembly codes so that ENDBR
instruction will be generated when Intel CET is enabled.
Reviewed By: MaskRay
Differential Revision: https://reviews.llvm.org/D111185
This reverts commit fdf4c03522.
Breaks macOS bots, e.g. https://crbug.com/1257863.
Still figuring out if this is actually supported on macOS. Other places
that include <cet.h> only do so on Linux.
1. Include <cet.h> in sanitizer_common/sanitizer_asm.h to mark Intel CET
support when Intel CET is enabled.
2. Add _CET_ENDBR to function entries in assembly codes so that ENDBR
instruction will be generated when Intel CET is enabled.
Reviewed By: dvyukov
Differential Revision: https://reviews.llvm.org/D111185
Print meaningful stack frames for stack/tls races
(instead of PC 1/2 that don't symbolize).
Imitate stack/tls writes after we create and initialize
the new thread, otherwise the races are not detected.
This is re-submit of the following reverted commits,
but without tests as they failed on a number of OSes/arches:
"tsan: fix and test detection of TLS races"
"tsan: fix tls_race3 test on darwin"
"tsan: print a meaningful frame for stack races"
Differential Revision: https://reviews.llvm.org/D111147
Whenever we call cur_thread_init, we call cur_thread on the next line.
So make cur_thread_init return the current thread directly.
Makes code a bit shorter, does not affect codegen.
Reviewed By: vitalybuka, melver
Differential Revision: https://reviews.llvm.org/D110384
Commit 354ded67b3 ("tsan: align ThreadState to cache line")
did an incomplete thing. It marked ThreadState as cache line
aligned, but the thread local ThreadState instance is declared
as an aligned char array with hard-coded 64-byte alignment.
On PowerPC cache line size is 128 bytes, so the hard-coded
64-byte alignment is not enough.
Use cache line alignment consistently.
Differential Revision: https://reviews.llvm.org/D110629
The trace tests crashed on darwin because of some thread
initialization issues (thread initialization is somewhat
different on darwin).
Instead of starting real threads, create a new ThreadState
in the main thread. This makes the tests more unit-testy
and hopefully won't crash on darwin (there is almost no
platform-specific code involved now).
This will also help with future trace tests that will need
more than 1 thread. Creating more than 1 real thread and
dispatching test actions across multiple threads in the
required deterministic order is painful.
Depends on D110539.
Reviewed By: melver
Differential Revision: https://reviews.llvm.org/D110546
Currently detection of races with TLS/stack initialization
is broken because we imitate the write before thread initialization,
so it's modelled with a wrong thread/epoch.
Fix that and add a test.
Reviewed By: melver
Differential Revision: https://reviews.llvm.org/D110538