On Darwin, we want to limit the parallelism during test execution for
sanitizer tests that use shadow memory. The reason is explained by this
existing comment:
> Only run up to 3 processes that require shadow memory simultaneously
> on 64-bit Darwin. Using more scales badly and hogs the system due to
> inefficient handling of large mmap'd regions (terabytes) by the
> kernel.
Previously we detected 3 cases:
* on-device: limit to 1 process
* 64-bit: macOS & simulators, limit to 3 processes
* others (32-bit): no limitation
We checked for the 64-bit case like this: `if arch in ['x86_64',
'x86_64h']` which misses macOS running on AS. Additionally, we don't
care about 32-bit anymore, so I've simplified this to 2 cases: on-device
and everything else.
Differential Revision: https://reviews.llvm.org/D122751
Currently, we only print how threads involved in data race are created from their parent threads.
Add a runtime flag 'print_full_thread_history' to print thread creation stacks for the threads involved in the data race and their ancestors up to the main thread.
Reviewed By: dvyukov
Differential Revision: https://reviews.llvm.org/D122131
For errno spoiling reports we only print the stack
where the signal handler is invoked. And the top
frame is the signal handler function, which is supposed
to give the info for debugging.
But in same cases the top frame can be some common thunk,
which does not give much info. E.g. for Go/cgo it's always
runtime.cgoSigtramp.
Print the signal number.
This is what we can easily gather and it may give at least
some hints regarding the issue.
Reviewed By: melver, vitalybuka
Differential Revision: https://reviews.llvm.org/D121979
The false positive fixed by commit f831d6fc80
("tsan: fix false positive during fd close") still happens episodically
on the added more stressful test which does just open/close.
I don't have a coherent explanation as to what exactly happens
but the fix fixes the false positive on this test as well.
The issue may be related to lost writes during asynchronous MADV_DONTNEED.
I've debugged similar unexplainable false positive related to freed and
reused memory and at the time the only possible explanation I found is that
an asynchronous MADV_DONTNEED may lead to lost writes. That's why commit
302ec7b9bc ("tsan: add memory_limit_mb flag") added StopTheWorld around
the memory flush, but unfortunately the commit does not capture these findings.
Reviewed By: melver
Differential Revision: https://reviews.llvm.org/D121363
FdClose is a subjet to the same atomicity problem as MemoryRangeFreed
(memory state is not "monotoic" wrt race detection).
So we need to lock the thread slot in FdClose the same way we do
in MemoryRangeFreed.
This fixes the modified stress.cpp test.
Reviewed By: vitalybuka, melver
Differential Revision: https://reviews.llvm.org/D121143
The stack trace addresses may be odd (normally addresses should be even), but
seems a good compromise when the instruction length (2,4,6) cannot be detected
easily.
Reviewed By: uweigand
Differential Revision: https://reviews.llvm.org/D120432
Turns out the test was working by accident: we need to ensure
TSan instrumentation is not called from the fork() hook, otherwise the
tool will deadlock. Previously it worked because alloc_free_blocks() got
inlined into __tsan_test_only_on_fork(), but it cannot always be the
case.
Adding __attribute__((disable_sanitizer_instrumentation)) will prevent
TSan from instrumenting alloc_free_blocks().
Reviewed By: dvyukov
Differential Revision: https://reviews.llvm.org/D120050
Creating threads after a multi-threaded fork is semi-supported,
we don't give particular guarantees, but we try to not fail
on simple cases and we have die_after_fork=0 flag that enables
not dying on creation of threads after a multi-threaded fork.
This flag is used in the wild:
23c052e3e3/SConstruct (L3599)
fork_multithreaded.cpp test started hanging in debug mode
after the recent "tsan: fix deadlock during race reporting" commit,
which added proactive ThreadRegistryLock check in SlotLock.
But the test broke earlier after "tsan: remove quadratic behavior in pthread_join"
commit which made tracking of alive threads based on pthread_t stricter
(CHECK-fail on 2 threads with the same pthread_t, or joining a non-existent thread).
When we start a thread after a multi-threaded fork, the new pthread_t
can actually match one of existing values (for threads that don't exist anymore).
Thread creation started CHECK-failing on this, but the test simply
ignored this CHECK failure in the child thread and "passed".
But after "tsan: fix deadlock during race reporting" the test started hanging dead,
because CHECK failures recursively lock thread registry.
Fix this purging all alive threads from thread registry on fork.
Also the thread registry mutex somehow lost the internal deadlock detector id
and was excluded from deadlock detection. If it would have the id, the CHECK
wouldn't hang because of the nested CHECK failure due to the deadlock.
But then again the test would have silently ignore this error as well
and the bugs wouldn't have been noticed.
Add the deadlock detector id to the thread registry mutex.
Also extend the test to check more cases and detect more bugs.
Reviewed By: melver
Differential Revision: https://reviews.llvm.org/D116091
SlotPairLocker calls SlotLock under ctx->multi_slot_mtx.
SlotLock can invoke global reset DoReset if we are out of slots/epochs.
But DoReset locks ctx->multi_slot_mtx as well, which leads to deadlock.
Resolve the deadlock by removing SlotPairLocker/multi_slot_mtx
and only lock one slot for which we will do RestoreStack.
We need to lock that slot because RestoreStack accesses the slot journal.
But it's unclear why we need to lock the current slot.
Initially I did it just to be on the safer side (but at that time
we dit not lock the second slot, so it was easy just to lock the current slot).
Reviewed By: melver
Differential Revision: https://reviews.llvm.org/D116040
Previously we would crash in the TSan runtime if the user program passes
a pointer to `malloc_size()` that doesn't point into app memory.
In these cases, `malloc_size()` should return 0.
For ASan, we fixed a similar issue here:
https://reviews.llvm.org/D15008
Radar-Id: rdar://problem/86213149
Differential Revision: https://reviews.llvm.org/D115947
This test would hang when the system ran out of resources and we fail to
create all 300 threads.
Differential Revision: https://reviews.llvm.org/D115845
This change switches tsan to the new runtime which features:
- 2x smaller shadow memory (2x of app memory)
- faster fully vectorized race detection
- small fixed-size vector clocks (512b)
- fast vectorized vector clock operations
- unlimited number of alive threads/goroutimes
Depends on D112602.
Reviewed By: melver
Differential Revision: https://reviews.llvm.org/D112603
Currently the test calls dlclose in the thread
concurrently with the main thread calling a function
from the dynamic library. This is not good.
Wait for the main thread to call the function
before calling dlclose.
Depends on D115612.
Reviewed By: melver
Differential Revision: https://reviews.llvm.org/D115613
The test contains a race and checks that it's detected.
But the race may not be detected since we are doing aggressive flushes
and if the state flush happens between racing accesses, tsan won't
detect the race). So return 1 to make the test deterministic
regardless of the race.
Reviewed By: melver
Differential Revision: https://reviews.llvm.org/D115612
This change switches tsan to the new runtime which features:
- 2x smaller shadow memory (2x of app memory)
- faster fully vectorized race detection
- small fixed-size vector clocks (512b)
- fast vectorized vector clock operations
- unlimited number of alive threads/goroutimes
Depends on D112602.
Reviewed By: melver
Differential Revision: https://reviews.llvm.org/D112603
We call UnmapShadow before the actual munmap, at that point we don't yet
know if the provided address/size are sane. We can't call UnmapShadow
after the actual munmap becuase at that point the memory range can
already be reused for something else, so we can't rely on the munmap
return value to understand is the values are sane.
While calling munmap with insane values (non-canonical address, negative
size, etc) is an error, the kernel won't crash. We must also try to not
crash as the failure mode is very confusing (paging fault inside of the
runtime on some derived shadow address).
Such invalid arguments are observed on Chromium tests:
https://bugs.chromium.org/p/chromium/issues/detail?id=1275581
Reviewed By: melver
Differential Revision: https://reviews.llvm.org/D114944
The added test demonstrates loading a dynamic library with static TLS.
Such static TLS is a hack that allows a dynamic library to have faster TLS,
but it can be loaded only iff all threads happened to allocate some excess
of static TLS space for whatever reason. If it's not the case loading fails with:
dlopen: cannot load any more object with static TLS
We used to produce a false positive because dlopen will write into TLS
of all existing threads to initialize/zero TLS region for the loaded library.
And this appears to be racing with initialization of TLS in the thread
since we model a write into the whole static TLS region (we don't what part
of it is currently unused):
WARNING: ThreadSanitizer: data race (pid=2317365)
Write of size 1 at 0x7f1fa9bfcdd7 by main thread:
0 memset
1 init_one_static_tls
2 __pthread_init_static_tls
[[ this is where main calls dlopen ]]
3 main
Previous write of size 8 at 0x7f1fa9bfcdd0 by thread T1:
0 __tsan_tls_initialization
Fix this by ignoring accesses during dlopen.
Reviewed By: melver
Differential Revision: https://reviews.llvm.org/D114953
Sometimes stacks for at_exit callbacks don't include any of the user functions/files.
For example, a race with a global std container destructor will only contain
the container type name and our at_exit_wrapper function. No signs what global variable
this is.
Remember and include in reports the function that installed the at_exit callback.
This should give glues as to what variable is being destroyed.
Depends on D114606.
Reviewed By: vitalybuka, melver
Differential Revision: https://reviews.llvm.org/D114607
Add a test for a common C++ bug when a global object is destroyed
while background threads still use it.
Depends on D114604.
Reviewed By: vitalybuka, melver
Differential Revision: https://reviews.llvm.org/D114605
This change switches tsan to the new runtime which features:
- 2x smaller shadow memory (2x of app memory)
- faster fully vectorized race detection
- small fixed-size vector clocks (512b)
- fast vectorized vector clock operations
- unlimited number of alive threads/goroutimes
Depends on D112602.
Reviewed By: melver
Differential Revision: https://reviews.llvm.org/D112603
We currently use a wrong value for heap block
(only works for C++, but not for Java).
Use the correct value (we already computed it before, just forgot to use).
Depends on D114593.
Reviewed By: melver
Differential Revision: https://reviews.llvm.org/D114595
Add a basic test that checks races between vector/non-vector
read/write accesses of different sizes/offsets in different orders.
This gives coverage of __tsan_read/write16 callbacks.
Depends on D114591.
Reviewed By: melver
Differential Revision: https://reviews.llvm.org/D114592
Vector SSE accesses make compiler emit __tsan_[unaligned_]read/write16 callbacks.
Make it possible to test these.
Reviewed By: melver
Differential Revision: https://reviews.llvm.org/D114591
The test tries to provoke internal allocator to be locked during fork
and then force the child process to use the internal allocator.
This test sometimes deadlocks with the new tsan runtime.
Depends on D114514.
Reviewed By: melver
Differential Revision: https://reviews.llvm.org/D114515
Test size larger than clear_shadow_mmap_threshold,
which is handled differently.
Depends on D114348.
Reviewed By: vitalybuka
Differential Revision: https://reviews.llvm.org/D114366
This change switches tsan to the new runtime which features:
- 2x smaller shadow memory (2x of app memory)
- faster fully vectorized race detection
- small fixed-size vector clocks (512b)
- fast vectorized vector clock operations
- unlimited number of alive threads/goroutimes
Differential Revision: https://reviews.llvm.org/D112603
This change switches tsan to the new runtime which features:
- 2x smaller shadow memory (2x of app memory)
- faster fully vectorized race detection
- small fixed-size vector clocks (512b)
- fast vectorized vector clock operations
- unlimited number of alive threads/goroutimes
Depends on D112602.
Reviewed By: melver
Differential Revision: https://reviews.llvm.org/D112603
All runtime callbacks must be non-instrumented with the new tsan runtime
(it's now more picky with respect to recursion into runtime).
Disable instrumentation in Darwin tests as we do in all other tests now.
Differential Revision: https://reviews.llvm.org/D114348
Add a fork test that models what happens on Mac
where fork calls malloc/free inside of our atfork
callbacks.
Reviewed By: vitalybuka, yln
Differential Revision: https://reviews.llvm.org/D114250
The new test started failing on bots with:
CHECK failed: tsan_rtl.cpp:327 "((addr + size)) <= ((TraceMemEnd()))"
(0xf06200e03010, 0xf06200000000) (tid=4073872)
https://lab.llvm.org/buildbot#builders/179/builds/1761
This is a latent bug in aarch64 virtual address space layout,
there is not enough address space to fit traces for all threads.
But since the trace space is going away with the new tsan runtime
(D112603), disable the test.
Reviewed By: melver
Differential Revision: https://reviews.llvm.org/D113990