Commit Graph

697 Commits

Author SHA1 Message Date
Julian Lettner a5228bcaad [Darwin] Limit parallelism for sanitizer tests that use shadow memory on AS
On Darwin, we want to limit the parallelism during test execution for
sanitizer tests that use shadow memory.  The reason is explained by this
existing comment:

> Only run up to 3 processes that require shadow memory simultaneously
> on 64-bit Darwin. Using more scales badly and hogs the system due to
> inefficient handling of large mmap'd regions (terabytes) by the
> kernel.

Previously we detected 3 cases:
* on-device: limit to 1 process
* 64-bit: macOS & simulators, limit to 3 processes
* others (32-bit): no limitation

We checked for the 64-bit case like this: `if arch in ['x86_64',
'x86_64h']` which misses macOS running on AS. Additionally, we don't
care about 32-bit anymore, so I've simplified this to 2 cases: on-device
and everything else.

Differential Revision: https://reviews.llvm.org/D122751
2022-03-31 14:43:28 -07:00
Dmitry Vyukov 1d4d2cceda [TSan] Add a runtime flag to print full thread creation stacks up to the main thread
Currently, we only print how threads involved in data race are created from their parent threads.
Add a runtime flag 'print_full_thread_history' to print thread creation stacks for the threads involved in the data race and their ancestors up to the main thread.

Reviewed By: dvyukov

Differential Revision: https://reviews.llvm.org/D122131
2022-03-24 17:30:27 +01:00
Dmitry Vyukov 9e66e5872c tsan: print signal num in errno spoiling reports
For errno spoiling reports we only print the stack
where the signal handler is invoked. And the top
frame is the signal handler function, which is supposed
to give the info for debugging.
But in same cases the top frame can be some common thunk,
which does not give much info. E.g. for Go/cgo it's always
runtime.cgoSigtramp.

Print the signal number.
This is what we can easily gather and it may give at least
some hints regarding the issue.

Reviewed By: melver, vitalybuka

Differential Revision: https://reviews.llvm.org/D121979
2022-03-18 16:12:11 +01:00
Dmitry Vyukov 66298e1c54 tsan: fix another false positive related to open/close
The false positive fixed by commit f831d6fc80
("tsan: fix false positive during fd close") still happens episodically
on the added more stressful test which does just open/close.

I don't have a coherent explanation as to what exactly happens
but the fix fixes the false positive on this test as well.
The issue may be related to lost writes during asynchronous MADV_DONTNEED.
I've debugged similar unexplainable false positive related to freed and
reused memory and at the time the only possible explanation I found is that
an asynchronous MADV_DONTNEED may lead to lost writes. That's why commit
302ec7b9bc ("tsan: add memory_limit_mb flag") added StopTheWorld around
the memory flush, but unfortunately the commit does not capture these findings.

Reviewed By: melver

Differential Revision: https://reviews.llvm.org/D121363
2022-03-10 17:02:51 +01:00
Dmitry Vyukov f831d6fc80 tsan: fix false positive during fd close
FdClose is a subjet to the same atomicity problem as MemoryRangeFreed
(memory state is not "monotoic" wrt race detection).
So we need to lock the thread slot in FdClose the same way we do
in MemoryRangeFreed.
This fixes the modified stress.cpp test.

Reviewed By: vitalybuka, melver

Differential Revision: https://reviews.llvm.org/D121143
2022-03-08 10:40:56 +01:00
Fangrui Song 632ea6929d [sanitizer][sancov] Use pc-1 for s390x
The stack trace addresses may be odd (normally addresses should be even), but
seems a good compromise when the instruction length (2,4,6) cannot be detected
easily.

Reviewed By: uweigand

Differential Revision: https://reviews.llvm.org/D120432
2022-02-23 13:35:22 -08:00
Alexander Potapenko be77afe43d tsan: Add a missing disable_sanitizer_instrumentation attribute
Turns out the test was working by accident: we need to ensure
TSan instrumentation is not called from the fork() hook, otherwise the
tool will deadlock. Previously it worked because alloc_free_blocks() got
inlined into __tsan_test_only_on_fork(), but it cannot always be the
case.

Adding __attribute__((disable_sanitizer_instrumentation)) will prevent
TSan from instrumenting alloc_free_blocks().

Reviewed By: dvyukov

Differential Revision: https://reviews.llvm.org/D120050
2022-02-17 15:34:41 +01:00
Julian Lettner 1f4a0531b3 [TSan] Mark test unsupported on Darwin 2022-01-23 22:01:48 -08:00
Julian Lettner 4614b93f53 [TSan][Darwin] Mark test UNSUPPORTED for iOS simulator 2022-01-11 15:01:24 -08:00
Julian Lettner f4ab0f6e09 [TSan] Avoid deadlock in test for compiler-rt debug build
rdar://86776155
2022-01-10 11:40:54 -08:00
Julian Lettner 63ddf0baf3 [TSan] Don't instrument code that is executed from __tsan_on_report()
See also: https://reviews.llvm.org/D111157
2021-12-21 17:02:51 -08:00
Dmitry Vyukov d95baa98f3 tsan: fix failures after multi-threaded fork
Creating threads after a multi-threaded fork is semi-supported,
we don't give particular guarantees, but we try to not fail
on simple cases and we have die_after_fork=0 flag that enables
not dying on creation of threads after a multi-threaded fork.
This flag is used in the wild:
23c052e3e3/SConstruct (L3599)

fork_multithreaded.cpp test started hanging in debug mode
after the recent "tsan: fix deadlock during race reporting" commit,
which added proactive ThreadRegistryLock check in SlotLock.

But the test broke earlier after "tsan: remove quadratic behavior in pthread_join"
commit which made tracking of alive threads based on pthread_t stricter
(CHECK-fail on 2 threads with the same pthread_t, or joining a non-existent thread).
When we start a thread after a multi-threaded fork, the new pthread_t
can actually match one of existing values (for threads that don't exist anymore).
Thread creation started CHECK-failing on this, but the test simply
ignored this CHECK failure in the child thread and "passed".
But after "tsan: fix deadlock during race reporting" the test started hanging dead,
because CHECK failures recursively lock thread registry.

Fix this purging all alive threads from thread registry on fork.

Also the thread registry mutex somehow lost the internal deadlock detector id
and was excluded from deadlock detection. If it would have the id, the CHECK
wouldn't hang because of the nested CHECK failure due to the deadlock.
But then again the test would have silently ignore this error as well
and the bugs wouldn't have been noticed.
Add the deadlock detector id to the thread registry mutex.

Also extend the test to check more cases and detect more bugs.

Reviewed By: melver

Differential Revision: https://reviews.llvm.org/D116091
2021-12-21 16:54:00 +01:00
Vitaly Buka 8f85d5205d [tsan] Disable test from D115759 on Darwin 2021-12-20 19:41:09 -08:00
Dmitry Vyukov 2eb3e20461 tsan: fix deadlock during race reporting
SlotPairLocker calls SlotLock under ctx->multi_slot_mtx.
SlotLock can invoke global reset DoReset if we are out of slots/epochs.
But DoReset locks ctx->multi_slot_mtx as well, which leads to deadlock.

Resolve the deadlock by removing SlotPairLocker/multi_slot_mtx
and only lock one slot for which we will do RestoreStack.
We need to lock that slot because RestoreStack accesses the slot journal.
But it's unclear why we need to lock the current slot.
Initially I did it just to be on the safer side (but at that time
we dit not lock the second slot, so it was easy just to lock the current slot).

Reviewed By: melver

Differential Revision: https://reviews.llvm.org/D116040
2021-12-20 18:52:48 +01:00
Julian Lettner 4399f3b6b0 [TSan][Darwin] Make malloc_size interceptor more robust
Previously we would crash in the TSan runtime if the user program passes
a pointer to `malloc_size()` that doesn't point into app memory.

In these cases, `malloc_size()` should return 0.

For ASan, we fixed a similar issue here:
https://reviews.llvm.org/D15008

Radar-Id: rdar://problem/86213149

Differential Revision: https://reviews.llvm.org/D115947
2021-12-17 15:38:08 -08:00
Matt Kulukundis 406b538dea Add a flag to force tsan's background thread
Reviewed By: dvyukov, vitalybuka

Differential Revision: https://reviews.llvm.org/D115759
2021-12-16 11:47:33 -08:00
Julian Lettner 3a1eb1cf2a [TSan] Make test fail more predictably
This test would hang when the system ran out of resources and we fail to
create all 300 threads.

Differential Revision: https://reviews.llvm.org/D115845
2021-12-16 08:33:32 -08:00
Dmitry Vyukov b332134921 tsan: new runtime (v3)
This change switches tsan to the new runtime which features:
 - 2x smaller shadow memory (2x of app memory)
 - faster fully vectorized race detection
 - small fixed-size vector clocks (512b)
 - fast vectorized vector clock operations
 - unlimited number of alive threads/goroutimes

Depends on D112602.

Reviewed By: melver

Differential Revision: https://reviews.llvm.org/D112603
2021-12-13 12:48:34 +01:00
Dmitry Vyukov b088833375 tsan: deflake dlopen_static_tls.cpp
Currently the test calls dlclose in the thread
concurrently with the main thread calling a function
from the dynamic library. This is not good.
Wait for the main thread to call the function
before calling dlclose.

Depends on D115612.

Reviewed By: melver

Differential Revision: https://reviews.llvm.org/D115613
2021-12-13 12:01:40 +01:00
Dmitry Vyukov 7de546e9e8 tsan: deflake flush_memory.cpp
The test contains a race and checks that it's detected.
But the race may not be detected since we are doing aggressive flushes
and if the state flush happens between racing accesses, tsan won't
detect the race). So return 1 to make the test deterministic
regardless of the race.

Reviewed By: melver

Differential Revision: https://reviews.llvm.org/D115612
2021-12-13 12:01:30 +01:00
Jonas Devlieghere 396113c19f Revert "tsan: new runtime (v3)"
This reverts commit 5a33e41281 becuase it
breaks LLDB.

https://green.lab.llvm.org/green/view/LLDB/job/lldb-cmake/39208/
2021-12-09 09:18:10 -08:00
Dmitry Vyukov 5a33e41281 tsan: new runtime (v3)
This change switches tsan to the new runtime which features:
 - 2x smaller shadow memory (2x of app memory)
 - faster fully vectorized race detection
 - small fixed-size vector clocks (512b)
 - fast vectorized vector clock operations
 - unlimited number of alive threads/goroutimes

Depends on D112602.

Reviewed By: melver

Differential Revision: https://reviews.llvm.org/D112603
2021-12-09 09:09:52 +01:00
Dmitry Vyukov 954582cdfc tsan: disable dlopen_static_tls.cpp test on powerpc64
Reviewed By: melver

Differential Revision: https://reviews.llvm.org/D115142
2021-12-06 13:13:43 +01:00
Dmitry Vyukov fd26417a74 tsan: disable dlopen_static_tls.cpp test on aarch64
Fails on bots: https://lab.llvm.org/buildbot#builders/184/builds/1580

Differential Revision: https://reviews.llvm.org/D115095
2021-12-04 13:01:47 +01:00
Dmitry Vyukov 4a5086dce3 tsan: disable munmap_invalid.cpp test on darwin
It failed on bots:
https://green.lab.llvm.org/green//job/clang-stage1-RA/25954/consoleFull#-1417328700a1ca8a51-895e-46c6-af87-ce24fa4cd561
and it  doesn't provide the test output.

Reviewed By: melver

Differential Revision: https://reviews.llvm.org/D114972
2021-12-03 09:03:45 +01:00
Dmitry Vyukov 1b576585eb tsan: tolerate munmap with invalid arguments
We call UnmapShadow before the actual munmap, at that point we don't yet
know if the provided address/size are sane. We can't call UnmapShadow
after the actual munmap becuase at that point the memory range can
already be reused for something else, so we can't rely on the munmap
return value to understand is the values are sane.
While calling munmap with insane values (non-canonical address, negative
size, etc) is an error, the kernel won't crash. We must also try to not
crash as the failure mode is very confusing (paging fault inside of the
runtime on some derived shadow address).

Such invalid arguments are observed on Chromium tests:
https://bugs.chromium.org/p/chromium/issues/detail?id=1275581

Reviewed By: melver

Differential Revision: https://reviews.llvm.org/D114944
2021-12-02 17:50:51 +01:00
Dmitry Vyukov 97b4e63117 tsan: fix false positives in dynamic libs with static tls
The added test demonstrates  loading a dynamic library with static TLS.
Such static TLS is a hack that allows a dynamic library to have faster TLS,
but it can be loaded only iff all threads happened to allocate some excess
of static TLS space for whatever reason. If it's not the case loading fails with:

dlopen: cannot load any more object with static TLS

We used to produce a false positive because dlopen will write into TLS
of all existing threads to initialize/zero TLS region for the loaded library.
And this appears to be racing with initialization of TLS in the thread
since we model a write into the whole static TLS region (we don't what part
of it is currently unused):

WARNING: ThreadSanitizer: data race (pid=2317365)
  Write of size 1 at 0x7f1fa9bfcdd7 by main thread:
    0 memset
    1 init_one_static_tls
    2 __pthread_init_static_tls
    [[ this is where main calls dlopen ]]
    3 main
  Previous write of size 8 at 0x7f1fa9bfcdd0 by thread T1:
    0 __tsan_tls_initialization

Fix this by ignoring accesses during dlopen.

Reviewed By: melver

Differential Revision: https://reviews.llvm.org/D114953
2021-12-02 17:47:05 +01:00
Julian Lettner 863b117411 [TSan][Darwin] Prevent inlining of functions in tests
Prevent inlining of functions so we can FileCheck the generated stack
traces.
2021-12-01 17:00:52 -08:00
Julian Lettner 6703fe25b7 [TSan][Darwin] Mark test unsupported 2021-12-01 15:50:10 -08:00
Dmitry Vyukov 09859113ed Revert "tsan: new runtime (v3)"
This reverts commit 66d4ce7e26.

Chromium tests started failing:
https://bugs.chromium.org/p/chromium/issues/detail?id=1275581
2021-12-01 18:00:46 +01:00
Benjamin Kramer 0e099a64be [tsan] Relax atexit5.cpp a bit more so it's not as dependent on the standard library implementation 2021-11-26 14:02:34 +01:00
Dmitry Vyukov a1dc97e472 tsan: remember and print function that installed at_exit callbacks
Sometimes stacks for at_exit callbacks don't include any of the user functions/files.
For example, a race with a global std container destructor will only contain
the container type name and our at_exit_wrapper function. No signs what global variable
this is.
Remember and include in reports the function that installed the at_exit callback.
This should give glues as to what variable is being destroyed.

Depends on D114606.

Reviewed By: vitalybuka, melver

Differential Revision: https://reviews.llvm.org/D114607
2021-11-26 08:00:55 +01:00
Dmitry Vyukov 3f87788de1 tsan: add a test for on_exit
Depends on D114605.

Reviewed By: vitalybuka, melver

Differential Revision: https://reviews.llvm.org/D114606
2021-11-26 08:00:43 +01:00
Dmitry Vyukov 9ea3bd5a1c tsan: add test for __cxa_atexit
Add a test for a common C++ bug when a global object is destroyed
while background threads still use it.

Depends on D114604.

Reviewed By: vitalybuka, melver

Differential Revision: https://reviews.llvm.org/D114605
2021-11-26 08:00:29 +01:00
Dmitry Vyukov c2f0de06c9 tsan: check stack in atexit4.cpp test
Reviewed By: vitalybuka, melver

Differential Revision: https://reviews.llvm.org/D114604
2021-11-26 08:00:19 +01:00
Dmitry Vyukov 66d4ce7e26 tsan: new runtime (v3)
This change switches tsan to the new runtime which features:
 - 2x smaller shadow memory (2x of app memory)
 - faster fully vectorized race detection
 - small fixed-size vector clocks (512b)
 - fast vectorized vector clock operations
 - unlimited number of alive threads/goroutimes

Depends on D112602.

Reviewed By: melver

Differential Revision: https://reviews.llvm.org/D112603
2021-11-25 18:32:04 +01:00
Dmitry Vyukov b584741d06 tsan: fix Java heap block begin in reports
We currently use a wrong value for heap block
(only works for C++, but not for Java).
Use the correct value (we already computed it before, just forgot to use).

Depends on D114593.

Reviewed By: melver

Differential Revision: https://reviews.llvm.org/D114595
2021-11-25 17:07:53 +01:00
Dmitry Vyukov debac0ef37 tsan: add a benchmark for vector memory accesses
Depends on D114592.

Reviewed By: melver

Differential Revision: https://reviews.llvm.org/D114593
2021-11-25 17:07:46 +01:00
Dmitry Vyukov 5cac2b956b tsan: add a test for vector memory accesses
Add a basic test that checks races between vector/non-vector
read/write accesses of different sizes/offsets in different orders.
This gives coverage of __tsan_read/write16 callbacks.

Depends on D114591.

Reviewed By: melver

Differential Revision: https://reviews.llvm.org/D114592
2021-11-25 17:07:18 +01:00
Dmitry Vyukov d841086ae6 tsan: enable -msse4 when compiling tests
Vector SSE accesses make compiler emit __tsan_[unaligned_]read/write16 callbacks.
Make it possible to test these.

Reviewed By: melver

Differential Revision: https://reviews.llvm.org/D114591
2021-11-25 17:07:02 +01:00
Dmitry Vyukov a68b52e0a3 tsan: add another fork deadlock test
The test tries to provoke internal allocator to be locked during fork
and then force the child process to use the internal allocator.
This test sometimes deadlocks with the new tsan runtime.

Depends on D114514.

Reviewed By: melver

Differential Revision: https://reviews.llvm.org/D114515
2021-11-24 13:25:53 +01:00
Dmitry Vyukov 764b35d89f tsan: extend mmap test
Test size larger than clear_shadow_mmap_threshold,
which is handled differently.

Depends on D114348.

Reviewed By: vitalybuka

Differential Revision: https://reviews.llvm.org/D114366
2021-11-24 10:57:21 +01:00
Weverything 1150f02c77 Revert "tsan: new runtime (v3)"
This reverts commit ebd47b0fb7.
This was causing unexpected behavior in programs.
2021-11-23 18:32:32 -08:00
Dmitry Vyukov d75ed9864a tsan: disable signal_sync2.cpp test on powerpc64
Fails 1 out of 10 runs on powerpc bots:
https://lab.llvm.org/buildbot/#/builders/121/builds/13391

Reviewed By: nemanjai

Differential Revision: https://reviews.llvm.org/D114426
2021-11-23 17:58:26 +01:00
Dmitry Vyukov ebd47b0fb7 tsan: new runtime (v3)
This change switches tsan to the new runtime which features:
 - 2x smaller shadow memory (2x of app memory)
 - faster fully vectorized race detection
 - small fixed-size vector clocks (512b)
 - fast vectorized vector clock operations
 - unlimited number of alive threads/goroutimes

Differential Revision: https://reviews.llvm.org/D112603
2021-11-23 11:44:59 +01:00
Dmitry Vyukov 5f18ae3988 Revert "tsan: new runtime (v3)"
Summary:
This reverts commit 1784fe0532.

Broke some bots:
https://lab.llvm.org/buildbot#builders/57/builds/12365
http://green.lab.llvm.org/green/job/clang-stage1-RA/25658/

Reviewers: vitalybuka, melver

Subscribers:
2021-11-22 19:08:48 +01:00
Dmitry Vyukov 1784fe0532 tsan: new runtime (v3)
This change switches tsan to the new runtime which features:
 - 2x smaller shadow memory (2x of app memory)
 - faster fully vectorized race detection
 - small fixed-size vector clocks (512b)
 - fast vectorized vector clock operations
 - unlimited number of alive threads/goroutimes

Depends on D112602.

Reviewed By: melver

Differential Revision: https://reviews.llvm.org/D112603
2021-11-22 15:55:39 +01:00
Dmitry Vyukov e69d50d9ff tsan: disable instrumentation in runtime callbacks in tests
All runtime callbacks must be non-instrumented with the new tsan runtime
(it's now more picky with respect to recursion into runtime).
Disable instrumentation in Darwin tests as we do in all other tests now.

Differential Revision: https://reviews.llvm.org/D114348
2021-11-22 15:48:29 +01:00
Dmitry Vyukov 6a3958247a tsan: add another fork test
Add a fork test that models what happens on Mac
where fork calls malloc/free inside of our atfork
callbacks.

Reviewed By: vitalybuka, yln

Differential Revision: https://reviews.llvm.org/D114250
2021-11-22 08:36:51 +01:00
Dmitry Vyukov d0c138ec8a tsan: disable bench_threads.cpp on aarch64
The new test started failing on bots with:

CHECK failed: tsan_rtl.cpp:327 "((addr + size)) <= ((TraceMemEnd()))"
   (0xf06200e03010, 0xf06200000000) (tid=4073872)

https://lab.llvm.org/buildbot#builders/179/builds/1761

This is a latent bug in aarch64 virtual address space layout,
there is not enough address space to fit traces for all threads.
But since the trace space is going away with the new tsan runtime
(D112603), disable the test.

Reviewed By: melver

Differential Revision: https://reviews.llvm.org/D113990
2021-11-16 16:53:04 +01:00