Add ThreadClock:: global_acquire_ which is the last time another thread
has done a global acquire of this thread's clock.
It helps to avoid problem described in:
https://github.com/golang/go/issues/39186
See test/tsan/java_finalizer2.cpp for a regression test.
Note the failuire is _extremely_ hard to hit, so if you are trying
to reproduce it, you may want to run something like:
$ go get golang.org/x/tools/cmd/stress
$ stress -p=64 ./a.out
The crux of the problem is roughly as follows.
A number of O(1) optimizations in the clocks algorithm assume proper
transitive cumulative propagation of clock values. The AcquireGlobal
operation may produce an inconsistent non-linearazable view of
thread clocks. Namely, it may acquire a later value from a thread
with a higher ID, but fail to acquire an earlier value from a thread
with a lower ID. If a thread that executed AcquireGlobal then releases
to a sync clock, it will spoil the sync clock with the inconsistent
values. If another thread later releases to the sync clock, the optimized
algorithm may break.
The exact sequence of events that leads to the failure.
- thread 1 executes AcquireGlobal
- thread 1 acquires value 1 for thread 2
- thread 2 increments clock to 2
- thread 2 releases to sync object 1
- thread 3 at time 1
- thread 3 acquires from sync object 1
- thread 1 acquires value 1 for thread 3
- thread 1 releases to sync object 2
- sync object 2 clock has 1 for thread 2 and 1 for thread 3
- thread 3 releases to sync object 2
- thread 3 sees value 1 in the clock for itself
and decides that it has already released to the clock
and did not acquire anything from other threads after that
(the last_acquire_ check in release operation)
- thread 3 does not update the value for thread 2 in the clock from 1 to 2
- thread 4 acquires from sync object 2
- thread 4 detects a false race with thread 2
as it should have been synchronized with thread 2 up to time 2,
but because of the broken clock it is now synchronized only up to time 1
The global_acquire_ value helps to prevent this scenario.
Namely, thread 3 will not trust any own clock values up to global_acquire_
for the purposes of the last_acquire_ optimization.
Reviewed-in: https://reviews.llvm.org/D80474
Reported-by: nvanbenschoten (Nathan VanBenschoten)
Summary:
Due to sandbox restrictions in the recent versions of the simulator runtime the
atos program is no longer able to access the task port of a parent process
without additional help.
This patch fixes this by registering a task port for the parent process
before spawning atos and also tells atos to look for this by setting
a special environment variable.
This patch is based on an Apple internal fix (rdar://problem/43693565) that
unfortunately contained a bug (rdar://problem/58789439) because it used
setenv() to set the special environment variable. This is not safe because in
certain circumstances this can trigger a call to realloc() which can fail
during symbolization leading to deadlock. A test case is included that captures
this problem.
The approach used to set the necessary environment variable is as
follows:
1. Calling `putenv()` early during process init (but late enough that
malloc/realloc works) to set a dummy value for the environment variable.
2. Just before `atos` is spawned the storage for the environment
variable is modified to contain the correct PID.
A flaw with this approach is that if the application messes with the
atos environment variable (i.e. unsets it or changes it) between the
time its set and the time we need it then symbolization will fail. We
will ignore this issue for now but a `DCHECK()` is included in the patch
that documents this assumption but doesn't check it at runtime to avoid
calling `getenv()`.
The issue reported in rdar://problem/58789439 manifested as a deadlock
during symbolization in the following situation:
1. Before TSan detects an issue something outside of the runtime calls
setenv() that sets a new environment variable that wasn't previously
set. This triggers a call to malloc() to allocate a new environment
array. This uses TSan's normal user-facing allocator. LibC stores this
pointer for future use later.
2. TSan detects an issue and tries to launch the symbolizer. When we are in the
symbolizer we switch to a different (internal allocator) and then we call
setenv() to set a new environment variable. When this happen setenv() sees
that it needs to make the environment array larger and calls realloc() on the
existing enviroment array because it remembers that it previously allocated
memory for it. Calling realloc() fails here because it is being called on a
pointer its never seen before.
The included test case closely reproduces the originally reported
problem but it doesn't replicate the `((kBlockMagic)) ==
((((u64*)addr)[0])` assertion failure exactly. This is due to the way
TSan's normal allocator allocates the environment array the first time
it is allocated. In the test program addr[0] accesses an inaccessible
page and raises SIGBUS. If TSan's SIGBUS signal handler is active, the
signal is caught and symbolication is attempted again which results in
deadlock.
In the originally reported problem the pointer is successfully derefenced but
then the assert fails due to the provided pointer not coming from the active
allocator. When the assert fails TSan tries to symbolicate the stacktrace while
already being in the middle of symbolication which results in deadlock.
rdar://problem/58789439
Reviewers: kubamracek, yln
Subscribers: jfb, #sanitizers, llvm-commits
Tags: #sanitizers
Differential Revision: https://reviews.llvm.org/D78179
Summary:
These tests pass with clang, but fail if gcc was used.
gcc build creates similar but not the same stacks.
Reviewers: vitalybuka
Reviewed By: vitalybuka
Subscribers: dvyukov, llvm-commits, #sanitizers
Tags: #sanitizers
Differential Revision: https://reviews.llvm.org/D78114
For targets where char is unsigned (like PowerPC), something like
char c = fgetc(...) will never produce a char that will compare
equal to EOF so this loop does not terminate.
Change the type to int (which appears to be the POSIX return type
for fgetc).
This allows the test case to terminate normally on PPC.
Buildbots say:
[126/127] Running lint check for sanitizer sources...
FAILED: projects/compiler-rt/lib/CMakeFiles/SanitizerLintCheck
cd /home/buildbots/ppc64be-clang-multistage-test/clang-ppc64be-multistage/stage1/projects/compiler-rt/lib && env LLVM_CHECKOUT=/home/buildbots/ppc64be-clang-multistage-test/clang-ppc64be-multistage/llvm/llvm SILENT=1 TMPDIR= PYTHON_EXECUTABLE=/usr/bin/python COMPILER_RT=/home/buildbots/ppc64be-clang-multistage-test/clang-ppc64be-multistage/llvm/compiler-rt /home/buildbots/ppc64be-clang-multistage-test/clang-ppc64be-multistage/llvm/compiler-rt/lib/sanitizer_common/scripts/check_lint.sh
/home/buildbots/ppc64be-clang-multistage-test/clang-ppc64be-multistage/llvm/compiler-rt/test/tsan/fiber_cleanup.cpp:71: Could not find a newline character at the end of the file. [whitespace/ending_newline] [5]
ninja: build stopped: subcommand failed.
Somehow this check is not part of 'ninja check-tsan'.
When creating and destroying fibers in tsan a thread state is created and destroyed. Currently, a memory mapping is leaked with each fiber (in __tsan_destroy_fiber). This causes applications with many short running fibers to crash or hang because of linux vm.max_map_count.
The root of this is that ThreadState holds a pointer to ThreadSignalContext for handling signals. The initialization and destruction of it is tied to platform specific events in tsan_interceptors_posix and missed when destroying a fiber (specifically, SigCtx is used to lazily create the ThreadSignalContext in tsan_interceptors_posix). This patch cleans up the memory by makinh the ThreadState create and destroy the ThreadSignalContext.
The relevant code causing the leak with fibers is the fiber destruction:
void FiberDestroy(ThreadState *thr, uptr pc, ThreadState *fiber) {
FiberSwitchImpl(thr, fiber);
ThreadFinish(fiber);
FiberSwitchImpl(fiber, thr);
internal_free(fiber);
}
Author: Florian
Reviewed-in: https://reviews.llvm.org/D76073
Temporarily revert "tsan: fix leak of ThreadSignalContext for fibers"
because it breaks the LLDB bot on GreenDragon.
This reverts commit 93f7743851.
This reverts commit d8a0f76de7.
When creating and destroying fibers in tsan a thread state
is created and destroyed. Currently, a memory mapping is
leaked with each fiber (in __tsan_destroy_fiber).
This causes applications with many short running fibers
to crash or hang because of linux vm.max_map_count.
The root of this is that ThreadState holds a pointer to
ThreadSignalContext for handling signals. The initialization
and destruction of it is tied to platform specific events
in tsan_interceptors_posix and missed when destroying a fiber
(specifically, SigCtx is used to lazily create the
ThreadSignalContext in tsan_interceptors_posix). This patch
cleans up the memory by inverting the control from the
platform specific code calling the generic ThreadFinish to
ThreadFinish calling a platform specific clean-up routine
after finishing a thread.
The relevant code causing the leak with fibers is the fiber destruction:
void FiberDestroy(ThreadState *thr, uptr pc, ThreadState *fiber) {
FiberSwitchImpl(thr, fiber);
ThreadFinish(fiber);
FiberSwitchImpl(fiber, thr);
internal_free(fiber);
}
I would appreciate feedback if this way of fixing the leak is ok.
Also, I think it would be worthwhile to more closely look at the
lifecycle of ThreadState (i.e. it uses no constructor/destructor,
thus requiring manual callbacks for cleanup) and how OS-Threads/user
level fibers are differentiated in the codebase. I would be happy to
contribute more if someone could point me at the right place to
discuss this issue.
Reviewed-in: https://reviews.llvm.org/D76073
Author: Florian (Florian)
Generally we ignore interceptors coming from called_from_lib-suppressed libraries.
However, we must not ignore critical interceptors like e.g. pthread_create,
otherwise runtime will lost track of threads.
pthread_detach is one of these interceptors we should not ignore as it affects
thread states and behavior of pthread_join which we don't ignore as well.
Currently we can produce very obscure false positives. For more context see:
https://groups.google.com/forum/#!topic/thread-sanitizer/ecH2P0QUqPs
The added test captures this pattern.
While we are here rename ThreadTid to ThreadConsumeTid to make it clear that
it's not just a "getter", it resets user_id to 0. This lead to confusion recently.
Reviewed in https://reviews.llvm.org/D74828
Summary:
This substitution expands to the appropriate minimum deployment target
flag where thread local storage (TLS) was first introduced on Darwin
platforms. For all other platforms the substitution expands to an empty
string.
E.g. for macOS the substitution expands to `-mmacosx-version-min=10.12`
This patch adds support for the substitution (and future substitutions)
by doing a minor refactor and then uses the substitution in the relevant
TSan tests.
rdar://problem/59568956
Reviewers: yln, kubamracek, dvyukov, vitalybuka
Subscribers: #sanitizers, llvm-commits
Tags: #sanitizers, #llvm
Differential Revision: https://reviews.llvm.org/D74802
This patch defines `config.apple_platform_min_deployment_target_flag`
in the ASan, LibFuzzer, TSan, and UBSan lit test configs.
rdar://problem/59463146
Summary:
A number of testcases in TSAN are designed to deal with intermittent problems
not exist in all executions of the tested program. A script called deflake.bash
runs the executable up to 10 times to deal with the intermittent nature of the tests.
The purpose of this patch is to parameterize the hard-coded threshold above via
--cmake_variables=-DTSAN_TEST_DEFLAKE_THRESHOLD=SomeIntegerValue
When this cmake var is not set, the default value of 10 will be used.
Reviewer: dvyukov (Dmitry Vyukov), eugenis (Evgenii Stepanov), rnk (Reid Kleckner), hubert.reinterpretcast (Hubert Tong), vitalybuka (Vitaly Buka)
Reviewed By: vitalybuka (Vitaly Buka)
Subscribers: mgorny (Michal Gorny), jfb (JF Bastien), steven.zhang (qshanz), llvm-commits (Mailing List llvm-commits), Sanitizers
Tag: LLVM, Sanitizers
Differential Revision: https://reviews.llvm.org/D73707
EXCLUDE_FROM_ALL means something else for add_lit_testsuite as it does
for something like add_executable. Distinguish between the two by
renaming the variable and making it an argument to add_lit_testsuite.
Differential revision: https://reviews.llvm.org/D74168
Summary:
The previous code hard-coded platform names but compiler-rt's CMake
build system actually already knows which Apple platforms TSan supports.
This change uses this information to enumerate the different Apple
platforms.
This change relies on the `get_capitalized_apple_platform()` function
added in a previous commit.
rdar://problem/58798733
Reviewers: kubamracek, yln
Subscribers: mgorny, #sanitizers, llvm-commits
Tags: #sanitizers, #llvm
Differential Revision: https://reviews.llvm.org/D73238
This flaky test that I added really gives our CI a lot of headaches.
Although I was never able to reproduce this locally, it sporadically
hangs/fails on our bots. I decided to silently pass the test whenever
we are unable to setup the proper test condition after 10 retries. This
is of course suboptimal and a last recourse. Please let me know if you
know how to test this better.
rdar://57844626
Use a new %run wrapper for ASAN/MSAN/TSAN tests that calls paxctl
in order to disable ASLR on the test executables. This makes it
possible to test sanitizers on systems where ASLR is enabled by default.
Differential Revision: https://reviews.llvm.org/D70958
The main problem here is that `-*-version_min=` was not being passed to
the compiler when building test cases. This can cause problems when
testing on devices running older OSs because Clang would previously
assume the minimum deployment target is the the latest OS in the SDK
which could be much newer than what the device is running.
Previously the generated value looked like this:
`-arch arm64 -isysroot
<path_to_xcode>/Contents/Developer/Platforms/iPhoneOS.platform/Developer/SDKs/iPhoneOS12.1.sdk`
With this change it now looks like:
`-arch arm64 -stdlib=libc++ -miphoneos-version-min=8.0 -isysroot
<path_to_xcode>/Contents/Developer/Platforms/iPhoneOS.platform/Developer/SDKs/iPhoneOS12.1.sdk`
This mirrors the setting of config.target_cflags on macOS.
This change is made for ASan, LibFuzzer, TSan, and UBSan.
To implement this a new `get_test_cflags_for_apple_platform()` function
has been added that when given an Apple platform name and architecture
returns a string containing the C compiler flags to use when building
tests. This also calls a new helper function `is_valid_apple_platform()`
that validates Apple platform names.
This is the third attempt at landing the patch.
The first attempt (r359305) had to be reverted (r359327) due to a buildbot
failure. The problem was that calling `get_test_cflags_for_apple_platform()`
can trigger a CMake error if the provided architecture is not supported by the
current CMake configuration. Previously, this could be triggered by passing
`-DCOMPILER_RT_ENABLE_IOS=OFF` to CMake. The root cause is that we were
generating test configurations for a list of architectures without checking if
the relevant Sanitizer actually supported that architecture. We now intersect
the list of architectures for an Apple platform with
`<SANITIZER>_SUPPORTED_ARCH` (where `<SANITIZER>` is a Sanitizer name) to
iterate through the correct list of architectures.
The second attempt (r363633) had to be reverted (r363779) due to a build
failure. The failed build was using a modified Apple toolchain where the iOS
simulator SDK was missing. This exposed a bug in the existing UBSan test
generation code where it was assumed that `COMPILER_RT_ENABLE_IOS` implied that
the toolchain supported both iOS and the iOS simulator. This is not true. This
has been fixed by using the list `SANITIZER_COMMON_SUPPORTED_OS` for the list
of supported Apple platforms for UBSan. For consistency with the other
Sanitizers we also now intersect the list of architectures with
UBSAN_SUPPORTED_ARCH.
rdar://problem/50124489
Differential Revision: https://reviews.llvm.org/D61242
llvm-svn: 373405
Adding annotation function variants __tsan_write_range_pc and
__tsan_read_range_pc to annotate ranged access to memory while providing a
program counter for the access.
Differential Revision: https://reviews.llvm.org/D66885
llvm-svn: 372730
Declare the family of AnnotateIgnore[Read,Write][Begin,End] TSan
annotations in compiler-rt/test/tsan/test.h so that we don't have to
declare them separately in every test that needs them. Replace usages.
Leave usages that explicitly test the annotation mechanism:
thread_end_with_ignore.cpp
thread_end_with_ignore3.cpp
llvm-svn: 371446
I verified that the test is red without the interceptors.
rdar://40334350
Reviewed By: kubamracek, vitalybuka
Differential Revision: https://reviews.llvm.org/D66616
llvm-svn: 371439
Summary:
It appears that since https://reviews.llvm.org/D54889, BackgroundThread()
crashes immediately because cur_thread()-> will return a null pointer
which is then dereferenced. I'm not sure why I only see this issue on
FreeBSD and not Linux since it should also be unintialized on other platforms.
Reviewers: yuri, dvyukov, dim, emaste
Subscribers: kubamracek, krytarowski, #sanitizers, llvm-commits
Tags: #sanitizers, #llvm
Differential Revision: https://reviews.llvm.org/D65705
llvm-svn: 368103
See https://reviews.llvm.org/D58620 for discussion, and for the commands
I ran. In addition I also ran
for f in $(svn diff | diffstat | grep .cc | cut -f 2 -d ' '); do rg $f . ; done
and manually updated (many) references to renamed files found by that.
llvm-svn: 367463
These lit configuration files are really Python source code. Using the
.py file extension helps editors and tools use the correct language
mode. LLVM and Clang already use this convention for lit configuration,
this change simply applies it to all of compiler-rt.
Reviewers: vitalybuka, dberris
Differential Revision: https://reviews.llvm.org/D63658
llvm-svn: 364591
This caused Chromium's clang package to stop building, see comment on
https://reviews.llvm.org/D61242 for details.
> Summary:
> The main problem here is that `-*-version_min=` was not being passed to
> the compiler when building test cases. This can cause problems when
> testing on devices running older OSs because Clang would previously
> assume the minimum deployment target is the the latest OS in the SDK
> which could be much newer than what the device is running.
>
> Previously the generated value looked like this:
>
> `-arch arm64 -isysroot
> <path_to_xcode>/Contents/Developer/Platforms/iPhoneOS.platform/Developer/SDKs/iPhoneOS12.1.sdk`
>
> With this change it now looks like:
>
> `-arch arm64 -stdlib=libc++ -miphoneos-version-min=8.0 -isysroot
> <path_to_xcode>/Contents/Developer/Platforms/iPhoneOS.platform/Developer/SDKs/iPhoneOS12.1.sdk`
>
> This mirrors the setting of `config.target_cflags` on macOS.
>
> This change is made for ASan, LibFuzzer, TSan, and UBSan.
>
> To implement this a new `get_test_cflags_for_apple_platform()` function
> has been added that when given an Apple platform name and architecture
> returns a string containing the C compiler flags to use when building
> tests. This also calls a new helper function `is_valid_apple_platform()`
> that validates Apple platform names.
>
> This is the second attempt at landing the patch. The first attempt (r359305)
> had to be reverted (r359327) due to a buildbot failure. The problem was
> that calling `get_test_cflags_for_apple_platform()` can trigger a CMake
> error if the provided architecture is not supported by the current
> CMake configuration. Previously, this could be triggered by passing
> `-DCOMPILER_RT_ENABLE_IOS=OFF` to CMake. The root cause is that we were
> generating test configurations for a list of architectures without
> checking if the relevant Sanitizer actually supported that architecture.
> We now intersect the list of architectures for an Apple platform
> with `<SANITIZER>_SUPPORTED_ARCH` (where `<SANITIZER>` is a Sanitizer
> name) to iterate through the correct list of architectures.
>
> rdar://problem/50124489
>
> Reviewers: kubamracek, yln, vsk, juliehockett, phosek
>
> Subscribers: mgorny, javed.absar, kristof.beyls, #sanitizers, llvm-commits
>
> Tags: #llvm, #sanitizers
>
> Differential Revision: https://reviews.llvm.org/D61242
llvm-svn: 363779
Summary:
The main problem here is that `-*-version_min=` was not being passed to
the compiler when building test cases. This can cause problems when
testing on devices running older OSs because Clang would previously
assume the minimum deployment target is the the latest OS in the SDK
which could be much newer than what the device is running.
Previously the generated value looked like this:
`-arch arm64 -isysroot
<path_to_xcode>/Contents/Developer/Platforms/iPhoneOS.platform/Developer/SDKs/iPhoneOS12.1.sdk`
With this change it now looks like:
`-arch arm64 -stdlib=libc++ -miphoneos-version-min=8.0 -isysroot
<path_to_xcode>/Contents/Developer/Platforms/iPhoneOS.platform/Developer/SDKs/iPhoneOS12.1.sdk`
This mirrors the setting of `config.target_cflags` on macOS.
This change is made for ASan, LibFuzzer, TSan, and UBSan.
To implement this a new `get_test_cflags_for_apple_platform()` function
has been added that when given an Apple platform name and architecture
returns a string containing the C compiler flags to use when building
tests. This also calls a new helper function `is_valid_apple_platform()`
that validates Apple platform names.
This is the second attempt at landing the patch. The first attempt (r359305)
had to be reverted (r359327) due to a buildbot failure. The problem was
that calling `get_test_cflags_for_apple_platform()` can trigger a CMake
error if the provided architecture is not supported by the current
CMake configuration. Previously, this could be triggered by passing
`-DCOMPILER_RT_ENABLE_IOS=OFF` to CMake. The root cause is that we were
generating test configurations for a list of architectures without
checking if the relevant Sanitizer actually supported that architecture.
We now intersect the list of architectures for an Apple platform
with `<SANITIZER>_SUPPORTED_ARCH` (where `<SANITIZER>` is a Sanitizer
name) to iterate through the correct list of architectures.
rdar://problem/50124489
Reviewers: kubamracek, yln, vsk, juliehockett, phosek
Subscribers: mgorny, javed.absar, kristof.beyls, #sanitizers, llvm-commits
Tags: #llvm, #sanitizers
Differential Revision: https://reviews.llvm.org/D61242
llvm-svn: 363633
Re-enable test that was disabled because it deadlocks when running on
the bot, but was never enabled again. Can't reproduce deadlock locally
so trying to investigate by re-enabling test.
llvm-svn: 360388
Summary: no_sanitize_thread is not enough as it still puts some tsan instrumentation
Reviewers: eugenis
Subscribers: kubamracek, #sanitizers, llvm-commits
Tags: #sanitizers, #llvm
Differential Revision: https://reviews.llvm.org/D61393
llvm-svn: 359731
This reverts commit 1bcdbd68616dc7f8debe126caafef7a7242a0e6b.
It's been reported that some bots are failing with this change with CMake
error like:
```
CMake Error at /b/s/w/ir/k/llvm-project/compiler-rt/cmake/config-ix.cmake:177 (message):
Unsupported architecture: arm64
Call Stack (most recent call first):
/b/s/w/ir/k/llvm-project/compiler-rt/cmake/config-ix.cmake:216 (get_target_flags_for_arch)
/b/s/w/ir/k/llvm-project/compiler-rt/test/tsan/CMakeLists.txt:78 (get_test_cflags_for_apple_platform)
```
I'm reverting the patch now to unbreak builds. I will investigate properly when time permits.
rdar://problem/50124489
llvm-svn: 359327
platforms.
The main problem here is that `-*-version_min=` was not being passed to
the compiler when building test cases. This can cause problems when
testing on devices running older OSs because Clang would previously
assume the minimum deployment target is the the latest OS in the SDK
which could be much newer than what the device is running.
Previously the generated value looked like this:
`-arch arm64 -isysroot
<path_to_xcode>/Contents/Developer/Platforms/iPhoneOS.platform/Developer/SDKs/iPhoneOS12.1.sdk`
With this change it now looks like:
`-arch arm64 -stdlib=libc++ -miphoneos-version-min=8.0 -isysroot
<path_to_xcode>/Contents/Developer/Platforms/iPhoneOS.platform/Developer/SDKs/iPhoneOS12.1.sdk`
This mirrors the setting of `config.target_cflags` on macOS.
This change is made for ASan, LibFuzzer, TSan, and UBSan.
To implement this a new `get_test_cflags_for_apple_platform()` function
has been added that when given an Apple platform name and architecture
returns a string containing the C compiler flags to use when building
tests. This also calls a new helper function `is_valid_apple_platform()`
that validates Apple platform names.
rdar://problem/50124489
Differential Revision: https://reviews.llvm.org/D58578
llvm-svn: 359305
Summary:
Apparently, it makes a difference on where a block lives depending on if
it's passed "inline" versus assigned and then passed via a variable.
Both tests in this commit now give a signal, if `Block_copy` is used in
`dispatch_sync`.
Since these tests use different mechanisms (Objective-C retain versus
C++ copy constructor) as proxies to observe if the block was copied, we
should keep both of them.
Commit, that first avoided the unnecessary copy:
faef7d034a
Subscribers: kubamracek, #sanitizers, llvm-commits
Tags: #sanitizers, #llvm
Differential Revision: https://reviews.llvm.org/D60639
llvm-svn: 358469