Add ThreadClock:: global_acquire_ which is the last time another thread
has done a global acquire of this thread's clock.
It helps to avoid problem described in:
https://github.com/golang/go/issues/39186
See test/tsan/java_finalizer2.cpp for a regression test.
Note the failuire is _extremely_ hard to hit, so if you are trying
to reproduce it, you may want to run something like:
$ go get golang.org/x/tools/cmd/stress
$ stress -p=64 ./a.out
The crux of the problem is roughly as follows.
A number of O(1) optimizations in the clocks algorithm assume proper
transitive cumulative propagation of clock values. The AcquireGlobal
operation may produce an inconsistent non-linearazable view of
thread clocks. Namely, it may acquire a later value from a thread
with a higher ID, but fail to acquire an earlier value from a thread
with a lower ID. If a thread that executed AcquireGlobal then releases
to a sync clock, it will spoil the sync clock with the inconsistent
values. If another thread later releases to the sync clock, the optimized
algorithm may break.
The exact sequence of events that leads to the failure.
- thread 1 executes AcquireGlobal
- thread 1 acquires value 1 for thread 2
- thread 2 increments clock to 2
- thread 2 releases to sync object 1
- thread 3 at time 1
- thread 3 acquires from sync object 1
- thread 1 acquires value 1 for thread 3
- thread 1 releases to sync object 2
- sync object 2 clock has 1 for thread 2 and 1 for thread 3
- thread 3 releases to sync object 2
- thread 3 sees value 1 in the clock for itself
and decides that it has already released to the clock
and did not acquire anything from other threads after that
(the last_acquire_ check in release operation)
- thread 3 does not update the value for thread 2 in the clock from 1 to 2
- thread 4 acquires from sync object 2
- thread 4 detects a false race with thread 2
as it should have been synchronized with thread 2 up to time 2,
but because of the broken clock it is now synchronized only up to time 1
The global_acquire_ value helps to prevent this scenario.
Namely, thread 3 will not trust any own clock values up to global_acquire_
for the purposes of the last_acquire_ optimization.
Reviewed-in: https://reviews.llvm.org/D80474
Reported-by: nvanbenschoten (Nathan VanBenschoten)
Some testcases are unexpectedly passing with NPM.
This is because the target functions are inlined in NPM.
I think we should add noinline attribute to keep these test points.
Reviewed By: vitalybuka
Differential Revision: https://reviews.llvm.org/D79648
A few testcases are still using deprecated options.
warning: argument '-fsanitize-coverage=[func|bb|edge]' is deprecated,
use '-fsanitize-coverage=[func|bb|edge],[trace-pc-guard|trace-pc]'
instead [-Wdeprecated]
Reviewed By: vitalybuka
Differential Revision: https://reviews.llvm.org/D79741
Per target runtime dir may change the suffix of shared libs.
We can not assume we are always building with per_target_runtime_dir on.
Reviewed By: cryptoad
Differential Revision: https://reviews.llvm.org/D80243
Summary:
The previous code tries to strip out parentheses and anything in between
them. I'm guessing the idea here was to try to drop any listed arguments
for the function being symbolized. Unfortunately this approach is broken
in several ways.
* Templated functions may contain parentheses. The existing approach
messes up these names.
* In C++ argument types are part of a function's signature for the
purposes of overloading so removing them could be confusing.
Fix this simply by not trying to adjust the function name that comes
from `atos`.
A test case is included.
Without the change the test case produced output like:
```
WRITE of size 4 at 0x6060000001a0 thread T0
#0 0x10b96614d in IntWrapper<void >::operator=> const&) asan-symbolize-templated-cxx.cpp:10
#1 0x10b960b0e in void writeToA<IntWrapper<void > >>) asan-symbolize-templated-cxx.cpp:30
#2 0x10b96bf27 in decltype>)>> >)) std::__1::__invoke<void >), IntWrapper<void > >>), IntWrapper<void >&&) type_traits:4425
#3 0x10b96bdc1 in void std::__1::__invoke_void_return_wrapper<void>::__call<void >), IntWrapper<void > >>), IntWrapper<void >&&) __functional_base:348
#4 0x10b96bd71 in std::__1::__function::__alloc_func<void >), std::__1::allocator<void >)>, void >)>::operator>&&) functional:1533
#5 0x10b9684e2 in std::__1::__function::__func<void >), std::__1::allocator<void >)>, void >)>::operator>&&) functional:1707
#6 0x10b96cd7b in std::__1::__function::__value_func<void >)>::operator>&&) const functional:1860
#7 0x10b96cc17 in std::__1::function<void >)>::operator>) const functional:2419
#8 0x10b960ca6 in Foo<void >), IntWrapper<void > >::doCall>) asan-symbolize-templated-cxx.cpp:44
#9 0x10b96088b in main asan-symbolize-templated-cxx.cpp:54
#10 0x7fff6ffdfcc8 in start (in libdyld.dylib) + 0
```
Note how the symbol names for the frames are messed up (e.g. #8, #1).
With the patch the output looks like:
```
WRITE of size 4 at 0x6060000001a0 thread T0
#0 0x10005214d in IntWrapper<void (int)>::operator=(IntWrapper<void (int)> const&) asan-symbolize-templated-cxx.cpp:10
#1 0x10004cb0e in void writeToA<IntWrapper<void (int)> >(IntWrapper<void (int)>) asan-symbolize-templated-cxx.cpp:30
#2 0x100057f27 in decltype(std::__1::forward<void (*&)(IntWrapper<void (int)>)>(fp)(std::__1::forward<IntWrapper<void (int)> >(fp0))) std::__1::__invoke<void (*&)(IntWrapper<void (int)>), IntWrapper<void (int)> >(void (*&)(IntWrapper<void (int)>), IntWrapper<void (int)>&&) type_traits:4425
#3 0x100057dc1 in void std::__1::__invoke_void_return_wrapper<void>::__call<void (*&)(IntWrapper<void (int)>), IntWrapper<void (int)> >(void (*&)(IntWrapper<void (int)>), IntWrapper<void (int)>&&) __functional_base:348
#4 0x100057d71 in std::__1::__function::__alloc_func<void (*)(IntWrapper<void (int)>), std::__1::allocator<void (*)(IntWrapper<void (int)>)>, void (IntWrapper<void (int)>)>::operator()(IntWrapper<void (int)>&&) functional:1533
#5 0x1000544e2 in std::__1::__function::__func<void (*)(IntWrapper<void (int)>), std::__1::allocator<void (*)(IntWrapper<void (int)>)>, void (IntWrapper<void (int)>)>::operator()(IntWrapper<void (int)>&&) functional:1707
#6 0x100058d7b in std::__1::__function::__value_func<void (IntWrapper<void (int)>)>::operator()(IntWrapper<void (int)>&&) const functional:1860
#7 0x100058c17 in std::__1::function<void (IntWrapper<void (int)>)>::operator()(IntWrapper<void (int)>) const functional:2419
#8 0x10004cca6 in Foo<void (IntWrapper<void (int)>), IntWrapper<void (int)> >::doCall(IntWrapper<void (int)>) asan-symbolize-templated-cxx.cpp:44
#9 0x10004c88b in main asan-symbolize-templated-cxx.cpp:54
#10 0x7fff6ffdfcc8 in start (in libdyld.dylib) + 0
```
rdar://problem/58887175
Reviewers: kubamracek, yln
Subscribers: #sanitizers, llvm-commits
Tags: #sanitizers
Differential Revision: https://reviews.llvm.org/D79597
Fixes PR45673
The commit 9180c14fe4 (D76206) resolved only a part of the problem
of concurrent .gcda file creation. It ensured that only one process
creates the file but did not ensure that the process locks the
file first. If not, the process which created the file may clobber
the contents written by a process which locked the file first.
This is the cause of PR45673.
This commit prevents the clobbering by revising the assumption
that a process which creates the file locks the file first.
Regardless of file creation, a process which locked the file first
uses fwrite (new_file==1) and other processes use mmap (new_file==0).
I also tried to keep the creation/first-lock process same by using
mkstemp/link/unlink but the code gets long. This commit is more
simple.
Note: You may be confused with other changes which try to resolve
concurrent file access. My understanding is (may not be correct):
D76206: Resolve race of .gcda file creation (but not lock)
This one: Resolve race of .gcda file creation and lock
D54599: Same as D76206 but abandoned?
D70910: Resolve race of multi-threaded counter flushing
D74953: Resolve counter sharing between parent/children processes
D78477: Revision of D74953
Differential Revision: https://reviews.llvm.org/D79556
Summary:
Fix hwasan allocator not respecting the requested alignment when it is
higher than a page, but still within primary (i.e. [2048, 65536]).
Reviewers: pcc, hctim, cryptoad
Subscribers: #sanitizers, llvm-commits
Tags: #sanitizers
Differential Revision: https://reviews.llvm.org/D79656
https://reviews.llvm.org/D63616 added `-fsanitize-coverage-whitelist`
and `-fsanitize-coverage-blacklist` for clang.
However, it was done only for legacy pass manager.
This patch enable it for new pass manager as well.
Reviewed By: vitalybuka
Differential Revision: https://reviews.llvm.org/D79653
Summary:
This is necessary to handle calls to free() after __hwasan_thread_exit,
which is possible in glibc.
Also, add a null check to GetCurrentThread, otherwise the logic in
GetThreadByBufferAddress turns it into a non-null value. This means that
all of the checks for GetCurrentThread() != nullptr do not have any
effect at all right now!
Reviewers: pcc, hctim
Subscribers: #sanitizers, llvm-commits
Tags: #sanitizers
Differential Revision: https://reviews.llvm.org/D79608
Summary: The new pass manager symbolizes the location as ~Simple instead of Simple::~Simple.
Reviewers: rnk, leonardchan, vitalybuka
Subscribers: #sanitizers
Tags: #sanitizers
Differential Revision: https://reviews.llvm.org/D79594
Summary:
When forking in several threads, the counters were written out in using the same global static variables (see GCDAProfiling.c): that leads to crashes.
So when there is a fork, the counters are resetted in the child process and they will be dumped at exit using the interprocess file locking.
When there is an exec, the counters are written out and in case of failures they're resetted.
Reviewers: jfb, vsk, marco-c, serge-sans-paille
Reviewed By: marco-c, serge-sans-paille
Subscribers: llvm-commits, serge-sans-paille, dmajor, cfe-commits, hiraditya, dexonsmith, #sanitizers, marco-c, sylvestre.ledru
Tags: #sanitizers, #clang, #llvm
Differential Revision: https://reviews.llvm.org/D78477
It looks like some bots are failing with os log not giving any
output. This might be due to the system under test being heavy
load so the 2 minute window might not be large enough. This
patch makes the window larger in the hope that this test will
be more reliable.
rdar://problem/62141527
This is the first patch in a series to add support for the AVR target.
This patch includes changes to make compiler-rt more target independent
by not relying on the width of an int or long.
Differential Revision: https://reviews.llvm.org/D78662
* Changing source lines seems to cause us to hit rdar://problem/62132428.
* Even if I workaround the above issue sometimes the source line in the dylib reported by atos is off by one.
It's simpler to just disable the test for now.
rdar://problem/61793759
We can use `simctl spawn --standalone` to enable running tests without
the need for an already-booted simulator instance. This also side-steps
the problem of not having a good place to shutdown the instance after
we are finished with testing.
rdar://58118442
Reviewed By: delcypher
Differential Revision: https://reviews.llvm.org/D78409
Summary:
Due to sandbox restrictions in the recent versions of the simulator runtime the
atos program is no longer able to access the task port of a parent process
without additional help.
This patch fixes this by registering a task port for the parent process
before spawning atos and also tells atos to look for this by setting
a special environment variable.
This patch is based on an Apple internal fix (rdar://problem/43693565) that
unfortunately contained a bug (rdar://problem/58789439) because it used
setenv() to set the special environment variable. This is not safe because in
certain circumstances this can trigger a call to realloc() which can fail
during symbolization leading to deadlock. A test case is included that captures
this problem.
The approach used to set the necessary environment variable is as
follows:
1. Calling `putenv()` early during process init (but late enough that
malloc/realloc works) to set a dummy value for the environment variable.
2. Just before `atos` is spawned the storage for the environment
variable is modified to contain the correct PID.
A flaw with this approach is that if the application messes with the
atos environment variable (i.e. unsets it or changes it) between the
time its set and the time we need it then symbolization will fail. We
will ignore this issue for now but a `DCHECK()` is included in the patch
that documents this assumption but doesn't check it at runtime to avoid
calling `getenv()`.
The issue reported in rdar://problem/58789439 manifested as a deadlock
during symbolization in the following situation:
1. Before TSan detects an issue something outside of the runtime calls
setenv() that sets a new environment variable that wasn't previously
set. This triggers a call to malloc() to allocate a new environment
array. This uses TSan's normal user-facing allocator. LibC stores this
pointer for future use later.
2. TSan detects an issue and tries to launch the symbolizer. When we are in the
symbolizer we switch to a different (internal allocator) and then we call
setenv() to set a new environment variable. When this happen setenv() sees
that it needs to make the environment array larger and calls realloc() on the
existing enviroment array because it remembers that it previously allocated
memory for it. Calling realloc() fails here because it is being called on a
pointer its never seen before.
The included test case closely reproduces the originally reported
problem but it doesn't replicate the `((kBlockMagic)) ==
((((u64*)addr)[0])` assertion failure exactly. This is due to the way
TSan's normal allocator allocates the environment array the first time
it is allocated. In the test program addr[0] accesses an inaccessible
page and raises SIGBUS. If TSan's SIGBUS signal handler is active, the
signal is caught and symbolication is attempted again which results in
deadlock.
In the originally reported problem the pointer is successfully derefenced but
then the assert fails due to the provided pointer not coming from the active
allocator. When the assert fails TSan tries to symbolicate the stacktrace while
already being in the middle of symbolication which results in deadlock.
rdar://problem/58789439
Reviewers: kubamracek, yln
Subscribers: jfb, #sanitizers, llvm-commits
Tags: #sanitizers
Differential Revision: https://reviews.llvm.org/D78179
Summary:
These tests pass with clang, but fail if gcc was used.
gcc build creates similar but not the same stacks.
Reviewers: vitalybuka
Reviewed By: vitalybuka
Subscribers: dvyukov, llvm-commits, #sanitizers
Tags: #sanitizers
Differential Revision: https://reviews.llvm.org/D78114
Summary:
Previously `AtosSymbolizer` would set the PID to examine in the
constructor which is called early on during sanitizer init. This can
lead to incorrect behaviour in the case of a fork() because if the
symbolizer is launched in the child it will be told examine the parent
process rather than the child.
To fix this the PID is determined just before the symbolizer is
launched.
A test case is included that triggers the buggy behaviour that existed
prior to this patch. The test observes the PID that `atos` was called
on. It also examines the symbolized stacktrace. Prior to this patch
`atos` failed to symbolize the stacktrace giving output that looked
like...
```
#0 0x100fc3bb5 in __sanitizer_print_stack_trace asan_stack.cpp:86
#1 0x10490dd36 in PrintStack+0x56 (/path/to/print-stack-trace-in-code-loaded-after-fork.cpp.tmp_shared_lib.dylib:x86_64+0xd36)
#2 0x100f6f986 in main+0x4a6 (/path/to/print-stack-trace-in-code-loaded-after-fork.cpp.tmp_loader:x86_64+0x100001986)
#3 0x7fff714f1cc8 in start+0x0 (/usr/lib/system/libdyld.dylib:x86_64+0x1acc8)
```
After this patch stackframes `#1` and `#2` are fully symbolized.
This patch is also a pre-requisite refactor for rdar://problem/58789439.
Reviewers: kubamracek, yln
Subscribers: #sanitizers, llvm-commits
Tags: #sanitizers
Differential Revision: https://reviews.llvm.org/D77623
Summary:
In preparation for writing a test for a bug fix we need to be able to
see the command used to launch the symbolizer process. This feature
will likely be useful for debugging how the Sanitizers use the
symbolizer in general.
This patch causes the command line used to launch the process to be
shown at verbosity level 3 and higher.
A small test case is included.
Reviewers: kubamracek, yln, vitalybuka, eugenis, kcc
Subscribers: #sanitizers, llvm-commits
Tags: #sanitizers
Differential Revision: https://reviews.llvm.org/D77622
For targets where char is unsigned (like PowerPC), something like
char c = fgetc(...) will never produce a char that will compare
equal to EOF so this loop does not terminate.
Change the type to int (which appears to be the POSIX return type
for fgetc).
This allows the test case to terminate normally on PPC.
Buildbots say:
[126/127] Running lint check for sanitizer sources...
FAILED: projects/compiler-rt/lib/CMakeFiles/SanitizerLintCheck
cd /home/buildbots/ppc64be-clang-multistage-test/clang-ppc64be-multistage/stage1/projects/compiler-rt/lib && env LLVM_CHECKOUT=/home/buildbots/ppc64be-clang-multistage-test/clang-ppc64be-multistage/llvm/llvm SILENT=1 TMPDIR= PYTHON_EXECUTABLE=/usr/bin/python COMPILER_RT=/home/buildbots/ppc64be-clang-multistage-test/clang-ppc64be-multistage/llvm/compiler-rt /home/buildbots/ppc64be-clang-multistage-test/clang-ppc64be-multistage/llvm/compiler-rt/lib/sanitizer_common/scripts/check_lint.sh
/home/buildbots/ppc64be-clang-multistage-test/clang-ppc64be-multistage/llvm/compiler-rt/test/tsan/fiber_cleanup.cpp:71: Could not find a newline character at the end of the file. [whitespace/ending_newline] [5]
ninja: build stopped: subcommand failed.
Somehow this check is not part of 'ninja check-tsan'.
When creating and destroying fibers in tsan a thread state is created and destroyed. Currently, a memory mapping is leaked with each fiber (in __tsan_destroy_fiber). This causes applications with many short running fibers to crash or hang because of linux vm.max_map_count.
The root of this is that ThreadState holds a pointer to ThreadSignalContext for handling signals. The initialization and destruction of it is tied to platform specific events in tsan_interceptors_posix and missed when destroying a fiber (specifically, SigCtx is used to lazily create the ThreadSignalContext in tsan_interceptors_posix). This patch cleans up the memory by makinh the ThreadState create and destroy the ThreadSignalContext.
The relevant code causing the leak with fibers is the fiber destruction:
void FiberDestroy(ThreadState *thr, uptr pc, ThreadState *fiber) {
FiberSwitchImpl(thr, fiber);
ThreadFinish(fiber);
FiberSwitchImpl(fiber, thr);
internal_free(fiber);
}
Author: Florian
Reviewed-in: https://reviews.llvm.org/D76073
Summary:
This commit adds two command-line options to clang.
These options let the user decide which functions will receive SanitizerCoverage instrumentation.
This is most useful in the libFuzzer use case, where it enables targeted coverage-guided fuzzing.
Patch by Yannis Juglaret of DGA-MI, Rennes, France
libFuzzer tests its target against an evolving corpus, and relies on SanitizerCoverage instrumentation to collect the code coverage information that drives corpus evolution. Currently, libFuzzer collects such information for all functions of the target under test, and adds to the corpus every mutated sample that finds a new code coverage path in any function of the target. We propose instead to let the user specify which functions' code coverage information is relevant for building the upcoming fuzzing campaign's corpus. To this end, we add two new command line options for clang, enabling targeted coverage-guided fuzzing with libFuzzer. We see targeted coverage guided fuzzing as a simple way to leverage libFuzzer for big targets with thousands of functions or multiple dependencies. We publish this patch as work from DGA-MI of Rennes, France, with proper authorization from the hierarchy.
Targeted coverage-guided fuzzing can accelerate bug finding for two reasons. First, the compiler will avoid costly instrumentation for non-relevant functions, accelerating fuzzer execution for each call to any of these functions. Second, the built fuzzer will produce and use a more accurate corpus, because it will not keep the samples that find new coverage paths in non-relevant functions.
The two new command line options are `-fsanitize-coverage-whitelist` and `-fsanitize-coverage-blacklist`. They accept files in the same format as the existing `-fsanitize-blacklist` option <https://clang.llvm.org/docs/SanitizerSpecialCaseList.html#format>. The new options influence SanitizerCoverage so that it will only instrument a subset of the functions in the target. We explain these options in detail in `clang/docs/SanitizerCoverage.rst`.
Consider now the woff2 fuzzing example from the libFuzzer tutorial <https://github.com/google/fuzzer-test-suite/blob/master/tutorial/libFuzzerTutorial.md>. We are aware that we cannot conclude much from this example because mutating compressed data is generally a bad idea, but let us use it anyway as an illustration for its simplicity. Let us use an empty blacklist together with one of the three following whitelists:
```
# (a)
src:*
fun:*
# (b)
src:SRC/*
fun:*
# (c)
src:SRC/src/woff2_dec.cc
fun:*
```
Running the built fuzzers shows how many instrumentation points the compiler adds, the fuzzer will output //XXX PCs//. Whitelist (a) is the instrument-everything whitelist, it produces 11912 instrumentation points. Whitelist (b) focuses coverage to instrument woff2 source code only, ignoring the dependency code for brotli (de)compression; it produces 3984 instrumented instrumentation points. Whitelist (c) focuses coverage to only instrument functions in the main file that deals with WOFF2 to TTF conversion, resulting in 1056 instrumentation points.
For experimentation purposes, we ran each fuzzer approximately 100 times, single process, with the initial corpus provided in the tutorial. We let the fuzzer run until it either found the heap buffer overflow or went out of memory. On this simple example, whitelists (b) and (c) found the heap buffer overflow more reliably and 5x faster than whitelist (a). The average execution times when finding the heap buffer overflow were as follows: (a) 904 s, (b) 156 s, and (c) 176 s.
We explain these results by the fact that WOFF2 to TTF conversion calls the brotli decompression algorithm's functions, which are mostly irrelevant for finding bugs in WOFF2 font reconstruction but nevertheless instrumented and used by whitelist (a) to guide fuzzing. This results in longer execution time for these functions and a partially irrelevant corpus. Contrary to whitelist (a), whitelists (b) and (c) will execute brotli-related functions without instrumentation overhead, and ignore new code paths found in them. This results in faster bug finding for WOFF2 font reconstruction.
The results for whitelist (b) are similar to the ones for whitelist (c). Indeed, WOFF2 to TTF conversion calls functions that are mostly located in SRC/src/woff2_dec.cc. The 2892 extra instrumentation points allowed by whitelist (b) do not tamper with bug finding, even though they are mostly irrelevant, simply because most of these functions do not get called. We get a slightly faster average time for bug finding with whitelist (b), which might indicate that some of the extra instrumentation points are actually relevant, or might just be random noise.
Reviewers: kcc, morehouse, vitalybuka
Reviewed By: morehouse, vitalybuka
Subscribers: pratyai, vitalybuka, eternalsakura, xwlin222, dende, srhines, kubamracek, #sanitizers, lebedev.ri, hiraditya, cfe-commits, llvm-commits
Tags: #clang, #sanitizers, #llvm
Differential Revision: https://reviews.llvm.org/D63616
Looks like this test fails on Darwin x86_64 as well:
http://green.lab.llvm.org/green/job/clang-stage1-RA/8593/
Command Output (stderr):
--
fatal error: error in backend: Global variable '__sancov_gen_' has an invalid section specifier '__DATA,__sancov_bool_flag': mach-o section specifier requires a section whose length is between 1 and 16 characters.