Current interface assumes that Go calls ProcWire/ProcUnwire
to establish the association between thread and proc.
With the wisdom of hindsight, this interface does not work
very well. I had to sprinkle Go scheduler with wire/unwire
calls, and any mistake leads to hard to debug crashes.
This is not something one wants to maintian.
Fortunately, there is a simpler solution. We can ask Go
runtime as to what is the current Processor, and that
question is very easy to answer on Go side.
Switch to such interface.
llvm-svn: 267703
This is reincarnation of http://reviews.llvm.org/D17648 with the bug fix pointed out by Adhemerval (zatrazz).
Currently ThreadState holds both logical state (required for race-detection algorithm, user-visible)
and physical state (various caches, most notably malloc cache). Move physical state in a new
Process entity. Besides just being the right thing from abstraction point of view, this solves several
problems:
Cache everything on P level in Go. Currently we cache on a mix of goroutine and OS thread levels.
This unnecessary increases memory consumption.
Properly handle free operations in Go. Frees are issue by GC which don't have goroutine context.
As the result we could not do anything more than just clearing shadow. For example, we leaked
sync objects and heap block descriptors.
This will allow to get rid of libc malloc in Go (now we have Processor context for internal allocator cache).
This in turn will allow to get rid of dependency on libc entirely.
Potentially we can make Processor per-CPU in C++ mode instead of per-thread, which will
reduce resource consumption.
The distinction between Thread and Processor is currently used only by Go, C++ creates Processor per OS thread,
which is equivalent to the current scheme.
llvm-svn: 267678
On OS X 10.11+, we have "automatic interceptors", so we don't need to use DYLD_INSERT_LIBRARIES when launching instrumented programs. However, non-instrumented programs that load TSan late (e.g. via dlopen) are currently broken, as TSan will still try to initialize, but the program will crash/hang at random places (because the interceptors don't work). This patch adds an explicit check that interceptors are working, and if not, it aborts and prints out an error message suggesting to explicitly use DYLD_INSERT_LIBRARIES.
TSan unit tests run with a statically linked runtime, where interceptors don't work. To avoid aborting the process in this case, the patch replaces `DisableReexec()` with a weak `ReexecDisabled()` function which is defined to return true in unit tests.
Differential Revision: http://reviews.llvm.org/D18212
llvm-svn: 263695
Currently ThreadState holds both logical state (required for race-detection algorithm, user-visible)
and physical state (various caches, most notably malloc cache). Move physical state in a new
Process entity. Besides just being the right thing from abstraction point of view, this solves several
problems:
1. Cache everything on P level in Go. Currently we cache on a mix of goroutine and OS thread levels.
This unnecessary increases memory consumption.
2. Properly handle free operations in Go. Frees are issue by GC which don't have goroutine context.
As the result we could not do anything more than just clearing shadow. For example, we leaked
sync objects and heap block descriptors.
3. This will allow to get rid of libc malloc in Go (now we have Processor context for internal allocator cache).
This in turn will allow to get rid of dependency on libc entirely.
4. Potentially we can make Processor per-CPU in C++ mode instead of per-thread, which will
reduce resource consumption.
The distinction between Thread and Processor is currently used only by Go, C++ creates Processor per OS thread,
which is equivalent to the current scheme.
llvm-svn: 262037
With COMPILER_RT_INCLUDE_TESTS turned ON and in a cross compiling
environment, the unit tests fail to link. This patch does the following changes
>Rename COMPILER_RT_TEST_CFLAGS to COMPILER_RT_UNITTEST_CFLAGS to reflect the
way it's used.
>Add COMPILER_RT_TEST_COMPILER_CFLAGS to COMPILER_RT_UNITTEST_CFLAGS so
that cross-compiler would be able to build/compile the unit tests
>Add COMPILER_RT_UNITTEST_LINKFLAGS to COMPILER_RT_UNITTEST_CFLAGS so
that cross-compiler would be able to link the unit tests (if needed)
Differential Revision: http://reviews.llvm.org/D16165
llvm-svn: 257783
This broke the build. For example, from
http://lab.llvm.org:8011/builders/clang-cmake-aarch64-full/builds/1191/steps/cmake%20stage%201/logs/stdio:
-- Compiler-RT supported architectures: aarch64
CMake Error at projects/compiler-rt/cmake/Modules/AddCompilerRT.cmake:170 (string):
string sub-command REPLACE requires at least four arguments.
Call Stack (most recent call first):
projects/compiler-rt/lib/CMakeLists.txt:4 (include)
llvm-svn: 257694
environment, the unit tests fail to link. This patch does the following changes
>Rename COMPILER_RT_TEST_CFLAGS to COMPILER_RT_UNITTEST_CFLAGS to reflect the
way it's used.
>Add COMPILER_RT_TEST_COMPILER_CFLAGS to COMPILER_RT_UNITTEST_CFLAGS so that
cross-compiler would be able to build/compile the unit tests
>Add COMPILER_RT_UNITTEST_LINKFLAGS to COMPILER_RT_UNITTEST_CFLAGS so that
cross-compiler would be able to link the unit tests (if needed)
Differential Revision:http://reviews.llvm.org/D15082
llvm-svn: 257686
On OS X, interceptors don't work in unit tests, so calloc() calls the system allocator. We need to use user_calloc() instead.
Differential Revision: http://reviews.llvm.org/D14918
llvm-svn: 253979
We need to call the intercepted version of pthread_detach. Secondly, PTHREAD_CREATE_JOINABLE and PTHREAD_CREATE_DETACHED are not 0 and 1 on OS X, so we need to properly pass these constants and not just a bool.
Differential Revision: http://reviews.llvm.org/D14837
llvm-svn: 253775
The tsan_test_util_posix.cc implementation of mutexes call pthread APIs directly, which on OS X don't end up calling the intercepted versions and we miss the synchronization. This patch changes the unit tests to directly call the intercepted versions. This fixes several test failures on OS X.
Differential Revision: http://reviews.llvm.org/D14835
llvm-svn: 253774
On OS X, this unit test (ThreadSpecificDtors) fails, because the new and delete operators actually call the overridden operators, which end up using TLVs and crash. Since C++'s new and delete is not important in this test, let's just replace them with a local variable. This fixes the test on OS X.
Differential Revision: http://reviews.llvm.org/D14826
llvm-svn: 253583
The TSan unit test build currently fails if we're also building the iOS parts of compiler-rt, because `TSAN_SUPPORTED_ARCH` contains ARM64. For unit tests, we need to filter this only to host architecture(s).
Differential Revision: http://reviews.llvm.org/D14604
llvm-svn: 252873
Summary:
Merge "exitcode" flag from ASan, LSan, TSan and "exit_code" from MSan
into one entity. Additionally, make sure sanitizer_common now uses the
value of common_flags()->exitcode when dying on error, so that this
flag will automatically work for other sanitizers (UBSan and DFSan) as
well.
User-visible changes:
* "exit_code" MSan runtime flag is now deprecated. If explicitly
specified, this flag will take precedence over "exitcode".
The users are encouraged to migrate to the new version.
* __asan_set_error_exit_code() and __msan_set_exit_code() functions
are removed. With few exceptions, we don't support changing runtime
flags during program execution - we can't make them thread-safe.
The users should use __sanitizer_set_death_callback()
that would call _exit() with proper exit code instead.
* Plugin tools (LSan and UBSan) now inherit the exit code of the parent
tool. In particular, this means that ASan would now crash the program
with exit code "1" instead of "23" if it detects leaks.
Reviewers: kcc, eugenis
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D12120
llvm-svn: 245734
This patch enabled TSAN for aarch64 with 39-bit VMA layout. As defined by
tsan_platform.h the layout used is:
0000 4000 00 - 0200 0000 00: main binary
2000 0000 00 - 4000 0000 00: shadow memory
4000 0000 00 - 5000 0000 00: metainfo
5000 0000 00 - 6000 0000 00: -
6000 0000 00 - 6200 0000 00: traces
6200 0000 00 - 7d00 0000 00: -
7d00 0000 00 - 7e00 0000 00: heap
7e00 0000 00 - 7fff ffff ff: modules and main thread stack
Which gives it about 8GB for main binary, 4GB for heap and 8GB for
modules and main thread stack.
Most of tests are passing, with the exception of:
* ignore_lib0, ignore_lib1, ignore_lib3 due a kernel limitation for
no support to make mmap page non-executable.
* longjmp tests due missing specialized assembly routines.
These tests are xfail for now.
The only tsan issue still showing is:
rtl/TsanRtlTest/Posix.ThreadLocalAccesses
Which still required further investigation. The test is disable for
aarch64 for now.
llvm-svn: 244055
Summary:
Use CMake's cmake_parse_arguments() instead.
It's called in a slightly different way, but supports all our use cases.
It's in CMake 2.8.8, which is our minimum supported version.
CMake 3.0 doc (roughly the same. No direct link to 2.8.8 doc):
http://www.cmake.org/cmake/help/v3.0/module/CMakeParseArguments.html?highlight=cmake_parse_arguments
Since I was already changing these calls, I changed ARCH and LIB into
ARCHS and LIBS to make it more clear that they're lists of arguments.
Reviewers: eugenis, samsonov, beanz
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D10529
llvm-svn: 240120
TSAN_SHADOW_COUNT is defined to 4 in all environments.
Other values of TSAN_SHADOW_COUNT were never tested and
were broken by recent changes to shadow mapping.
Remove it as there is no reason to fix nor maintain it.
llvm-svn: 226466
Summary:
This change removes `__tsan::StackTrace` class. There are
now three alternatives:
# Lightweight `__sanitizer::StackTrace`, which doesn't own a buffer
of PCs. It is used in functions that need stack traces in read-only
mode, and helps to prevent unnecessary allocations/copies (e.g.
for StackTraces fetched from StackDepot).
# `__sanitizer::BufferedStackTrace`, which stores buffer of PCs in
a constant array. It is used in TraceHeader (non-Go version)
# `__tsan::VarSizeStackTrace`, which owns buffer of PCs, dynamically
allocated via TSan internal allocator.
Test Plan: compiler-rt test suite
Reviewers: dvyukov, kcc
Reviewed By: kcc
Subscribers: llvm-commits, kcc
Differential Revision: http://reviews.llvm.org/D6004
llvm-svn: 221194
Vector clocks is the most actively allocated object in tsan runtime.
Current internal allocator is not scalable enough to handle allocation
of clocks in scalable way (too small caches). This changes transforms
clocks to 2-level array with 512-byte blocks. Since all blocks are of
the same size, it's possible to cache them more efficiently in per-thread caches.
llvm-svn: 214912
The bug happens in the following case:
Mutex is located at heap block beginning,
when we call MutexDestroy, s->next is set to 0,
so free can't find the MBlock related to the block.
llvm-svn: 212531
Introduce new public header <sanitizer/allocator_interface.h> and a set
of functions __sanitizer_get_ownership(), __sanitizer_malloc_hook() etc.
that will eventually replace their tool-specific equivalents
(__asan_get_ownership(), __msan_get_ownership() etc.). Tool-specific
functions are now deprecated and implemented as stubs redirecting
to __sanitizer_ versions (which are implemented differently in each tool).
Replace all uses of __xsan_ versions with __sanitizer_ versions in unit
and lit tests.
llvm-svn: 212469
The former used to crash with a null deref if it was given a not owned pointer,
while the latter returned 0. Now they both return 0. This is still not the best possible
behavior: it is better to print an error report with a stack trace, pointing
to the error in user code, as we do in ASan.
llvm-svn: 212112
The optimization is two-fold:
First, the algorithm now uses SSE instructions to
handle all 4 shadow slots at once. This makes processing
faster.
Second, if shadow contains the same access, we do not
store the event into trace. This increases effective
trace size, that is, tsan can remember up to 10x more
previous memory accesses.
Perofrmance impact:
Before:
[ OK ] DISABLED_BENCH.Mop8Read (2461 ms)
[ OK ] DISABLED_BENCH.Mop8Write (1836 ms)
After:
[ OK ] DISABLED_BENCH.Mop8Read (1204 ms)
[ OK ] DISABLED_BENCH.Mop8Write (976 ms)
But this measures only fast-path.
On large real applications the speedup is ~20%.
Trace size impact:
On app1:
Memory accesses : 1163265870
Including same : 791312905 (68%)
on app2:
Memory accesses : 166875345
Including same : 150449689 (90%)
90% of filtered events means that trace size is effectively 10x larger.
llvm-svn: 209897
The new storage (MetaMap) is based on direct shadow (instead of a hashmap + per-block lists).
This solves a number of problems:
- eliminates quadratic behaviour in SyncTab::GetAndLock (https://code.google.com/p/thread-sanitizer/issues/detail?id=26)
- eliminates contention in SyncTab
- eliminates contention in internal allocator during allocation of sync objects
- removes a bunch of ad-hoc code in java interface
- reduces java shadow from 2x to 1/2x
- allows to memorize heap block meta info for Java and Go
- allows to cleanup sync object meta info for Go
- which in turn enabled deadlock detector for Go
llvm-svn: 209810
Make vector clock operations O(1) for several important classes of use cases.
See comments for details.
Below are stats from a large server app, 77% of all clock operations are handled as O(1).
Clock acquire : 25983645
empty clock : 6288080
fast from release-store : 14917504
contains my tid : 4515743
repeated (fast) : 2141428
full (slow) : 2636633
acquired something : 1426863
Clock release : 2544216
resize : 6241
fast1 : 197693
fast2 : 1016293
fast3 : 2007
full (slow) : 1797488
was acquired : 709227
clear tail : 1
last overflow : 0
Clock release store : 3446946
resize : 200516
fast : 469265
slow : 2977681
clear tail : 0
Clock acquire-release : 820028
llvm-svn: 204656
Make behavior introduced in r202820 conditional (under legacy_pthread_cond flag).
The new issue that we've hit with the satellite pthread_cond_t struct is
that pthread_condattr_getpshared does not work (satellite data is not shared between processes).
The idea is that most processes do not use pthread 2.2.5.
The rare ones that use (2.2.5 is dated by 2002) must specify legacy_pthread_cond=1
on their own risk.
llvm-svn: 204032