llvm-project/compiler-rt/lib
Kostya Kortchinsky 4410e2c43f [scudo] Improve the scalability of the shared TSD model
Summary:
The shared TSD model in its current form doesn't scale. Here is an example of
rpc2-benchmark (with default parameters, which is threading heavy) on a 72-core
machines (defaulting to a `CompactSizeClassMap` and no Quarantine):
- with tcmalloc: 337K reqs/sec, peak RSS of 338MB;
- with scudo (exclusive): 321K reqs/sec, peak RSS of 637MB;
- with scudo (shared): 241K reqs/sec, peak RSS of 324MB.

This isn't great, since the exclusive model uses a lot of memory, while the
shared model doesn't even come close to be competitive.

This is mostly due to the fact that we are consistently scanning the TSD pool
starting at index 0 for an available TSD, which can result in a lot of failed
lock attempts, and touching some memory that needs not be touched.

This CL attempts to make things better in most situations:
- first, use a thread local variable on Linux (intead of pthread APIs) to store
  the current TSD in the shared model;
- move the locking boolean out of the TSD: this allows the compiler to use a
  register and potentially optimize out a branch instead of reading it from the
  TSD everytime (we also save a tiny bit of memory per TSD);
- 64-bit atomic operations on 32-bit ARM platforms happen to be expensive: so
  store the `Precedence` in a `uptr` instead of a `u64`. We lose some
  nanoseconds of precision and we'll wrap around at some point, but the benefit
  is worth it;
- change a `CHECK` to a `DCHECK`: this should never happen, but if something is
  ever terribly wrong, we'll crash on a near null AV if the TSD happens to be
  null;
- based on an idea by dvyukov@, we are implementing a bound random scan for
  an available TSD. This requires computing the coprimes for the number of TSDs,
  and attempting to lock up to 4 TSDs in an random order before falling back to
  the current one. This is obviously slightly more expansive when we have just
  2 TSDs (barely noticeable) but is otherwise beneficial. The `Precedence` still
  basically corresponds to the moment of the first contention on a TSD. To seed
  on random choice, we use the precedence of the current TSD since it is very
  likely to be non-zero (since we are in the slow path after a failed `tryLock`)

With those modifications, the benchmark yields to:
- with scudo (shared): 330K reqs/sec, peak RSS of 327MB.

So the shared model for this specific situation not only becomes competitive but
outperforms the exclusive model. I experimented with some values greater than 4
for the number of TSDs to attempt to lock and it yielded a decrease in QPS. Just
sticking with the current TSD is also a tad slower. Numbers on platforms with
less cores (eg: Android) remain similar.

Reviewers: alekseyshl, dvyukov, javed.absar

Reviewed By: alekseyshl, dvyukov

Subscribers: srhines, kristof.beyls, delcypher, llvm-commits, #sanitizers

Differential Revision: https://reviews.llvm.org/D47289

llvm-svn: 334410
2018-06-11 14:50:31 +00:00
..
BlocksRuntime [compiler-rt] Test commit: remove some trailing white spaces. 2017-08-25 19:36:30 +00:00
asan [asan, myriad] Use local pool for new/delete when ASan run-time is not up 2018-06-08 21:49:38 +00:00
builtins Revert "[cmake] [ARM] Check if VFP is supported before including any VFP builtins" 2018-05-24 21:36:27 +00:00
cfi [sanitizer] Build failures fixes post D45457 2018-04-16 16:58:34 +00:00
dfsan Add weak definitions of trace-cmp hooks to dfsan 2018-06-01 21:59:25 +00:00
esan [sanitizer] Replace InternalScopedBuffer with InternalMmapVector 2018-05-07 05:56:36 +00:00
fuzzer [libFuzzer] When printing NEW_FUNC, use 1-base indexing. 2018-06-07 21:15:24 +00:00
hwasan [HWASan] Report proper error on allocator failures instead of CHECK(0)-ing 2018-06-07 23:33:33 +00:00
interception [sanitizer] Trivial portion of the port to Myriad RTEMS 2018-05-18 00:43:54 +00:00
lsan [Sanitizers] Check alignment != 0 for aligned_alloc and posix_memalign 2018-06-08 20:40:35 +00:00
msan [MSan] Report proper error on allocator failures instead of CHECK(0)-ing 2018-06-08 23:31:42 +00:00
profile [CMake] Build shared version of runtimes for Fuchsia 2018-05-09 21:24:06 +00:00
safestack [safestack] Lazy initialization of interceptors 2018-05-26 01:18:32 +00:00
sanitizer_common [ASAN] Fix crash on i?86-linux (32-bit) against glibc 2.27 and later 2018-06-10 11:17:47 +00:00
scudo [scudo] Improve the scalability of the shared TSD model 2018-06-11 14:50:31 +00:00
stats [sanitizer] Replace InternalScopedBuffer with InternalMmapVector 2018-05-07 05:56:36 +00:00
tsan Introduce CheckASLR() in sanitizers 2018-06-05 07:29:23 +00:00
ubsan [sanitizer] Trivial portion of the port to Myriad RTEMS 2018-05-18 00:43:54 +00:00
ubsan_minimal [CMake] Build shared version of runtimes for Fuchsia 2018-05-09 21:24:06 +00:00
xray [Xray] logging forgotten header 2018-06-08 08:42:37 +00:00
CMakeLists.txt [cmake] Add a separate CMake var to control profile runtime 2017-10-02 05:03:55 +00:00