Commit Graph

56 Commits

Author SHA1 Message Date
Kostya Kortchinsky b59abb2590 [scudo] Scudo thread specific data refactor, part 3
Summary:
Previous parts: D38139, D38183.

In this part of the refactor, we abstract the Linux vs Android TSD dissociation
in favor of a Exclusive vs Shared one, allowing for easier platform introduction
and configuration.

Most of this change consist of shuffling the files around to reflect the new
organization.

We introduce `scudo_platform.h` where platform specific definition lie. This
involves the TSD model and the platform specific allocator parameters. In an
upcoming CL, those will be configurable via defines, but we currently stick
with conservative defaults.

Reviewers: alekseyshl, dvyukov

Reviewed By: alekseyshl, dvyukov

Subscribers: srhines, llvm-commits, mgorny

Differential Revision: https://reviews.llvm.org/D38244

llvm-svn: 314224
2017-09-26 17:20:02 +00:00
Kostya Kortchinsky 22396c2f47 [scudo] Scudo thread specific data refactor, part 2
Summary:
Following D38139, we now consolidate the TSD definition, merging the shared
TSD definition with the exclusive TSD definition. We introduce a boolean set
at initializaton denoting the need for the TSD to be unlocked or not. This
adds some unused members to the exclusive TSD, but increases consistency and
reduces the definitions fragmentation.

We remove the fallback mechanism from `scudo_allocator.cpp` and add a fallback
TSD in the non-shared version. Since the shared version doesn't require one,
this makes overall more sense.

There are a couple of additional cosmetic changes: removing the header guards
from the remaining `.inc` files, added error string to a `CHECK`.

Question to reviewers: I thought about friending `getTSDAndLock` in `ScudoTSD`
so that the `FallbackTSD` could `Mutex.Lock()` directly instead of `lock()`
which involved zeroing out the `Precedence`, which is unused otherwise. Is it
worth doing?

Reviewers: alekseyshl, dvyukov, kcc

Reviewed By: dvyukov

Subscribers: srhines, llvm-commits

Differential Revision: https://reviews.llvm.org/D38183

llvm-svn: 314110
2017-09-25 15:12:08 +00:00
Kostya Kortchinsky 392480952c [scudo] Scudo thread specific data refactor, part 1
Summary:
We are going through an overhaul of Scudo's TSD, to allow for new platforms
to be integrated more easily, and make the code more sound.

This first part is mostly renaming, preferring some shorter names, correcting
some comments. I removed `getPrng` and `getAllocatorCache` to directly access
the members, there was not really any benefit to them (and it was suggested by
Dmitry in D37590).

The only functional change is in `scudo_tls_android.cpp`: we enforce bounds to
the `NumberOfTSDs` and most of the logic in `getTSDAndLockSlow` is skipped if we
only have 1 TSD.

Reviewers: alekseyshl, dvyukov, kcc

Reviewed By: dvyukov

Subscribers: llvm-commits, srhines

Differential Revision: https://reviews.llvm.org/D38139

llvm-svn: 313987
2017-09-22 15:35:37 +00:00
Kostya Kortchinsky 26e689f0c5 [scudo] Fix bad request handling when allocator has not been initialized
Summary:
In a few functions (`scudoMemalign` and the like), we would call
`ScudoAllocator::FailureHandler::OnBadRequest` if the parameters didn't check
out. The issue is that if the allocator had not been initialized (eg: if this
is the first heap related function called), we would use variables like
`allocator_may_return_null` and `exitcode` that still had their default value
(as opposed to the one set by the user or the initialization path).

To solve this, we introduce `handleBadRequest` that will call `initThreadMaybe`,
allowing the options to be correctly initialized.

Unfortunately, the tests were passing because `exitcode` was still 0, so the
results looked like success. Change those tests to do what they were supposed
to.

Reviewers: alekseyshl

Reviewed By: alekseyshl

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D37853

llvm-svn: 313294
2017-09-14 20:34:32 +00:00
Kostya Kortchinsky 040c211bc4 [scudo] Fix improper TSD init after TLS destructors are called
Summary:
Some of glibc's own thread local data is destroyed after a user's thread local
destructors are called, via __libc_thread_freeres. This might involve calling
free, as is the case for strerror_thread_freeres.
If there is no prior heap operation in the thread, this free would end up
initializing some thread specific data that would never be destroyed properly
(as user's pthread destructors have already been called), while still being
deallocated when the TLS goes away. As a result, a program could SEGV, usually
in __sanitizer::AllocatorGlobalStats::Unregister, where one of the doubly linked
list links would refer to a now unmapped memory area.

To prevent this from happening, we will not do a full initialization from the
deallocation path. This means that the fallback cache & quarantine will be used
if no other heap operation has been called, and we effectively prevent the TSD
being initialized and never destroyed. The TSD will be fully initialized for all
other paths.

In the event of a thread doing only frees and nothing else, a TSD would never
be initialized for that thread, but this situation is unlikely and we can live
with that.

Reviewers: alekseyshl

Reviewed By: alekseyshl

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D37697

llvm-svn: 312939
2017-09-11 19:59:40 +00:00
Kostya Kortchinsky 6bc7b26d18 [scudo] getauxval alternative for Android
Summary:
`getauxval` was introduced with API level 18. In order to get things to work
at lower API levels (for the toolchain itself which is built at 14 for 32-bit),
we introduce an alternative implementation reading directly from
`/proc/self/auxv`.

Reviewers: alekseyshl

Reviewed By: alekseyshl

Subscribers: srhines, llvm-commits

Differential Revision: https://reviews.llvm.org/D37488

llvm-svn: 312653
2017-09-06 17:51:32 +00:00
Kostya Kortchinsky 476f21d87e [sanitizer] Re-introduce kUseSeparateSizeClassForBatch for the 32-bit Primary
Summary:
Currently `TransferBatch` are located within the same memory regions as
"regular" chunks. This is not ideal for security: they make for an interesting
target to overwrite, and are not protected by the frontend (namely, Scudo).

To solve this, we re-introduce `kUseSeparateSizeClassForBatch` for the 32-bit
Primary allowing for `TransferBatch` to end up in their own memory region.
Currently only Scudo would use this new feature, the default behavior remains
unchanged. The separate `kBatchClassID` was used for a brief period of time
previously but removed when the 64-bit ended up using the "free array".

Reviewers: alekseyshl, kcc, eugenis

Reviewed By: alekseyshl

Subscribers: llvm-commits, kubamracek

Differential Revision: https://reviews.llvm.org/D37082

llvm-svn: 311891
2017-08-28 15:20:02 +00:00
Kostya Kortchinsky 43917720a7 [scudo] Application & platform compatibility changes
Summary:
This patch changes a few (small) things around for compatibility purposes for
the current Android & Fuchsia work:
- `realloc`'ing some memory that was not allocated with `malloc`, `calloc` or
  `realloc`, while UB according to http://pubs.opengroup.org/onlinepubs/009695399/functions/realloc.html
  is more common that one would think. We now only check this if
  `DeallocationTypeMismatch` is set; change the "mismatch" error
  messages to be more homogeneous;
- some sketchily written but widely used libraries expect a call to `realloc`
  to copy the usable size of the old chunk to the new one instead of the
  requested size. We have to begrundingly abide by this de-facto standard.
  This doesn't seem to impact security either way, unless someone comes up with
  something we didn't think about;
- the CRC32 intrinsics for 64-bit take a 64-bit first argument. This is
  misleading as the upper 32 bits end up being ignored. This was also raising
  `-Wconversion` errors. Change things to take a `u32` as first argument.
  This also means we were (and are) only using 32 bits of the Cookie - not a
  big thing, but worth mentioning.
- Includes-wise: prefer `stddef.h` to `cstddef`, move `scudo_flags.h` where it
  is actually needed.
- Add tests for the memalign-realloc case, and the realloc-usable-size one.

(Edited typos)

Reviewers: alekseyshl

Reviewed By: alekseyshl

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D36754

llvm-svn: 311018
2017-08-16 16:40:48 +00:00
Kostya Kortchinsky e1dde07640 [sanitizers] Add a blocking boolean to GetRandom prototype
Summary:
On platforms with `getrandom`, the system call defaults to blocking. This
becomes an issue in the very early stage of the boot for Scudo, when the RNG
source is not set-up yet: the syscall will block and we'll stall.

Introduce a parameter to specify that the function should not block, defaulting
to blocking as the underlying syscall does.

Update Scudo to use the non-blocking version.

Reviewers: alekseyshl

Reviewed By: alekseyshl

Subscribers: llvm-commits, kubamracek

Differential Revision: https://reviews.llvm.org/D36399

llvm-svn: 310839
2017-08-14 14:53:47 +00:00
Kostya Kortchinsky 65fdf677f2 [scudo] Check for pvalloc overflow
Summary:
Previously we were rounding up the size passed to `pvalloc` to the next
multiple of page size no matter what. There is an overflow possibility that
wasn't accounted for. So now, return null in the event of an overflow. The man
page doesn't seem to indicate the errno to set in this particular situation,
but the glibc unit tests go for ENOMEM (https://code.woboq.org/userspace/glibc/malloc/tst-pvalloc.c.html#54)
so we'll do the same.
Update the aligned allocation funtions tests to check for properly aligned
returned pointers, and the `pvalloc` corner cases.

@alekseyshl: do you want me to do the same in the other Sanitizers?

Reviewers: alekseyshl

Reviewed By: alekseyshl

Subscribers: kubamracek, alekseyshl, llvm-commits

Differential Revision: https://reviews.llvm.org/D35818

llvm-svn: 309033
2017-07-25 21:18:02 +00:00
Kostya Kortchinsky 2d94405a32 [scudo] Quarantine overhaul
Summary:
First, some context.

The main feedback we get about the quarantine is that it's too memory hungry.
A single MB of quarantine will have an impact of 3 to 4MB of PSS/RSS, and
things quickly get out of hand in terms of memory usage, and the quarantine
ends up disabled.

The main objective of the quarantine is to protect from use-after-free
exploitation by making it harder for an attacker to reallocate a controlled
chunk in place of the targeted freed chunk. This is achieved by not making it
available to the backend right away for reuse, but holding it a little while.

Historically, what has usually been the target of such attacks was objects,
where vtable pointers or other function pointers could constitute a valuable
targeti to replace. Those are usually on the smaller side. There is barely any
advantage in putting the quarantine several megabytes of RGB data or the like.

Now for the patch.

This patch introduces a new way the Quarantine behaves in Scudo. First of all,
the size of the Quarantine will be defined in KB instead of MB, then we
introduce a new option: the size up to which (lower than or equal to) a chunk
will be quarantined. This way, we only quarantine smaller chunks, and the size
of the quarantine remains manageable. It also prevents someone from triggering
a recycle by allocating something huge. We default to 512 bytes on 32-bit and
2048 bytes on 64-bit platforms.

In details, the patches includes the following:
- introduce `QuarantineSizeKb`, but honor `QuarantineSizeMb` if set to fall
  back to the old behavior (meaning no threshold in that case);
  `QuarantineSizeMb` is described as deprecated in the options descriptios;
  documentation update will follow;
- introduce `QuarantineChunksUpToSize`, the new threshold value;
- update the `quarantine.cpp` test, and other tests using `QuarantineSizeMb`;
- remove `AllocatorOptions::copyTo`, it wasn't used;
- slightly change the logic around `quarantineOrDeallocateChunk` to accomodate
  for the new logic; rename a couple of variables there as well;

Rewriting the tests, I found a somewhat annoying bug where non-default aligned
chunks would account for more than needed when placed in the quarantine due to
`<< MinAlignment` instead of `<< MinAlignmentLog`. This is fixed and tested for
now.

Reviewers: alekseyshl, kcc

Reviewed By: alekseyshl

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D35694

llvm-svn: 308884
2017-07-24 15:29:38 +00:00
Alex Shlyapnikov 42bea018af [Sanitizers] ASan/MSan/LSan allocators set errno on failure.
Summary:
ASan/MSan/LSan allocators set errno on allocation failures according to
malloc/calloc/etc. expected behavior.

MSan allocator was refactored a bit to make its structure more similar
with other allocators.

Also switch Scudo allocator to the internal errno definitions.

TSan allocator changes will follow.

Reviewers: eugenis

Subscribers: llvm-commits, kubamracek

Differential Revision: https://reviews.llvm.org/D35275

llvm-svn: 308344
2017-07-18 19:11:04 +00:00
Alex Shlyapnikov df18cbba55 [Sanitizers] Scudo allocator set errno on failure.
Summary:
Set proper errno code on alloction failure and change pvalloc and
posix_memalign implementation to satisfy their man-specified
requirements.

Reviewers: cryptoad

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D35429

llvm-svn: 308053
2017-07-14 21:17:16 +00:00
Kostya Kortchinsky b44364dd15 [scudo] Do not grab a cache for secondary allocation & per related changes
Summary:
Secondary backed allocations do not require a cache. While it's not necessary
an issue when each thread has its cache, it becomes one with a shared pool of
caches (Android), as a Secondary backed allocation or deallocation holds a
cache that could be useful to another thread doing a Primary backed allocation.

We introduce an additional PRNG and its mutex (to avoid contention with the
Fallback one for Primary allocations) that will provide the `Salt` needed for
Secondary backed allocations.

I changed some of the code in a way that feels more readable to me (eg: using
some values directly rather than going  through ternary assigned variables,
using directly `true`/`false` rather than `FromPrimary`). I will let reviewers
decide if it actually is.

An additional change is to mark `CheckForCallocOverflow` as `UNLIKELY`.

Reviewers: alekseyshl

Reviewed By: alekseyshl

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D35358

llvm-svn: 307958
2017-07-13 21:01:19 +00:00
Kostya Kortchinsky 00582563be [scudo] PRNG makeover
Summary:
This follows the addition of `GetRandom` with D34412. We remove our
`/dev/urandom` code and use the new function. Additionally, change the PRNG for
a slightly faster version. One of the issues with the old code is that we have
64 full bits of randomness per "next", using only 8 of those for the Salt and
discarding the rest. So we add a cached u64 in the PRNG that can serve up to
8 u8 before having to call the "next" function again.

During some integration work, I also realized that some very early processes
(like `init`) do not benefit from `/dev/urandom` yet. So if there is no
`getrandom` syscall as well, we have to fallback to some sort of initialization
of the PRNG.

Now a few words on why XoRoShiRo and not something else. I have played a while
with various PRNGs on 32 & 64 bit platforms. Some results are below. LCG 32 & 64
are usually faster but produce respectively 15 & 31 bits of entropy, meaning
that to get a full 64-bit, you would need to call them several times. The simple
XorShift is fast, produces 32 bits but is mediocre with regard to PRNG test
suites, PCG is slower overall, and XoRoShiRo is faster than XorShift128+ and
produces full 64 bits.

%%%
root@tulip-chiphd:/data # ./randtest.arm
[+] starting xs32...
[?] xs32 duration: 22431833053ns
[+] starting lcg32...
[?] lcg32 duration: 14941402090ns
[+] starting pcg32...
[?] pcg32 duration: 44941973771ns
[+] starting xs128p...
[?] xs128p duration: 48889786981ns
[+] starting lcg64...
[?] lcg64 duration: 33831042391ns
[+] starting xos128p...
[?] xos128p duration: 44850878605ns

root@tulip-chiphd:/data # ./randtest.aarch64
[+] starting xs32...
[?] xs32 duration: 22425151678ns
[+] starting lcg32...
[?] lcg32 duration: 14954255257ns
[+] starting pcg32...
[?] pcg32 duration: 37346265726ns
[+] starting xs128p...
[?] xs128p duration: 22523807219ns
[+] starting lcg64...
[?] lcg64 duration: 26141304679ns
[+] starting xos128p...
[?] xos128p duration: 14937033215ns
%%%

Reviewers: alekseyshl

Reviewed By: alekseyshl

Subscribers: aemerson, kristof.beyls, llvm-commits

Differential Revision: https://reviews.llvm.org/D35221

llvm-svn: 307798
2017-07-12 15:29:08 +00:00
Alex Shlyapnikov 346988bf02 Merge
llvm-svn: 306748
2017-06-29 21:54:38 +00:00
Kostya Kortchinsky 0ce4999002 [scudo] Change aligned alloc functions to be more compliant & perf changes
Summary:
We were not following the `man` documented behaviors for invalid arguments to
`memalign` and associated functions. Using `CHECK` for those was a bit extreme,
so we relax the behavior to return null pointers as expected when this happens.
Adapt the associated test.

I am using this change also to change a few more minor performance improvements:
- mark as `UNLIKELY` a bunch of unlikely conditions;
- the current `CHECK` in `__sanitizer::RoundUpTo` is redundant for us in *all*
  calls. So I am introducing our own version without said `CHECK`.
- change our combined allocator `GetActuallyAllocatedSize`. We already know if
  the pointer is from the Primary or Secondary, so the `PointerIsMine` check is
  redundant as well, and costly for the 32-bit Primary. So we get the size by
  directly using the available Primary functions.

Finally, change a `int` to `uptr` to avoid a warning/error when compiling on
Android.

Reviewers: alekseyshl

Reviewed By: alekseyshl

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D34782

llvm-svn: 306698
2017-06-29 16:45:20 +00:00
Alex Shlyapnikov 4b450685d3 [Sanitizers] Operator new() interceptors always die on allocation error
Summary:
Operator new interceptors behavior is now controlled by their nothrow
property as well as by allocator_may_return_null flag value:

- allocator_may_return_null=* + new()        - die on allocation error
- allocator_may_return_null=0 + new(nothrow) - die on allocation error
- allocator_may_return_null=1 + new(nothrow) - return null

Ideally new() should throw std::bad_alloc exception, but that is not
trivial to achieve, hence TODO.

Reviewers: eugenis

Subscribers: kubamracek, llvm-commits

Differential Revision: https://reviews.llvm.org/D34731

llvm-svn: 306604
2017-06-28 21:58:57 +00:00
Alex Shlyapnikov ccab11b0e8 [Sanitizers] Move cached allocator_may_return_null flag to sanitizer_allocator
Summary:
Move cached allocator_may_return_null flag to sanitizer_allocator.cc and
provide API to consolidate and unify the behavior of all specific allocators.

Make all sanitizers using CombinedAllocator to follow
AllocatorReturnNullOrDieOnOOM() rules to behave the same way when OOM
happens.

When OOM happens, turn allocator_out_of_memory flag on regardless of
allocator_may_return_null flag value (it used to not to be set when
allocator_may_return_null == true).

release_to_os_interval_ms and rss_limit_exceeded will likely be moved to
sanitizer_allocator.cc too (later).

Reviewers: eugenis

Subscribers: srhines, kubamracek, llvm-commits

Differential Revision: https://reviews.llvm.org/D34310

llvm-svn: 305858
2017-06-20 21:23:02 +00:00
Kostya Kortchinsky db18e4d993 [scudo] Check the return values of the pthread_* functions
Summary:
Currently we are not enforcing the success of `pthread_once`, and
`pthread_setspecific`. Errors could lead to harder to debug issues later in
the thread's life. This adds checks for a 0 return value for both.
If `pthread_setspecific` fails in the teardown path, opt for an immediate
teardown as opposed to a fatal failure.

Reviewers: alekseyshl, kcc

Reviewed By: alekseyshl

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D33555

llvm-svn: 303998
2017-05-26 15:39:22 +00:00
Kostya Kortchinsky 432b8dd8ad [scudo] lower quarantine default sizes
Summary:
After discussing the current defaults with a couple of parties, the consensus
is that they are too high. 1Mb of quarantine has about a 4Mb impact on PSS, so
memory usage goes up quickly.
This is obviously configurable, but the default value should be more
"approachable", so both the global size and the thread local size are 1/4 of
what they used to be.

Reviewers: alekseyshl, kcc

Reviewed By: alekseyshl

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D33321

llvm-svn: 303380
2017-05-18 20:47:35 +00:00
Kostya Kortchinsky dc646a0889 [sanitizer] Change SizeClassAllocator32 to accept just one template
Summary:
With rL279771, SizeClassAllocator64 was changed to accept only one template
instead of 5, for the following reasons: "First, this will make the mangled
names shorter. Second, this will make adding more parameters simpler". This
patch mirrors that work for SizeClassAllocator32.

This is in preparation for introducing the randomization of chunks in the
32-bit SizeClassAllocator in a later patch.

Reviewers: kcc, alekseyshl, dvyukov

Reviewed By: alekseyshl

Subscribers: llvm-commits, kubamracek

Differential Revision: https://reviews.llvm.org/D33141

llvm-svn: 303071
2017-05-15 14:47:19 +00:00
Kostya Kortchinsky 01a66fc928 [scudo] Use our own combined allocator
Summary:
The reasoning behind this change is twofold:
- the current combined allocator (sanitizer_allocator_combined.h) implements
  features that are not relevant for Scudo, making some code redundant, and
  some restrictions not pertinent (alignments for example). This forced us to
  do some weird things between the frontend and our secondary to make things
  work;
- we have enough information to be able to know if a chunk will be serviced by
  the Primary or Secondary, allowing us to avoid extraneous calls to functions
  such as `PointerIsMine` or `CanAllocate`.

As a result, the new scudo-specific combined allocator is very straightforward,
and allows us to remove some now unnecessary code both in the frontend and the
secondary. Unused functions have been left in as unimplemented for now.

It turns out to also be a sizeable performance gain (3% faster in some Android
memory_replay benchmarks, doing some more on other platforms).

Reviewers: alekseyshl, kcc, dvyukov

Reviewed By: alekseyshl

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D33007

llvm-svn: 302830
2017-05-11 21:40:45 +00:00
Kostya Kortchinsky b0e96eb28e [scudo] CRC32 optimizations
Summary:
This change optimizes several aspects of the checksum used for chunk headers.

First, there is no point in checking the weak symbol `computeHardwareCRC32`
everytime, it will either be there or not when we start, so check it once
during initialization and set the checksum type accordingly.

Then, the loading of `HashAlgorithm` for SSE versions (and ARM equivalent) was
not optimized out, while not necessary. So I reshuffled that part of the code,
which duplicates a tiny bit of code, but ends up in a much cleaner assembly
(and faster as we avoid an extraneous load and some calls).

The following code is the checksum at the end of `scudoMalloc` for x86_64 with
full SSE 4.2, before:
```
mov     rax, 0FFFFFFFFFFFFFFh
shl     r10, 38h
mov     edi, dword ptr cs:_ZN7__scudoL6CookieE ; __scudo::Cookie
and     r14, rax
lea     rsi, [r13-10h]
movzx   eax, cs:_ZN7__scudoL13HashAlgorithmE ; __scudo::HashAlgorithm
or      r14, r10
mov     rbx, r14
xor     bx, bx
call    _ZN7__scudo20computeHardwareCRC32Ejm ; __scudo::computeHardwareCRC32(uint,ulong)
mov     rsi, rbx
mov     edi, eax
call    _ZN7__scudo20computeHardwareCRC32Ejm ; __scudo::computeHardwareCRC32(uint,ulong)
mov     r14w, ax
mov     rax, r13
mov     [r13-10h], r14
```
After:
```
mov     rax, cs:_ZN7__scudoL6CookieE ; __scudo::Cookie
lea     rcx, [rbx-10h]
mov     rdx, 0FFFFFFFFFFFFFFh
and     r14, rdx
shl     r9, 38h
or      r14, r9
crc32   eax, rcx
mov     rdx, r14
xor     dx, dx
mov     eax, eax
crc32   eax, rdx
mov     r14w, ax
mov     rax, rbx
mov     [rbx-10h], r14
```

Reviewers: dvyukov, alekseyshl, kcc

Reviewed By: alekseyshl

Subscribers: aemerson, rengolin, llvm-commits

Differential Revision: https://reviews.llvm.org/D32971

llvm-svn: 302538
2017-05-09 15:12:38 +00:00
Kostya Kortchinsky ee0695766c [scudo] Add Android support
Summary:
This change adds Android support to the allocator (but doesn't yet enable it in
the cmake config), and should be the last fragment of the rewritten change
D31947.

Android has more memory constraints than other platforms, so the idea of a
unique context per thread would not have worked. The alternative chosen is to
allocate a set of contexts based on the number of cores on the machine, and
share those contexts within the threads. Contexts can be dynamically reassigned
to threads to prevent contention, based on a scheme suggested by @dvyuokv in
the initial review.

Additionally, given that Android doesn't support ELF TLS (only emutls for now),
we use the TSan TLS slot to make things faster: Scudo is mutually exclusive
with other sanitizers so this shouldn't cause any problem.

An additional change made here, is replacing `thread_local` by `THREADLOCAL`
and using the initial-exec thread model in the non-Android version to prevent
extraneous weak definition and checks on the relevant variables.

Reviewers: kcc, dvyukov, alekseyshl

Reviewed By: alekseyshl

Subscribers: srhines, mgorny, llvm-commits

Differential Revision: https://reviews.llvm.org/D32649

llvm-svn: 302300
2017-05-05 21:38:22 +00:00
Kostya Kortchinsky 36b3434161 [scudo] Move thread local variables into their own files
Summary:
This change introduces scudo_tls.h & scudo_tls_linux.cpp, where we move the
thread local variables used by the allocator, namely the cache, quarantine
cache & prng. `ScudoThreadContext` will hold those. This patch doesn't
introduce any new platform support yet, this will be the object of a later
patch. This also changes the PRNG so that the structure can be POD.

Reviewers: kcc, dvyukov, alekseyshl

Reviewed By: dvyukov, alekseyshl

Subscribers: llvm-commits, mgorny

Differential Revision: https://reviews.llvm.org/D32440

llvm-svn: 301584
2017-04-27 20:21:16 +00:00
Kostya Kortchinsky f1a54fdfd6 [scudo] Bypass Quarantine if its size is set to 0
Summary:
In the current state of things, the deallocation path puts a chunk in the
Quarantine whether it's enabled or not (size of 0). When the Quarantine is
disabled, this results in the header being loaded (and checked) twice, and
stored (and checksummed) once, in `deallocate` and `Recycle`.

This change introduces a `quarantineOrDeallocateChunk` function that has a
fast path to deallocation if the Quarantine is disabled. Even though this is
not the preferred configuration security-wise, this change saves a sizeable
amount of processing for that particular situation (which could be adopted by
low memory devices). Additionally this simplifies a bit `deallocate` and
`reallocate`.

Reviewers: dvyukov, kcc, alekseyshl

Reviewed By: dvyukov

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D32310

llvm-svn: 301015
2017-04-21 18:10:53 +00:00
Kostya Kortchinsky fff8e0620b [scudo] Remove GetActuallyAllocatedSize calls from the fast path
Summary:
GetActuallyAllocatedSize is actually expensive. In order to avoid calling this
function in the malloc/free fast path, we change the Scudo chunk header to
store the size of the chunk, if from the Primary, or the amount of unused
bytes if from the Secondary. This way, we only have to call the culprit
function for Secondary backed allocations (and still in realloc).

The performance gain on a singly threaded pure malloc/free benchmark exercising
the Primary allocator is above 5%.

Reviewers: alekseyshl, kcc, dvyukov

Reviewed By: dvyukov

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D32299

llvm-svn: 300861
2017-04-20 18:07:17 +00:00
Kostya Kortchinsky 006805d146 [scudo] Minor changes and refactoring
Summary:
This is part of D31947 that is being split into several smaller changes.

This one deals with all the minor changes, more specifically:
- Rename some variables and functions to make their purpose clearer;
- Reorder some code;
- Mark the hot termination incurring checks as `UNLIKELY`; if they happen, the
  program will die anyway;
- Add a `getScudoChunk` method;
- Add an `eraseHeader` method to ScudoChunk that will clear a header with 0s;
- Add a parameter to `allocate` to know if the allocated chunk should be filled
  with zeros. This allows `calloc` to not have to call
  `GetActuallyAllocatedSize`; more changes to get rid of this function on the
  hot paths will follow;
- reallocate was missing a check to verify that the pointer is properly
  aligned on `MinAlignment`;
- The `Stats` in the secondary have to be protected by a mutex as the `Add`
  and `Sub` methods are actually not atomic;
- The software CRC32 function was moved to the header to allow for inlining.

Reviewers: dvyukov, alekseyshl, kcc

Reviewed By: dvyukov

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D32242

llvm-svn: 300846
2017-04-20 15:11:00 +00:00
Kostya Kortchinsky 8d6257b4bf [scudo] 32-bit quarantine sizes adjustments and bug fixes
Summary:
The local and global quarantine sizes were not offering a distinction for
32-bit and 64-bit platforms. This is addressed with lower values for 32-bit.

When writing additional tests for the quarantine, it was discovered that when
calling some of the allocator interface function prior to any allocation
operation having occured, the test would crash due to the allocator not being
initialized. This was addressed by making sure the allocator is initialized
for those scenarios.

Relevant tests were added in interface.cpp and quarantine.cpp.

Last change being the removal of the extraneous link dependencies for the
tests thanks to rL293220, anf the addition of the gc-sections linker flag.

Reviewers: kcc, alekseyshl

Reviewed By: alekseyshl

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D29341

llvm-svn: 294037
2017-02-03 20:49:42 +00:00
Kostya Kortchinsky a00b9229c3 [scudo] Replacing std::atomic with Sanitizer's atomics
Summary:
In an effort to getting rid of dependencies to external libraries, we are
replacing atomic PackedHeader use of std::atomic with Sanitizer's
atomic_uint64_t, which allows us to avoid -latomic.

Reviewers: kcc, phosek, alekseyshl

Reviewed By: alekseyshl

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D28864

llvm-svn: 292630
2017-01-20 18:32:18 +00:00
Kostya Kortchinsky b39dff4551 [scudo] Refactor of CRC32 and ARM runtime CRC32 detection
Summary:
ARM & AArch64 runtime detection for hardware support of CRC32 has been added
via check of the AT_HWVAL auxiliary vector.

Following Michal's suggestions in D28417, the CRC32 code has been further
changed and looks better now. When compiled with full relro (which is strongly
suggested to benefit from additional hardening), the weak symbol for
computeHardwareCRC32 is read-only and the assembly generated is fairly clean
and straight forward. As suggested, an additional optimization is to skip
the runtime check if SSE 4.2 has been enabled globally, as opposed to only
for scudo_crc32.cpp.

scudo_crc32.h has no purpose anymore and was removed.

Reviewers: alekseyshl, kcc, rengolin, mgorny, phosek

Reviewed By: rengolin, mgorny

Subscribers: aemerson, rengolin, llvm-commits

Differential Revision: https://reviews.llvm.org/D28574

llvm-svn: 292409
2017-01-18 17:11:17 +00:00
Petr Hosek 97ab7fe6b7 [scudo] Eliminate the runtime dependency on libc++abi
Making this variable non-static avoids the need for locking to ensure
that the initialization is thread-safe which in turns eliminates the
runtime dependency on libc++abi library (for __cxa_guard_acquire and
__cxa_guard_release) which makes it possible to link scudo against
pure C programs.

Differential Revision: https://reviews.llvm.org/D28757

llvm-svn: 292253
2017-01-17 20:41:04 +00:00
Mike Aizatsky 9700acba11 Revert "[sancov] moving sancov rt to sancov/ directory"
This reverts commit https://reviews.llvm.org/rL291734
Reason: mac breakage
http://lab.llvm.org:8080/green//job/clang-stage1-configure-RA_build/28798/consoleFull#1657087648e9a0fee5-ebcc-4238-a641-c5aa112c323e

llvm-svn: 291736
2017-01-12 01:37:35 +00:00
Mike Aizatsky 875572f358 [sancov] moving sancov rt to sancov/ directory
Subscribers: kubabrecka, mgorny

Differential Revision: https://reviews.llvm.org/D28541

llvm-svn: 291734
2017-01-12 01:19:34 +00:00
Kostya Kortchinsky c4d6c938e3 [scudo] Separate hardware CRC32 routines
Summary:
As raised in D28304, enabling SSE 4.2 for the whole Scudo tree leads to the
emission of SSE 4.2 instructions everywhere, while the runtime checks only
applied to the CRC32 computing function.

This patch separates the CRC32 function taking advantage of the hardware into
its own file, and only enabled -msse4.2 for that file, if detected to be
supported by the compiler.

Another consequence of removing SSE4.2 globally is realizing that memcpy were
not being optimized, which turned out to be due to the -fno-builtin in
SANITIZER_COMMON_CFLAGS. So we now explicitely enable builtins for Scudo.

The resulting assembly looks good, with some CALLs are introduced instead of
the CRC32 code being inlined.

Reviewers: kcc, mgorny, alekseyshl

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D28417

llvm-svn: 291570
2017-01-10 16:39:36 +00:00
Michal Gorny c74123bdff [cmake] Disable appending -msse4.2 flag implicitly
Disable the code appending -msse4.2 flag implicitly when the compiler
supports it. The compiler support for this flags do not indicate that
the underlying CPU will support SSE4.2, and passing it may result in
SSE4.2 code being emitted *implicitly*.

If the target platform supports SSE4.2 appropriately, the relevant bits
should be already enabled via -march= or equivalent. In this case
passing -msse4.2 is redundant.

If a runtime detection is desired (which seems to be a case with SCUDO),
then (as gcc manpage points out) the specific SSE4.2 needs to be
isolated into a separate file, the -msse4.2 flag can be forced only
for that file and the function defined in that file can only be called
when the CPU is determined to support SSE4.2.

This fixes SIGILL on SCUDO when it is compiled using gcc-5.4.

Differential Revision: https://reviews.llvm.org/D28304

llvm-svn: 291217
2017-01-06 08:21:02 +00:00
Kostya Kortchinsky 47be0edfa3 [scudo] Use DefaultSizeClassMap for 32-bit
Summary:
With the recent changes to the Secondary, we use less bits for UnusedBytes,
which allows us in return to increase the bits used for Offset. That means
that we can use a Primary SizeClassMap allowing for a larger maximum size.

Reviewers: kcc, alekseyshl

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D27816

llvm-svn: 289838
2016-12-15 18:06:55 +00:00
Kostya Kortchinsky c74da7ce58 Corrected D27428: Do not use the alignment-rounded-up size with secondary
Summary:
I atually had an integer overflow on 32-bit with D27428 that didn't reproduce
locally, as the test servers would manage allocate addresses in the 0xffffxxxx
range, which led to some issues when rounding addresses.

At this point, I feel that Scudo could benefit from having its own combined
allocator, as we don't get any benefit from the current one, but have to work
around some hurdles (alignment checks, rounding up that is no longer needed,
extraneous code).

Reviewers: kcc, alekseyshl

Subscribers: llvm-commits, kubabrecka

Differential Revision: https://reviews.llvm.org/D27681

llvm-svn: 289572
2016-12-13 19:31:54 +00:00
Kostya Kortchinsky bfeb7990b2 Reverting rL289088 while investigating some test issue on the build servers
Subscribers: kubabrecka

Differential Revision: https://reviews.llvm.org/D27605

llvm-svn: 289180
2016-12-09 03:28:12 +00:00
Kostya Kortchinsky 2defe4d9a1 [sanitizer] Do not use the alignment-rounded-up size when using the secondary
Summary:
The combined allocator rounds up the requested size with regard to the
alignment, which makes sense when being serviced by the primary as it comes
with alignment guarantees, but not with the secondary. For the rare case of
large alignments, it wastes memory, and entices unnecessarily large fields for
the Scudo header. With this patch, we pass the non-alignement-rounded-up size
to the secondary, and adapt the Scudo code for this change.

Reviewers: alekseyshl, kcc

Subscribers: llvm-commits, kubabrecka

Differential Revision: https://reviews.llvm.org/D27428

llvm-svn: 289088
2016-12-08 19:05:46 +00:00
Stephan Bergmann 612d8c66f4 Don't include system header inside namespace
...causes build failure at least with GCC 6.2.1, as smmintrin.h indirectly
includes cstdlib, which then runs into problems.

llvm-svn: 288486
2016-12-02 08:03:57 +00:00
Kostya Kortchinsky 1148dc5274 [scudo] 32-bit and hardware agnostic support
Summary:
This update introduces i386 support for the Scudo Hardened Allocator, and
offers software alternatives for functions that used to require hardware
specific instruction sets. This should make porting to new architectures
easier.

Among the changes:
- The chunk header has been changed to accomodate the size limitations
  encountered on 32-bit architectures. We now fit everything in 64-bit. This
  was achieved by storing the amount of unused bytes in an allocation rather
  than the size itself, as one can be deduced from the other with the help
  of the GetActuallyAllocatedSize function. As it turns out, this header can
  be used for both 64 and 32 bit, and as such we dropped the requirement for
  the 128-bit compare and exchange instruction support (cmpxchg16b).
- Add 32-bit support for the checksum and the PRNG functions: if the SSE 4.2
  instruction set is supported, use the 32-bit CRC32 instruction, and in the
  XorShift128, use a 32-bit based state instead of 64-bit.
- Add software support for CRC32: if SSE 4.2 is not supported, fallback on a
  software implementation.
- Modify tests that were not 32-bit compliant, and expand them to cover more
  allocation and alignment sizes. The random shuffle test has been deactivated
  for linux-i386 & linux-i686 as the 32-bit sanitizer allocator doesn't
  currently randomize chunks.

Reviewers: alekseyshl, kcc

Subscribers: filcab, llvm-commits, tberghammer, danalbert, srhines, mgorny, modocache

Differential Revision: https://reviews.llvm.org/D26358

llvm-svn: 288255
2016-11-30 17:32:20 +00:00
Evgeniy Stepanov d3305afc75 Return memory to OS right after free (not in the async thread).
Summary:
In order to avoid starting a separate thread to return unused memory to
the system (the thread interferes with process startup on Android,
Zygota waits for all threads to exit before fork, but this thread never
exits), try to return it right after free.

Reviewers: eugenis

Subscribers: cryptoad, filcab, danalbert, kubabrecka, llvm-commits

Patch by Aleksey Shlyapnikov.

Differential Revision: https://reviews.llvm.org/D27003

llvm-svn: 288091
2016-11-29 00:22:50 +00:00
Kostya Kortchinsky 71dcc33c58 [scudo] Lay the foundation for 32-bit support
Summary:
In order to support 32-bit platforms, we have to make some adjustments in
multiple locations, one of them being the Scudo chunk header. For it to fit on
64 bits (as a reminder, on x64 it's 128 bits), I had to crunch the space taken
by some of the fields. In order to keep the offset field small, the secondary
allocator was changed to accomodate aligned allocations for larger alignments,
hence making the offset constant for chunks serviced by it.

The resulting header candidate has been added, and further modifications to
allow 32-bit support will follow.

Another notable change is the addition of MaybeStartBackgroudThread() to allow
release of the memory to the OS.

Reviewers: kcc

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D25688

llvm-svn: 285209
2016-10-26 16:16:58 +00:00
Kostya Kortchinsky ada2761407 [scudo] Fix an edge case in the secondary allocator
Summary:
s/CHECK_LT/CHECK_LE/ in the secondary allocator, as under certain circumstances
Ptr + Size can be equal to MapEnd. This edge case was not found by the current
tests, so those were extended to be able to catch that.

Reviewers: kcc

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D25101

llvm-svn: 282913
2016-09-30 19:57:21 +00:00
Vitaly Buka 0ec5a2830d Don't use internal symbolizer if we are in process of reporting Out-of-Memory.
Reviewed by eugenis offline, as reviews.llvm.org is down.

llvm-svn: 282805
2016-09-29 23:00:54 +00:00
Kostya Kortchinsky 1da3ea561a [scudo] Fix a bug in the new Secondary Allocator
Summary:
GetActuallyAllocatedSize() was not accounting for the last page of the mapping
being a guard page, and was returning the wrong number of actually allocated
bytes, which in turn would mess up with the realloc logic. Current tests didn't
find this as the size exercised was only serviced by the Primary.

Correct the issue by subtracting PageSize, and update the realloc test to
exercise paths in both the Primary and the Secondary.

Reviewers: kcc

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D24787

llvm-svn: 282030
2016-09-20 22:17:59 +00:00
Kostya Kortchinsky 3beafffcca [scudo] Modify Scudo to use its own Secondary Allocator
Summary:
The Sanitizer Secondary Allocator was not entirely ideal was Scudo for several
reasons: decent amount of unneeded code, redundant checks already performed by
the front end, unneeded data structures, difficulty to properly protect the
secondary chunks header.

Given that the second allocator is pretty straight forward, Scudo will use its
own, trimming all the unneeded code off of the Sanitizer one. A significant
difference in terms of security is that now each secondary chunk is preceded
and followed by a guard page, thus mitigating overflows into and from the
chunk.

A test was added as well to illustrate the overflow & underflow situations
into the guard pages.

Reviewers: kcc

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D24737

llvm-svn: 281938
2016-09-19 21:11:55 +00:00
Chris Bieneman 21395f9839 [CMake] Connect Compiler-RT targets to LLVM Runtimes directory
This patch builds on LLVM r279776.

In this patch I've done some cleanup and abstracted three common steps runtime components have in their CMakeLists files, and added a fourth.

The three steps I abstract are:

(1) Add a top-level target (i.e asan, msan, ...)
(2) Set the target properties for sorting files in IDE generators
(3) Make the compiler-rt target depend on the top-level target

The new step is to check if a command named "runtime_register_component" is defined, and to call it with the component name.

The runtime_register_component command is defined in llvm/runtimes/CMakeLists.txt, and presently just adds the component to a list of sub-components, which later gets used to generate target mappings.

With this patch a new workflow for runtimes builds is supported. The new workflow when building runtimes from the LLVM runtimes directory is:

> cmake [...]
> ninja runtimes-configure
> ninja asan

The "runtimes-configure" target builds all the dependencies for configuring the runtimes projects, and runs CMake on the runtimes projects. Running the runtimes CMake generates a list of targets to bind into the top-level CMake so subsequent build invocations will have access to some of Compiler-RT's targets through the top-level build.

Note: This patch does exclude some top-level targets from compiler-rt libraries because they either don't install files (sanitizer_common), or don't have a cooresponding `check` target (stats).
llvm-svn: 279863
2016-08-26 20:52:22 +00:00