Summary:
Operator new interceptors behavior is now controlled by their nothrow
property as well as by allocator_may_return_null flag value:
- allocator_may_return_null=* + new() - die on allocation error
- allocator_may_return_null=0 + new(nothrow) - die on allocation error
- allocator_may_return_null=1 + new(nothrow) - return null
Ideally new() should throw std::bad_alloc exception, but that is not
trivial to achieve, hence TODO.
Reviewers: eugenis
Subscribers: kubamracek, llvm-commits
Differential Revision: https://reviews.llvm.org/D34731
llvm-svn: 306604
Summary:
Move cached allocator_may_return_null flag to sanitizer_allocator.cc and
provide API to consolidate and unify the behavior of all specific allocators.
Make all sanitizers using CombinedAllocator to follow
AllocatorReturnNullOrDieOnOOM() rules to behave the same way when OOM
happens.
When OOM happens, turn allocator_out_of_memory flag on regardless of
allocator_may_return_null flag value (it used to not to be set when
allocator_may_return_null == true).
release_to_os_interval_ms and rss_limit_exceeded will likely be moved to
sanitizer_allocator.cc too (later).
Reviewers: eugenis
Subscribers: srhines, kubamracek, llvm-commits
Differential Revision: https://reviews.llvm.org/D34310
llvm-svn: 305858
Summary:
Currently we are not enforcing the success of `pthread_once`, and
`pthread_setspecific`. Errors could lead to harder to debug issues later in
the thread's life. This adds checks for a 0 return value for both.
If `pthread_setspecific` fails in the teardown path, opt for an immediate
teardown as opposed to a fatal failure.
Reviewers: alekseyshl, kcc
Reviewed By: alekseyshl
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D33555
llvm-svn: 303998
Summary:
After discussing the current defaults with a couple of parties, the consensus
is that they are too high. 1Mb of quarantine has about a 4Mb impact on PSS, so
memory usage goes up quickly.
This is obviously configurable, but the default value should be more
"approachable", so both the global size and the thread local size are 1/4 of
what they used to be.
Reviewers: alekseyshl, kcc
Reviewed By: alekseyshl
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D33321
llvm-svn: 303380
Summary:
With rL279771, SizeClassAllocator64 was changed to accept only one template
instead of 5, for the following reasons: "First, this will make the mangled
names shorter. Second, this will make adding more parameters simpler". This
patch mirrors that work for SizeClassAllocator32.
This is in preparation for introducing the randomization of chunks in the
32-bit SizeClassAllocator in a later patch.
Reviewers: kcc, alekseyshl, dvyukov
Reviewed By: alekseyshl
Subscribers: llvm-commits, kubamracek
Differential Revision: https://reviews.llvm.org/D33141
llvm-svn: 303071
Summary:
The reasoning behind this change is twofold:
- the current combined allocator (sanitizer_allocator_combined.h) implements
features that are not relevant for Scudo, making some code redundant, and
some restrictions not pertinent (alignments for example). This forced us to
do some weird things between the frontend and our secondary to make things
work;
- we have enough information to be able to know if a chunk will be serviced by
the Primary or Secondary, allowing us to avoid extraneous calls to functions
such as `PointerIsMine` or `CanAllocate`.
As a result, the new scudo-specific combined allocator is very straightforward,
and allows us to remove some now unnecessary code both in the frontend and the
secondary. Unused functions have been left in as unimplemented for now.
It turns out to also be a sizeable performance gain (3% faster in some Android
memory_replay benchmarks, doing some more on other platforms).
Reviewers: alekseyshl, kcc, dvyukov
Reviewed By: alekseyshl
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D33007
llvm-svn: 302830
Summary:
This change optimizes several aspects of the checksum used for chunk headers.
First, there is no point in checking the weak symbol `computeHardwareCRC32`
everytime, it will either be there or not when we start, so check it once
during initialization and set the checksum type accordingly.
Then, the loading of `HashAlgorithm` for SSE versions (and ARM equivalent) was
not optimized out, while not necessary. So I reshuffled that part of the code,
which duplicates a tiny bit of code, but ends up in a much cleaner assembly
(and faster as we avoid an extraneous load and some calls).
The following code is the checksum at the end of `scudoMalloc` for x86_64 with
full SSE 4.2, before:
```
mov rax, 0FFFFFFFFFFFFFFh
shl r10, 38h
mov edi, dword ptr cs:_ZN7__scudoL6CookieE ; __scudo::Cookie
and r14, rax
lea rsi, [r13-10h]
movzx eax, cs:_ZN7__scudoL13HashAlgorithmE ; __scudo::HashAlgorithm
or r14, r10
mov rbx, r14
xor bx, bx
call _ZN7__scudo20computeHardwareCRC32Ejm ; __scudo::computeHardwareCRC32(uint,ulong)
mov rsi, rbx
mov edi, eax
call _ZN7__scudo20computeHardwareCRC32Ejm ; __scudo::computeHardwareCRC32(uint,ulong)
mov r14w, ax
mov rax, r13
mov [r13-10h], r14
```
After:
```
mov rax, cs:_ZN7__scudoL6CookieE ; __scudo::Cookie
lea rcx, [rbx-10h]
mov rdx, 0FFFFFFFFFFFFFFh
and r14, rdx
shl r9, 38h
or r14, r9
crc32 eax, rcx
mov rdx, r14
xor dx, dx
mov eax, eax
crc32 eax, rdx
mov r14w, ax
mov rax, rbx
mov [rbx-10h], r14
```
Reviewers: dvyukov, alekseyshl, kcc
Reviewed By: alekseyshl
Subscribers: aemerson, rengolin, llvm-commits
Differential Revision: https://reviews.llvm.org/D32971
llvm-svn: 302538
Summary:
This change adds Android support to the allocator (but doesn't yet enable it in
the cmake config), and should be the last fragment of the rewritten change
D31947.
Android has more memory constraints than other platforms, so the idea of a
unique context per thread would not have worked. The alternative chosen is to
allocate a set of contexts based on the number of cores on the machine, and
share those contexts within the threads. Contexts can be dynamically reassigned
to threads to prevent contention, based on a scheme suggested by @dvyuokv in
the initial review.
Additionally, given that Android doesn't support ELF TLS (only emutls for now),
we use the TSan TLS slot to make things faster: Scudo is mutually exclusive
with other sanitizers so this shouldn't cause any problem.
An additional change made here, is replacing `thread_local` by `THREADLOCAL`
and using the initial-exec thread model in the non-Android version to prevent
extraneous weak definition and checks on the relevant variables.
Reviewers: kcc, dvyukov, alekseyshl
Reviewed By: alekseyshl
Subscribers: srhines, mgorny, llvm-commits
Differential Revision: https://reviews.llvm.org/D32649
llvm-svn: 302300
Summary:
This change introduces scudo_tls.h & scudo_tls_linux.cpp, where we move the
thread local variables used by the allocator, namely the cache, quarantine
cache & prng. `ScudoThreadContext` will hold those. This patch doesn't
introduce any new platform support yet, this will be the object of a later
patch. This also changes the PRNG so that the structure can be POD.
Reviewers: kcc, dvyukov, alekseyshl
Reviewed By: dvyukov, alekseyshl
Subscribers: llvm-commits, mgorny
Differential Revision: https://reviews.llvm.org/D32440
llvm-svn: 301584
Summary:
In the current state of things, the deallocation path puts a chunk in the
Quarantine whether it's enabled or not (size of 0). When the Quarantine is
disabled, this results in the header being loaded (and checked) twice, and
stored (and checksummed) once, in `deallocate` and `Recycle`.
This change introduces a `quarantineOrDeallocateChunk` function that has a
fast path to deallocation if the Quarantine is disabled. Even though this is
not the preferred configuration security-wise, this change saves a sizeable
amount of processing for that particular situation (which could be adopted by
low memory devices). Additionally this simplifies a bit `deallocate` and
`reallocate`.
Reviewers: dvyukov, kcc, alekseyshl
Reviewed By: dvyukov
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D32310
llvm-svn: 301015
Summary:
GetActuallyAllocatedSize is actually expensive. In order to avoid calling this
function in the malloc/free fast path, we change the Scudo chunk header to
store the size of the chunk, if from the Primary, or the amount of unused
bytes if from the Secondary. This way, we only have to call the culprit
function for Secondary backed allocations (and still in realloc).
The performance gain on a singly threaded pure malloc/free benchmark exercising
the Primary allocator is above 5%.
Reviewers: alekseyshl, kcc, dvyukov
Reviewed By: dvyukov
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D32299
llvm-svn: 300861
Summary:
This is part of D31947 that is being split into several smaller changes.
This one deals with all the minor changes, more specifically:
- Rename some variables and functions to make their purpose clearer;
- Reorder some code;
- Mark the hot termination incurring checks as `UNLIKELY`; if they happen, the
program will die anyway;
- Add a `getScudoChunk` method;
- Add an `eraseHeader` method to ScudoChunk that will clear a header with 0s;
- Add a parameter to `allocate` to know if the allocated chunk should be filled
with zeros. This allows `calloc` to not have to call
`GetActuallyAllocatedSize`; more changes to get rid of this function on the
hot paths will follow;
- reallocate was missing a check to verify that the pointer is properly
aligned on `MinAlignment`;
- The `Stats` in the secondary have to be protected by a mutex as the `Add`
and `Sub` methods are actually not atomic;
- The software CRC32 function was moved to the header to allow for inlining.
Reviewers: dvyukov, alekseyshl, kcc
Reviewed By: dvyukov
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D32242
llvm-svn: 300846
Summary:
The local and global quarantine sizes were not offering a distinction for
32-bit and 64-bit platforms. This is addressed with lower values for 32-bit.
When writing additional tests for the quarantine, it was discovered that when
calling some of the allocator interface function prior to any allocation
operation having occured, the test would crash due to the allocator not being
initialized. This was addressed by making sure the allocator is initialized
for those scenarios.
Relevant tests were added in interface.cpp and quarantine.cpp.
Last change being the removal of the extraneous link dependencies for the
tests thanks to rL293220, anf the addition of the gc-sections linker flag.
Reviewers: kcc, alekseyshl
Reviewed By: alekseyshl
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D29341
llvm-svn: 294037
Summary:
In an effort to getting rid of dependencies to external libraries, we are
replacing atomic PackedHeader use of std::atomic with Sanitizer's
atomic_uint64_t, which allows us to avoid -latomic.
Reviewers: kcc, phosek, alekseyshl
Reviewed By: alekseyshl
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D28864
llvm-svn: 292630
Summary:
ARM & AArch64 runtime detection for hardware support of CRC32 has been added
via check of the AT_HWVAL auxiliary vector.
Following Michal's suggestions in D28417, the CRC32 code has been further
changed and looks better now. When compiled with full relro (which is strongly
suggested to benefit from additional hardening), the weak symbol for
computeHardwareCRC32 is read-only and the assembly generated is fairly clean
and straight forward. As suggested, an additional optimization is to skip
the runtime check if SSE 4.2 has been enabled globally, as opposed to only
for scudo_crc32.cpp.
scudo_crc32.h has no purpose anymore and was removed.
Reviewers: alekseyshl, kcc, rengolin, mgorny, phosek
Reviewed By: rengolin, mgorny
Subscribers: aemerson, rengolin, llvm-commits
Differential Revision: https://reviews.llvm.org/D28574
llvm-svn: 292409
Making this variable non-static avoids the need for locking to ensure
that the initialization is thread-safe which in turns eliminates the
runtime dependency on libc++abi library (for __cxa_guard_acquire and
__cxa_guard_release) which makes it possible to link scudo against
pure C programs.
Differential Revision: https://reviews.llvm.org/D28757
llvm-svn: 292253
Summary:
As raised in D28304, enabling SSE 4.2 for the whole Scudo tree leads to the
emission of SSE 4.2 instructions everywhere, while the runtime checks only
applied to the CRC32 computing function.
This patch separates the CRC32 function taking advantage of the hardware into
its own file, and only enabled -msse4.2 for that file, if detected to be
supported by the compiler.
Another consequence of removing SSE4.2 globally is realizing that memcpy were
not being optimized, which turned out to be due to the -fno-builtin in
SANITIZER_COMMON_CFLAGS. So we now explicitely enable builtins for Scudo.
The resulting assembly looks good, with some CALLs are introduced instead of
the CRC32 code being inlined.
Reviewers: kcc, mgorny, alekseyshl
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D28417
llvm-svn: 291570
Disable the code appending -msse4.2 flag implicitly when the compiler
supports it. The compiler support for this flags do not indicate that
the underlying CPU will support SSE4.2, and passing it may result in
SSE4.2 code being emitted *implicitly*.
If the target platform supports SSE4.2 appropriately, the relevant bits
should be already enabled via -march= or equivalent. In this case
passing -msse4.2 is redundant.
If a runtime detection is desired (which seems to be a case with SCUDO),
then (as gcc manpage points out) the specific SSE4.2 needs to be
isolated into a separate file, the -msse4.2 flag can be forced only
for that file and the function defined in that file can only be called
when the CPU is determined to support SSE4.2.
This fixes SIGILL on SCUDO when it is compiled using gcc-5.4.
Differential Revision: https://reviews.llvm.org/D28304
llvm-svn: 291217
Summary:
With the recent changes to the Secondary, we use less bits for UnusedBytes,
which allows us in return to increase the bits used for Offset. That means
that we can use a Primary SizeClassMap allowing for a larger maximum size.
Reviewers: kcc, alekseyshl
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D27816
llvm-svn: 289838
Summary:
I atually had an integer overflow on 32-bit with D27428 that didn't reproduce
locally, as the test servers would manage allocate addresses in the 0xffffxxxx
range, which led to some issues when rounding addresses.
At this point, I feel that Scudo could benefit from having its own combined
allocator, as we don't get any benefit from the current one, but have to work
around some hurdles (alignment checks, rounding up that is no longer needed,
extraneous code).
Reviewers: kcc, alekseyshl
Subscribers: llvm-commits, kubabrecka
Differential Revision: https://reviews.llvm.org/D27681
llvm-svn: 289572
Summary:
The combined allocator rounds up the requested size with regard to the
alignment, which makes sense when being serviced by the primary as it comes
with alignment guarantees, but not with the secondary. For the rare case of
large alignments, it wastes memory, and entices unnecessarily large fields for
the Scudo header. With this patch, we pass the non-alignement-rounded-up size
to the secondary, and adapt the Scudo code for this change.
Reviewers: alekseyshl, kcc
Subscribers: llvm-commits, kubabrecka
Differential Revision: https://reviews.llvm.org/D27428
llvm-svn: 289088
Summary:
This update introduces i386 support for the Scudo Hardened Allocator, and
offers software alternatives for functions that used to require hardware
specific instruction sets. This should make porting to new architectures
easier.
Among the changes:
- The chunk header has been changed to accomodate the size limitations
encountered on 32-bit architectures. We now fit everything in 64-bit. This
was achieved by storing the amount of unused bytes in an allocation rather
than the size itself, as one can be deduced from the other with the help
of the GetActuallyAllocatedSize function. As it turns out, this header can
be used for both 64 and 32 bit, and as such we dropped the requirement for
the 128-bit compare and exchange instruction support (cmpxchg16b).
- Add 32-bit support for the checksum and the PRNG functions: if the SSE 4.2
instruction set is supported, use the 32-bit CRC32 instruction, and in the
XorShift128, use a 32-bit based state instead of 64-bit.
- Add software support for CRC32: if SSE 4.2 is not supported, fallback on a
software implementation.
- Modify tests that were not 32-bit compliant, and expand them to cover more
allocation and alignment sizes. The random shuffle test has been deactivated
for linux-i386 & linux-i686 as the 32-bit sanitizer allocator doesn't
currently randomize chunks.
Reviewers: alekseyshl, kcc
Subscribers: filcab, llvm-commits, tberghammer, danalbert, srhines, mgorny, modocache
Differential Revision: https://reviews.llvm.org/D26358
llvm-svn: 288255
Summary:
In order to avoid starting a separate thread to return unused memory to
the system (the thread interferes with process startup on Android,
Zygota waits for all threads to exit before fork, but this thread never
exits), try to return it right after free.
Reviewers: eugenis
Subscribers: cryptoad, filcab, danalbert, kubabrecka, llvm-commits
Patch by Aleksey Shlyapnikov.
Differential Revision: https://reviews.llvm.org/D27003
llvm-svn: 288091
Summary:
In order to support 32-bit platforms, we have to make some adjustments in
multiple locations, one of them being the Scudo chunk header. For it to fit on
64 bits (as a reminder, on x64 it's 128 bits), I had to crunch the space taken
by some of the fields. In order to keep the offset field small, the secondary
allocator was changed to accomodate aligned allocations for larger alignments,
hence making the offset constant for chunks serviced by it.
The resulting header candidate has been added, and further modifications to
allow 32-bit support will follow.
Another notable change is the addition of MaybeStartBackgroudThread() to allow
release of the memory to the OS.
Reviewers: kcc
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D25688
llvm-svn: 285209
Summary:
s/CHECK_LT/CHECK_LE/ in the secondary allocator, as under certain circumstances
Ptr + Size can be equal to MapEnd. This edge case was not found by the current
tests, so those were extended to be able to catch that.
Reviewers: kcc
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D25101
llvm-svn: 282913
Summary:
GetActuallyAllocatedSize() was not accounting for the last page of the mapping
being a guard page, and was returning the wrong number of actually allocated
bytes, which in turn would mess up with the realloc logic. Current tests didn't
find this as the size exercised was only serviced by the Primary.
Correct the issue by subtracting PageSize, and update the realloc test to
exercise paths in both the Primary and the Secondary.
Reviewers: kcc
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D24787
llvm-svn: 282030
Summary:
The Sanitizer Secondary Allocator was not entirely ideal was Scudo for several
reasons: decent amount of unneeded code, redundant checks already performed by
the front end, unneeded data structures, difficulty to properly protect the
secondary chunks header.
Given that the second allocator is pretty straight forward, Scudo will use its
own, trimming all the unneeded code off of the Sanitizer one. A significant
difference in terms of security is that now each secondary chunk is preceded
and followed by a guard page, thus mitigating overflows into and from the
chunk.
A test was added as well to illustrate the overflow & underflow situations
into the guard pages.
Reviewers: kcc
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D24737
llvm-svn: 281938
This patch builds on LLVM r279776.
In this patch I've done some cleanup and abstracted three common steps runtime components have in their CMakeLists files, and added a fourth.
The three steps I abstract are:
(1) Add a top-level target (i.e asan, msan, ...)
(2) Set the target properties for sorting files in IDE generators
(3) Make the compiler-rt target depend on the top-level target
The new step is to check if a command named "runtime_register_component" is defined, and to call it with the component name.
The runtime_register_component command is defined in llvm/runtimes/CMakeLists.txt, and presently just adds the component to a list of sub-components, which later gets used to generate target mappings.
With this patch a new workflow for runtimes builds is supported. The new workflow when building runtimes from the LLVM runtimes directory is:
> cmake [...]
> ninja runtimes-configure
> ninja asan
The "runtimes-configure" target builds all the dependencies for configuring the runtimes projects, and runs CMake on the runtimes projects. Running the runtimes CMake generates a list of targets to bind into the top-level CMake so subsequent build invocations will have access to some of Compiler-RT's targets through the top-level build.
Note: This patch does exclude some top-level targets from compiler-rt libraries because they either don't install files (sanitizer_common), or don't have a cooresponding `check` target (stats).
llvm-svn: 279863
Summary:
Currently, the Scudo Hardened Allocator only gets its flags via the SCUDO_OPTIONS environment variable.
With this patch, we offer the opportunity for programs to define their own options via __scudo_default_options() which behaves like __asan_default_options() (weak symbol).
A relevant test has been added as well, and the documentation updated accordingly.
I also used this patch as an opportunity to rename a few variables to comply with the LLVM naming scheme, and replaced a use of Report with dieWithMessage for consistency (and to avoid a callback).
Reviewers: llvm-commits, kcc
Differential Revision: https://reviews.llvm.org/D23018
llvm-svn: 277536
Summary:
This patch is a refactoring of the way cmake 'targets' are grouped.
It won't affect non-UI cmake-generators.
Clang/LLVM are using a structured way to group targets which ease
navigation through Visual Studio UI. The Compiler-RT projects
differ from the way Clang/LLVM are grouping targets.
This patch doesn't contain behavior changes.
Reviewers: kubabrecka, rnk
Subscribers: wang0109, llvm-commits, kubabrecka, chrisha
Differential Revision: http://reviews.llvm.org/D21952
llvm-svn: 275111
Summary:
This is an initial implementation of a Hardened Allocator based on Sanitizer Common's CombinedAllocator.
It aims at mitigating heap based vulnerabilities by adding several features to the base allocator, while staying relatively fast.
The following were implemented:
- additional consistency checks on the allocation function parameters and on the heap chunks;
- use of checksum protected chunk header, to detect corruption;
- randomness to the allocator base;
- delayed freelist (quarantine), to mitigate use after free and overall determinism.
Additional mitigations are in the works.
Reviewers: eugenis, aizatsky, pcc, krasin, vitalybuka, glider, dvyukov, kcc
Subscribers: kubabrecka, filcab, llvm-commits
Differential Revision: http://reviews.llvm.org/D20084
llvm-svn: 271968