llvm-project/compiler-rt/lib/scudo/scudo_allocator.cpp

821 lines
30 KiB
C++
Raw Normal View History

//===-- scudo_allocator.cpp -------------------------------------*- C++ -*-===//
//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//
//===----------------------------------------------------------------------===//
///
/// Scudo Hardened Allocator implementation.
/// It uses the sanitizer_common allocator as a base and aims at mitigating
/// heap corruption vulnerabilities. It provides a checksum-guarded chunk
/// header, a delayed free list, and additional sanity checks.
///
//===----------------------------------------------------------------------===//
#include "scudo_allocator.h"
[scudo] CRC32 optimizations Summary: This change optimizes several aspects of the checksum used for chunk headers. First, there is no point in checking the weak symbol `computeHardwareCRC32` everytime, it will either be there or not when we start, so check it once during initialization and set the checksum type accordingly. Then, the loading of `HashAlgorithm` for SSE versions (and ARM equivalent) was not optimized out, while not necessary. So I reshuffled that part of the code, which duplicates a tiny bit of code, but ends up in a much cleaner assembly (and faster as we avoid an extraneous load and some calls). The following code is the checksum at the end of `scudoMalloc` for x86_64 with full SSE 4.2, before: ``` mov rax, 0FFFFFFFFFFFFFFh shl r10, 38h mov edi, dword ptr cs:_ZN7__scudoL6CookieE ; __scudo::Cookie and r14, rax lea rsi, [r13-10h] movzx eax, cs:_ZN7__scudoL13HashAlgorithmE ; __scudo::HashAlgorithm or r14, r10 mov rbx, r14 xor bx, bx call _ZN7__scudo20computeHardwareCRC32Ejm ; __scudo::computeHardwareCRC32(uint,ulong) mov rsi, rbx mov edi, eax call _ZN7__scudo20computeHardwareCRC32Ejm ; __scudo::computeHardwareCRC32(uint,ulong) mov r14w, ax mov rax, r13 mov [r13-10h], r14 ``` After: ``` mov rax, cs:_ZN7__scudoL6CookieE ; __scudo::Cookie lea rcx, [rbx-10h] mov rdx, 0FFFFFFFFFFFFFFh and r14, rdx shl r9, 38h or r14, r9 crc32 eax, rcx mov rdx, r14 xor dx, dx mov eax, eax crc32 eax, rdx mov r14w, ax mov rax, rbx mov [rbx-10h], r14 ``` Reviewers: dvyukov, alekseyshl, kcc Reviewed By: alekseyshl Subscribers: aemerson, rengolin, llvm-commits Differential Revision: https://reviews.llvm.org/D32971 llvm-svn: 302538
2017-05-09 23:12:38 +08:00
#include "scudo_crc32.h"
#include "scudo_errors.h"
[scudo] Application & platform compatibility changes Summary: This patch changes a few (small) things around for compatibility purposes for the current Android & Fuchsia work: - `realloc`'ing some memory that was not allocated with `malloc`, `calloc` or `realloc`, while UB according to http://pubs.opengroup.org/onlinepubs/009695399/functions/realloc.html is more common that one would think. We now only check this if `DeallocationTypeMismatch` is set; change the "mismatch" error messages to be more homogeneous; - some sketchily written but widely used libraries expect a call to `realloc` to copy the usable size of the old chunk to the new one instead of the requested size. We have to begrundingly abide by this de-facto standard. This doesn't seem to impact security either way, unless someone comes up with something we didn't think about; - the CRC32 intrinsics for 64-bit take a 64-bit first argument. This is misleading as the upper 32 bits end up being ignored. This was also raising `-Wconversion` errors. Change things to take a `u32` as first argument. This also means we were (and are) only using 32 bits of the Cookie - not a big thing, but worth mentioning. - Includes-wise: prefer `stddef.h` to `cstddef`, move `scudo_flags.h` where it is actually needed. - Add tests for the memalign-realloc case, and the realloc-usable-size one. (Edited typos) Reviewers: alekseyshl Reviewed By: alekseyshl Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D36754 llvm-svn: 311018
2017-08-17 00:40:48 +08:00
#include "scudo_flags.h"
#include "scudo_interface_internal.h"
#include "scudo_tsd.h"
#include "scudo_utils.h"
#include "sanitizer_common/sanitizer_allocator_checks.h"
#include "sanitizer_common/sanitizer_allocator_interface.h"
#include "sanitizer_common/sanitizer_quarantine.h"
#ifdef GWP_ASAN_HOOKS
# include "gwp_asan/guarded_pool_allocator.h"
# include "gwp_asan/optional/backtrace.h"
# include "gwp_asan/optional/options_parser.h"
#endif // GWP_ASAN_HOOKS
#include <errno.h>
#include <string.h>
namespace __scudo {
// Global static cookie, initialized at start-up.
static u32 Cookie;
[scudo] 32-bit and hardware agnostic support Summary: This update introduces i386 support for the Scudo Hardened Allocator, and offers software alternatives for functions that used to require hardware specific instruction sets. This should make porting to new architectures easier. Among the changes: - The chunk header has been changed to accomodate the size limitations encountered on 32-bit architectures. We now fit everything in 64-bit. This was achieved by storing the amount of unused bytes in an allocation rather than the size itself, as one can be deduced from the other with the help of the GetActuallyAllocatedSize function. As it turns out, this header can be used for both 64 and 32 bit, and as such we dropped the requirement for the 128-bit compare and exchange instruction support (cmpxchg16b). - Add 32-bit support for the checksum and the PRNG functions: if the SSE 4.2 instruction set is supported, use the 32-bit CRC32 instruction, and in the XorShift128, use a 32-bit based state instead of 64-bit. - Add software support for CRC32: if SSE 4.2 is not supported, fallback on a software implementation. - Modify tests that were not 32-bit compliant, and expand them to cover more allocation and alignment sizes. The random shuffle test has been deactivated for linux-i386 & linux-i686 as the 32-bit sanitizer allocator doesn't currently randomize chunks. Reviewers: alekseyshl, kcc Subscribers: filcab, llvm-commits, tberghammer, danalbert, srhines, mgorny, modocache Differential Revision: https://reviews.llvm.org/D26358 llvm-svn: 288255
2016-12-01 01:32:20 +08:00
// We default to software CRC32 if the alternatives are not supported, either
// at compilation or at runtime.
static atomic_uint8_t HashAlgorithm = { CRC32Software };
[scudo] Application & platform compatibility changes Summary: This patch changes a few (small) things around for compatibility purposes for the current Android & Fuchsia work: - `realloc`'ing some memory that was not allocated with `malloc`, `calloc` or `realloc`, while UB according to http://pubs.opengroup.org/onlinepubs/009695399/functions/realloc.html is more common that one would think. We now only check this if `DeallocationTypeMismatch` is set; change the "mismatch" error messages to be more homogeneous; - some sketchily written but widely used libraries expect a call to `realloc` to copy the usable size of the old chunk to the new one instead of the requested size. We have to begrundingly abide by this de-facto standard. This doesn't seem to impact security either way, unless someone comes up with something we didn't think about; - the CRC32 intrinsics for 64-bit take a 64-bit first argument. This is misleading as the upper 32 bits end up being ignored. This was also raising `-Wconversion` errors. Change things to take a `u32` as first argument. This also means we were (and are) only using 32 bits of the Cookie - not a big thing, but worth mentioning. - Includes-wise: prefer `stddef.h` to `cstddef`, move `scudo_flags.h` where it is actually needed. - Add tests for the memalign-realloc case, and the realloc-usable-size one. (Edited typos) Reviewers: alekseyshl Reviewed By: alekseyshl Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D36754 llvm-svn: 311018
2017-08-17 00:40:48 +08:00
INLINE u32 computeCRC32(u32 Crc, uptr Value, uptr *Array, uptr ArraySize) {
[scudo] CRC32 optimizations Summary: This change optimizes several aspects of the checksum used for chunk headers. First, there is no point in checking the weak symbol `computeHardwareCRC32` everytime, it will either be there or not when we start, so check it once during initialization and set the checksum type accordingly. Then, the loading of `HashAlgorithm` for SSE versions (and ARM equivalent) was not optimized out, while not necessary. So I reshuffled that part of the code, which duplicates a tiny bit of code, but ends up in a much cleaner assembly (and faster as we avoid an extraneous load and some calls). The following code is the checksum at the end of `scudoMalloc` for x86_64 with full SSE 4.2, before: ``` mov rax, 0FFFFFFFFFFFFFFh shl r10, 38h mov edi, dword ptr cs:_ZN7__scudoL6CookieE ; __scudo::Cookie and r14, rax lea rsi, [r13-10h] movzx eax, cs:_ZN7__scudoL13HashAlgorithmE ; __scudo::HashAlgorithm or r14, r10 mov rbx, r14 xor bx, bx call _ZN7__scudo20computeHardwareCRC32Ejm ; __scudo::computeHardwareCRC32(uint,ulong) mov rsi, rbx mov edi, eax call _ZN7__scudo20computeHardwareCRC32Ejm ; __scudo::computeHardwareCRC32(uint,ulong) mov r14w, ax mov rax, r13 mov [r13-10h], r14 ``` After: ``` mov rax, cs:_ZN7__scudoL6CookieE ; __scudo::Cookie lea rcx, [rbx-10h] mov rdx, 0FFFFFFFFFFFFFFh and r14, rdx shl r9, 38h or r14, r9 crc32 eax, rcx mov rdx, r14 xor dx, dx mov eax, eax crc32 eax, rdx mov r14w, ax mov rax, rbx mov [rbx-10h], r14 ``` Reviewers: dvyukov, alekseyshl, kcc Reviewed By: alekseyshl Subscribers: aemerson, rengolin, llvm-commits Differential Revision: https://reviews.llvm.org/D32971 llvm-svn: 302538
2017-05-09 23:12:38 +08:00
// If the hardware CRC32 feature is defined here, it was enabled everywhere,
// as opposed to only for scudo_crc32.cpp. This means that other hardware
// specific instructions were likely emitted at other places, and as a
// result there is no reason to not use it here.
#if defined(__SSE4_2__) || defined(__ARM_FEATURE_CRC32)
[scudo] CRC32 optimizations Summary: This change optimizes several aspects of the checksum used for chunk headers. First, there is no point in checking the weak symbol `computeHardwareCRC32` everytime, it will either be there or not when we start, so check it once during initialization and set the checksum type accordingly. Then, the loading of `HashAlgorithm` for SSE versions (and ARM equivalent) was not optimized out, while not necessary. So I reshuffled that part of the code, which duplicates a tiny bit of code, but ends up in a much cleaner assembly (and faster as we avoid an extraneous load and some calls). The following code is the checksum at the end of `scudoMalloc` for x86_64 with full SSE 4.2, before: ``` mov rax, 0FFFFFFFFFFFFFFh shl r10, 38h mov edi, dword ptr cs:_ZN7__scudoL6CookieE ; __scudo::Cookie and r14, rax lea rsi, [r13-10h] movzx eax, cs:_ZN7__scudoL13HashAlgorithmE ; __scudo::HashAlgorithm or r14, r10 mov rbx, r14 xor bx, bx call _ZN7__scudo20computeHardwareCRC32Ejm ; __scudo::computeHardwareCRC32(uint,ulong) mov rsi, rbx mov edi, eax call _ZN7__scudo20computeHardwareCRC32Ejm ; __scudo::computeHardwareCRC32(uint,ulong) mov r14w, ax mov rax, r13 mov [r13-10h], r14 ``` After: ``` mov rax, cs:_ZN7__scudoL6CookieE ; __scudo::Cookie lea rcx, [rbx-10h] mov rdx, 0FFFFFFFFFFFFFFh and r14, rdx shl r9, 38h or r14, r9 crc32 eax, rcx mov rdx, r14 xor dx, dx mov eax, eax crc32 eax, rdx mov r14w, ax mov rax, rbx mov [rbx-10h], r14 ``` Reviewers: dvyukov, alekseyshl, kcc Reviewed By: alekseyshl Subscribers: aemerson, rengolin, llvm-commits Differential Revision: https://reviews.llvm.org/D32971 llvm-svn: 302538
2017-05-09 23:12:38 +08:00
Crc = CRC32_INTRINSIC(Crc, Value);
for (uptr i = 0; i < ArraySize; i++)
Crc = CRC32_INTRINSIC(Crc, Array[i]);
return Crc;
#else
[scudo] CRC32 optimizations Summary: This change optimizes several aspects of the checksum used for chunk headers. First, there is no point in checking the weak symbol `computeHardwareCRC32` everytime, it will either be there or not when we start, so check it once during initialization and set the checksum type accordingly. Then, the loading of `HashAlgorithm` for SSE versions (and ARM equivalent) was not optimized out, while not necessary. So I reshuffled that part of the code, which duplicates a tiny bit of code, but ends up in a much cleaner assembly (and faster as we avoid an extraneous load and some calls). The following code is the checksum at the end of `scudoMalloc` for x86_64 with full SSE 4.2, before: ``` mov rax, 0FFFFFFFFFFFFFFh shl r10, 38h mov edi, dword ptr cs:_ZN7__scudoL6CookieE ; __scudo::Cookie and r14, rax lea rsi, [r13-10h] movzx eax, cs:_ZN7__scudoL13HashAlgorithmE ; __scudo::HashAlgorithm or r14, r10 mov rbx, r14 xor bx, bx call _ZN7__scudo20computeHardwareCRC32Ejm ; __scudo::computeHardwareCRC32(uint,ulong) mov rsi, rbx mov edi, eax call _ZN7__scudo20computeHardwareCRC32Ejm ; __scudo::computeHardwareCRC32(uint,ulong) mov r14w, ax mov rax, r13 mov [r13-10h], r14 ``` After: ``` mov rax, cs:_ZN7__scudoL6CookieE ; __scudo::Cookie lea rcx, [rbx-10h] mov rdx, 0FFFFFFFFFFFFFFh and r14, rdx shl r9, 38h or r14, r9 crc32 eax, rcx mov rdx, r14 xor dx, dx mov eax, eax crc32 eax, rdx mov r14w, ax mov rax, rbx mov [rbx-10h], r14 ``` Reviewers: dvyukov, alekseyshl, kcc Reviewed By: alekseyshl Subscribers: aemerson, rengolin, llvm-commits Differential Revision: https://reviews.llvm.org/D32971 llvm-svn: 302538
2017-05-09 23:12:38 +08:00
if (atomic_load_relaxed(&HashAlgorithm) == CRC32Hardware) {
Crc = computeHardwareCRC32(Crc, Value);
for (uptr i = 0; i < ArraySize; i++)
Crc = computeHardwareCRC32(Crc, Array[i]);
return Crc;
}
Crc = computeSoftwareCRC32(Crc, Value);
for (uptr i = 0; i < ArraySize; i++)
Crc = computeSoftwareCRC32(Crc, Array[i]);
return Crc;
#endif // defined(__SSE4_2__) || defined(__ARM_FEATURE_CRC32)
[scudo] 32-bit and hardware agnostic support Summary: This update introduces i386 support for the Scudo Hardened Allocator, and offers software alternatives for functions that used to require hardware specific instruction sets. This should make porting to new architectures easier. Among the changes: - The chunk header has been changed to accomodate the size limitations encountered on 32-bit architectures. We now fit everything in 64-bit. This was achieved by storing the amount of unused bytes in an allocation rather than the size itself, as one can be deduced from the other with the help of the GetActuallyAllocatedSize function. As it turns out, this header can be used for both 64 and 32 bit, and as such we dropped the requirement for the 128-bit compare and exchange instruction support (cmpxchg16b). - Add 32-bit support for the checksum and the PRNG functions: if the SSE 4.2 instruction set is supported, use the 32-bit CRC32 instruction, and in the XorShift128, use a 32-bit based state instead of 64-bit. - Add software support for CRC32: if SSE 4.2 is not supported, fallback on a software implementation. - Modify tests that were not 32-bit compliant, and expand them to cover more allocation and alignment sizes. The random shuffle test has been deactivated for linux-i386 & linux-i686 as the 32-bit sanitizer allocator doesn't currently randomize chunks. Reviewers: alekseyshl, kcc Subscribers: filcab, llvm-commits, tberghammer, danalbert, srhines, mgorny, modocache Differential Revision: https://reviews.llvm.org/D26358 llvm-svn: 288255
2016-12-01 01:32:20 +08:00
}
static BackendT &getBackend();
namespace Chunk {
static INLINE AtomicPackedHeader *getAtomicHeader(void *Ptr) {
return reinterpret_cast<AtomicPackedHeader *>(reinterpret_cast<uptr>(Ptr) -
getHeaderSize());
}
static INLINE
const AtomicPackedHeader *getConstAtomicHeader(const void *Ptr) {
return reinterpret_cast<const AtomicPackedHeader *>(
reinterpret_cast<uptr>(Ptr) - getHeaderSize());
}
static INLINE bool isAligned(const void *Ptr) {
return IsAligned(reinterpret_cast<uptr>(Ptr), MinAlignment);
}
// We can't use the offset member of the chunk itself, as we would double
// fetch it without any warranty that it wouldn't have been tampered. To
// prevent this, we work with a local copy of the header.
static INLINE void *getBackendPtr(const void *Ptr, UnpackedHeader *Header) {
return reinterpret_cast<void *>(reinterpret_cast<uptr>(Ptr) -
getHeaderSize() - (Header->Offset << MinAlignmentLog));
}
[scudo] 32-bit and hardware agnostic support Summary: This update introduces i386 support for the Scudo Hardened Allocator, and offers software alternatives for functions that used to require hardware specific instruction sets. This should make porting to new architectures easier. Among the changes: - The chunk header has been changed to accomodate the size limitations encountered on 32-bit architectures. We now fit everything in 64-bit. This was achieved by storing the amount of unused bytes in an allocation rather than the size itself, as one can be deduced from the other with the help of the GetActuallyAllocatedSize function. As it turns out, this header can be used for both 64 and 32 bit, and as such we dropped the requirement for the 128-bit compare and exchange instruction support (cmpxchg16b). - Add 32-bit support for the checksum and the PRNG functions: if the SSE 4.2 instruction set is supported, use the 32-bit CRC32 instruction, and in the XorShift128, use a 32-bit based state instead of 64-bit. - Add software support for CRC32: if SSE 4.2 is not supported, fallback on a software implementation. - Modify tests that were not 32-bit compliant, and expand them to cover more allocation and alignment sizes. The random shuffle test has been deactivated for linux-i386 & linux-i686 as the 32-bit sanitizer allocator doesn't currently randomize chunks. Reviewers: alekseyshl, kcc Subscribers: filcab, llvm-commits, tberghammer, danalbert, srhines, mgorny, modocache Differential Revision: https://reviews.llvm.org/D26358 llvm-svn: 288255
2016-12-01 01:32:20 +08:00
// Returns the usable size for a chunk, meaning the amount of bytes from the
// beginning of the user data to the end of the backend allocated chunk.
static INLINE uptr getUsableSize(const void *Ptr, UnpackedHeader *Header) {
const uptr ClassId = Header->ClassId;
if (ClassId)
return PrimaryT::ClassIdToSize(ClassId) - getHeaderSize() -
(Header->Offset << MinAlignmentLog);
return SecondaryT::GetActuallyAllocatedSize(
getBackendPtr(Ptr, Header)) - getHeaderSize();
}
// Returns the size the user requested when allocating the chunk.
static INLINE uptr getSize(const void *Ptr, UnpackedHeader *Header) {
const uptr SizeOrUnusedBytes = Header->SizeOrUnusedBytes;
if (Header->ClassId)
return SizeOrUnusedBytes;
return SecondaryT::GetActuallyAllocatedSize(
getBackendPtr(Ptr, Header)) - getHeaderSize() - SizeOrUnusedBytes;
[scudo] 32-bit and hardware agnostic support Summary: This update introduces i386 support for the Scudo Hardened Allocator, and offers software alternatives for functions that used to require hardware specific instruction sets. This should make porting to new architectures easier. Among the changes: - The chunk header has been changed to accomodate the size limitations encountered on 32-bit architectures. We now fit everything in 64-bit. This was achieved by storing the amount of unused bytes in an allocation rather than the size itself, as one can be deduced from the other with the help of the GetActuallyAllocatedSize function. As it turns out, this header can be used for both 64 and 32 bit, and as such we dropped the requirement for the 128-bit compare and exchange instruction support (cmpxchg16b). - Add 32-bit support for the checksum and the PRNG functions: if the SSE 4.2 instruction set is supported, use the 32-bit CRC32 instruction, and in the XorShift128, use a 32-bit based state instead of 64-bit. - Add software support for CRC32: if SSE 4.2 is not supported, fallback on a software implementation. - Modify tests that were not 32-bit compliant, and expand them to cover more allocation and alignment sizes. The random shuffle test has been deactivated for linux-i386 & linux-i686 as the 32-bit sanitizer allocator doesn't currently randomize chunks. Reviewers: alekseyshl, kcc Subscribers: filcab, llvm-commits, tberghammer, danalbert, srhines, mgorny, modocache Differential Revision: https://reviews.llvm.org/D26358 llvm-svn: 288255
2016-12-01 01:32:20 +08:00
}
// Compute the checksum of the chunk pointer and its header.
static INLINE u16 computeChecksum(const void *Ptr, UnpackedHeader *Header) {
[scudo] 32-bit and hardware agnostic support Summary: This update introduces i386 support for the Scudo Hardened Allocator, and offers software alternatives for functions that used to require hardware specific instruction sets. This should make porting to new architectures easier. Among the changes: - The chunk header has been changed to accomodate the size limitations encountered on 32-bit architectures. We now fit everything in 64-bit. This was achieved by storing the amount of unused bytes in an allocation rather than the size itself, as one can be deduced from the other with the help of the GetActuallyAllocatedSize function. As it turns out, this header can be used for both 64 and 32 bit, and as such we dropped the requirement for the 128-bit compare and exchange instruction support (cmpxchg16b). - Add 32-bit support for the checksum and the PRNG functions: if the SSE 4.2 instruction set is supported, use the 32-bit CRC32 instruction, and in the XorShift128, use a 32-bit based state instead of 64-bit. - Add software support for CRC32: if SSE 4.2 is not supported, fallback on a software implementation. - Modify tests that were not 32-bit compliant, and expand them to cover more allocation and alignment sizes. The random shuffle test has been deactivated for linux-i386 & linux-i686 as the 32-bit sanitizer allocator doesn't currently randomize chunks. Reviewers: alekseyshl, kcc Subscribers: filcab, llvm-commits, tberghammer, danalbert, srhines, mgorny, modocache Differential Revision: https://reviews.llvm.org/D26358 llvm-svn: 288255
2016-12-01 01:32:20 +08:00
UnpackedHeader ZeroChecksumHeader = *Header;
ZeroChecksumHeader.Checksum = 0;
uptr HeaderHolder[sizeof(UnpackedHeader) / sizeof(uptr)];
memcpy(&HeaderHolder, &ZeroChecksumHeader, sizeof(HeaderHolder));
const u32 Crc = computeCRC32(Cookie, reinterpret_cast<uptr>(Ptr),
HeaderHolder, ARRAY_SIZE(HeaderHolder));
return static_cast<u16>(Crc);
[scudo] 32-bit and hardware agnostic support Summary: This update introduces i386 support for the Scudo Hardened Allocator, and offers software alternatives for functions that used to require hardware specific instruction sets. This should make porting to new architectures easier. Among the changes: - The chunk header has been changed to accomodate the size limitations encountered on 32-bit architectures. We now fit everything in 64-bit. This was achieved by storing the amount of unused bytes in an allocation rather than the size itself, as one can be deduced from the other with the help of the GetActuallyAllocatedSize function. As it turns out, this header can be used for both 64 and 32 bit, and as such we dropped the requirement for the 128-bit compare and exchange instruction support (cmpxchg16b). - Add 32-bit support for the checksum and the PRNG functions: if the SSE 4.2 instruction set is supported, use the 32-bit CRC32 instruction, and in the XorShift128, use a 32-bit based state instead of 64-bit. - Add software support for CRC32: if SSE 4.2 is not supported, fallback on a software implementation. - Modify tests that were not 32-bit compliant, and expand them to cover more allocation and alignment sizes. The random shuffle test has been deactivated for linux-i386 & linux-i686 as the 32-bit sanitizer allocator doesn't currently randomize chunks. Reviewers: alekseyshl, kcc Subscribers: filcab, llvm-commits, tberghammer, danalbert, srhines, mgorny, modocache Differential Revision: https://reviews.llvm.org/D26358 llvm-svn: 288255
2016-12-01 01:32:20 +08:00
}
// Checks the validity of a chunk by verifying its checksum. It doesn't
// incur termination in the event of an invalid chunk.
static INLINE bool isValid(const void *Ptr) {
PackedHeader NewPackedHeader =
atomic_load_relaxed(getConstAtomicHeader(Ptr));
UnpackedHeader NewUnpackedHeader =
bit_cast<UnpackedHeader>(NewPackedHeader);
return (NewUnpackedHeader.Checksum ==
computeChecksum(Ptr, &NewUnpackedHeader));
}
// Ensure that ChunkAvailable is 0, so that if a 0 checksum is ever valid
// for a fully nulled out header, its state will be available anyway.
COMPILER_CHECK(ChunkAvailable == 0);
// Loads and unpacks the header, verifying the checksum in the process.
static INLINE
void loadHeader(const void *Ptr, UnpackedHeader *NewUnpackedHeader) {
PackedHeader NewPackedHeader =
atomic_load_relaxed(getConstAtomicHeader(Ptr));
*NewUnpackedHeader = bit_cast<UnpackedHeader>(NewPackedHeader);
if (UNLIKELY(NewUnpackedHeader->Checksum !=
computeChecksum(Ptr, NewUnpackedHeader)))
dieWithMessage("corrupted chunk header at address %p\n", Ptr);
}
// Packs and stores the header, computing the checksum in the process.
static INLINE void storeHeader(void *Ptr, UnpackedHeader *NewUnpackedHeader) {
NewUnpackedHeader->Checksum = computeChecksum(Ptr, NewUnpackedHeader);
PackedHeader NewPackedHeader = bit_cast<PackedHeader>(*NewUnpackedHeader);
atomic_store_relaxed(getAtomicHeader(Ptr), NewPackedHeader);
}
// Packs and stores the header, computing the checksum in the process. We
// compare the current header with the expected provided one to ensure that
// we are not being raced by a corruption occurring in another thread.
static INLINE void compareExchangeHeader(void *Ptr,
UnpackedHeader *NewUnpackedHeader,
UnpackedHeader *OldUnpackedHeader) {
NewUnpackedHeader->Checksum = computeChecksum(Ptr, NewUnpackedHeader);
PackedHeader NewPackedHeader = bit_cast<PackedHeader>(*NewUnpackedHeader);
PackedHeader OldPackedHeader = bit_cast<PackedHeader>(*OldUnpackedHeader);
if (UNLIKELY(!atomic_compare_exchange_strong(
getAtomicHeader(Ptr), &OldPackedHeader, NewPackedHeader,
memory_order_relaxed)))
dieWithMessage("race on chunk header at address %p\n", Ptr);
}
} // namespace Chunk
struct QuarantineCallback {
explicit QuarantineCallback(AllocatorCacheT *Cache)
: Cache_(Cache) {}
// Chunk recycling function, returns a quarantined chunk to the backend,
// first making sure it hasn't been tampered with.
void Recycle(void *Ptr) {
UnpackedHeader Header;
Chunk::loadHeader(Ptr, &Header);
if (UNLIKELY(Header.State != ChunkQuarantine))
dieWithMessage("invalid chunk state when recycling address %p\n", Ptr);
UnpackedHeader NewHeader = Header;
NewHeader.State = ChunkAvailable;
Chunk::compareExchangeHeader(Ptr, &NewHeader, &Header);
void *BackendPtr = Chunk::getBackendPtr(Ptr, &Header);
if (Header.ClassId)
getBackend().deallocatePrimary(Cache_, BackendPtr, Header.ClassId);
else
getBackend().deallocateSecondary(BackendPtr);
}
// Internal quarantine allocation and deallocation functions. We first check
// that the batches are indeed serviced by the Primary.
// TODO(kostyak): figure out the best way to protect the batches.
void *Allocate(uptr Size) {
const uptr BatchClassId = SizeClassMap::ClassID(sizeof(QuarantineBatch));
return getBackend().allocatePrimary(Cache_, BatchClassId);
}
void Deallocate(void *Ptr) {
const uptr BatchClassId = SizeClassMap::ClassID(sizeof(QuarantineBatch));
getBackend().deallocatePrimary(Cache_, Ptr, BatchClassId);
}
AllocatorCacheT *Cache_;
COMPILER_CHECK(sizeof(QuarantineBatch) < SizeClassMap::kMaxSize);
};
typedef Quarantine<QuarantineCallback, void> QuarantineT;
typedef QuarantineT::Cache QuarantineCacheT;
COMPILER_CHECK(sizeof(QuarantineCacheT) <=
sizeof(ScudoTSD::QuarantineCachePlaceHolder));
QuarantineCacheT *getQuarantineCache(ScudoTSD *TSD) {
return reinterpret_cast<QuarantineCacheT *>(TSD->QuarantineCachePlaceHolder);
}
#ifdef GWP_ASAN_HOOKS
static gwp_asan::GuardedPoolAllocator GuardedAlloc;
#endif // GWP_ASAN_HOOKS
struct Allocator {
static const uptr MaxAllowedMallocSize =
FIRST_32_SECOND_64(2UL << 30, 1ULL << 40);
BackendT Backend;
QuarantineT Quarantine;
[scudo] Quarantine overhaul Summary: First, some context. The main feedback we get about the quarantine is that it's too memory hungry. A single MB of quarantine will have an impact of 3 to 4MB of PSS/RSS, and things quickly get out of hand in terms of memory usage, and the quarantine ends up disabled. The main objective of the quarantine is to protect from use-after-free exploitation by making it harder for an attacker to reallocate a controlled chunk in place of the targeted freed chunk. This is achieved by not making it available to the backend right away for reuse, but holding it a little while. Historically, what has usually been the target of such attacks was objects, where vtable pointers or other function pointers could constitute a valuable targeti to replace. Those are usually on the smaller side. There is barely any advantage in putting the quarantine several megabytes of RGB data or the like. Now for the patch. This patch introduces a new way the Quarantine behaves in Scudo. First of all, the size of the Quarantine will be defined in KB instead of MB, then we introduce a new option: the size up to which (lower than or equal to) a chunk will be quarantined. This way, we only quarantine smaller chunks, and the size of the quarantine remains manageable. It also prevents someone from triggering a recycle by allocating something huge. We default to 512 bytes on 32-bit and 2048 bytes on 64-bit platforms. In details, the patches includes the following: - introduce `QuarantineSizeKb`, but honor `QuarantineSizeMb` if set to fall back to the old behavior (meaning no threshold in that case); `QuarantineSizeMb` is described as deprecated in the options descriptios; documentation update will follow; - introduce `QuarantineChunksUpToSize`, the new threshold value; - update the `quarantine.cpp` test, and other tests using `QuarantineSizeMb`; - remove `AllocatorOptions::copyTo`, it wasn't used; - slightly change the logic around `quarantineOrDeallocateChunk` to accomodate for the new logic; rename a couple of variables there as well; Rewriting the tests, I found a somewhat annoying bug where non-default aligned chunks would account for more than needed when placed in the quarantine due to `<< MinAlignment` instead of `<< MinAlignmentLog`. This is fixed and tested for now. Reviewers: alekseyshl, kcc Reviewed By: alekseyshl Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D35694 llvm-svn: 308884
2017-07-24 23:29:38 +08:00
u32 QuarantineChunksUpToSize;
bool DeallocationTypeMismatch;
bool ZeroContents;
bool DeleteSizeMismatch;
bool CheckRssLimit;
uptr HardRssLimitMb;
uptr SoftRssLimitMb;
atomic_uint8_t RssLimitExceeded;
atomic_uint64_t RssLastCheckedAtNS;
explicit Allocator(LinkerInitialized)
: Quarantine(LINKER_INITIALIZED) {}
NOINLINE void performSanityChecks();
void init() {
SanitizerToolName = "Scudo";
PrimaryAllocatorName = "ScudoPrimary";
SecondaryAllocatorName = "ScudoSecondary";
initFlags();
performSanityChecks();
// Check if hardware CRC32 is supported in the binary and by the platform,
// if so, opt for the CRC32 hardware version of the checksum.
if (&computeHardwareCRC32 && hasHardwareCRC32())
atomic_store_relaxed(&HashAlgorithm, CRC32Hardware);
SetAllocatorMayReturnNull(common_flags()->allocator_may_return_null);
Backend.init(common_flags()->allocator_release_to_os_interval_ms);
HardRssLimitMb = common_flags()->hard_rss_limit_mb;
SoftRssLimitMb = common_flags()->soft_rss_limit_mb;
Quarantine.Init(
static_cast<uptr>(getFlags()->QuarantineSizeKb) << 10,
static_cast<uptr>(getFlags()->ThreadLocalQuarantineSizeKb) << 10);
QuarantineChunksUpToSize = (Quarantine.GetCacheSize() == 0) ? 0 :
getFlags()->QuarantineChunksUpToSize;
DeallocationTypeMismatch = getFlags()->DeallocationTypeMismatch;
DeleteSizeMismatch = getFlags()->DeleteSizeMismatch;
ZeroContents = getFlags()->ZeroContents;
if (UNLIKELY(!GetRandom(reinterpret_cast<void *>(&Cookie), sizeof(Cookie),
/*blocking=*/false))) {
Cookie = static_cast<u32>((NanoTime() >> 12) ^
(reinterpret_cast<uptr>(this) >> 4));
}
CheckRssLimit = HardRssLimitMb || SoftRssLimitMb;
if (CheckRssLimit)
atomic_store_relaxed(&RssLastCheckedAtNS, MonotonicNanoTime());
}
// Helper function that checks for a valid Scudo chunk. nullptr isn't.
bool isValidPointer(const void *Ptr) {
initThreadMaybe();
if (UNLIKELY(!Ptr))
[scudo] 32-bit and hardware agnostic support Summary: This update introduces i386 support for the Scudo Hardened Allocator, and offers software alternatives for functions that used to require hardware specific instruction sets. This should make porting to new architectures easier. Among the changes: - The chunk header has been changed to accomodate the size limitations encountered on 32-bit architectures. We now fit everything in 64-bit. This was achieved by storing the amount of unused bytes in an allocation rather than the size itself, as one can be deduced from the other with the help of the GetActuallyAllocatedSize function. As it turns out, this header can be used for both 64 and 32 bit, and as such we dropped the requirement for the 128-bit compare and exchange instruction support (cmpxchg16b). - Add 32-bit support for the checksum and the PRNG functions: if the SSE 4.2 instruction set is supported, use the 32-bit CRC32 instruction, and in the XorShift128, use a 32-bit based state instead of 64-bit. - Add software support for CRC32: if SSE 4.2 is not supported, fallback on a software implementation. - Modify tests that were not 32-bit compliant, and expand them to cover more allocation and alignment sizes. The random shuffle test has been deactivated for linux-i386 & linux-i686 as the 32-bit sanitizer allocator doesn't currently randomize chunks. Reviewers: alekseyshl, kcc Subscribers: filcab, llvm-commits, tberghammer, danalbert, srhines, mgorny, modocache Differential Revision: https://reviews.llvm.org/D26358 llvm-svn: 288255
2016-12-01 01:32:20 +08:00
return false;
if (!Chunk::isAligned(Ptr))
return false;
return Chunk::isValid(Ptr);
[scudo] 32-bit and hardware agnostic support Summary: This update introduces i386 support for the Scudo Hardened Allocator, and offers software alternatives for functions that used to require hardware specific instruction sets. This should make porting to new architectures easier. Among the changes: - The chunk header has been changed to accomodate the size limitations encountered on 32-bit architectures. We now fit everything in 64-bit. This was achieved by storing the amount of unused bytes in an allocation rather than the size itself, as one can be deduced from the other with the help of the GetActuallyAllocatedSize function. As it turns out, this header can be used for both 64 and 32 bit, and as such we dropped the requirement for the 128-bit compare and exchange instruction support (cmpxchg16b). - Add 32-bit support for the checksum and the PRNG functions: if the SSE 4.2 instruction set is supported, use the 32-bit CRC32 instruction, and in the XorShift128, use a 32-bit based state instead of 64-bit. - Add software support for CRC32: if SSE 4.2 is not supported, fallback on a software implementation. - Modify tests that were not 32-bit compliant, and expand them to cover more allocation and alignment sizes. The random shuffle test has been deactivated for linux-i386 & linux-i686 as the 32-bit sanitizer allocator doesn't currently randomize chunks. Reviewers: alekseyshl, kcc Subscribers: filcab, llvm-commits, tberghammer, danalbert, srhines, mgorny, modocache Differential Revision: https://reviews.llvm.org/D26358 llvm-svn: 288255
2016-12-01 01:32:20 +08:00
}
NOINLINE bool isRssLimitExceeded();
// Allocates a chunk.
void *allocate(uptr Size, uptr Alignment, AllocType Type,
bool ForceZeroContents = false) {
initThreadMaybe();
#ifdef GWP_ASAN_HOOKS
if (UNLIKELY(GuardedAlloc.shouldSample())) {
if (void *Ptr = GuardedAlloc.allocate(Size))
return Ptr;
}
#endif // GWP_ASAN_HOOKS
if (UNLIKELY(Alignment > MaxAlignment)) {
if (AllocatorMayReturnNull())
return nullptr;
reportAllocationAlignmentTooBig(Alignment, MaxAlignment);
}
if (UNLIKELY(Alignment < MinAlignment))
Alignment = MinAlignment;
const uptr NeededSize = RoundUpTo(Size ? Size : 1, MinAlignment) +
Chunk::getHeaderSize();
const uptr AlignedSize = (Alignment > MinAlignment) ?
NeededSize + (Alignment - Chunk::getHeaderSize()) : NeededSize;
if (UNLIKELY(Size >= MaxAllowedMallocSize) ||
UNLIKELY(AlignedSize >= MaxAllowedMallocSize)) {
if (AllocatorMayReturnNull())
return nullptr;
reportAllocationSizeTooBig(Size, AlignedSize, MaxAllowedMallocSize);
}
if (CheckRssLimit && UNLIKELY(isRssLimitExceeded())) {
if (AllocatorMayReturnNull())
return nullptr;
reportRssLimitExceeded();
}
// Primary and Secondary backed allocations have a different treatment. We
// deal with alignment requirements of Primary serviced allocations here,
// but the Secondary will take care of its own alignment needs.
void *BackendPtr;
uptr BackendSize;
u8 ClassId;
if (PrimaryT::CanAllocate(AlignedSize, MinAlignment)) {
BackendSize = AlignedSize;
ClassId = SizeClassMap::ClassID(BackendSize);
[scudo] Improve the scalability of the shared TSD model Summary: The shared TSD model in its current form doesn't scale. Here is an example of rpc2-benchmark (with default parameters, which is threading heavy) on a 72-core machines (defaulting to a `CompactSizeClassMap` and no Quarantine): - with tcmalloc: 337K reqs/sec, peak RSS of 338MB; - with scudo (exclusive): 321K reqs/sec, peak RSS of 637MB; - with scudo (shared): 241K reqs/sec, peak RSS of 324MB. This isn't great, since the exclusive model uses a lot of memory, while the shared model doesn't even come close to be competitive. This is mostly due to the fact that we are consistently scanning the TSD pool starting at index 0 for an available TSD, which can result in a lot of failed lock attempts, and touching some memory that needs not be touched. This CL attempts to make things better in most situations: - first, use a thread local variable on Linux (intead of pthread APIs) to store the current TSD in the shared model; - move the locking boolean out of the TSD: this allows the compiler to use a register and potentially optimize out a branch instead of reading it from the TSD everytime (we also save a tiny bit of memory per TSD); - 64-bit atomic operations on 32-bit ARM platforms happen to be expensive: so store the `Precedence` in a `uptr` instead of a `u64`. We lose some nanoseconds of precision and we'll wrap around at some point, but the benefit is worth it; - change a `CHECK` to a `DCHECK`: this should never happen, but if something is ever terribly wrong, we'll crash on a near null AV if the TSD happens to be null; - based on an idea by dvyukov@, we are implementing a bound random scan for an available TSD. This requires computing the coprimes for the number of TSDs, and attempting to lock up to 4 TSDs in an random order before falling back to the current one. This is obviously slightly more expansive when we have just 2 TSDs (barely noticeable) but is otherwise beneficial. The `Precedence` still basically corresponds to the moment of the first contention on a TSD. To seed on random choice, we use the precedence of the current TSD since it is very likely to be non-zero (since we are in the slow path after a failed `tryLock`) With those modifications, the benchmark yields to: - with scudo (shared): 330K reqs/sec, peak RSS of 327MB. So the shared model for this specific situation not only becomes competitive but outperforms the exclusive model. I experimented with some values greater than 4 for the number of TSDs to attempt to lock and it yielded a decrease in QPS. Just sticking with the current TSD is also a tad slower. Numbers on platforms with less cores (eg: Android) remain similar. Reviewers: alekseyshl, dvyukov, javed.absar Reviewed By: alekseyshl, dvyukov Subscribers: srhines, kristof.beyls, delcypher, llvm-commits, #sanitizers Differential Revision: https://reviews.llvm.org/D47289 llvm-svn: 334410
2018-06-11 22:50:31 +08:00
bool UnlockRequired;
ScudoTSD *TSD = getTSDAndLock(&UnlockRequired);
BackendPtr = Backend.allocatePrimary(&TSD->Cache, ClassId);
[scudo] Improve the scalability of the shared TSD model Summary: The shared TSD model in its current form doesn't scale. Here is an example of rpc2-benchmark (with default parameters, which is threading heavy) on a 72-core machines (defaulting to a `CompactSizeClassMap` and no Quarantine): - with tcmalloc: 337K reqs/sec, peak RSS of 338MB; - with scudo (exclusive): 321K reqs/sec, peak RSS of 637MB; - with scudo (shared): 241K reqs/sec, peak RSS of 324MB. This isn't great, since the exclusive model uses a lot of memory, while the shared model doesn't even come close to be competitive. This is mostly due to the fact that we are consistently scanning the TSD pool starting at index 0 for an available TSD, which can result in a lot of failed lock attempts, and touching some memory that needs not be touched. This CL attempts to make things better in most situations: - first, use a thread local variable on Linux (intead of pthread APIs) to store the current TSD in the shared model; - move the locking boolean out of the TSD: this allows the compiler to use a register and potentially optimize out a branch instead of reading it from the TSD everytime (we also save a tiny bit of memory per TSD); - 64-bit atomic operations on 32-bit ARM platforms happen to be expensive: so store the `Precedence` in a `uptr` instead of a `u64`. We lose some nanoseconds of precision and we'll wrap around at some point, but the benefit is worth it; - change a `CHECK` to a `DCHECK`: this should never happen, but if something is ever terribly wrong, we'll crash on a near null AV if the TSD happens to be null; - based on an idea by dvyukov@, we are implementing a bound random scan for an available TSD. This requires computing the coprimes for the number of TSDs, and attempting to lock up to 4 TSDs in an random order before falling back to the current one. This is obviously slightly more expansive when we have just 2 TSDs (barely noticeable) but is otherwise beneficial. The `Precedence` still basically corresponds to the moment of the first contention on a TSD. To seed on random choice, we use the precedence of the current TSD since it is very likely to be non-zero (since we are in the slow path after a failed `tryLock`) With those modifications, the benchmark yields to: - with scudo (shared): 330K reqs/sec, peak RSS of 327MB. So the shared model for this specific situation not only becomes competitive but outperforms the exclusive model. I experimented with some values greater than 4 for the number of TSDs to attempt to lock and it yielded a decrease in QPS. Just sticking with the current TSD is also a tad slower. Numbers on platforms with less cores (eg: Android) remain similar. Reviewers: alekseyshl, dvyukov, javed.absar Reviewed By: alekseyshl, dvyukov Subscribers: srhines, kristof.beyls, delcypher, llvm-commits, #sanitizers Differential Revision: https://reviews.llvm.org/D47289 llvm-svn: 334410
2018-06-11 22:50:31 +08:00
if (UnlockRequired)
TSD->unlock();
} else {
BackendSize = NeededSize;
ClassId = 0;
BackendPtr = Backend.allocateSecondary(BackendSize, Alignment);
}
if (UNLIKELY(!BackendPtr)) {
SetAllocatorOutOfMemory();
if (AllocatorMayReturnNull())
return nullptr;
reportOutOfMemory(Size);
}
[scudo] 32-bit and hardware agnostic support Summary: This update introduces i386 support for the Scudo Hardened Allocator, and offers software alternatives for functions that used to require hardware specific instruction sets. This should make porting to new architectures easier. Among the changes: - The chunk header has been changed to accomodate the size limitations encountered on 32-bit architectures. We now fit everything in 64-bit. This was achieved by storing the amount of unused bytes in an allocation rather than the size itself, as one can be deduced from the other with the help of the GetActuallyAllocatedSize function. As it turns out, this header can be used for both 64 and 32 bit, and as such we dropped the requirement for the 128-bit compare and exchange instruction support (cmpxchg16b). - Add 32-bit support for the checksum and the PRNG functions: if the SSE 4.2 instruction set is supported, use the 32-bit CRC32 instruction, and in the XorShift128, use a 32-bit based state instead of 64-bit. - Add software support for CRC32: if SSE 4.2 is not supported, fallback on a software implementation. - Modify tests that were not 32-bit compliant, and expand them to cover more allocation and alignment sizes. The random shuffle test has been deactivated for linux-i386 & linux-i686 as the 32-bit sanitizer allocator doesn't currently randomize chunks. Reviewers: alekseyshl, kcc Subscribers: filcab, llvm-commits, tberghammer, danalbert, srhines, mgorny, modocache Differential Revision: https://reviews.llvm.org/D26358 llvm-svn: 288255
2016-12-01 01:32:20 +08:00
// If requested, we will zero out the entire contents of the returned chunk.
if ((ForceZeroContents || ZeroContents) && ClassId)
memset(BackendPtr, 0, PrimaryT::ClassIdToSize(ClassId));
[scudo] 32-bit and hardware agnostic support Summary: This update introduces i386 support for the Scudo Hardened Allocator, and offers software alternatives for functions that used to require hardware specific instruction sets. This should make porting to new architectures easier. Among the changes: - The chunk header has been changed to accomodate the size limitations encountered on 32-bit architectures. We now fit everything in 64-bit. This was achieved by storing the amount of unused bytes in an allocation rather than the size itself, as one can be deduced from the other with the help of the GetActuallyAllocatedSize function. As it turns out, this header can be used for both 64 and 32 bit, and as such we dropped the requirement for the 128-bit compare and exchange instruction support (cmpxchg16b). - Add 32-bit support for the checksum and the PRNG functions: if the SSE 4.2 instruction set is supported, use the 32-bit CRC32 instruction, and in the XorShift128, use a 32-bit based state instead of 64-bit. - Add software support for CRC32: if SSE 4.2 is not supported, fallback on a software implementation. - Modify tests that were not 32-bit compliant, and expand them to cover more allocation and alignment sizes. The random shuffle test has been deactivated for linux-i386 & linux-i686 as the 32-bit sanitizer allocator doesn't currently randomize chunks. Reviewers: alekseyshl, kcc Subscribers: filcab, llvm-commits, tberghammer, danalbert, srhines, mgorny, modocache Differential Revision: https://reviews.llvm.org/D26358 llvm-svn: 288255
2016-12-01 01:32:20 +08:00
UnpackedHeader Header = {};
uptr UserPtr = reinterpret_cast<uptr>(BackendPtr) + Chunk::getHeaderSize();
if (UNLIKELY(!IsAligned(UserPtr, Alignment))) {
// Since the Secondary takes care of alignment, a non-aligned pointer
// means it is from the Primary. It is also the only case where the offset
// field of the header would be non-zero.
DCHECK(ClassId);
const uptr AlignedUserPtr = RoundUpTo(UserPtr, Alignment);
Header.Offset = (AlignedUserPtr - UserPtr) >> MinAlignmentLog;
UserPtr = AlignedUserPtr;
}
DCHECK_LE(UserPtr + Size, reinterpret_cast<uptr>(BackendPtr) + BackendSize);
Header.State = ChunkAllocated;
Header.AllocType = Type;
if (ClassId) {
Header.ClassId = ClassId;
Header.SizeOrUnusedBytes = Size;
} else {
// The secondary fits the allocations to a page, so the amount of unused
// bytes is the difference between the end of the user allocation and the
// next page boundary.
const uptr PageSize = GetPageSizeCached();
const uptr TrailingBytes = (UserPtr + Size) & (PageSize - 1);
if (TrailingBytes)
Header.SizeOrUnusedBytes = PageSize - TrailingBytes;
}
void *Ptr = reinterpret_cast<void *>(UserPtr);
Chunk::storeHeader(Ptr, &Header);
if (SCUDO_CAN_USE_HOOKS && &__sanitizer_malloc_hook)
__sanitizer_malloc_hook(Ptr, Size);
return Ptr;
}
[scudo] Quarantine overhaul Summary: First, some context. The main feedback we get about the quarantine is that it's too memory hungry. A single MB of quarantine will have an impact of 3 to 4MB of PSS/RSS, and things quickly get out of hand in terms of memory usage, and the quarantine ends up disabled. The main objective of the quarantine is to protect from use-after-free exploitation by making it harder for an attacker to reallocate a controlled chunk in place of the targeted freed chunk. This is achieved by not making it available to the backend right away for reuse, but holding it a little while. Historically, what has usually been the target of such attacks was objects, where vtable pointers or other function pointers could constitute a valuable targeti to replace. Those are usually on the smaller side. There is barely any advantage in putting the quarantine several megabytes of RGB data or the like. Now for the patch. This patch introduces a new way the Quarantine behaves in Scudo. First of all, the size of the Quarantine will be defined in KB instead of MB, then we introduce a new option: the size up to which (lower than or equal to) a chunk will be quarantined. This way, we only quarantine smaller chunks, and the size of the quarantine remains manageable. It also prevents someone from triggering a recycle by allocating something huge. We default to 512 bytes on 32-bit and 2048 bytes on 64-bit platforms. In details, the patches includes the following: - introduce `QuarantineSizeKb`, but honor `QuarantineSizeMb` if set to fall back to the old behavior (meaning no threshold in that case); `QuarantineSizeMb` is described as deprecated in the options descriptios; documentation update will follow; - introduce `QuarantineChunksUpToSize`, the new threshold value; - update the `quarantine.cpp` test, and other tests using `QuarantineSizeMb`; - remove `AllocatorOptions::copyTo`, it wasn't used; - slightly change the logic around `quarantineOrDeallocateChunk` to accomodate for the new logic; rename a couple of variables there as well; Rewriting the tests, I found a somewhat annoying bug where non-default aligned chunks would account for more than needed when placed in the quarantine due to `<< MinAlignment` instead of `<< MinAlignmentLog`. This is fixed and tested for now. Reviewers: alekseyshl, kcc Reviewed By: alekseyshl Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D35694 llvm-svn: 308884
2017-07-24 23:29:38 +08:00
// Place a chunk in the quarantine or directly deallocate it in the event of
// a zero-sized quarantine, or if the size of the chunk is greater than the
// quarantine chunk size threshold.
void quarantineOrDeallocateChunk(void *Ptr, UnpackedHeader *Header,
uptr Size) {
const bool BypassQuarantine = !Size || (Size > QuarantineChunksUpToSize);
if (BypassQuarantine) {
UnpackedHeader NewHeader = *Header;
NewHeader.State = ChunkAvailable;
Chunk::compareExchangeHeader(Ptr, &NewHeader, Header);
void *BackendPtr = Chunk::getBackendPtr(Ptr, Header);
if (Header->ClassId) {
[scudo] Improve the scalability of the shared TSD model Summary: The shared TSD model in its current form doesn't scale. Here is an example of rpc2-benchmark (with default parameters, which is threading heavy) on a 72-core machines (defaulting to a `CompactSizeClassMap` and no Quarantine): - with tcmalloc: 337K reqs/sec, peak RSS of 338MB; - with scudo (exclusive): 321K reqs/sec, peak RSS of 637MB; - with scudo (shared): 241K reqs/sec, peak RSS of 324MB. This isn't great, since the exclusive model uses a lot of memory, while the shared model doesn't even come close to be competitive. This is mostly due to the fact that we are consistently scanning the TSD pool starting at index 0 for an available TSD, which can result in a lot of failed lock attempts, and touching some memory that needs not be touched. This CL attempts to make things better in most situations: - first, use a thread local variable on Linux (intead of pthread APIs) to store the current TSD in the shared model; - move the locking boolean out of the TSD: this allows the compiler to use a register and potentially optimize out a branch instead of reading it from the TSD everytime (we also save a tiny bit of memory per TSD); - 64-bit atomic operations on 32-bit ARM platforms happen to be expensive: so store the `Precedence` in a `uptr` instead of a `u64`. We lose some nanoseconds of precision and we'll wrap around at some point, but the benefit is worth it; - change a `CHECK` to a `DCHECK`: this should never happen, but if something is ever terribly wrong, we'll crash on a near null AV if the TSD happens to be null; - based on an idea by dvyukov@, we are implementing a bound random scan for an available TSD. This requires computing the coprimes for the number of TSDs, and attempting to lock up to 4 TSDs in an random order before falling back to the current one. This is obviously slightly more expansive when we have just 2 TSDs (barely noticeable) but is otherwise beneficial. The `Precedence` still basically corresponds to the moment of the first contention on a TSD. To seed on random choice, we use the precedence of the current TSD since it is very likely to be non-zero (since we are in the slow path after a failed `tryLock`) With those modifications, the benchmark yields to: - with scudo (shared): 330K reqs/sec, peak RSS of 327MB. So the shared model for this specific situation not only becomes competitive but outperforms the exclusive model. I experimented with some values greater than 4 for the number of TSDs to attempt to lock and it yielded a decrease in QPS. Just sticking with the current TSD is also a tad slower. Numbers on platforms with less cores (eg: Android) remain similar. Reviewers: alekseyshl, dvyukov, javed.absar Reviewed By: alekseyshl, dvyukov Subscribers: srhines, kristof.beyls, delcypher, llvm-commits, #sanitizers Differential Revision: https://reviews.llvm.org/D47289 llvm-svn: 334410
2018-06-11 22:50:31 +08:00
bool UnlockRequired;
ScudoTSD *TSD = getTSDAndLock(&UnlockRequired);
getBackend().deallocatePrimary(&TSD->Cache, BackendPtr,
Header->ClassId);
[scudo] Improve the scalability of the shared TSD model Summary: The shared TSD model in its current form doesn't scale. Here is an example of rpc2-benchmark (with default parameters, which is threading heavy) on a 72-core machines (defaulting to a `CompactSizeClassMap` and no Quarantine): - with tcmalloc: 337K reqs/sec, peak RSS of 338MB; - with scudo (exclusive): 321K reqs/sec, peak RSS of 637MB; - with scudo (shared): 241K reqs/sec, peak RSS of 324MB. This isn't great, since the exclusive model uses a lot of memory, while the shared model doesn't even come close to be competitive. This is mostly due to the fact that we are consistently scanning the TSD pool starting at index 0 for an available TSD, which can result in a lot of failed lock attempts, and touching some memory that needs not be touched. This CL attempts to make things better in most situations: - first, use a thread local variable on Linux (intead of pthread APIs) to store the current TSD in the shared model; - move the locking boolean out of the TSD: this allows the compiler to use a register and potentially optimize out a branch instead of reading it from the TSD everytime (we also save a tiny bit of memory per TSD); - 64-bit atomic operations on 32-bit ARM platforms happen to be expensive: so store the `Precedence` in a `uptr` instead of a `u64`. We lose some nanoseconds of precision and we'll wrap around at some point, but the benefit is worth it; - change a `CHECK` to a `DCHECK`: this should never happen, but if something is ever terribly wrong, we'll crash on a near null AV if the TSD happens to be null; - based on an idea by dvyukov@, we are implementing a bound random scan for an available TSD. This requires computing the coprimes for the number of TSDs, and attempting to lock up to 4 TSDs in an random order before falling back to the current one. This is obviously slightly more expansive when we have just 2 TSDs (barely noticeable) but is otherwise beneficial. The `Precedence` still basically corresponds to the moment of the first contention on a TSD. To seed on random choice, we use the precedence of the current TSD since it is very likely to be non-zero (since we are in the slow path after a failed `tryLock`) With those modifications, the benchmark yields to: - with scudo (shared): 330K reqs/sec, peak RSS of 327MB. So the shared model for this specific situation not only becomes competitive but outperforms the exclusive model. I experimented with some values greater than 4 for the number of TSDs to attempt to lock and it yielded a decrease in QPS. Just sticking with the current TSD is also a tad slower. Numbers on platforms with less cores (eg: Android) remain similar. Reviewers: alekseyshl, dvyukov, javed.absar Reviewed By: alekseyshl, dvyukov Subscribers: srhines, kristof.beyls, delcypher, llvm-commits, #sanitizers Differential Revision: https://reviews.llvm.org/D47289 llvm-svn: 334410
2018-06-11 22:50:31 +08:00
if (UnlockRequired)
TSD->unlock();
} else {
getBackend().deallocateSecondary(BackendPtr);
}
} else {
[scudo] Quarantine overhaul Summary: First, some context. The main feedback we get about the quarantine is that it's too memory hungry. A single MB of quarantine will have an impact of 3 to 4MB of PSS/RSS, and things quickly get out of hand in terms of memory usage, and the quarantine ends up disabled. The main objective of the quarantine is to protect from use-after-free exploitation by making it harder for an attacker to reallocate a controlled chunk in place of the targeted freed chunk. This is achieved by not making it available to the backend right away for reuse, but holding it a little while. Historically, what has usually been the target of such attacks was objects, where vtable pointers or other function pointers could constitute a valuable targeti to replace. Those are usually on the smaller side. There is barely any advantage in putting the quarantine several megabytes of RGB data or the like. Now for the patch. This patch introduces a new way the Quarantine behaves in Scudo. First of all, the size of the Quarantine will be defined in KB instead of MB, then we introduce a new option: the size up to which (lower than or equal to) a chunk will be quarantined. This way, we only quarantine smaller chunks, and the size of the quarantine remains manageable. It also prevents someone from triggering a recycle by allocating something huge. We default to 512 bytes on 32-bit and 2048 bytes on 64-bit platforms. In details, the patches includes the following: - introduce `QuarantineSizeKb`, but honor `QuarantineSizeMb` if set to fall back to the old behavior (meaning no threshold in that case); `QuarantineSizeMb` is described as deprecated in the options descriptios; documentation update will follow; - introduce `QuarantineChunksUpToSize`, the new threshold value; - update the `quarantine.cpp` test, and other tests using `QuarantineSizeMb`; - remove `AllocatorOptions::copyTo`, it wasn't used; - slightly change the logic around `quarantineOrDeallocateChunk` to accomodate for the new logic; rename a couple of variables there as well; Rewriting the tests, I found a somewhat annoying bug where non-default aligned chunks would account for more than needed when placed in the quarantine due to `<< MinAlignment` instead of `<< MinAlignmentLog`. This is fixed and tested for now. Reviewers: alekseyshl, kcc Reviewed By: alekseyshl Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D35694 llvm-svn: 308884
2017-07-24 23:29:38 +08:00
// If a small memory amount was allocated with a larger alignment, we want
// to take that into account. Otherwise the Quarantine would be filled
// with tiny chunks, taking a lot of VA memory. This is an approximation
// of the usable size, that allows us to not call
// GetActuallyAllocatedSize.
const uptr EstimatedSize = Size + (Header->Offset << MinAlignmentLog);
UnpackedHeader NewHeader = *Header;
NewHeader.State = ChunkQuarantine;
Chunk::compareExchangeHeader(Ptr, &NewHeader, Header);
[scudo] Improve the scalability of the shared TSD model Summary: The shared TSD model in its current form doesn't scale. Here is an example of rpc2-benchmark (with default parameters, which is threading heavy) on a 72-core machines (defaulting to a `CompactSizeClassMap` and no Quarantine): - with tcmalloc: 337K reqs/sec, peak RSS of 338MB; - with scudo (exclusive): 321K reqs/sec, peak RSS of 637MB; - with scudo (shared): 241K reqs/sec, peak RSS of 324MB. This isn't great, since the exclusive model uses a lot of memory, while the shared model doesn't even come close to be competitive. This is mostly due to the fact that we are consistently scanning the TSD pool starting at index 0 for an available TSD, which can result in a lot of failed lock attempts, and touching some memory that needs not be touched. This CL attempts to make things better in most situations: - first, use a thread local variable on Linux (intead of pthread APIs) to store the current TSD in the shared model; - move the locking boolean out of the TSD: this allows the compiler to use a register and potentially optimize out a branch instead of reading it from the TSD everytime (we also save a tiny bit of memory per TSD); - 64-bit atomic operations on 32-bit ARM platforms happen to be expensive: so store the `Precedence` in a `uptr` instead of a `u64`. We lose some nanoseconds of precision and we'll wrap around at some point, but the benefit is worth it; - change a `CHECK` to a `DCHECK`: this should never happen, but if something is ever terribly wrong, we'll crash on a near null AV if the TSD happens to be null; - based on an idea by dvyukov@, we are implementing a bound random scan for an available TSD. This requires computing the coprimes for the number of TSDs, and attempting to lock up to 4 TSDs in an random order before falling back to the current one. This is obviously slightly more expansive when we have just 2 TSDs (barely noticeable) but is otherwise beneficial. The `Precedence` still basically corresponds to the moment of the first contention on a TSD. To seed on random choice, we use the precedence of the current TSD since it is very likely to be non-zero (since we are in the slow path after a failed `tryLock`) With those modifications, the benchmark yields to: - with scudo (shared): 330K reqs/sec, peak RSS of 327MB. So the shared model for this specific situation not only becomes competitive but outperforms the exclusive model. I experimented with some values greater than 4 for the number of TSDs to attempt to lock and it yielded a decrease in QPS. Just sticking with the current TSD is also a tad slower. Numbers on platforms with less cores (eg: Android) remain similar. Reviewers: alekseyshl, dvyukov, javed.absar Reviewed By: alekseyshl, dvyukov Subscribers: srhines, kristof.beyls, delcypher, llvm-commits, #sanitizers Differential Revision: https://reviews.llvm.org/D47289 llvm-svn: 334410
2018-06-11 22:50:31 +08:00
bool UnlockRequired;
ScudoTSD *TSD = getTSDAndLock(&UnlockRequired);
Quarantine.Put(getQuarantineCache(TSD), QuarantineCallback(&TSD->Cache),
Ptr, EstimatedSize);
[scudo] Improve the scalability of the shared TSD model Summary: The shared TSD model in its current form doesn't scale. Here is an example of rpc2-benchmark (with default parameters, which is threading heavy) on a 72-core machines (defaulting to a `CompactSizeClassMap` and no Quarantine): - with tcmalloc: 337K reqs/sec, peak RSS of 338MB; - with scudo (exclusive): 321K reqs/sec, peak RSS of 637MB; - with scudo (shared): 241K reqs/sec, peak RSS of 324MB. This isn't great, since the exclusive model uses a lot of memory, while the shared model doesn't even come close to be competitive. This is mostly due to the fact that we are consistently scanning the TSD pool starting at index 0 for an available TSD, which can result in a lot of failed lock attempts, and touching some memory that needs not be touched. This CL attempts to make things better in most situations: - first, use a thread local variable on Linux (intead of pthread APIs) to store the current TSD in the shared model; - move the locking boolean out of the TSD: this allows the compiler to use a register and potentially optimize out a branch instead of reading it from the TSD everytime (we also save a tiny bit of memory per TSD); - 64-bit atomic operations on 32-bit ARM platforms happen to be expensive: so store the `Precedence` in a `uptr` instead of a `u64`. We lose some nanoseconds of precision and we'll wrap around at some point, but the benefit is worth it; - change a `CHECK` to a `DCHECK`: this should never happen, but if something is ever terribly wrong, we'll crash on a near null AV if the TSD happens to be null; - based on an idea by dvyukov@, we are implementing a bound random scan for an available TSD. This requires computing the coprimes for the number of TSDs, and attempting to lock up to 4 TSDs in an random order before falling back to the current one. This is obviously slightly more expansive when we have just 2 TSDs (barely noticeable) but is otherwise beneficial. The `Precedence` still basically corresponds to the moment of the first contention on a TSD. To seed on random choice, we use the precedence of the current TSD since it is very likely to be non-zero (since we are in the slow path after a failed `tryLock`) With those modifications, the benchmark yields to: - with scudo (shared): 330K reqs/sec, peak RSS of 327MB. So the shared model for this specific situation not only becomes competitive but outperforms the exclusive model. I experimented with some values greater than 4 for the number of TSDs to attempt to lock and it yielded a decrease in QPS. Just sticking with the current TSD is also a tad slower. Numbers on platforms with less cores (eg: Android) remain similar. Reviewers: alekseyshl, dvyukov, javed.absar Reviewed By: alekseyshl, dvyukov Subscribers: srhines, kristof.beyls, delcypher, llvm-commits, #sanitizers Differential Revision: https://reviews.llvm.org/D47289 llvm-svn: 334410
2018-06-11 22:50:31 +08:00
if (UnlockRequired)
TSD->unlock();
}
}
// Deallocates a Chunk, which means either adding it to the quarantine or
// directly returning it to the backend if criteria are met.
void deallocate(void *Ptr, uptr DeleteSize, uptr DeleteAlignment,
AllocType Type) {
// For a deallocation, we only ensure minimal initialization, meaning thread
// local data will be left uninitialized for now (when using ELF TLS). The
// fallback cache will be used instead. This is a workaround for a situation
// where the only heap operation performed in a thread would be a free past
// the TLS destructors, ending up in initialized thread specific data never
// being destroyed properly. Any other heap operation will do a full init.
initThreadMaybe(/*MinimalInit=*/true);
if (SCUDO_CAN_USE_HOOKS && &__sanitizer_free_hook)
__sanitizer_free_hook(Ptr);
if (UNLIKELY(!Ptr))
return;
#ifdef GWP_ASAN_HOOKS
if (UNLIKELY(GuardedAlloc.pointerIsMine(Ptr))) {
GuardedAlloc.deallocate(Ptr);
return;
}
#endif // GWP_ASAN_HOOKS
if (UNLIKELY(!Chunk::isAligned(Ptr)))
dieWithMessage("misaligned pointer when deallocating address %p\n", Ptr);
[scudo] Quarantine overhaul Summary: First, some context. The main feedback we get about the quarantine is that it's too memory hungry. A single MB of quarantine will have an impact of 3 to 4MB of PSS/RSS, and things quickly get out of hand in terms of memory usage, and the quarantine ends up disabled. The main objective of the quarantine is to protect from use-after-free exploitation by making it harder for an attacker to reallocate a controlled chunk in place of the targeted freed chunk. This is achieved by not making it available to the backend right away for reuse, but holding it a little while. Historically, what has usually been the target of such attacks was objects, where vtable pointers or other function pointers could constitute a valuable targeti to replace. Those are usually on the smaller side. There is barely any advantage in putting the quarantine several megabytes of RGB data or the like. Now for the patch. This patch introduces a new way the Quarantine behaves in Scudo. First of all, the size of the Quarantine will be defined in KB instead of MB, then we introduce a new option: the size up to which (lower than or equal to) a chunk will be quarantined. This way, we only quarantine smaller chunks, and the size of the quarantine remains manageable. It also prevents someone from triggering a recycle by allocating something huge. We default to 512 bytes on 32-bit and 2048 bytes on 64-bit platforms. In details, the patches includes the following: - introduce `QuarantineSizeKb`, but honor `QuarantineSizeMb` if set to fall back to the old behavior (meaning no threshold in that case); `QuarantineSizeMb` is described as deprecated in the options descriptios; documentation update will follow; - introduce `QuarantineChunksUpToSize`, the new threshold value; - update the `quarantine.cpp` test, and other tests using `QuarantineSizeMb`; - remove `AllocatorOptions::copyTo`, it wasn't used; - slightly change the logic around `quarantineOrDeallocateChunk` to accomodate for the new logic; rename a couple of variables there as well; Rewriting the tests, I found a somewhat annoying bug where non-default aligned chunks would account for more than needed when placed in the quarantine due to `<< MinAlignment` instead of `<< MinAlignmentLog`. This is fixed and tested for now. Reviewers: alekseyshl, kcc Reviewed By: alekseyshl Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D35694 llvm-svn: 308884
2017-07-24 23:29:38 +08:00
UnpackedHeader Header;
Chunk::loadHeader(Ptr, &Header);
if (UNLIKELY(Header.State != ChunkAllocated))
dieWithMessage("invalid chunk state when deallocating address %p\n", Ptr);
if (DeallocationTypeMismatch) {
// The deallocation type has to match the allocation one.
[scudo] Quarantine overhaul Summary: First, some context. The main feedback we get about the quarantine is that it's too memory hungry. A single MB of quarantine will have an impact of 3 to 4MB of PSS/RSS, and things quickly get out of hand in terms of memory usage, and the quarantine ends up disabled. The main objective of the quarantine is to protect from use-after-free exploitation by making it harder for an attacker to reallocate a controlled chunk in place of the targeted freed chunk. This is achieved by not making it available to the backend right away for reuse, but holding it a little while. Historically, what has usually been the target of such attacks was objects, where vtable pointers or other function pointers could constitute a valuable targeti to replace. Those are usually on the smaller side. There is barely any advantage in putting the quarantine several megabytes of RGB data or the like. Now for the patch. This patch introduces a new way the Quarantine behaves in Scudo. First of all, the size of the Quarantine will be defined in KB instead of MB, then we introduce a new option: the size up to which (lower than or equal to) a chunk will be quarantined. This way, we only quarantine smaller chunks, and the size of the quarantine remains manageable. It also prevents someone from triggering a recycle by allocating something huge. We default to 512 bytes on 32-bit and 2048 bytes on 64-bit platforms. In details, the patches includes the following: - introduce `QuarantineSizeKb`, but honor `QuarantineSizeMb` if set to fall back to the old behavior (meaning no threshold in that case); `QuarantineSizeMb` is described as deprecated in the options descriptios; documentation update will follow; - introduce `QuarantineChunksUpToSize`, the new threshold value; - update the `quarantine.cpp` test, and other tests using `QuarantineSizeMb`; - remove `AllocatorOptions::copyTo`, it wasn't used; - slightly change the logic around `quarantineOrDeallocateChunk` to accomodate for the new logic; rename a couple of variables there as well; Rewriting the tests, I found a somewhat annoying bug where non-default aligned chunks would account for more than needed when placed in the quarantine due to `<< MinAlignment` instead of `<< MinAlignmentLog`. This is fixed and tested for now. Reviewers: alekseyshl, kcc Reviewed By: alekseyshl Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D35694 llvm-svn: 308884
2017-07-24 23:29:38 +08:00
if (Header.AllocType != Type) {
// With the exception of memalign'd Chunks, that can be still be free'd.
if (Header.AllocType != FromMemalign || Type != FromMalloc)
dieWithMessage("allocation type mismatch when deallocating address "
"%p\n", Ptr);
}
}
const uptr Size = Chunk::getSize(Ptr, &Header);
if (DeleteSizeMismatch) {
if (DeleteSize && DeleteSize != Size)
dieWithMessage("invalid sized delete when deallocating address %p\n",
Ptr);
}
(void)DeleteAlignment; // TODO(kostyak): verify that the alignment matches.
quarantineOrDeallocateChunk(Ptr, &Header, Size);
}
// Reallocates a chunk. We can save on a new allocation if the new requested
// size still fits in the chunk.
void *reallocate(void *OldPtr, uptr NewSize) {
initThreadMaybe();
#ifdef GWP_ASAN_HOOKS
if (UNLIKELY(GuardedAlloc.pointerIsMine(OldPtr))) {
size_t OldSize = GuardedAlloc.getSize(OldPtr);
void *NewPtr = allocate(NewSize, MinAlignment, FromMalloc);
if (NewPtr)
memcpy(NewPtr, OldPtr, (NewSize < OldSize) ? NewSize : OldSize);
GuardedAlloc.deallocate(OldPtr);
return NewPtr;
}
#endif // GWP_ASAN_HOOKS
if (UNLIKELY(!Chunk::isAligned(OldPtr)))
dieWithMessage("misaligned address when reallocating address %p\n",
OldPtr);
[scudo] 32-bit and hardware agnostic support Summary: This update introduces i386 support for the Scudo Hardened Allocator, and offers software alternatives for functions that used to require hardware specific instruction sets. This should make porting to new architectures easier. Among the changes: - The chunk header has been changed to accomodate the size limitations encountered on 32-bit architectures. We now fit everything in 64-bit. This was achieved by storing the amount of unused bytes in an allocation rather than the size itself, as one can be deduced from the other with the help of the GetActuallyAllocatedSize function. As it turns out, this header can be used for both 64 and 32 bit, and as such we dropped the requirement for the 128-bit compare and exchange instruction support (cmpxchg16b). - Add 32-bit support for the checksum and the PRNG functions: if the SSE 4.2 instruction set is supported, use the 32-bit CRC32 instruction, and in the XorShift128, use a 32-bit based state instead of 64-bit. - Add software support for CRC32: if SSE 4.2 is not supported, fallback on a software implementation. - Modify tests that were not 32-bit compliant, and expand them to cover more allocation and alignment sizes. The random shuffle test has been deactivated for linux-i386 & linux-i686 as the 32-bit sanitizer allocator doesn't currently randomize chunks. Reviewers: alekseyshl, kcc Subscribers: filcab, llvm-commits, tberghammer, danalbert, srhines, mgorny, modocache Differential Revision: https://reviews.llvm.org/D26358 llvm-svn: 288255
2016-12-01 01:32:20 +08:00
UnpackedHeader OldHeader;
Chunk::loadHeader(OldPtr, &OldHeader);
if (UNLIKELY(OldHeader.State != ChunkAllocated))
dieWithMessage("invalid chunk state when reallocating address %p\n",
OldPtr);
[scudo] Application & platform compatibility changes Summary: This patch changes a few (small) things around for compatibility purposes for the current Android & Fuchsia work: - `realloc`'ing some memory that was not allocated with `malloc`, `calloc` or `realloc`, while UB according to http://pubs.opengroup.org/onlinepubs/009695399/functions/realloc.html is more common that one would think. We now only check this if `DeallocationTypeMismatch` is set; change the "mismatch" error messages to be more homogeneous; - some sketchily written but widely used libraries expect a call to `realloc` to copy the usable size of the old chunk to the new one instead of the requested size. We have to begrundingly abide by this de-facto standard. This doesn't seem to impact security either way, unless someone comes up with something we didn't think about; - the CRC32 intrinsics for 64-bit take a 64-bit first argument. This is misleading as the upper 32 bits end up being ignored. This was also raising `-Wconversion` errors. Change things to take a `u32` as first argument. This also means we were (and are) only using 32 bits of the Cookie - not a big thing, but worth mentioning. - Includes-wise: prefer `stddef.h` to `cstddef`, move `scudo_flags.h` where it is actually needed. - Add tests for the memalign-realloc case, and the realloc-usable-size one. (Edited typos) Reviewers: alekseyshl Reviewed By: alekseyshl Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D36754 llvm-svn: 311018
2017-08-17 00:40:48 +08:00
if (DeallocationTypeMismatch) {
if (UNLIKELY(OldHeader.AllocType != FromMalloc))
dieWithMessage("allocation type mismatch when reallocating address "
"%p\n", OldPtr);
}
const uptr UsableSize = Chunk::getUsableSize(OldPtr, &OldHeader);
// The new size still fits in the current chunk, and the size difference
// is reasonable.
if (NewSize <= UsableSize &&
(UsableSize - NewSize) < (SizeClassMap::kMaxSize / 2)) {
UnpackedHeader NewHeader = OldHeader;
NewHeader.SizeOrUnusedBytes =
OldHeader.ClassId ? NewSize : UsableSize - NewSize;
Chunk::compareExchangeHeader(OldPtr, &NewHeader, &OldHeader);
return OldPtr;
}
// Otherwise, we have to allocate a new chunk and copy the contents of the
// old one.
void *NewPtr = allocate(NewSize, MinAlignment, FromMalloc);
if (NewPtr) {
const uptr OldSize = OldHeader.ClassId ? OldHeader.SizeOrUnusedBytes :
UsableSize - OldHeader.SizeOrUnusedBytes;
[scudo] Application & platform compatibility changes Summary: This patch changes a few (small) things around for compatibility purposes for the current Android & Fuchsia work: - `realloc`'ing some memory that was not allocated with `malloc`, `calloc` or `realloc`, while UB according to http://pubs.opengroup.org/onlinepubs/009695399/functions/realloc.html is more common that one would think. We now only check this if `DeallocationTypeMismatch` is set; change the "mismatch" error messages to be more homogeneous; - some sketchily written but widely used libraries expect a call to `realloc` to copy the usable size of the old chunk to the new one instead of the requested size. We have to begrundingly abide by this de-facto standard. This doesn't seem to impact security either way, unless someone comes up with something we didn't think about; - the CRC32 intrinsics for 64-bit take a 64-bit first argument. This is misleading as the upper 32 bits end up being ignored. This was also raising `-Wconversion` errors. Change things to take a `u32` as first argument. This also means we were (and are) only using 32 bits of the Cookie - not a big thing, but worth mentioning. - Includes-wise: prefer `stddef.h` to `cstddef`, move `scudo_flags.h` where it is actually needed. - Add tests for the memalign-realloc case, and the realloc-usable-size one. (Edited typos) Reviewers: alekseyshl Reviewed By: alekseyshl Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D36754 llvm-svn: 311018
2017-08-17 00:40:48 +08:00
memcpy(NewPtr, OldPtr, Min(NewSize, UsableSize));
quarantineOrDeallocateChunk(OldPtr, &OldHeader, OldSize);
}
return NewPtr;
}
[scudo] 32-bit and hardware agnostic support Summary: This update introduces i386 support for the Scudo Hardened Allocator, and offers software alternatives for functions that used to require hardware specific instruction sets. This should make porting to new architectures easier. Among the changes: - The chunk header has been changed to accomodate the size limitations encountered on 32-bit architectures. We now fit everything in 64-bit. This was achieved by storing the amount of unused bytes in an allocation rather than the size itself, as one can be deduced from the other with the help of the GetActuallyAllocatedSize function. As it turns out, this header can be used for both 64 and 32 bit, and as such we dropped the requirement for the 128-bit compare and exchange instruction support (cmpxchg16b). - Add 32-bit support for the checksum and the PRNG functions: if the SSE 4.2 instruction set is supported, use the 32-bit CRC32 instruction, and in the XorShift128, use a 32-bit based state instead of 64-bit. - Add software support for CRC32: if SSE 4.2 is not supported, fallback on a software implementation. - Modify tests that were not 32-bit compliant, and expand them to cover more allocation and alignment sizes. The random shuffle test has been deactivated for linux-i386 & linux-i686 as the 32-bit sanitizer allocator doesn't currently randomize chunks. Reviewers: alekseyshl, kcc Subscribers: filcab, llvm-commits, tberghammer, danalbert, srhines, mgorny, modocache Differential Revision: https://reviews.llvm.org/D26358 llvm-svn: 288255
2016-12-01 01:32:20 +08:00
// Helper function that returns the actual usable size of a chunk.
uptr getUsableSize(const void *Ptr) {
initThreadMaybe();
if (UNLIKELY(!Ptr))
[scudo] 32-bit and hardware agnostic support Summary: This update introduces i386 support for the Scudo Hardened Allocator, and offers software alternatives for functions that used to require hardware specific instruction sets. This should make porting to new architectures easier. Among the changes: - The chunk header has been changed to accomodate the size limitations encountered on 32-bit architectures. We now fit everything in 64-bit. This was achieved by storing the amount of unused bytes in an allocation rather than the size itself, as one can be deduced from the other with the help of the GetActuallyAllocatedSize function. As it turns out, this header can be used for both 64 and 32 bit, and as such we dropped the requirement for the 128-bit compare and exchange instruction support (cmpxchg16b). - Add 32-bit support for the checksum and the PRNG functions: if the SSE 4.2 instruction set is supported, use the 32-bit CRC32 instruction, and in the XorShift128, use a 32-bit based state instead of 64-bit. - Add software support for CRC32: if SSE 4.2 is not supported, fallback on a software implementation. - Modify tests that were not 32-bit compliant, and expand them to cover more allocation and alignment sizes. The random shuffle test has been deactivated for linux-i386 & linux-i686 as the 32-bit sanitizer allocator doesn't currently randomize chunks. Reviewers: alekseyshl, kcc Subscribers: filcab, llvm-commits, tberghammer, danalbert, srhines, mgorny, modocache Differential Revision: https://reviews.llvm.org/D26358 llvm-svn: 288255
2016-12-01 01:32:20 +08:00
return 0;
#ifdef GWP_ASAN_HOOKS
if (UNLIKELY(GuardedAlloc.pointerIsMine(Ptr)))
return GuardedAlloc.getSize(Ptr);
#endif // GWP_ASAN_HOOKS
[scudo] 32-bit and hardware agnostic support Summary: This update introduces i386 support for the Scudo Hardened Allocator, and offers software alternatives for functions that used to require hardware specific instruction sets. This should make porting to new architectures easier. Among the changes: - The chunk header has been changed to accomodate the size limitations encountered on 32-bit architectures. We now fit everything in 64-bit. This was achieved by storing the amount of unused bytes in an allocation rather than the size itself, as one can be deduced from the other with the help of the GetActuallyAllocatedSize function. As it turns out, this header can be used for both 64 and 32 bit, and as such we dropped the requirement for the 128-bit compare and exchange instruction support (cmpxchg16b). - Add 32-bit support for the checksum and the PRNG functions: if the SSE 4.2 instruction set is supported, use the 32-bit CRC32 instruction, and in the XorShift128, use a 32-bit based state instead of 64-bit. - Add software support for CRC32: if SSE 4.2 is not supported, fallback on a software implementation. - Modify tests that were not 32-bit compliant, and expand them to cover more allocation and alignment sizes. The random shuffle test has been deactivated for linux-i386 & linux-i686 as the 32-bit sanitizer allocator doesn't currently randomize chunks. Reviewers: alekseyshl, kcc Subscribers: filcab, llvm-commits, tberghammer, danalbert, srhines, mgorny, modocache Differential Revision: https://reviews.llvm.org/D26358 llvm-svn: 288255
2016-12-01 01:32:20 +08:00
UnpackedHeader Header;
Chunk::loadHeader(Ptr, &Header);
[scudo] 32-bit and hardware agnostic support Summary: This update introduces i386 support for the Scudo Hardened Allocator, and offers software alternatives for functions that used to require hardware specific instruction sets. This should make porting to new architectures easier. Among the changes: - The chunk header has been changed to accomodate the size limitations encountered on 32-bit architectures. We now fit everything in 64-bit. This was achieved by storing the amount of unused bytes in an allocation rather than the size itself, as one can be deduced from the other with the help of the GetActuallyAllocatedSize function. As it turns out, this header can be used for both 64 and 32 bit, and as such we dropped the requirement for the 128-bit compare and exchange instruction support (cmpxchg16b). - Add 32-bit support for the checksum and the PRNG functions: if the SSE 4.2 instruction set is supported, use the 32-bit CRC32 instruction, and in the XorShift128, use a 32-bit based state instead of 64-bit. - Add software support for CRC32: if SSE 4.2 is not supported, fallback on a software implementation. - Modify tests that were not 32-bit compliant, and expand them to cover more allocation and alignment sizes. The random shuffle test has been deactivated for linux-i386 & linux-i686 as the 32-bit sanitizer allocator doesn't currently randomize chunks. Reviewers: alekseyshl, kcc Subscribers: filcab, llvm-commits, tberghammer, danalbert, srhines, mgorny, modocache Differential Revision: https://reviews.llvm.org/D26358 llvm-svn: 288255
2016-12-01 01:32:20 +08:00
// Getting the usable size of a chunk only makes sense if it's allocated.
if (UNLIKELY(Header.State != ChunkAllocated))
dieWithMessage("invalid chunk state when sizing address %p\n", Ptr);
return Chunk::getUsableSize(Ptr, &Header);
[scudo] 32-bit and hardware agnostic support Summary: This update introduces i386 support for the Scudo Hardened Allocator, and offers software alternatives for functions that used to require hardware specific instruction sets. This should make porting to new architectures easier. Among the changes: - The chunk header has been changed to accomodate the size limitations encountered on 32-bit architectures. We now fit everything in 64-bit. This was achieved by storing the amount of unused bytes in an allocation rather than the size itself, as one can be deduced from the other with the help of the GetActuallyAllocatedSize function. As it turns out, this header can be used for both 64 and 32 bit, and as such we dropped the requirement for the 128-bit compare and exchange instruction support (cmpxchg16b). - Add 32-bit support for the checksum and the PRNG functions: if the SSE 4.2 instruction set is supported, use the 32-bit CRC32 instruction, and in the XorShift128, use a 32-bit based state instead of 64-bit. - Add software support for CRC32: if SSE 4.2 is not supported, fallback on a software implementation. - Modify tests that were not 32-bit compliant, and expand them to cover more allocation and alignment sizes. The random shuffle test has been deactivated for linux-i386 & linux-i686 as the 32-bit sanitizer allocator doesn't currently randomize chunks. Reviewers: alekseyshl, kcc Subscribers: filcab, llvm-commits, tberghammer, danalbert, srhines, mgorny, modocache Differential Revision: https://reviews.llvm.org/D26358 llvm-svn: 288255
2016-12-01 01:32:20 +08:00
}
void *calloc(uptr NMemB, uptr Size) {
initThreadMaybe();
if (UNLIKELY(CheckForCallocOverflow(NMemB, Size))) {
if (AllocatorMayReturnNull())
return nullptr;
reportCallocOverflow(NMemB, Size);
}
return allocate(NMemB * Size, MinAlignment, FromMalloc, true);
}
void commitBack(ScudoTSD *TSD) {
Quarantine.Drain(getQuarantineCache(TSD), QuarantineCallback(&TSD->Cache));
Backend.destroyCache(&TSD->Cache);
}
uptr getStats(AllocatorStat StatType) {
initThreadMaybe();
uptr stats[AllocatorStatCount];
Backend.getStats(stats);
return stats[StatType];
}
bool canReturnNull() {
initThreadMaybe();
return AllocatorMayReturnNull();
}
void setRssLimit(uptr LimitMb, bool HardLimit) {
if (HardLimit)
HardRssLimitMb = LimitMb;
else
SoftRssLimitMb = LimitMb;
CheckRssLimit = HardRssLimitMb || SoftRssLimitMb;
}
void printStats() {
initThreadMaybe();
Backend.printStats();
}
};
NOINLINE void Allocator::performSanityChecks() {
// Verify that the header offset field can hold the maximum offset. In the
// case of the Secondary allocator, it takes care of alignment and the
// offset will always be 0. In the case of the Primary, the worst case
// scenario happens in the last size class, when the backend allocation
// would already be aligned on the requested alignment, which would happen
// to be the maximum alignment that would fit in that size class. As a
// result, the maximum offset will be at most the maximum alignment for the
// last size class minus the header size, in multiples of MinAlignment.
UnpackedHeader Header = {};
const uptr MaxPrimaryAlignment =
1 << MostSignificantSetBitIndex(SizeClassMap::kMaxSize - MinAlignment);
const uptr MaxOffset =
(MaxPrimaryAlignment - Chunk::getHeaderSize()) >> MinAlignmentLog;
Header.Offset = MaxOffset;
if (Header.Offset != MaxOffset)
dieWithMessage("maximum possible offset doesn't fit in header\n");
// Verify that we can fit the maximum size or amount of unused bytes in the
// header. Given that the Secondary fits the allocation to a page, the worst
// case scenario happens in the Primary. It will depend on the second to
// last and last class sizes, as well as the dynamic base for the Primary.
// The following is an over-approximation that works for our needs.
const uptr MaxSizeOrUnusedBytes = SizeClassMap::kMaxSize - 1;
Header.SizeOrUnusedBytes = MaxSizeOrUnusedBytes;
if (Header.SizeOrUnusedBytes != MaxSizeOrUnusedBytes)
dieWithMessage("maximum possible unused bytes doesn't fit in header\n");
const uptr LargestClassId = SizeClassMap::kLargestClassID;
Header.ClassId = LargestClassId;
if (Header.ClassId != LargestClassId)
dieWithMessage("largest class ID doesn't fit in header\n");
}
// Opportunistic RSS limit check. This will update the RSS limit status, if
// it can, every 250ms, otherwise it will just return the current one.
NOINLINE bool Allocator::isRssLimitExceeded() {
u64 LastCheck = atomic_load_relaxed(&RssLastCheckedAtNS);
const u64 CurrentCheck = MonotonicNanoTime();
if (LIKELY(CurrentCheck < LastCheck + (250ULL * 1000000ULL)))
return atomic_load_relaxed(&RssLimitExceeded);
if (!atomic_compare_exchange_weak(&RssLastCheckedAtNS, &LastCheck,
CurrentCheck, memory_order_relaxed))
return atomic_load_relaxed(&RssLimitExceeded);
// TODO(kostyak): We currently use sanitizer_common's GetRSS which reads the
// RSS from /proc/self/statm by default. We might want to
// call getrusage directly, even if it's less accurate.
const uptr CurrentRssMb = GetRSS() >> 20;
if (HardRssLimitMb && UNLIKELY(HardRssLimitMb < CurrentRssMb))
dieWithMessage("hard RSS limit exhausted (%zdMb vs %zdMb)\n",
HardRssLimitMb, CurrentRssMb);
if (SoftRssLimitMb) {
if (atomic_load_relaxed(&RssLimitExceeded)) {
if (CurrentRssMb <= SoftRssLimitMb)
atomic_store_relaxed(&RssLimitExceeded, false);
} else {
if (CurrentRssMb > SoftRssLimitMb) {
atomic_store_relaxed(&RssLimitExceeded, true);
Printf("Scudo INFO: soft RSS limit exhausted (%zdMb vs %zdMb)\n",
SoftRssLimitMb, CurrentRssMb);
}
}
}
return atomic_load_relaxed(&RssLimitExceeded);
}
static Allocator Instance(LINKER_INITIALIZED);
static BackendT &getBackend() {
return Instance.Backend;
}
void initScudo() {
Instance.init();
#ifdef GWP_ASAN_HOOKS
gwp_asan::options::initOptions();
gwp_asan::options::Options &Opts = gwp_asan::options::getOptions();
Opts.Backtrace = gwp_asan::options::getBacktraceFunction();
Opts.PrintBacktrace = gwp_asan::options::getPrintBacktraceFunction();
GuardedAlloc.init(Opts);
#endif // GWP_ASAN_HOOKS
}
[scudo] Improve the scalability of the shared TSD model Summary: The shared TSD model in its current form doesn't scale. Here is an example of rpc2-benchmark (with default parameters, which is threading heavy) on a 72-core machines (defaulting to a `CompactSizeClassMap` and no Quarantine): - with tcmalloc: 337K reqs/sec, peak RSS of 338MB; - with scudo (exclusive): 321K reqs/sec, peak RSS of 637MB; - with scudo (shared): 241K reqs/sec, peak RSS of 324MB. This isn't great, since the exclusive model uses a lot of memory, while the shared model doesn't even come close to be competitive. This is mostly due to the fact that we are consistently scanning the TSD pool starting at index 0 for an available TSD, which can result in a lot of failed lock attempts, and touching some memory that needs not be touched. This CL attempts to make things better in most situations: - first, use a thread local variable on Linux (intead of pthread APIs) to store the current TSD in the shared model; - move the locking boolean out of the TSD: this allows the compiler to use a register and potentially optimize out a branch instead of reading it from the TSD everytime (we also save a tiny bit of memory per TSD); - 64-bit atomic operations on 32-bit ARM platforms happen to be expensive: so store the `Precedence` in a `uptr` instead of a `u64`. We lose some nanoseconds of precision and we'll wrap around at some point, but the benefit is worth it; - change a `CHECK` to a `DCHECK`: this should never happen, but if something is ever terribly wrong, we'll crash on a near null AV if the TSD happens to be null; - based on an idea by dvyukov@, we are implementing a bound random scan for an available TSD. This requires computing the coprimes for the number of TSDs, and attempting to lock up to 4 TSDs in an random order before falling back to the current one. This is obviously slightly more expansive when we have just 2 TSDs (barely noticeable) but is otherwise beneficial. The `Precedence` still basically corresponds to the moment of the first contention on a TSD. To seed on random choice, we use the precedence of the current TSD since it is very likely to be non-zero (since we are in the slow path after a failed `tryLock`) With those modifications, the benchmark yields to: - with scudo (shared): 330K reqs/sec, peak RSS of 327MB. So the shared model for this specific situation not only becomes competitive but outperforms the exclusive model. I experimented with some values greater than 4 for the number of TSDs to attempt to lock and it yielded a decrease in QPS. Just sticking with the current TSD is also a tad slower. Numbers on platforms with less cores (eg: Android) remain similar. Reviewers: alekseyshl, dvyukov, javed.absar Reviewed By: alekseyshl, dvyukov Subscribers: srhines, kristof.beyls, delcypher, llvm-commits, #sanitizers Differential Revision: https://reviews.llvm.org/D47289 llvm-svn: 334410
2018-06-11 22:50:31 +08:00
void ScudoTSD::init() {
getBackend().initCache(&Cache);
memset(QuarantineCachePlaceHolder, 0, sizeof(QuarantineCachePlaceHolder));
}
void ScudoTSD::commitBack() {
Instance.commitBack(this);
}
void *scudoAllocate(uptr Size, uptr Alignment, AllocType Type) {
if (Alignment && UNLIKELY(!IsPowerOfTwo(Alignment))) {
errno = EINVAL;
if (Instance.canReturnNull())
return nullptr;
reportAllocationAlignmentNotPowerOfTwo(Alignment);
}
return SetErrnoOnNull(Instance.allocate(Size, Alignment, Type));
}
void scudoDeallocate(void *Ptr, uptr Size, uptr Alignment, AllocType Type) {
Instance.deallocate(Ptr, Size, Alignment, Type);
}
void *scudoRealloc(void *Ptr, uptr Size) {
if (!Ptr)
return SetErrnoOnNull(Instance.allocate(Size, MinAlignment, FromMalloc));
if (Size == 0) {
Instance.deallocate(Ptr, 0, 0, FromMalloc);
return nullptr;
}
return SetErrnoOnNull(Instance.reallocate(Ptr, Size));
}
void *scudoCalloc(uptr NMemB, uptr Size) {
return SetErrnoOnNull(Instance.calloc(NMemB, Size));
}
void *scudoValloc(uptr Size) {
return SetErrnoOnNull(
Instance.allocate(Size, GetPageSizeCached(), FromMemalign));
}
void *scudoPvalloc(uptr Size) {
const uptr PageSize = GetPageSizeCached();
if (UNLIKELY(CheckForPvallocOverflow(Size, PageSize))) {
errno = ENOMEM;
if (Instance.canReturnNull())
return nullptr;
reportPvallocOverflow(Size);
}
// pvalloc(0) should allocate one page.
Size = Size ? RoundUpTo(Size, PageSize) : PageSize;
return SetErrnoOnNull(Instance.allocate(Size, PageSize, FromMemalign));
}
int scudoPosixMemalign(void **MemPtr, uptr Alignment, uptr Size) {
if (UNLIKELY(!CheckPosixMemalignAlignment(Alignment))) {
if (!Instance.canReturnNull())
reportInvalidPosixMemalignAlignment(Alignment);
return EINVAL;
}
void *Ptr = Instance.allocate(Size, Alignment, FromMemalign);
if (UNLIKELY(!Ptr))
return ENOMEM;
*MemPtr = Ptr;
return 0;
}
void *scudoAlignedAlloc(uptr Alignment, uptr Size) {
if (UNLIKELY(!CheckAlignedAllocAlignmentAndSize(Alignment, Size))) {
errno = EINVAL;
if (Instance.canReturnNull())
return nullptr;
reportInvalidAlignedAllocAlignment(Size, Alignment);
}
return SetErrnoOnNull(Instance.allocate(Size, Alignment, FromMalloc));
}
uptr scudoMallocUsableSize(void *Ptr) {
return Instance.getUsableSize(Ptr);
}
[scudo] 32-bit and hardware agnostic support Summary: This update introduces i386 support for the Scudo Hardened Allocator, and offers software alternatives for functions that used to require hardware specific instruction sets. This should make porting to new architectures easier. Among the changes: - The chunk header has been changed to accomodate the size limitations encountered on 32-bit architectures. We now fit everything in 64-bit. This was achieved by storing the amount of unused bytes in an allocation rather than the size itself, as one can be deduced from the other with the help of the GetActuallyAllocatedSize function. As it turns out, this header can be used for both 64 and 32 bit, and as such we dropped the requirement for the 128-bit compare and exchange instruction support (cmpxchg16b). - Add 32-bit support for the checksum and the PRNG functions: if the SSE 4.2 instruction set is supported, use the 32-bit CRC32 instruction, and in the XorShift128, use a 32-bit based state instead of 64-bit. - Add software support for CRC32: if SSE 4.2 is not supported, fallback on a software implementation. - Modify tests that were not 32-bit compliant, and expand them to cover more allocation and alignment sizes. The random shuffle test has been deactivated for linux-i386 & linux-i686 as the 32-bit sanitizer allocator doesn't currently randomize chunks. Reviewers: alekseyshl, kcc Subscribers: filcab, llvm-commits, tberghammer, danalbert, srhines, mgorny, modocache Differential Revision: https://reviews.llvm.org/D26358 llvm-svn: 288255
2016-12-01 01:32:20 +08:00
} // namespace __scudo
using namespace __scudo;
// MallocExtension helper functions
uptr __sanitizer_get_current_allocated_bytes() {
return Instance.getStats(AllocatorStatAllocated);
}
uptr __sanitizer_get_heap_size() {
return Instance.getStats(AllocatorStatMapped);
}
uptr __sanitizer_get_free_bytes() {
return 1;
}
uptr __sanitizer_get_unmapped_bytes() {
return 1;
}
uptr __sanitizer_get_estimated_allocated_size(uptr Size) {
return Size;
}
[scudo] 32-bit and hardware agnostic support Summary: This update introduces i386 support for the Scudo Hardened Allocator, and offers software alternatives for functions that used to require hardware specific instruction sets. This should make porting to new architectures easier. Among the changes: - The chunk header has been changed to accomodate the size limitations encountered on 32-bit architectures. We now fit everything in 64-bit. This was achieved by storing the amount of unused bytes in an allocation rather than the size itself, as one can be deduced from the other with the help of the GetActuallyAllocatedSize function. As it turns out, this header can be used for both 64 and 32 bit, and as such we dropped the requirement for the 128-bit compare and exchange instruction support (cmpxchg16b). - Add 32-bit support for the checksum and the PRNG functions: if the SSE 4.2 instruction set is supported, use the 32-bit CRC32 instruction, and in the XorShift128, use a 32-bit based state instead of 64-bit. - Add software support for CRC32: if SSE 4.2 is not supported, fallback on a software implementation. - Modify tests that were not 32-bit compliant, and expand them to cover more allocation and alignment sizes. The random shuffle test has been deactivated for linux-i386 & linux-i686 as the 32-bit sanitizer allocator doesn't currently randomize chunks. Reviewers: alekseyshl, kcc Subscribers: filcab, llvm-commits, tberghammer, danalbert, srhines, mgorny, modocache Differential Revision: https://reviews.llvm.org/D26358 llvm-svn: 288255
2016-12-01 01:32:20 +08:00
int __sanitizer_get_ownership(const void *Ptr) {
return Instance.isValidPointer(Ptr);
}
[scudo] 32-bit and hardware agnostic support Summary: This update introduces i386 support for the Scudo Hardened Allocator, and offers software alternatives for functions that used to require hardware specific instruction sets. This should make porting to new architectures easier. Among the changes: - The chunk header has been changed to accomodate the size limitations encountered on 32-bit architectures. We now fit everything in 64-bit. This was achieved by storing the amount of unused bytes in an allocation rather than the size itself, as one can be deduced from the other with the help of the GetActuallyAllocatedSize function. As it turns out, this header can be used for both 64 and 32 bit, and as such we dropped the requirement for the 128-bit compare and exchange instruction support (cmpxchg16b). - Add 32-bit support for the checksum and the PRNG functions: if the SSE 4.2 instruction set is supported, use the 32-bit CRC32 instruction, and in the XorShift128, use a 32-bit based state instead of 64-bit. - Add software support for CRC32: if SSE 4.2 is not supported, fallback on a software implementation. - Modify tests that were not 32-bit compliant, and expand them to cover more allocation and alignment sizes. The random shuffle test has been deactivated for linux-i386 & linux-i686 as the 32-bit sanitizer allocator doesn't currently randomize chunks. Reviewers: alekseyshl, kcc Subscribers: filcab, llvm-commits, tberghammer, danalbert, srhines, mgorny, modocache Differential Revision: https://reviews.llvm.org/D26358 llvm-svn: 288255
2016-12-01 01:32:20 +08:00
uptr __sanitizer_get_allocated_size(const void *Ptr) {
return Instance.getUsableSize(Ptr);
}
#if !SANITIZER_SUPPORTS_WEAK_HOOKS
SANITIZER_INTERFACE_WEAK_DEF(void, __sanitizer_malloc_hook,
void *Ptr, uptr Size) {
(void)Ptr;
(void)Size;
}
SANITIZER_INTERFACE_WEAK_DEF(void, __sanitizer_free_hook, void *Ptr) {
(void)Ptr;
}
#endif
// Interface functions
void __scudo_set_rss_limit(uptr LimitMb, s32 HardLimit) {
if (!SCUDO_CAN_USE_PUBLIC_INTERFACE)
return;
Instance.setRssLimit(LimitMb, !!HardLimit);
}
void __scudo_print_stats() {
Instance.printStats();
}