llvm-project/compiler-rt/lib/scudo/scudo_allocator.cpp

757 lines
28 KiB
C++
Raw Normal View History

//===-- scudo_allocator.cpp -------------------------------------*- C++ -*-===//
//
// The LLVM Compiler Infrastructure
//
// This file is distributed under the University of Illinois Open Source
// License. See LICENSE.TXT for details.
//
//===----------------------------------------------------------------------===//
///
/// Scudo Hardened Allocator implementation.
/// It uses the sanitizer_common allocator as a base and aims at mitigating
/// heap corruption vulnerabilities. It provides a checksum-guarded chunk
/// header, a delayed free list, and additional sanity checks.
///
//===----------------------------------------------------------------------===//
#include "scudo_allocator.h"
[scudo] CRC32 optimizations Summary: This change optimizes several aspects of the checksum used for chunk headers. First, there is no point in checking the weak symbol `computeHardwareCRC32` everytime, it will either be there or not when we start, so check it once during initialization and set the checksum type accordingly. Then, the loading of `HashAlgorithm` for SSE versions (and ARM equivalent) was not optimized out, while not necessary. So I reshuffled that part of the code, which duplicates a tiny bit of code, but ends up in a much cleaner assembly (and faster as we avoid an extraneous load and some calls). The following code is the checksum at the end of `scudoMalloc` for x86_64 with full SSE 4.2, before: ``` mov rax, 0FFFFFFFFFFFFFFh shl r10, 38h mov edi, dword ptr cs:_ZN7__scudoL6CookieE ; __scudo::Cookie and r14, rax lea rsi, [r13-10h] movzx eax, cs:_ZN7__scudoL13HashAlgorithmE ; __scudo::HashAlgorithm or r14, r10 mov rbx, r14 xor bx, bx call _ZN7__scudo20computeHardwareCRC32Ejm ; __scudo::computeHardwareCRC32(uint,ulong) mov rsi, rbx mov edi, eax call _ZN7__scudo20computeHardwareCRC32Ejm ; __scudo::computeHardwareCRC32(uint,ulong) mov r14w, ax mov rax, r13 mov [r13-10h], r14 ``` After: ``` mov rax, cs:_ZN7__scudoL6CookieE ; __scudo::Cookie lea rcx, [rbx-10h] mov rdx, 0FFFFFFFFFFFFFFh and r14, rdx shl r9, 38h or r14, r9 crc32 eax, rcx mov rdx, r14 xor dx, dx mov eax, eax crc32 eax, rdx mov r14w, ax mov rax, rbx mov [rbx-10h], r14 ``` Reviewers: dvyukov, alekseyshl, kcc Reviewed By: alekseyshl Subscribers: aemerson, rengolin, llvm-commits Differential Revision: https://reviews.llvm.org/D32971 llvm-svn: 302538
2017-05-09 23:12:38 +08:00
#include "scudo_crc32.h"
[scudo] Application & platform compatibility changes Summary: This patch changes a few (small) things around for compatibility purposes for the current Android & Fuchsia work: - `realloc`'ing some memory that was not allocated with `malloc`, `calloc` or `realloc`, while UB according to http://pubs.opengroup.org/onlinepubs/009695399/functions/realloc.html is more common that one would think. We now only check this if `DeallocationTypeMismatch` is set; change the "mismatch" error messages to be more homogeneous; - some sketchily written but widely used libraries expect a call to `realloc` to copy the usable size of the old chunk to the new one instead of the requested size. We have to begrundingly abide by this de-facto standard. This doesn't seem to impact security either way, unless someone comes up with something we didn't think about; - the CRC32 intrinsics for 64-bit take a 64-bit first argument. This is misleading as the upper 32 bits end up being ignored. This was also raising `-Wconversion` errors. Change things to take a `u32` as first argument. This also means we were (and are) only using 32 bits of the Cookie - not a big thing, but worth mentioning. - Includes-wise: prefer `stddef.h` to `cstddef`, move `scudo_flags.h` where it is actually needed. - Add tests for the memalign-realloc case, and the realloc-usable-size one. (Edited typos) Reviewers: alekseyshl Reviewed By: alekseyshl Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D36754 llvm-svn: 311018
2017-08-17 00:40:48 +08:00
#include "scudo_flags.h"
#include "scudo_tls.h"
#include "scudo_utils.h"
#include "sanitizer_common/sanitizer_allocator_checks.h"
#include "sanitizer_common/sanitizer_allocator_interface.h"
#include "sanitizer_common/sanitizer_errno.h"
#include "sanitizer_common/sanitizer_quarantine.h"
#include <string.h>
namespace __scudo {
// Global static cookie, initialized at start-up.
[scudo] 32-bit and hardware agnostic support Summary: This update introduces i386 support for the Scudo Hardened Allocator, and offers software alternatives for functions that used to require hardware specific instruction sets. This should make porting to new architectures easier. Among the changes: - The chunk header has been changed to accomodate the size limitations encountered on 32-bit architectures. We now fit everything in 64-bit. This was achieved by storing the amount of unused bytes in an allocation rather than the size itself, as one can be deduced from the other with the help of the GetActuallyAllocatedSize function. As it turns out, this header can be used for both 64 and 32 bit, and as such we dropped the requirement for the 128-bit compare and exchange instruction support (cmpxchg16b). - Add 32-bit support for the checksum and the PRNG functions: if the SSE 4.2 instruction set is supported, use the 32-bit CRC32 instruction, and in the XorShift128, use a 32-bit based state instead of 64-bit. - Add software support for CRC32: if SSE 4.2 is not supported, fallback on a software implementation. - Modify tests that were not 32-bit compliant, and expand them to cover more allocation and alignment sizes. The random shuffle test has been deactivated for linux-i386 & linux-i686 as the 32-bit sanitizer allocator doesn't currently randomize chunks. Reviewers: alekseyshl, kcc Subscribers: filcab, llvm-commits, tberghammer, danalbert, srhines, mgorny, modocache Differential Revision: https://reviews.llvm.org/D26358 llvm-svn: 288255
2016-12-01 01:32:20 +08:00
static uptr Cookie;
// We default to software CRC32 if the alternatives are not supported, either
// at compilation or at runtime.
static atomic_uint8_t HashAlgorithm = { CRC32Software };
[scudo] Application & platform compatibility changes Summary: This patch changes a few (small) things around for compatibility purposes for the current Android & Fuchsia work: - `realloc`'ing some memory that was not allocated with `malloc`, `calloc` or `realloc`, while UB according to http://pubs.opengroup.org/onlinepubs/009695399/functions/realloc.html is more common that one would think. We now only check this if `DeallocationTypeMismatch` is set; change the "mismatch" error messages to be more homogeneous; - some sketchily written but widely used libraries expect a call to `realloc` to copy the usable size of the old chunk to the new one instead of the requested size. We have to begrundingly abide by this de-facto standard. This doesn't seem to impact security either way, unless someone comes up with something we didn't think about; - the CRC32 intrinsics for 64-bit take a 64-bit first argument. This is misleading as the upper 32 bits end up being ignored. This was also raising `-Wconversion` errors. Change things to take a `u32` as first argument. This also means we were (and are) only using 32 bits of the Cookie - not a big thing, but worth mentioning. - Includes-wise: prefer `stddef.h` to `cstddef`, move `scudo_flags.h` where it is actually needed. - Add tests for the memalign-realloc case, and the realloc-usable-size one. (Edited typos) Reviewers: alekseyshl Reviewed By: alekseyshl Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D36754 llvm-svn: 311018
2017-08-17 00:40:48 +08:00
INLINE u32 computeCRC32(u32 Crc, uptr Value, uptr *Array, uptr ArraySize) {
[scudo] CRC32 optimizations Summary: This change optimizes several aspects of the checksum used for chunk headers. First, there is no point in checking the weak symbol `computeHardwareCRC32` everytime, it will either be there or not when we start, so check it once during initialization and set the checksum type accordingly. Then, the loading of `HashAlgorithm` for SSE versions (and ARM equivalent) was not optimized out, while not necessary. So I reshuffled that part of the code, which duplicates a tiny bit of code, but ends up in a much cleaner assembly (and faster as we avoid an extraneous load and some calls). The following code is the checksum at the end of `scudoMalloc` for x86_64 with full SSE 4.2, before: ``` mov rax, 0FFFFFFFFFFFFFFh shl r10, 38h mov edi, dword ptr cs:_ZN7__scudoL6CookieE ; __scudo::Cookie and r14, rax lea rsi, [r13-10h] movzx eax, cs:_ZN7__scudoL13HashAlgorithmE ; __scudo::HashAlgorithm or r14, r10 mov rbx, r14 xor bx, bx call _ZN7__scudo20computeHardwareCRC32Ejm ; __scudo::computeHardwareCRC32(uint,ulong) mov rsi, rbx mov edi, eax call _ZN7__scudo20computeHardwareCRC32Ejm ; __scudo::computeHardwareCRC32(uint,ulong) mov r14w, ax mov rax, r13 mov [r13-10h], r14 ``` After: ``` mov rax, cs:_ZN7__scudoL6CookieE ; __scudo::Cookie lea rcx, [rbx-10h] mov rdx, 0FFFFFFFFFFFFFFh and r14, rdx shl r9, 38h or r14, r9 crc32 eax, rcx mov rdx, r14 xor dx, dx mov eax, eax crc32 eax, rdx mov r14w, ax mov rax, rbx mov [rbx-10h], r14 ``` Reviewers: dvyukov, alekseyshl, kcc Reviewed By: alekseyshl Subscribers: aemerson, rengolin, llvm-commits Differential Revision: https://reviews.llvm.org/D32971 llvm-svn: 302538
2017-05-09 23:12:38 +08:00
// If the hardware CRC32 feature is defined here, it was enabled everywhere,
// as opposed to only for scudo_crc32.cpp. This means that other hardware
// specific instructions were likely emitted at other places, and as a
// result there is no reason to not use it here.
#if defined(__SSE4_2__) || defined(__ARM_FEATURE_CRC32)
[scudo] CRC32 optimizations Summary: This change optimizes several aspects of the checksum used for chunk headers. First, there is no point in checking the weak symbol `computeHardwareCRC32` everytime, it will either be there or not when we start, so check it once during initialization and set the checksum type accordingly. Then, the loading of `HashAlgorithm` for SSE versions (and ARM equivalent) was not optimized out, while not necessary. So I reshuffled that part of the code, which duplicates a tiny bit of code, but ends up in a much cleaner assembly (and faster as we avoid an extraneous load and some calls). The following code is the checksum at the end of `scudoMalloc` for x86_64 with full SSE 4.2, before: ``` mov rax, 0FFFFFFFFFFFFFFh shl r10, 38h mov edi, dword ptr cs:_ZN7__scudoL6CookieE ; __scudo::Cookie and r14, rax lea rsi, [r13-10h] movzx eax, cs:_ZN7__scudoL13HashAlgorithmE ; __scudo::HashAlgorithm or r14, r10 mov rbx, r14 xor bx, bx call _ZN7__scudo20computeHardwareCRC32Ejm ; __scudo::computeHardwareCRC32(uint,ulong) mov rsi, rbx mov edi, eax call _ZN7__scudo20computeHardwareCRC32Ejm ; __scudo::computeHardwareCRC32(uint,ulong) mov r14w, ax mov rax, r13 mov [r13-10h], r14 ``` After: ``` mov rax, cs:_ZN7__scudoL6CookieE ; __scudo::Cookie lea rcx, [rbx-10h] mov rdx, 0FFFFFFFFFFFFFFh and r14, rdx shl r9, 38h or r14, r9 crc32 eax, rcx mov rdx, r14 xor dx, dx mov eax, eax crc32 eax, rdx mov r14w, ax mov rax, rbx mov [rbx-10h], r14 ``` Reviewers: dvyukov, alekseyshl, kcc Reviewed By: alekseyshl Subscribers: aemerson, rengolin, llvm-commits Differential Revision: https://reviews.llvm.org/D32971 llvm-svn: 302538
2017-05-09 23:12:38 +08:00
Crc = CRC32_INTRINSIC(Crc, Value);
for (uptr i = 0; i < ArraySize; i++)
Crc = CRC32_INTRINSIC(Crc, Array[i]);
return Crc;
#else
[scudo] CRC32 optimizations Summary: This change optimizes several aspects of the checksum used for chunk headers. First, there is no point in checking the weak symbol `computeHardwareCRC32` everytime, it will either be there or not when we start, so check it once during initialization and set the checksum type accordingly. Then, the loading of `HashAlgorithm` for SSE versions (and ARM equivalent) was not optimized out, while not necessary. So I reshuffled that part of the code, which duplicates a tiny bit of code, but ends up in a much cleaner assembly (and faster as we avoid an extraneous load and some calls). The following code is the checksum at the end of `scudoMalloc` for x86_64 with full SSE 4.2, before: ``` mov rax, 0FFFFFFFFFFFFFFh shl r10, 38h mov edi, dword ptr cs:_ZN7__scudoL6CookieE ; __scudo::Cookie and r14, rax lea rsi, [r13-10h] movzx eax, cs:_ZN7__scudoL13HashAlgorithmE ; __scudo::HashAlgorithm or r14, r10 mov rbx, r14 xor bx, bx call _ZN7__scudo20computeHardwareCRC32Ejm ; __scudo::computeHardwareCRC32(uint,ulong) mov rsi, rbx mov edi, eax call _ZN7__scudo20computeHardwareCRC32Ejm ; __scudo::computeHardwareCRC32(uint,ulong) mov r14w, ax mov rax, r13 mov [r13-10h], r14 ``` After: ``` mov rax, cs:_ZN7__scudoL6CookieE ; __scudo::Cookie lea rcx, [rbx-10h] mov rdx, 0FFFFFFFFFFFFFFh and r14, rdx shl r9, 38h or r14, r9 crc32 eax, rcx mov rdx, r14 xor dx, dx mov eax, eax crc32 eax, rdx mov r14w, ax mov rax, rbx mov [rbx-10h], r14 ``` Reviewers: dvyukov, alekseyshl, kcc Reviewed By: alekseyshl Subscribers: aemerson, rengolin, llvm-commits Differential Revision: https://reviews.llvm.org/D32971 llvm-svn: 302538
2017-05-09 23:12:38 +08:00
if (atomic_load_relaxed(&HashAlgorithm) == CRC32Hardware) {
Crc = computeHardwareCRC32(Crc, Value);
for (uptr i = 0; i < ArraySize; i++)
Crc = computeHardwareCRC32(Crc, Array[i]);
return Crc;
}
Crc = computeSoftwareCRC32(Crc, Value);
for (uptr i = 0; i < ArraySize; i++)
Crc = computeSoftwareCRC32(Crc, Array[i]);
return Crc;
#endif // defined(__SSE4_2__) || defined(__ARM_FEATURE_CRC32)
[scudo] 32-bit and hardware agnostic support Summary: This update introduces i386 support for the Scudo Hardened Allocator, and offers software alternatives for functions that used to require hardware specific instruction sets. This should make porting to new architectures easier. Among the changes: - The chunk header has been changed to accomodate the size limitations encountered on 32-bit architectures. We now fit everything in 64-bit. This was achieved by storing the amount of unused bytes in an allocation rather than the size itself, as one can be deduced from the other with the help of the GetActuallyAllocatedSize function. As it turns out, this header can be used for both 64 and 32 bit, and as such we dropped the requirement for the 128-bit compare and exchange instruction support (cmpxchg16b). - Add 32-bit support for the checksum and the PRNG functions: if the SSE 4.2 instruction set is supported, use the 32-bit CRC32 instruction, and in the XorShift128, use a 32-bit based state instead of 64-bit. - Add software support for CRC32: if SSE 4.2 is not supported, fallback on a software implementation. - Modify tests that were not 32-bit compliant, and expand them to cover more allocation and alignment sizes. The random shuffle test has been deactivated for linux-i386 & linux-i686 as the 32-bit sanitizer allocator doesn't currently randomize chunks. Reviewers: alekseyshl, kcc Subscribers: filcab, llvm-commits, tberghammer, danalbert, srhines, mgorny, modocache Differential Revision: https://reviews.llvm.org/D26358 llvm-svn: 288255
2016-12-01 01:32:20 +08:00
}
static ScudoBackendAllocator &getBackendAllocator();
struct ScudoChunk : UnpackedHeader {
// We can't use the offset member of the chunk itself, as we would double
// fetch it without any warranty that it wouldn't have been tampered. To
// prevent this, we work with a local copy of the header.
void *getAllocBeg(UnpackedHeader *Header) {
return reinterpret_cast<void *>(
reinterpret_cast<uptr>(this) - (Header->Offset << MinAlignmentLog));
}
[scudo] 32-bit and hardware agnostic support Summary: This update introduces i386 support for the Scudo Hardened Allocator, and offers software alternatives for functions that used to require hardware specific instruction sets. This should make porting to new architectures easier. Among the changes: - The chunk header has been changed to accomodate the size limitations encountered on 32-bit architectures. We now fit everything in 64-bit. This was achieved by storing the amount of unused bytes in an allocation rather than the size itself, as one can be deduced from the other with the help of the GetActuallyAllocatedSize function. As it turns out, this header can be used for both 64 and 32 bit, and as such we dropped the requirement for the 128-bit compare and exchange instruction support (cmpxchg16b). - Add 32-bit support for the checksum and the PRNG functions: if the SSE 4.2 instruction set is supported, use the 32-bit CRC32 instruction, and in the XorShift128, use a 32-bit based state instead of 64-bit. - Add software support for CRC32: if SSE 4.2 is not supported, fallback on a software implementation. - Modify tests that were not 32-bit compliant, and expand them to cover more allocation and alignment sizes. The random shuffle test has been deactivated for linux-i386 & linux-i686 as the 32-bit sanitizer allocator doesn't currently randomize chunks. Reviewers: alekseyshl, kcc Subscribers: filcab, llvm-commits, tberghammer, danalbert, srhines, mgorny, modocache Differential Revision: https://reviews.llvm.org/D26358 llvm-svn: 288255
2016-12-01 01:32:20 +08:00
// Returns the usable size for a chunk, meaning the amount of bytes from the
// beginning of the user data to the end of the backend allocated chunk.
uptr getUsableSize(UnpackedHeader *Header) {
uptr Size =
getBackendAllocator().getActuallyAllocatedSize(getAllocBeg(Header),
Header->FromPrimary);
[scudo] 32-bit and hardware agnostic support Summary: This update introduces i386 support for the Scudo Hardened Allocator, and offers software alternatives for functions that used to require hardware specific instruction sets. This should make porting to new architectures easier. Among the changes: - The chunk header has been changed to accomodate the size limitations encountered on 32-bit architectures. We now fit everything in 64-bit. This was achieved by storing the amount of unused bytes in an allocation rather than the size itself, as one can be deduced from the other with the help of the GetActuallyAllocatedSize function. As it turns out, this header can be used for both 64 and 32 bit, and as such we dropped the requirement for the 128-bit compare and exchange instruction support (cmpxchg16b). - Add 32-bit support for the checksum and the PRNG functions: if the SSE 4.2 instruction set is supported, use the 32-bit CRC32 instruction, and in the XorShift128, use a 32-bit based state instead of 64-bit. - Add software support for CRC32: if SSE 4.2 is not supported, fallback on a software implementation. - Modify tests that were not 32-bit compliant, and expand them to cover more allocation and alignment sizes. The random shuffle test has been deactivated for linux-i386 & linux-i686 as the 32-bit sanitizer allocator doesn't currently randomize chunks. Reviewers: alekseyshl, kcc Subscribers: filcab, llvm-commits, tberghammer, danalbert, srhines, mgorny, modocache Differential Revision: https://reviews.llvm.org/D26358 llvm-svn: 288255
2016-12-01 01:32:20 +08:00
if (Size == 0)
return 0;
[scudo] 32-bit and hardware agnostic support Summary: This update introduces i386 support for the Scudo Hardened Allocator, and offers software alternatives for functions that used to require hardware specific instruction sets. This should make porting to new architectures easier. Among the changes: - The chunk header has been changed to accomodate the size limitations encountered on 32-bit architectures. We now fit everything in 64-bit. This was achieved by storing the amount of unused bytes in an allocation rather than the size itself, as one can be deduced from the other with the help of the GetActuallyAllocatedSize function. As it turns out, this header can be used for both 64 and 32 bit, and as such we dropped the requirement for the 128-bit compare and exchange instruction support (cmpxchg16b). - Add 32-bit support for the checksum and the PRNG functions: if the SSE 4.2 instruction set is supported, use the 32-bit CRC32 instruction, and in the XorShift128, use a 32-bit based state instead of 64-bit. - Add software support for CRC32: if SSE 4.2 is not supported, fallback on a software implementation. - Modify tests that were not 32-bit compliant, and expand them to cover more allocation and alignment sizes. The random shuffle test has been deactivated for linux-i386 & linux-i686 as the 32-bit sanitizer allocator doesn't currently randomize chunks. Reviewers: alekseyshl, kcc Subscribers: filcab, llvm-commits, tberghammer, danalbert, srhines, mgorny, modocache Differential Revision: https://reviews.llvm.org/D26358 llvm-svn: 288255
2016-12-01 01:32:20 +08:00
return Size - AlignedChunkHeaderSize - (Header->Offset << MinAlignmentLog);
}
// Compute the checksum of the Chunk pointer and its ChunkHeader.
u16 computeChecksum(UnpackedHeader *Header) const {
[scudo] 32-bit and hardware agnostic support Summary: This update introduces i386 support for the Scudo Hardened Allocator, and offers software alternatives for functions that used to require hardware specific instruction sets. This should make porting to new architectures easier. Among the changes: - The chunk header has been changed to accomodate the size limitations encountered on 32-bit architectures. We now fit everything in 64-bit. This was achieved by storing the amount of unused bytes in an allocation rather than the size itself, as one can be deduced from the other with the help of the GetActuallyAllocatedSize function. As it turns out, this header can be used for both 64 and 32 bit, and as such we dropped the requirement for the 128-bit compare and exchange instruction support (cmpxchg16b). - Add 32-bit support for the checksum and the PRNG functions: if the SSE 4.2 instruction set is supported, use the 32-bit CRC32 instruction, and in the XorShift128, use a 32-bit based state instead of 64-bit. - Add software support for CRC32: if SSE 4.2 is not supported, fallback on a software implementation. - Modify tests that were not 32-bit compliant, and expand them to cover more allocation and alignment sizes. The random shuffle test has been deactivated for linux-i386 & linux-i686 as the 32-bit sanitizer allocator doesn't currently randomize chunks. Reviewers: alekseyshl, kcc Subscribers: filcab, llvm-commits, tberghammer, danalbert, srhines, mgorny, modocache Differential Revision: https://reviews.llvm.org/D26358 llvm-svn: 288255
2016-12-01 01:32:20 +08:00
UnpackedHeader ZeroChecksumHeader = *Header;
ZeroChecksumHeader.Checksum = 0;
uptr HeaderHolder[sizeof(UnpackedHeader) / sizeof(uptr)];
memcpy(&HeaderHolder, &ZeroChecksumHeader, sizeof(HeaderHolder));
[scudo] Application & platform compatibility changes Summary: This patch changes a few (small) things around for compatibility purposes for the current Android & Fuchsia work: - `realloc`'ing some memory that was not allocated with `malloc`, `calloc` or `realloc`, while UB according to http://pubs.opengroup.org/onlinepubs/009695399/functions/realloc.html is more common that one would think. We now only check this if `DeallocationTypeMismatch` is set; change the "mismatch" error messages to be more homogeneous; - some sketchily written but widely used libraries expect a call to `realloc` to copy the usable size of the old chunk to the new one instead of the requested size. We have to begrundingly abide by this de-facto standard. This doesn't seem to impact security either way, unless someone comes up with something we didn't think about; - the CRC32 intrinsics for 64-bit take a 64-bit first argument. This is misleading as the upper 32 bits end up being ignored. This was also raising `-Wconversion` errors. Change things to take a `u32` as first argument. This also means we were (and are) only using 32 bits of the Cookie - not a big thing, but worth mentioning. - Includes-wise: prefer `stddef.h` to `cstddef`, move `scudo_flags.h` where it is actually needed. - Add tests for the memalign-realloc case, and the realloc-usable-size one. (Edited typos) Reviewers: alekseyshl Reviewed By: alekseyshl Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D36754 llvm-svn: 311018
2017-08-17 00:40:48 +08:00
u32 Crc = computeCRC32(static_cast<u32>(Cookie),
reinterpret_cast<uptr>(this), HeaderHolder,
[scudo] CRC32 optimizations Summary: This change optimizes several aspects of the checksum used for chunk headers. First, there is no point in checking the weak symbol `computeHardwareCRC32` everytime, it will either be there or not when we start, so check it once during initialization and set the checksum type accordingly. Then, the loading of `HashAlgorithm` for SSE versions (and ARM equivalent) was not optimized out, while not necessary. So I reshuffled that part of the code, which duplicates a tiny bit of code, but ends up in a much cleaner assembly (and faster as we avoid an extraneous load and some calls). The following code is the checksum at the end of `scudoMalloc` for x86_64 with full SSE 4.2, before: ``` mov rax, 0FFFFFFFFFFFFFFh shl r10, 38h mov edi, dword ptr cs:_ZN7__scudoL6CookieE ; __scudo::Cookie and r14, rax lea rsi, [r13-10h] movzx eax, cs:_ZN7__scudoL13HashAlgorithmE ; __scudo::HashAlgorithm or r14, r10 mov rbx, r14 xor bx, bx call _ZN7__scudo20computeHardwareCRC32Ejm ; __scudo::computeHardwareCRC32(uint,ulong) mov rsi, rbx mov edi, eax call _ZN7__scudo20computeHardwareCRC32Ejm ; __scudo::computeHardwareCRC32(uint,ulong) mov r14w, ax mov rax, r13 mov [r13-10h], r14 ``` After: ``` mov rax, cs:_ZN7__scudoL6CookieE ; __scudo::Cookie lea rcx, [rbx-10h] mov rdx, 0FFFFFFFFFFFFFFh and r14, rdx shl r9, 38h or r14, r9 crc32 eax, rcx mov rdx, r14 xor dx, dx mov eax, eax crc32 eax, rdx mov r14w, ax mov rax, rbx mov [rbx-10h], r14 ``` Reviewers: dvyukov, alekseyshl, kcc Reviewed By: alekseyshl Subscribers: aemerson, rengolin, llvm-commits Differential Revision: https://reviews.llvm.org/D32971 llvm-svn: 302538
2017-05-09 23:12:38 +08:00
ARRAY_SIZE(HeaderHolder));
return static_cast<u16>(Crc);
[scudo] 32-bit and hardware agnostic support Summary: This update introduces i386 support for the Scudo Hardened Allocator, and offers software alternatives for functions that used to require hardware specific instruction sets. This should make porting to new architectures easier. Among the changes: - The chunk header has been changed to accomodate the size limitations encountered on 32-bit architectures. We now fit everything in 64-bit. This was achieved by storing the amount of unused bytes in an allocation rather than the size itself, as one can be deduced from the other with the help of the GetActuallyAllocatedSize function. As it turns out, this header can be used for both 64 and 32 bit, and as such we dropped the requirement for the 128-bit compare and exchange instruction support (cmpxchg16b). - Add 32-bit support for the checksum and the PRNG functions: if the SSE 4.2 instruction set is supported, use the 32-bit CRC32 instruction, and in the XorShift128, use a 32-bit based state instead of 64-bit. - Add software support for CRC32: if SSE 4.2 is not supported, fallback on a software implementation. - Modify tests that were not 32-bit compliant, and expand them to cover more allocation and alignment sizes. The random shuffle test has been deactivated for linux-i386 & linux-i686 as the 32-bit sanitizer allocator doesn't currently randomize chunks. Reviewers: alekseyshl, kcc Subscribers: filcab, llvm-commits, tberghammer, danalbert, srhines, mgorny, modocache Differential Revision: https://reviews.llvm.org/D26358 llvm-svn: 288255
2016-12-01 01:32:20 +08:00
}
// Checks the validity of a chunk by verifying its checksum. It doesn't
// incur termination in the event of an invalid chunk.
[scudo] 32-bit and hardware agnostic support Summary: This update introduces i386 support for the Scudo Hardened Allocator, and offers software alternatives for functions that used to require hardware specific instruction sets. This should make porting to new architectures easier. Among the changes: - The chunk header has been changed to accomodate the size limitations encountered on 32-bit architectures. We now fit everything in 64-bit. This was achieved by storing the amount of unused bytes in an allocation rather than the size itself, as one can be deduced from the other with the help of the GetActuallyAllocatedSize function. As it turns out, this header can be used for both 64 and 32 bit, and as such we dropped the requirement for the 128-bit compare and exchange instruction support (cmpxchg16b). - Add 32-bit support for the checksum and the PRNG functions: if the SSE 4.2 instruction set is supported, use the 32-bit CRC32 instruction, and in the XorShift128, use a 32-bit based state instead of 64-bit. - Add software support for CRC32: if SSE 4.2 is not supported, fallback on a software implementation. - Modify tests that were not 32-bit compliant, and expand them to cover more allocation and alignment sizes. The random shuffle test has been deactivated for linux-i386 & linux-i686 as the 32-bit sanitizer allocator doesn't currently randomize chunks. Reviewers: alekseyshl, kcc Subscribers: filcab, llvm-commits, tberghammer, danalbert, srhines, mgorny, modocache Differential Revision: https://reviews.llvm.org/D26358 llvm-svn: 288255
2016-12-01 01:32:20 +08:00
bool isValid() {
UnpackedHeader NewUnpackedHeader;
const AtomicPackedHeader *AtomicHeader =
reinterpret_cast<const AtomicPackedHeader *>(this);
PackedHeader NewPackedHeader = atomic_load_relaxed(AtomicHeader);
[scudo] 32-bit and hardware agnostic support Summary: This update introduces i386 support for the Scudo Hardened Allocator, and offers software alternatives for functions that used to require hardware specific instruction sets. This should make porting to new architectures easier. Among the changes: - The chunk header has been changed to accomodate the size limitations encountered on 32-bit architectures. We now fit everything in 64-bit. This was achieved by storing the amount of unused bytes in an allocation rather than the size itself, as one can be deduced from the other with the help of the GetActuallyAllocatedSize function. As it turns out, this header can be used for both 64 and 32 bit, and as such we dropped the requirement for the 128-bit compare and exchange instruction support (cmpxchg16b). - Add 32-bit support for the checksum and the PRNG functions: if the SSE 4.2 instruction set is supported, use the 32-bit CRC32 instruction, and in the XorShift128, use a 32-bit based state instead of 64-bit. - Add software support for CRC32: if SSE 4.2 is not supported, fallback on a software implementation. - Modify tests that were not 32-bit compliant, and expand them to cover more allocation and alignment sizes. The random shuffle test has been deactivated for linux-i386 & linux-i686 as the 32-bit sanitizer allocator doesn't currently randomize chunks. Reviewers: alekseyshl, kcc Subscribers: filcab, llvm-commits, tberghammer, danalbert, srhines, mgorny, modocache Differential Revision: https://reviews.llvm.org/D26358 llvm-svn: 288255
2016-12-01 01:32:20 +08:00
NewUnpackedHeader = bit_cast<UnpackedHeader>(NewPackedHeader);
return (NewUnpackedHeader.Checksum == computeChecksum(&NewUnpackedHeader));
}
// Nulls out a chunk header. When returning the chunk to the backend, there
// is no need to store a valid ChunkAvailable header, as this would be
// computationally expensive. Zeroing out serves the same purpose by making
// the header invalid. In the extremely rare event where 0 would be a valid
// checksum for the chunk, the state of the chunk is ChunkAvailable anyway.
COMPILER_CHECK(ChunkAvailable == 0);
void eraseHeader() {
PackedHeader NullPackedHeader = 0;
AtomicPackedHeader *AtomicHeader =
reinterpret_cast<AtomicPackedHeader *>(this);
atomic_store_relaxed(AtomicHeader, NullPackedHeader);
}
// Loads and unpacks the header, verifying the checksum in the process.
void loadHeader(UnpackedHeader *NewUnpackedHeader) const {
const AtomicPackedHeader *AtomicHeader =
reinterpret_cast<const AtomicPackedHeader *>(this);
PackedHeader NewPackedHeader = atomic_load_relaxed(AtomicHeader);
*NewUnpackedHeader = bit_cast<UnpackedHeader>(NewPackedHeader);
if (UNLIKELY(NewUnpackedHeader->Checksum !=
computeChecksum(NewUnpackedHeader))) {
dieWithMessage("ERROR: corrupted chunk header at address %p\n", this);
}
}
// Packs and stores the header, computing the checksum in the process.
void storeHeader(UnpackedHeader *NewUnpackedHeader) {
NewUnpackedHeader->Checksum = computeChecksum(NewUnpackedHeader);
PackedHeader NewPackedHeader = bit_cast<PackedHeader>(*NewUnpackedHeader);
AtomicPackedHeader *AtomicHeader =
reinterpret_cast<AtomicPackedHeader *>(this);
atomic_store_relaxed(AtomicHeader, NewPackedHeader);
}
// Packs and stores the header, computing the checksum in the process. We
// compare the current header with the expected provided one to ensure that
// we are not being raced by a corruption occurring in another thread.
void compareExchangeHeader(UnpackedHeader *NewUnpackedHeader,
UnpackedHeader *OldUnpackedHeader) {
NewUnpackedHeader->Checksum = computeChecksum(NewUnpackedHeader);
PackedHeader NewPackedHeader = bit_cast<PackedHeader>(*NewUnpackedHeader);
PackedHeader OldPackedHeader = bit_cast<PackedHeader>(*OldUnpackedHeader);
AtomicPackedHeader *AtomicHeader =
reinterpret_cast<AtomicPackedHeader *>(this);
if (UNLIKELY(!atomic_compare_exchange_strong(AtomicHeader,
&OldPackedHeader,
NewPackedHeader,
memory_order_relaxed))) {
dieWithMessage("ERROR: race on chunk header at address %p\n", this);
}
}
};
ScudoChunk *getScudoChunk(uptr UserBeg) {
return reinterpret_cast<ScudoChunk *>(UserBeg - AlignedChunkHeaderSize);
}
struct AllocatorOptions {
[scudo] Quarantine overhaul Summary: First, some context. The main feedback we get about the quarantine is that it's too memory hungry. A single MB of quarantine will have an impact of 3 to 4MB of PSS/RSS, and things quickly get out of hand in terms of memory usage, and the quarantine ends up disabled. The main objective of the quarantine is to protect from use-after-free exploitation by making it harder for an attacker to reallocate a controlled chunk in place of the targeted freed chunk. This is achieved by not making it available to the backend right away for reuse, but holding it a little while. Historically, what has usually been the target of such attacks was objects, where vtable pointers or other function pointers could constitute a valuable targeti to replace. Those are usually on the smaller side. There is barely any advantage in putting the quarantine several megabytes of RGB data or the like. Now for the patch. This patch introduces a new way the Quarantine behaves in Scudo. First of all, the size of the Quarantine will be defined in KB instead of MB, then we introduce a new option: the size up to which (lower than or equal to) a chunk will be quarantined. This way, we only quarantine smaller chunks, and the size of the quarantine remains manageable. It also prevents someone from triggering a recycle by allocating something huge. We default to 512 bytes on 32-bit and 2048 bytes on 64-bit platforms. In details, the patches includes the following: - introduce `QuarantineSizeKb`, but honor `QuarantineSizeMb` if set to fall back to the old behavior (meaning no threshold in that case); `QuarantineSizeMb` is described as deprecated in the options descriptios; documentation update will follow; - introduce `QuarantineChunksUpToSize`, the new threshold value; - update the `quarantine.cpp` test, and other tests using `QuarantineSizeMb`; - remove `AllocatorOptions::copyTo`, it wasn't used; - slightly change the logic around `quarantineOrDeallocateChunk` to accomodate for the new logic; rename a couple of variables there as well; Rewriting the tests, I found a somewhat annoying bug where non-default aligned chunks would account for more than needed when placed in the quarantine due to `<< MinAlignment` instead of `<< MinAlignmentLog`. This is fixed and tested for now. Reviewers: alekseyshl, kcc Reviewed By: alekseyshl Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D35694 llvm-svn: 308884
2017-07-24 23:29:38 +08:00
u32 QuarantineSizeKb;
u32 ThreadLocalQuarantineSizeKb;
[scudo] Quarantine overhaul Summary: First, some context. The main feedback we get about the quarantine is that it's too memory hungry. A single MB of quarantine will have an impact of 3 to 4MB of PSS/RSS, and things quickly get out of hand in terms of memory usage, and the quarantine ends up disabled. The main objective of the quarantine is to protect from use-after-free exploitation by making it harder for an attacker to reallocate a controlled chunk in place of the targeted freed chunk. This is achieved by not making it available to the backend right away for reuse, but holding it a little while. Historically, what has usually been the target of such attacks was objects, where vtable pointers or other function pointers could constitute a valuable targeti to replace. Those are usually on the smaller side. There is barely any advantage in putting the quarantine several megabytes of RGB data or the like. Now for the patch. This patch introduces a new way the Quarantine behaves in Scudo. First of all, the size of the Quarantine will be defined in KB instead of MB, then we introduce a new option: the size up to which (lower than or equal to) a chunk will be quarantined. This way, we only quarantine smaller chunks, and the size of the quarantine remains manageable. It also prevents someone from triggering a recycle by allocating something huge. We default to 512 bytes on 32-bit and 2048 bytes on 64-bit platforms. In details, the patches includes the following: - introduce `QuarantineSizeKb`, but honor `QuarantineSizeMb` if set to fall back to the old behavior (meaning no threshold in that case); `QuarantineSizeMb` is described as deprecated in the options descriptios; documentation update will follow; - introduce `QuarantineChunksUpToSize`, the new threshold value; - update the `quarantine.cpp` test, and other tests using `QuarantineSizeMb`; - remove `AllocatorOptions::copyTo`, it wasn't used; - slightly change the logic around `quarantineOrDeallocateChunk` to accomodate for the new logic; rename a couple of variables there as well; Rewriting the tests, I found a somewhat annoying bug where non-default aligned chunks would account for more than needed when placed in the quarantine due to `<< MinAlignment` instead of `<< MinAlignmentLog`. This is fixed and tested for now. Reviewers: alekseyshl, kcc Reviewed By: alekseyshl Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D35694 llvm-svn: 308884
2017-07-24 23:29:38 +08:00
u32 QuarantineChunksUpToSize;
bool MayReturnNull;
s32 ReleaseToOSIntervalMs;
bool DeallocationTypeMismatch;
bool DeleteSizeMismatch;
bool ZeroContents;
void setFrom(const Flags *f, const CommonFlags *cf);
};
void AllocatorOptions::setFrom(const Flags *f, const CommonFlags *cf) {
MayReturnNull = cf->allocator_may_return_null;
ReleaseToOSIntervalMs = cf->allocator_release_to_os_interval_ms;
[scudo] Quarantine overhaul Summary: First, some context. The main feedback we get about the quarantine is that it's too memory hungry. A single MB of quarantine will have an impact of 3 to 4MB of PSS/RSS, and things quickly get out of hand in terms of memory usage, and the quarantine ends up disabled. The main objective of the quarantine is to protect from use-after-free exploitation by making it harder for an attacker to reallocate a controlled chunk in place of the targeted freed chunk. This is achieved by not making it available to the backend right away for reuse, but holding it a little while. Historically, what has usually been the target of such attacks was objects, where vtable pointers or other function pointers could constitute a valuable targeti to replace. Those are usually on the smaller side. There is barely any advantage in putting the quarantine several megabytes of RGB data or the like. Now for the patch. This patch introduces a new way the Quarantine behaves in Scudo. First of all, the size of the Quarantine will be defined in KB instead of MB, then we introduce a new option: the size up to which (lower than or equal to) a chunk will be quarantined. This way, we only quarantine smaller chunks, and the size of the quarantine remains manageable. It also prevents someone from triggering a recycle by allocating something huge. We default to 512 bytes on 32-bit and 2048 bytes on 64-bit platforms. In details, the patches includes the following: - introduce `QuarantineSizeKb`, but honor `QuarantineSizeMb` if set to fall back to the old behavior (meaning no threshold in that case); `QuarantineSizeMb` is described as deprecated in the options descriptios; documentation update will follow; - introduce `QuarantineChunksUpToSize`, the new threshold value; - update the `quarantine.cpp` test, and other tests using `QuarantineSizeMb`; - remove `AllocatorOptions::copyTo`, it wasn't used; - slightly change the logic around `quarantineOrDeallocateChunk` to accomodate for the new logic; rename a couple of variables there as well; Rewriting the tests, I found a somewhat annoying bug where non-default aligned chunks would account for more than needed when placed in the quarantine due to `<< MinAlignment` instead of `<< MinAlignmentLog`. This is fixed and tested for now. Reviewers: alekseyshl, kcc Reviewed By: alekseyshl Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D35694 llvm-svn: 308884
2017-07-24 23:29:38 +08:00
QuarantineSizeKb = f->QuarantineSizeKb;
ThreadLocalQuarantineSizeKb = f->ThreadLocalQuarantineSizeKb;
[scudo] Quarantine overhaul Summary: First, some context. The main feedback we get about the quarantine is that it's too memory hungry. A single MB of quarantine will have an impact of 3 to 4MB of PSS/RSS, and things quickly get out of hand in terms of memory usage, and the quarantine ends up disabled. The main objective of the quarantine is to protect from use-after-free exploitation by making it harder for an attacker to reallocate a controlled chunk in place of the targeted freed chunk. This is achieved by not making it available to the backend right away for reuse, but holding it a little while. Historically, what has usually been the target of such attacks was objects, where vtable pointers or other function pointers could constitute a valuable targeti to replace. Those are usually on the smaller side. There is barely any advantage in putting the quarantine several megabytes of RGB data or the like. Now for the patch. This patch introduces a new way the Quarantine behaves in Scudo. First of all, the size of the Quarantine will be defined in KB instead of MB, then we introduce a new option: the size up to which (lower than or equal to) a chunk will be quarantined. This way, we only quarantine smaller chunks, and the size of the quarantine remains manageable. It also prevents someone from triggering a recycle by allocating something huge. We default to 512 bytes on 32-bit and 2048 bytes on 64-bit platforms. In details, the patches includes the following: - introduce `QuarantineSizeKb`, but honor `QuarantineSizeMb` if set to fall back to the old behavior (meaning no threshold in that case); `QuarantineSizeMb` is described as deprecated in the options descriptios; documentation update will follow; - introduce `QuarantineChunksUpToSize`, the new threshold value; - update the `quarantine.cpp` test, and other tests using `QuarantineSizeMb`; - remove `AllocatorOptions::copyTo`, it wasn't used; - slightly change the logic around `quarantineOrDeallocateChunk` to accomodate for the new logic; rename a couple of variables there as well; Rewriting the tests, I found a somewhat annoying bug where non-default aligned chunks would account for more than needed when placed in the quarantine due to `<< MinAlignment` instead of `<< MinAlignmentLog`. This is fixed and tested for now. Reviewers: alekseyshl, kcc Reviewed By: alekseyshl Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D35694 llvm-svn: 308884
2017-07-24 23:29:38 +08:00
QuarantineChunksUpToSize = f->QuarantineChunksUpToSize;
DeallocationTypeMismatch = f->DeallocationTypeMismatch;
DeleteSizeMismatch = f->DeleteSizeMismatch;
ZeroContents = f->ZeroContents;
}
static void initScudoInternal(const AllocatorOptions &Options);
static bool ScudoInitIsRunning = false;
void initScudo() {
SanitizerToolName = "Scudo";
CHECK(!ScudoInitIsRunning && "Scudo init calls itself!");
ScudoInitIsRunning = true;
[scudo] CRC32 optimizations Summary: This change optimizes several aspects of the checksum used for chunk headers. First, there is no point in checking the weak symbol `computeHardwareCRC32` everytime, it will either be there or not when we start, so check it once during initialization and set the checksum type accordingly. Then, the loading of `HashAlgorithm` for SSE versions (and ARM equivalent) was not optimized out, while not necessary. So I reshuffled that part of the code, which duplicates a tiny bit of code, but ends up in a much cleaner assembly (and faster as we avoid an extraneous load and some calls). The following code is the checksum at the end of `scudoMalloc` for x86_64 with full SSE 4.2, before: ``` mov rax, 0FFFFFFFFFFFFFFh shl r10, 38h mov edi, dword ptr cs:_ZN7__scudoL6CookieE ; __scudo::Cookie and r14, rax lea rsi, [r13-10h] movzx eax, cs:_ZN7__scudoL13HashAlgorithmE ; __scudo::HashAlgorithm or r14, r10 mov rbx, r14 xor bx, bx call _ZN7__scudo20computeHardwareCRC32Ejm ; __scudo::computeHardwareCRC32(uint,ulong) mov rsi, rbx mov edi, eax call _ZN7__scudo20computeHardwareCRC32Ejm ; __scudo::computeHardwareCRC32(uint,ulong) mov r14w, ax mov rax, r13 mov [r13-10h], r14 ``` After: ``` mov rax, cs:_ZN7__scudoL6CookieE ; __scudo::Cookie lea rcx, [rbx-10h] mov rdx, 0FFFFFFFFFFFFFFh and r14, rdx shl r9, 38h or r14, r9 crc32 eax, rcx mov rdx, r14 xor dx, dx mov eax, eax crc32 eax, rdx mov r14w, ax mov rax, rbx mov [rbx-10h], r14 ``` Reviewers: dvyukov, alekseyshl, kcc Reviewed By: alekseyshl Subscribers: aemerson, rengolin, llvm-commits Differential Revision: https://reviews.llvm.org/D32971 llvm-svn: 302538
2017-05-09 23:12:38 +08:00
// Check if hardware CRC32 is supported in the binary and by the platform, if
// so, opt for the CRC32 hardware version of the checksum.
if (computeHardwareCRC32 && testCPUFeature(CRC32CPUFeature))
[scudo] 32-bit and hardware agnostic support Summary: This update introduces i386 support for the Scudo Hardened Allocator, and offers software alternatives for functions that used to require hardware specific instruction sets. This should make porting to new architectures easier. Among the changes: - The chunk header has been changed to accomodate the size limitations encountered on 32-bit architectures. We now fit everything in 64-bit. This was achieved by storing the amount of unused bytes in an allocation rather than the size itself, as one can be deduced from the other with the help of the GetActuallyAllocatedSize function. As it turns out, this header can be used for both 64 and 32 bit, and as such we dropped the requirement for the 128-bit compare and exchange instruction support (cmpxchg16b). - Add 32-bit support for the checksum and the PRNG functions: if the SSE 4.2 instruction set is supported, use the 32-bit CRC32 instruction, and in the XorShift128, use a 32-bit based state instead of 64-bit. - Add software support for CRC32: if SSE 4.2 is not supported, fallback on a software implementation. - Modify tests that were not 32-bit compliant, and expand them to cover more allocation and alignment sizes. The random shuffle test has been deactivated for linux-i386 & linux-i686 as the 32-bit sanitizer allocator doesn't currently randomize chunks. Reviewers: alekseyshl, kcc Subscribers: filcab, llvm-commits, tberghammer, danalbert, srhines, mgorny, modocache Differential Revision: https://reviews.llvm.org/D26358 llvm-svn: 288255
2016-12-01 01:32:20 +08:00
atomic_store_relaxed(&HashAlgorithm, CRC32Hardware);
initFlags();
AllocatorOptions Options;
Options.setFrom(getFlags(), common_flags());
initScudoInternal(Options);
// TODO(kostyak): determine if MaybeStartBackgroudThread could be of some use.
ScudoInitIsRunning = false;
}
struct QuarantineCallback {
explicit QuarantineCallback(AllocatorCache *Cache)
: Cache_(Cache) {}
// Chunk recycling function, returns a quarantined chunk to the backend,
// first making sure it hasn't been tampered with.
void Recycle(ScudoChunk *Chunk) {
UnpackedHeader Header;
Chunk->loadHeader(&Header);
if (UNLIKELY(Header.State != ChunkQuarantine)) {
dieWithMessage("ERROR: invalid chunk state when recycling address %p\n",
Chunk);
}
Chunk->eraseHeader();
void *Ptr = Chunk->getAllocBeg(&Header);
if (Header.FromPrimary)
getBackendAllocator().deallocatePrimary(Cache_, Ptr);
else
getBackendAllocator().deallocateSecondary(Ptr);
}
// Internal quarantine allocation and deallocation functions. We first check
// that the batches are indeed serviced by the Primary.
// TODO(kostyak): figure out the best way to protect the batches.
COMPILER_CHECK(sizeof(QuarantineBatch) < SizeClassMap::kMaxSize);
void *Allocate(uptr Size) {
return getBackendAllocator().allocatePrimary(Cache_, Size);
}
void Deallocate(void *Ptr) {
getBackendAllocator().deallocatePrimary(Cache_, Ptr);
}
AllocatorCache *Cache_;
};
typedef Quarantine<QuarantineCallback, ScudoChunk> ScudoQuarantine;
typedef ScudoQuarantine::Cache ScudoQuarantineCache;
COMPILER_CHECK(sizeof(ScudoQuarantineCache) <=
sizeof(ScudoThreadContext::QuarantineCachePlaceHolder));
AllocatorCache *getAllocatorCache(ScudoThreadContext *ThreadContext) {
return &ThreadContext->Cache;
}
ScudoQuarantineCache *getQuarantineCache(ScudoThreadContext *ThreadContext) {
return reinterpret_cast<
ScudoQuarantineCache *>(ThreadContext->QuarantineCachePlaceHolder);
}
[scudo] PRNG makeover Summary: This follows the addition of `GetRandom` with D34412. We remove our `/dev/urandom` code and use the new function. Additionally, change the PRNG for a slightly faster version. One of the issues with the old code is that we have 64 full bits of randomness per "next", using only 8 of those for the Salt and discarding the rest. So we add a cached u64 in the PRNG that can serve up to 8 u8 before having to call the "next" function again. During some integration work, I also realized that some very early processes (like `init`) do not benefit from `/dev/urandom` yet. So if there is no `getrandom` syscall as well, we have to fallback to some sort of initialization of the PRNG. Now a few words on why XoRoShiRo and not something else. I have played a while with various PRNGs on 32 & 64 bit platforms. Some results are below. LCG 32 & 64 are usually faster but produce respectively 15 & 31 bits of entropy, meaning that to get a full 64-bit, you would need to call them several times. The simple XorShift is fast, produces 32 bits but is mediocre with regard to PRNG test suites, PCG is slower overall, and XoRoShiRo is faster than XorShift128+ and produces full 64 bits. %%% root@tulip-chiphd:/data # ./randtest.arm [+] starting xs32... [?] xs32 duration: 22431833053ns [+] starting lcg32... [?] lcg32 duration: 14941402090ns [+] starting pcg32... [?] pcg32 duration: 44941973771ns [+] starting xs128p... [?] xs128p duration: 48889786981ns [+] starting lcg64... [?] lcg64 duration: 33831042391ns [+] starting xos128p... [?] xos128p duration: 44850878605ns root@tulip-chiphd:/data # ./randtest.aarch64 [+] starting xs32... [?] xs32 duration: 22425151678ns [+] starting lcg32... [?] lcg32 duration: 14954255257ns [+] starting pcg32... [?] pcg32 duration: 37346265726ns [+] starting xs128p... [?] xs128p duration: 22523807219ns [+] starting lcg64... [?] lcg64 duration: 26141304679ns [+] starting xos128p... [?] xos128p duration: 14937033215ns %%% Reviewers: alekseyshl Reviewed By: alekseyshl Subscribers: aemerson, kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D35221 llvm-svn: 307798
2017-07-12 23:29:08 +08:00
ScudoPrng *getPrng(ScudoThreadContext *ThreadContext) {
return &ThreadContext->Prng;
}
struct ScudoAllocator {
static const uptr MaxAllowedMallocSize =
FIRST_32_SECOND_64(2UL << 30, 1ULL << 40);
typedef ReturnNullOrDieOnFailure FailureHandler;
ScudoBackendAllocator BackendAllocator;
ScudoQuarantine AllocatorQuarantine;
StaticSpinMutex GlobalPrngMutex;
ScudoPrng GlobalPrng;
// The fallback caches are used when the thread local caches have been
// 'detroyed' on thread tear-down. They are protected by a Mutex as they can
// be accessed by different threads.
StaticSpinMutex FallbackMutex;
AllocatorCache FallbackAllocatorCache;
ScudoQuarantineCache FallbackQuarantineCache;
[scudo] PRNG makeover Summary: This follows the addition of `GetRandom` with D34412. We remove our `/dev/urandom` code and use the new function. Additionally, change the PRNG for a slightly faster version. One of the issues with the old code is that we have 64 full bits of randomness per "next", using only 8 of those for the Salt and discarding the rest. So we add a cached u64 in the PRNG that can serve up to 8 u8 before having to call the "next" function again. During some integration work, I also realized that some very early processes (like `init`) do not benefit from `/dev/urandom` yet. So if there is no `getrandom` syscall as well, we have to fallback to some sort of initialization of the PRNG. Now a few words on why XoRoShiRo and not something else. I have played a while with various PRNGs on 32 & 64 bit platforms. Some results are below. LCG 32 & 64 are usually faster but produce respectively 15 & 31 bits of entropy, meaning that to get a full 64-bit, you would need to call them several times. The simple XorShift is fast, produces 32 bits but is mediocre with regard to PRNG test suites, PCG is slower overall, and XoRoShiRo is faster than XorShift128+ and produces full 64 bits. %%% root@tulip-chiphd:/data # ./randtest.arm [+] starting xs32... [?] xs32 duration: 22431833053ns [+] starting lcg32... [?] lcg32 duration: 14941402090ns [+] starting pcg32... [?] pcg32 duration: 44941973771ns [+] starting xs128p... [?] xs128p duration: 48889786981ns [+] starting lcg64... [?] lcg64 duration: 33831042391ns [+] starting xos128p... [?] xos128p duration: 44850878605ns root@tulip-chiphd:/data # ./randtest.aarch64 [+] starting xs32... [?] xs32 duration: 22425151678ns [+] starting lcg32... [?] lcg32 duration: 14954255257ns [+] starting pcg32... [?] pcg32 duration: 37346265726ns [+] starting xs128p... [?] xs128p duration: 22523807219ns [+] starting lcg64... [?] lcg64 duration: 26141304679ns [+] starting xos128p... [?] xos128p duration: 14937033215ns %%% Reviewers: alekseyshl Reviewed By: alekseyshl Subscribers: aemerson, kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D35221 llvm-svn: 307798
2017-07-12 23:29:08 +08:00
ScudoPrng FallbackPrng;
[scudo] Quarantine overhaul Summary: First, some context. The main feedback we get about the quarantine is that it's too memory hungry. A single MB of quarantine will have an impact of 3 to 4MB of PSS/RSS, and things quickly get out of hand in terms of memory usage, and the quarantine ends up disabled. The main objective of the quarantine is to protect from use-after-free exploitation by making it harder for an attacker to reallocate a controlled chunk in place of the targeted freed chunk. This is achieved by not making it available to the backend right away for reuse, but holding it a little while. Historically, what has usually been the target of such attacks was objects, where vtable pointers or other function pointers could constitute a valuable targeti to replace. Those are usually on the smaller side. There is barely any advantage in putting the quarantine several megabytes of RGB data or the like. Now for the patch. This patch introduces a new way the Quarantine behaves in Scudo. First of all, the size of the Quarantine will be defined in KB instead of MB, then we introduce a new option: the size up to which (lower than or equal to) a chunk will be quarantined. This way, we only quarantine smaller chunks, and the size of the quarantine remains manageable. It also prevents someone from triggering a recycle by allocating something huge. We default to 512 bytes on 32-bit and 2048 bytes on 64-bit platforms. In details, the patches includes the following: - introduce `QuarantineSizeKb`, but honor `QuarantineSizeMb` if set to fall back to the old behavior (meaning no threshold in that case); `QuarantineSizeMb` is described as deprecated in the options descriptios; documentation update will follow; - introduce `QuarantineChunksUpToSize`, the new threshold value; - update the `quarantine.cpp` test, and other tests using `QuarantineSizeMb`; - remove `AllocatorOptions::copyTo`, it wasn't used; - slightly change the logic around `quarantineOrDeallocateChunk` to accomodate for the new logic; rename a couple of variables there as well; Rewriting the tests, I found a somewhat annoying bug where non-default aligned chunks would account for more than needed when placed in the quarantine due to `<< MinAlignment` instead of `<< MinAlignmentLog`. This is fixed and tested for now. Reviewers: alekseyshl, kcc Reviewed By: alekseyshl Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D35694 llvm-svn: 308884
2017-07-24 23:29:38 +08:00
u32 QuarantineChunksUpToSize;
bool DeallocationTypeMismatch;
bool ZeroContents;
bool DeleteSizeMismatch;
explicit ScudoAllocator(LinkerInitialized)
: AllocatorQuarantine(LINKER_INITIALIZED),
FallbackQuarantineCache(LINKER_INITIALIZED) {}
void init(const AllocatorOptions &Options) {
// Verify that the header offset field can hold the maximum offset. In the
// case of the Secondary allocator, it takes care of alignment and the
// offset will always be 0. In the case of the Primary, the worst case
// scenario happens in the last size class, when the backend allocation
// would already be aligned on the requested alignment, which would happen
// to be the maximum alignment that would fit in that size class. As a
// result, the maximum offset will be at most the maximum alignment for the
// last size class minus the header size, in multiples of MinAlignment.
UnpackedHeader Header = {};
uptr MaxPrimaryAlignment =
1 << MostSignificantSetBitIndex(SizeClassMap::kMaxSize - MinAlignment);
uptr MaxOffset =
(MaxPrimaryAlignment - AlignedChunkHeaderSize) >> MinAlignmentLog;
[scudo] 32-bit and hardware agnostic support Summary: This update introduces i386 support for the Scudo Hardened Allocator, and offers software alternatives for functions that used to require hardware specific instruction sets. This should make porting to new architectures easier. Among the changes: - The chunk header has been changed to accomodate the size limitations encountered on 32-bit architectures. We now fit everything in 64-bit. This was achieved by storing the amount of unused bytes in an allocation rather than the size itself, as one can be deduced from the other with the help of the GetActuallyAllocatedSize function. As it turns out, this header can be used for both 64 and 32 bit, and as such we dropped the requirement for the 128-bit compare and exchange instruction support (cmpxchg16b). - Add 32-bit support for the checksum and the PRNG functions: if the SSE 4.2 instruction set is supported, use the 32-bit CRC32 instruction, and in the XorShift128, use a 32-bit based state instead of 64-bit. - Add software support for CRC32: if SSE 4.2 is not supported, fallback on a software implementation. - Modify tests that were not 32-bit compliant, and expand them to cover more allocation and alignment sizes. The random shuffle test has been deactivated for linux-i386 & linux-i686 as the 32-bit sanitizer allocator doesn't currently randomize chunks. Reviewers: alekseyshl, kcc Subscribers: filcab, llvm-commits, tberghammer, danalbert, srhines, mgorny, modocache Differential Revision: https://reviews.llvm.org/D26358 llvm-svn: 288255
2016-12-01 01:32:20 +08:00
Header.Offset = MaxOffset;
if (Header.Offset != MaxOffset) {
dieWithMessage("ERROR: the maximum possible offset doesn't fit in the "
"header\n");
}
// Verify that we can fit the maximum size or amount of unused bytes in the
// header. Given that the Secondary fits the allocation to a page, the worst
// case scenario happens in the Primary. It will depend on the second to
// last and last class sizes, as well as the dynamic base for the Primary.
// The following is an over-approximation that works for our needs.
uptr MaxSizeOrUnusedBytes = SizeClassMap::kMaxSize - 1;
Header.SizeOrUnusedBytes = MaxSizeOrUnusedBytes;
if (Header.SizeOrUnusedBytes != MaxSizeOrUnusedBytes) {
[scudo] 32-bit and hardware agnostic support Summary: This update introduces i386 support for the Scudo Hardened Allocator, and offers software alternatives for functions that used to require hardware specific instruction sets. This should make porting to new architectures easier. Among the changes: - The chunk header has been changed to accomodate the size limitations encountered on 32-bit architectures. We now fit everything in 64-bit. This was achieved by storing the amount of unused bytes in an allocation rather than the size itself, as one can be deduced from the other with the help of the GetActuallyAllocatedSize function. As it turns out, this header can be used for both 64 and 32 bit, and as such we dropped the requirement for the 128-bit compare and exchange instruction support (cmpxchg16b). - Add 32-bit support for the checksum and the PRNG functions: if the SSE 4.2 instruction set is supported, use the 32-bit CRC32 instruction, and in the XorShift128, use a 32-bit based state instead of 64-bit. - Add software support for CRC32: if SSE 4.2 is not supported, fallback on a software implementation. - Modify tests that were not 32-bit compliant, and expand them to cover more allocation and alignment sizes. The random shuffle test has been deactivated for linux-i386 & linux-i686 as the 32-bit sanitizer allocator doesn't currently randomize chunks. Reviewers: alekseyshl, kcc Subscribers: filcab, llvm-commits, tberghammer, danalbert, srhines, mgorny, modocache Differential Revision: https://reviews.llvm.org/D26358 llvm-svn: 288255
2016-12-01 01:32:20 +08:00
dieWithMessage("ERROR: the maximum possible unused bytes doesn't fit in "
"the header\n");
}
DeallocationTypeMismatch = Options.DeallocationTypeMismatch;
DeleteSizeMismatch = Options.DeleteSizeMismatch;
ZeroContents = Options.ZeroContents;
SetAllocatorMayReturnNull(Options.MayReturnNull);
BackendAllocator.init(Options.ReleaseToOSIntervalMs);
AllocatorQuarantine.Init(
[scudo] Quarantine overhaul Summary: First, some context. The main feedback we get about the quarantine is that it's too memory hungry. A single MB of quarantine will have an impact of 3 to 4MB of PSS/RSS, and things quickly get out of hand in terms of memory usage, and the quarantine ends up disabled. The main objective of the quarantine is to protect from use-after-free exploitation by making it harder for an attacker to reallocate a controlled chunk in place of the targeted freed chunk. This is achieved by not making it available to the backend right away for reuse, but holding it a little while. Historically, what has usually been the target of such attacks was objects, where vtable pointers or other function pointers could constitute a valuable targeti to replace. Those are usually on the smaller side. There is barely any advantage in putting the quarantine several megabytes of RGB data or the like. Now for the patch. This patch introduces a new way the Quarantine behaves in Scudo. First of all, the size of the Quarantine will be defined in KB instead of MB, then we introduce a new option: the size up to which (lower than or equal to) a chunk will be quarantined. This way, we only quarantine smaller chunks, and the size of the quarantine remains manageable. It also prevents someone from triggering a recycle by allocating something huge. We default to 512 bytes on 32-bit and 2048 bytes on 64-bit platforms. In details, the patches includes the following: - introduce `QuarantineSizeKb`, but honor `QuarantineSizeMb` if set to fall back to the old behavior (meaning no threshold in that case); `QuarantineSizeMb` is described as deprecated in the options descriptios; documentation update will follow; - introduce `QuarantineChunksUpToSize`, the new threshold value; - update the `quarantine.cpp` test, and other tests using `QuarantineSizeMb`; - remove `AllocatorOptions::copyTo`, it wasn't used; - slightly change the logic around `quarantineOrDeallocateChunk` to accomodate for the new logic; rename a couple of variables there as well; Rewriting the tests, I found a somewhat annoying bug where non-default aligned chunks would account for more than needed when placed in the quarantine due to `<< MinAlignment` instead of `<< MinAlignmentLog`. This is fixed and tested for now. Reviewers: alekseyshl, kcc Reviewed By: alekseyshl Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D35694 llvm-svn: 308884
2017-07-24 23:29:38 +08:00
static_cast<uptr>(Options.QuarantineSizeKb) << 10,
static_cast<uptr>(Options.ThreadLocalQuarantineSizeKb) << 10);
[scudo] Quarantine overhaul Summary: First, some context. The main feedback we get about the quarantine is that it's too memory hungry. A single MB of quarantine will have an impact of 3 to 4MB of PSS/RSS, and things quickly get out of hand in terms of memory usage, and the quarantine ends up disabled. The main objective of the quarantine is to protect from use-after-free exploitation by making it harder for an attacker to reallocate a controlled chunk in place of the targeted freed chunk. This is achieved by not making it available to the backend right away for reuse, but holding it a little while. Historically, what has usually been the target of such attacks was objects, where vtable pointers or other function pointers could constitute a valuable targeti to replace. Those are usually on the smaller side. There is barely any advantage in putting the quarantine several megabytes of RGB data or the like. Now for the patch. This patch introduces a new way the Quarantine behaves in Scudo. First of all, the size of the Quarantine will be defined in KB instead of MB, then we introduce a new option: the size up to which (lower than or equal to) a chunk will be quarantined. This way, we only quarantine smaller chunks, and the size of the quarantine remains manageable. It also prevents someone from triggering a recycle by allocating something huge. We default to 512 bytes on 32-bit and 2048 bytes on 64-bit platforms. In details, the patches includes the following: - introduce `QuarantineSizeKb`, but honor `QuarantineSizeMb` if set to fall back to the old behavior (meaning no threshold in that case); `QuarantineSizeMb` is described as deprecated in the options descriptios; documentation update will follow; - introduce `QuarantineChunksUpToSize`, the new threshold value; - update the `quarantine.cpp` test, and other tests using `QuarantineSizeMb`; - remove `AllocatorOptions::copyTo`, it wasn't used; - slightly change the logic around `quarantineOrDeallocateChunk` to accomodate for the new logic; rename a couple of variables there as well; Rewriting the tests, I found a somewhat annoying bug where non-default aligned chunks would account for more than needed when placed in the quarantine due to `<< MinAlignment` instead of `<< MinAlignmentLog`. This is fixed and tested for now. Reviewers: alekseyshl, kcc Reviewed By: alekseyshl Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D35694 llvm-svn: 308884
2017-07-24 23:29:38 +08:00
QuarantineChunksUpToSize = Options.QuarantineChunksUpToSize;
GlobalPrng.init();
Cookie = GlobalPrng.getU64();
BackendAllocator.initCache(&FallbackAllocatorCache);
[scudo] PRNG makeover Summary: This follows the addition of `GetRandom` with D34412. We remove our `/dev/urandom` code and use the new function. Additionally, change the PRNG for a slightly faster version. One of the issues with the old code is that we have 64 full bits of randomness per "next", using only 8 of those for the Salt and discarding the rest. So we add a cached u64 in the PRNG that can serve up to 8 u8 before having to call the "next" function again. During some integration work, I also realized that some very early processes (like `init`) do not benefit from `/dev/urandom` yet. So if there is no `getrandom` syscall as well, we have to fallback to some sort of initialization of the PRNG. Now a few words on why XoRoShiRo and not something else. I have played a while with various PRNGs on 32 & 64 bit platforms. Some results are below. LCG 32 & 64 are usually faster but produce respectively 15 & 31 bits of entropy, meaning that to get a full 64-bit, you would need to call them several times. The simple XorShift is fast, produces 32 bits but is mediocre with regard to PRNG test suites, PCG is slower overall, and XoRoShiRo is faster than XorShift128+ and produces full 64 bits. %%% root@tulip-chiphd:/data # ./randtest.arm [+] starting xs32... [?] xs32 duration: 22431833053ns [+] starting lcg32... [?] lcg32 duration: 14941402090ns [+] starting pcg32... [?] pcg32 duration: 44941973771ns [+] starting xs128p... [?] xs128p duration: 48889786981ns [+] starting lcg64... [?] lcg64 duration: 33831042391ns [+] starting xos128p... [?] xos128p duration: 44850878605ns root@tulip-chiphd:/data # ./randtest.aarch64 [+] starting xs32... [?] xs32 duration: 22425151678ns [+] starting lcg32... [?] lcg32 duration: 14954255257ns [+] starting pcg32... [?] pcg32 duration: 37346265726ns [+] starting xs128p... [?] xs128p duration: 22523807219ns [+] starting lcg64... [?] lcg64 duration: 26141304679ns [+] starting xos128p... [?] xos128p duration: 14937033215ns %%% Reviewers: alekseyshl Reviewed By: alekseyshl Subscribers: aemerson, kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D35221 llvm-svn: 307798
2017-07-12 23:29:08 +08:00
FallbackPrng.init();
}
// Helper function that checks for a valid Scudo chunk. nullptr isn't.
[scudo] 32-bit and hardware agnostic support Summary: This update introduces i386 support for the Scudo Hardened Allocator, and offers software alternatives for functions that used to require hardware specific instruction sets. This should make porting to new architectures easier. Among the changes: - The chunk header has been changed to accomodate the size limitations encountered on 32-bit architectures. We now fit everything in 64-bit. This was achieved by storing the amount of unused bytes in an allocation rather than the size itself, as one can be deduced from the other with the help of the GetActuallyAllocatedSize function. As it turns out, this header can be used for both 64 and 32 bit, and as such we dropped the requirement for the 128-bit compare and exchange instruction support (cmpxchg16b). - Add 32-bit support for the checksum and the PRNG functions: if the SSE 4.2 instruction set is supported, use the 32-bit CRC32 instruction, and in the XorShift128, use a 32-bit based state instead of 64-bit. - Add software support for CRC32: if SSE 4.2 is not supported, fallback on a software implementation. - Modify tests that were not 32-bit compliant, and expand them to cover more allocation and alignment sizes. The random shuffle test has been deactivated for linux-i386 & linux-i686 as the 32-bit sanitizer allocator doesn't currently randomize chunks. Reviewers: alekseyshl, kcc Subscribers: filcab, llvm-commits, tberghammer, danalbert, srhines, mgorny, modocache Differential Revision: https://reviews.llvm.org/D26358 llvm-svn: 288255
2016-12-01 01:32:20 +08:00
bool isValidPointer(const void *UserPtr) {
initThreadMaybe();
if (UNLIKELY(!UserPtr))
[scudo] 32-bit and hardware agnostic support Summary: This update introduces i386 support for the Scudo Hardened Allocator, and offers software alternatives for functions that used to require hardware specific instruction sets. This should make porting to new architectures easier. Among the changes: - The chunk header has been changed to accomodate the size limitations encountered on 32-bit architectures. We now fit everything in 64-bit. This was achieved by storing the amount of unused bytes in an allocation rather than the size itself, as one can be deduced from the other with the help of the GetActuallyAllocatedSize function. As it turns out, this header can be used for both 64 and 32 bit, and as such we dropped the requirement for the 128-bit compare and exchange instruction support (cmpxchg16b). - Add 32-bit support for the checksum and the PRNG functions: if the SSE 4.2 instruction set is supported, use the 32-bit CRC32 instruction, and in the XorShift128, use a 32-bit based state instead of 64-bit. - Add software support for CRC32: if SSE 4.2 is not supported, fallback on a software implementation. - Modify tests that were not 32-bit compliant, and expand them to cover more allocation and alignment sizes. The random shuffle test has been deactivated for linux-i386 & linux-i686 as the 32-bit sanitizer allocator doesn't currently randomize chunks. Reviewers: alekseyshl, kcc Subscribers: filcab, llvm-commits, tberghammer, danalbert, srhines, mgorny, modocache Differential Revision: https://reviews.llvm.org/D26358 llvm-svn: 288255
2016-12-01 01:32:20 +08:00
return false;
uptr UserBeg = reinterpret_cast<uptr>(UserPtr);
if (!IsAligned(UserBeg, MinAlignment))
return false;
return getScudoChunk(UserBeg)->isValid();
[scudo] 32-bit and hardware agnostic support Summary: This update introduces i386 support for the Scudo Hardened Allocator, and offers software alternatives for functions that used to require hardware specific instruction sets. This should make porting to new architectures easier. Among the changes: - The chunk header has been changed to accomodate the size limitations encountered on 32-bit architectures. We now fit everything in 64-bit. This was achieved by storing the amount of unused bytes in an allocation rather than the size itself, as one can be deduced from the other with the help of the GetActuallyAllocatedSize function. As it turns out, this header can be used for both 64 and 32 bit, and as such we dropped the requirement for the 128-bit compare and exchange instruction support (cmpxchg16b). - Add 32-bit support for the checksum and the PRNG functions: if the SSE 4.2 instruction set is supported, use the 32-bit CRC32 instruction, and in the XorShift128, use a 32-bit based state instead of 64-bit. - Add software support for CRC32: if SSE 4.2 is not supported, fallback on a software implementation. - Modify tests that were not 32-bit compliant, and expand them to cover more allocation and alignment sizes. The random shuffle test has been deactivated for linux-i386 & linux-i686 as the 32-bit sanitizer allocator doesn't currently randomize chunks. Reviewers: alekseyshl, kcc Subscribers: filcab, llvm-commits, tberghammer, danalbert, srhines, mgorny, modocache Differential Revision: https://reviews.llvm.org/D26358 llvm-svn: 288255
2016-12-01 01:32:20 +08:00
}
// Allocates a chunk.
void *allocate(uptr Size, uptr Alignment, AllocType Type,
bool ForceZeroContents = false) {
initThreadMaybe();
if (UNLIKELY(Alignment > MaxAlignment))
return FailureHandler::OnBadRequest();
if (UNLIKELY(Alignment < MinAlignment))
Alignment = MinAlignment;
if (UNLIKELY(Size >= MaxAllowedMallocSize))
return FailureHandler::OnBadRequest();
if (UNLIKELY(Size == 0))
Size = 1;
uptr NeededSize = RoundUpTo(Size, MinAlignment) + AlignedChunkHeaderSize;
uptr AlignedSize = (Alignment > MinAlignment) ?
NeededSize + (Alignment - AlignedChunkHeaderSize) : NeededSize;
if (UNLIKELY(AlignedSize >= MaxAllowedMallocSize))
return FailureHandler::OnBadRequest();
// Primary and Secondary backed allocations have a different treatment. We
// deal with alignment requirements of Primary serviced allocations here,
// but the Secondary will take care of its own alignment needs.
bool FromPrimary = PrimaryAllocator::CanAllocate(AlignedSize, MinAlignment);
void *Ptr;
[scudo] PRNG makeover Summary: This follows the addition of `GetRandom` with D34412. We remove our `/dev/urandom` code and use the new function. Additionally, change the PRNG for a slightly faster version. One of the issues with the old code is that we have 64 full bits of randomness per "next", using only 8 of those for the Salt and discarding the rest. So we add a cached u64 in the PRNG that can serve up to 8 u8 before having to call the "next" function again. During some integration work, I also realized that some very early processes (like `init`) do not benefit from `/dev/urandom` yet. So if there is no `getrandom` syscall as well, we have to fallback to some sort of initialization of the PRNG. Now a few words on why XoRoShiRo and not something else. I have played a while with various PRNGs on 32 & 64 bit platforms. Some results are below. LCG 32 & 64 are usually faster but produce respectively 15 & 31 bits of entropy, meaning that to get a full 64-bit, you would need to call them several times. The simple XorShift is fast, produces 32 bits but is mediocre with regard to PRNG test suites, PCG is slower overall, and XoRoShiRo is faster than XorShift128+ and produces full 64 bits. %%% root@tulip-chiphd:/data # ./randtest.arm [+] starting xs32... [?] xs32 duration: 22431833053ns [+] starting lcg32... [?] lcg32 duration: 14941402090ns [+] starting pcg32... [?] pcg32 duration: 44941973771ns [+] starting xs128p... [?] xs128p duration: 48889786981ns [+] starting lcg64... [?] lcg64 duration: 33831042391ns [+] starting xos128p... [?] xos128p duration: 44850878605ns root@tulip-chiphd:/data # ./randtest.aarch64 [+] starting xs32... [?] xs32 duration: 22425151678ns [+] starting lcg32... [?] lcg32 duration: 14954255257ns [+] starting pcg32... [?] pcg32 duration: 37346265726ns [+] starting xs128p... [?] xs128p duration: 22523807219ns [+] starting lcg64... [?] lcg64 duration: 26141304679ns [+] starting xos128p... [?] xos128p duration: 14937033215ns %%% Reviewers: alekseyshl Reviewed By: alekseyshl Subscribers: aemerson, kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D35221 llvm-svn: 307798
2017-07-12 23:29:08 +08:00
u8 Salt;
uptr AllocSize;
if (FromPrimary) {
AllocSize = AlignedSize;
ScudoThreadContext *ThreadContext = getThreadContextAndLock();
if (LIKELY(ThreadContext)) {
Salt = getPrng(ThreadContext)->getU8();
Ptr = BackendAllocator.allocatePrimary(getAllocatorCache(ThreadContext),
AllocSize);
ThreadContext->unlock();
} else {
SpinMutexLock l(&FallbackMutex);
Salt = FallbackPrng.getU8();
Ptr = BackendAllocator.allocatePrimary(&FallbackAllocatorCache,
AllocSize);
}
} else {
{
SpinMutexLock l(&GlobalPrngMutex);
Salt = GlobalPrng.getU8();
}
AllocSize = NeededSize;
Ptr = BackendAllocator.allocateSecondary(AllocSize, Alignment);
}
if (UNLIKELY(!Ptr))
return FailureHandler::OnOOM();
[scudo] 32-bit and hardware agnostic support Summary: This update introduces i386 support for the Scudo Hardened Allocator, and offers software alternatives for functions that used to require hardware specific instruction sets. This should make porting to new architectures easier. Among the changes: - The chunk header has been changed to accomodate the size limitations encountered on 32-bit architectures. We now fit everything in 64-bit. This was achieved by storing the amount of unused bytes in an allocation rather than the size itself, as one can be deduced from the other with the help of the GetActuallyAllocatedSize function. As it turns out, this header can be used for both 64 and 32 bit, and as such we dropped the requirement for the 128-bit compare and exchange instruction support (cmpxchg16b). - Add 32-bit support for the checksum and the PRNG functions: if the SSE 4.2 instruction set is supported, use the 32-bit CRC32 instruction, and in the XorShift128, use a 32-bit based state instead of 64-bit. - Add software support for CRC32: if SSE 4.2 is not supported, fallback on a software implementation. - Modify tests that were not 32-bit compliant, and expand them to cover more allocation and alignment sizes. The random shuffle test has been deactivated for linux-i386 & linux-i686 as the 32-bit sanitizer allocator doesn't currently randomize chunks. Reviewers: alekseyshl, kcc Subscribers: filcab, llvm-commits, tberghammer, danalbert, srhines, mgorny, modocache Differential Revision: https://reviews.llvm.org/D26358 llvm-svn: 288255
2016-12-01 01:32:20 +08:00
// If requested, we will zero out the entire contents of the returned chunk.
if ((ForceZeroContents || ZeroContents) && FromPrimary)
memset(Ptr, 0, BackendAllocator.getActuallyAllocatedSize(
Ptr, /*FromPrimary=*/true));
[scudo] 32-bit and hardware agnostic support Summary: This update introduces i386 support for the Scudo Hardened Allocator, and offers software alternatives for functions that used to require hardware specific instruction sets. This should make porting to new architectures easier. Among the changes: - The chunk header has been changed to accomodate the size limitations encountered on 32-bit architectures. We now fit everything in 64-bit. This was achieved by storing the amount of unused bytes in an allocation rather than the size itself, as one can be deduced from the other with the help of the GetActuallyAllocatedSize function. As it turns out, this header can be used for both 64 and 32 bit, and as such we dropped the requirement for the 128-bit compare and exchange instruction support (cmpxchg16b). - Add 32-bit support for the checksum and the PRNG functions: if the SSE 4.2 instruction set is supported, use the 32-bit CRC32 instruction, and in the XorShift128, use a 32-bit based state instead of 64-bit. - Add software support for CRC32: if SSE 4.2 is not supported, fallback on a software implementation. - Modify tests that were not 32-bit compliant, and expand them to cover more allocation and alignment sizes. The random shuffle test has been deactivated for linux-i386 & linux-i686 as the 32-bit sanitizer allocator doesn't currently randomize chunks. Reviewers: alekseyshl, kcc Subscribers: filcab, llvm-commits, tberghammer, danalbert, srhines, mgorny, modocache Differential Revision: https://reviews.llvm.org/D26358 llvm-svn: 288255
2016-12-01 01:32:20 +08:00
UnpackedHeader Header = {};
uptr AllocBeg = reinterpret_cast<uptr>(Ptr);
uptr UserBeg = AllocBeg + AlignedChunkHeaderSize;
if (UNLIKELY(!IsAligned(UserBeg, Alignment))) {
// Since the Secondary takes care of alignment, a non-aligned pointer
// means it is from the Primary. It is also the only case where the offset
// field of the header would be non-zero.
CHECK(FromPrimary);
UserBeg = RoundUpTo(UserBeg, Alignment);
uptr Offset = UserBeg - AlignedChunkHeaderSize - AllocBeg;
Header.Offset = Offset >> MinAlignmentLog;
}
CHECK_LE(UserBeg + Size, AllocBeg + AllocSize);
Header.State = ChunkAllocated;
Header.AllocType = Type;
if (FromPrimary) {
Header.FromPrimary = 1;
Header.SizeOrUnusedBytes = Size;
} else {
// The secondary fits the allocations to a page, so the amount of unused
// bytes is the difference between the end of the user allocation and the
// next page boundary.
uptr PageSize = GetPageSizeCached();
uptr TrailingBytes = (UserBeg + Size) & (PageSize - 1);
if (TrailingBytes)
Header.SizeOrUnusedBytes = PageSize - TrailingBytes;
}
Header.Salt = Salt;
getScudoChunk(UserBeg)->storeHeader(&Header);
void *UserPtr = reinterpret_cast<void *>(UserBeg);
// if (&__sanitizer_malloc_hook) __sanitizer_malloc_hook(UserPtr, Size);
return UserPtr;
}
[scudo] Quarantine overhaul Summary: First, some context. The main feedback we get about the quarantine is that it's too memory hungry. A single MB of quarantine will have an impact of 3 to 4MB of PSS/RSS, and things quickly get out of hand in terms of memory usage, and the quarantine ends up disabled. The main objective of the quarantine is to protect from use-after-free exploitation by making it harder for an attacker to reallocate a controlled chunk in place of the targeted freed chunk. This is achieved by not making it available to the backend right away for reuse, but holding it a little while. Historically, what has usually been the target of such attacks was objects, where vtable pointers or other function pointers could constitute a valuable targeti to replace. Those are usually on the smaller side. There is barely any advantage in putting the quarantine several megabytes of RGB data or the like. Now for the patch. This patch introduces a new way the Quarantine behaves in Scudo. First of all, the size of the Quarantine will be defined in KB instead of MB, then we introduce a new option: the size up to which (lower than or equal to) a chunk will be quarantined. This way, we only quarantine smaller chunks, and the size of the quarantine remains manageable. It also prevents someone from triggering a recycle by allocating something huge. We default to 512 bytes on 32-bit and 2048 bytes on 64-bit platforms. In details, the patches includes the following: - introduce `QuarantineSizeKb`, but honor `QuarantineSizeMb` if set to fall back to the old behavior (meaning no threshold in that case); `QuarantineSizeMb` is described as deprecated in the options descriptios; documentation update will follow; - introduce `QuarantineChunksUpToSize`, the new threshold value; - update the `quarantine.cpp` test, and other tests using `QuarantineSizeMb`; - remove `AllocatorOptions::copyTo`, it wasn't used; - slightly change the logic around `quarantineOrDeallocateChunk` to accomodate for the new logic; rename a couple of variables there as well; Rewriting the tests, I found a somewhat annoying bug where non-default aligned chunks would account for more than needed when placed in the quarantine due to `<< MinAlignment` instead of `<< MinAlignmentLog`. This is fixed and tested for now. Reviewers: alekseyshl, kcc Reviewed By: alekseyshl Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D35694 llvm-svn: 308884
2017-07-24 23:29:38 +08:00
// Place a chunk in the quarantine or directly deallocate it in the event of
// a zero-sized quarantine, or if the size of the chunk is greater than the
// quarantine chunk size threshold.
void quarantineOrDeallocateChunk(ScudoChunk *Chunk, UnpackedHeader *Header,
uptr Size) {
[scudo] Quarantine overhaul Summary: First, some context. The main feedback we get about the quarantine is that it's too memory hungry. A single MB of quarantine will have an impact of 3 to 4MB of PSS/RSS, and things quickly get out of hand in terms of memory usage, and the quarantine ends up disabled. The main objective of the quarantine is to protect from use-after-free exploitation by making it harder for an attacker to reallocate a controlled chunk in place of the targeted freed chunk. This is achieved by not making it available to the backend right away for reuse, but holding it a little while. Historically, what has usually been the target of such attacks was objects, where vtable pointers or other function pointers could constitute a valuable targeti to replace. Those are usually on the smaller side. There is barely any advantage in putting the quarantine several megabytes of RGB data or the like. Now for the patch. This patch introduces a new way the Quarantine behaves in Scudo. First of all, the size of the Quarantine will be defined in KB instead of MB, then we introduce a new option: the size up to which (lower than or equal to) a chunk will be quarantined. This way, we only quarantine smaller chunks, and the size of the quarantine remains manageable. It also prevents someone from triggering a recycle by allocating something huge. We default to 512 bytes on 32-bit and 2048 bytes on 64-bit platforms. In details, the patches includes the following: - introduce `QuarantineSizeKb`, but honor `QuarantineSizeMb` if set to fall back to the old behavior (meaning no threshold in that case); `QuarantineSizeMb` is described as deprecated in the options descriptios; documentation update will follow; - introduce `QuarantineChunksUpToSize`, the new threshold value; - update the `quarantine.cpp` test, and other tests using `QuarantineSizeMb`; - remove `AllocatorOptions::copyTo`, it wasn't used; - slightly change the logic around `quarantineOrDeallocateChunk` to accomodate for the new logic; rename a couple of variables there as well; Rewriting the tests, I found a somewhat annoying bug where non-default aligned chunks would account for more than needed when placed in the quarantine due to `<< MinAlignment` instead of `<< MinAlignmentLog`. This is fixed and tested for now. Reviewers: alekseyshl, kcc Reviewed By: alekseyshl Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D35694 llvm-svn: 308884
2017-07-24 23:29:38 +08:00
const bool BypassQuarantine = (AllocatorQuarantine.GetCacheSize() == 0) ||
(Size > QuarantineChunksUpToSize);
if (BypassQuarantine) {
Chunk->eraseHeader();
void *Ptr = Chunk->getAllocBeg(Header);
[scudo] Quarantine overhaul Summary: First, some context. The main feedback we get about the quarantine is that it's too memory hungry. A single MB of quarantine will have an impact of 3 to 4MB of PSS/RSS, and things quickly get out of hand in terms of memory usage, and the quarantine ends up disabled. The main objective of the quarantine is to protect from use-after-free exploitation by making it harder for an attacker to reallocate a controlled chunk in place of the targeted freed chunk. This is achieved by not making it available to the backend right away for reuse, but holding it a little while. Historically, what has usually been the target of such attacks was objects, where vtable pointers or other function pointers could constitute a valuable targeti to replace. Those are usually on the smaller side. There is barely any advantage in putting the quarantine several megabytes of RGB data or the like. Now for the patch. This patch introduces a new way the Quarantine behaves in Scudo. First of all, the size of the Quarantine will be defined in KB instead of MB, then we introduce a new option: the size up to which (lower than or equal to) a chunk will be quarantined. This way, we only quarantine smaller chunks, and the size of the quarantine remains manageable. It also prevents someone from triggering a recycle by allocating something huge. We default to 512 bytes on 32-bit and 2048 bytes on 64-bit platforms. In details, the patches includes the following: - introduce `QuarantineSizeKb`, but honor `QuarantineSizeMb` if set to fall back to the old behavior (meaning no threshold in that case); `QuarantineSizeMb` is described as deprecated in the options descriptios; documentation update will follow; - introduce `QuarantineChunksUpToSize`, the new threshold value; - update the `quarantine.cpp` test, and other tests using `QuarantineSizeMb`; - remove `AllocatorOptions::copyTo`, it wasn't used; - slightly change the logic around `quarantineOrDeallocateChunk` to accomodate for the new logic; rename a couple of variables there as well; Rewriting the tests, I found a somewhat annoying bug where non-default aligned chunks would account for more than needed when placed in the quarantine due to `<< MinAlignment` instead of `<< MinAlignmentLog`. This is fixed and tested for now. Reviewers: alekseyshl, kcc Reviewed By: alekseyshl Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D35694 llvm-svn: 308884
2017-07-24 23:29:38 +08:00
if (Header->FromPrimary) {
ScudoThreadContext *ThreadContext = getThreadContextAndLock();
if (LIKELY(ThreadContext)) {
getBackendAllocator().deallocatePrimary(
getAllocatorCache(ThreadContext), Ptr);
ThreadContext->unlock();
} else {
SpinMutexLock Lock(&FallbackMutex);
getBackendAllocator().deallocatePrimary(&FallbackAllocatorCache, Ptr);
}
} else {
getBackendAllocator().deallocateSecondary(Ptr);
}
} else {
[scudo] Quarantine overhaul Summary: First, some context. The main feedback we get about the quarantine is that it's too memory hungry. A single MB of quarantine will have an impact of 3 to 4MB of PSS/RSS, and things quickly get out of hand in terms of memory usage, and the quarantine ends up disabled. The main objective of the quarantine is to protect from use-after-free exploitation by making it harder for an attacker to reallocate a controlled chunk in place of the targeted freed chunk. This is achieved by not making it available to the backend right away for reuse, but holding it a little while. Historically, what has usually been the target of such attacks was objects, where vtable pointers or other function pointers could constitute a valuable targeti to replace. Those are usually on the smaller side. There is barely any advantage in putting the quarantine several megabytes of RGB data or the like. Now for the patch. This patch introduces a new way the Quarantine behaves in Scudo. First of all, the size of the Quarantine will be defined in KB instead of MB, then we introduce a new option: the size up to which (lower than or equal to) a chunk will be quarantined. This way, we only quarantine smaller chunks, and the size of the quarantine remains manageable. It also prevents someone from triggering a recycle by allocating something huge. We default to 512 bytes on 32-bit and 2048 bytes on 64-bit platforms. In details, the patches includes the following: - introduce `QuarantineSizeKb`, but honor `QuarantineSizeMb` if set to fall back to the old behavior (meaning no threshold in that case); `QuarantineSizeMb` is described as deprecated in the options descriptios; documentation update will follow; - introduce `QuarantineChunksUpToSize`, the new threshold value; - update the `quarantine.cpp` test, and other tests using `QuarantineSizeMb`; - remove `AllocatorOptions::copyTo`, it wasn't used; - slightly change the logic around `quarantineOrDeallocateChunk` to accomodate for the new logic; rename a couple of variables there as well; Rewriting the tests, I found a somewhat annoying bug where non-default aligned chunks would account for more than needed when placed in the quarantine due to `<< MinAlignment` instead of `<< MinAlignmentLog`. This is fixed and tested for now. Reviewers: alekseyshl, kcc Reviewed By: alekseyshl Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D35694 llvm-svn: 308884
2017-07-24 23:29:38 +08:00
// If a small memory amount was allocated with a larger alignment, we want
// to take that into account. Otherwise the Quarantine would be filled
// with tiny chunks, taking a lot of VA memory. This is an approximation
// of the usable size, that allows us to not call
// GetActuallyAllocatedSize.
uptr EstimatedSize = Size + (Header->Offset << MinAlignmentLog);
UnpackedHeader NewHeader = *Header;
NewHeader.State = ChunkQuarantine;
Chunk->compareExchangeHeader(&NewHeader, Header);
ScudoThreadContext *ThreadContext = getThreadContextAndLock();
if (LIKELY(ThreadContext)) {
AllocatorQuarantine.Put(getQuarantineCache(ThreadContext),
QuarantineCallback(
getAllocatorCache(ThreadContext)),
[scudo] Quarantine overhaul Summary: First, some context. The main feedback we get about the quarantine is that it's too memory hungry. A single MB of quarantine will have an impact of 3 to 4MB of PSS/RSS, and things quickly get out of hand in terms of memory usage, and the quarantine ends up disabled. The main objective of the quarantine is to protect from use-after-free exploitation by making it harder for an attacker to reallocate a controlled chunk in place of the targeted freed chunk. This is achieved by not making it available to the backend right away for reuse, but holding it a little while. Historically, what has usually been the target of such attacks was objects, where vtable pointers or other function pointers could constitute a valuable targeti to replace. Those are usually on the smaller side. There is barely any advantage in putting the quarantine several megabytes of RGB data or the like. Now for the patch. This patch introduces a new way the Quarantine behaves in Scudo. First of all, the size of the Quarantine will be defined in KB instead of MB, then we introduce a new option: the size up to which (lower than or equal to) a chunk will be quarantined. This way, we only quarantine smaller chunks, and the size of the quarantine remains manageable. It also prevents someone from triggering a recycle by allocating something huge. We default to 512 bytes on 32-bit and 2048 bytes on 64-bit platforms. In details, the patches includes the following: - introduce `QuarantineSizeKb`, but honor `QuarantineSizeMb` if set to fall back to the old behavior (meaning no threshold in that case); `QuarantineSizeMb` is described as deprecated in the options descriptios; documentation update will follow; - introduce `QuarantineChunksUpToSize`, the new threshold value; - update the `quarantine.cpp` test, and other tests using `QuarantineSizeMb`; - remove `AllocatorOptions::copyTo`, it wasn't used; - slightly change the logic around `quarantineOrDeallocateChunk` to accomodate for the new logic; rename a couple of variables there as well; Rewriting the tests, I found a somewhat annoying bug where non-default aligned chunks would account for more than needed when placed in the quarantine due to `<< MinAlignment` instead of `<< MinAlignmentLog`. This is fixed and tested for now. Reviewers: alekseyshl, kcc Reviewed By: alekseyshl Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D35694 llvm-svn: 308884
2017-07-24 23:29:38 +08:00
Chunk, EstimatedSize);
ThreadContext->unlock();
} else {
SpinMutexLock l(&FallbackMutex);
AllocatorQuarantine.Put(&FallbackQuarantineCache,
QuarantineCallback(&FallbackAllocatorCache),
[scudo] Quarantine overhaul Summary: First, some context. The main feedback we get about the quarantine is that it's too memory hungry. A single MB of quarantine will have an impact of 3 to 4MB of PSS/RSS, and things quickly get out of hand in terms of memory usage, and the quarantine ends up disabled. The main objective of the quarantine is to protect from use-after-free exploitation by making it harder for an attacker to reallocate a controlled chunk in place of the targeted freed chunk. This is achieved by not making it available to the backend right away for reuse, but holding it a little while. Historically, what has usually been the target of such attacks was objects, where vtable pointers or other function pointers could constitute a valuable targeti to replace. Those are usually on the smaller side. There is barely any advantage in putting the quarantine several megabytes of RGB data or the like. Now for the patch. This patch introduces a new way the Quarantine behaves in Scudo. First of all, the size of the Quarantine will be defined in KB instead of MB, then we introduce a new option: the size up to which (lower than or equal to) a chunk will be quarantined. This way, we only quarantine smaller chunks, and the size of the quarantine remains manageable. It also prevents someone from triggering a recycle by allocating something huge. We default to 512 bytes on 32-bit and 2048 bytes on 64-bit platforms. In details, the patches includes the following: - introduce `QuarantineSizeKb`, but honor `QuarantineSizeMb` if set to fall back to the old behavior (meaning no threshold in that case); `QuarantineSizeMb` is described as deprecated in the options descriptios; documentation update will follow; - introduce `QuarantineChunksUpToSize`, the new threshold value; - update the `quarantine.cpp` test, and other tests using `QuarantineSizeMb`; - remove `AllocatorOptions::copyTo`, it wasn't used; - slightly change the logic around `quarantineOrDeallocateChunk` to accomodate for the new logic; rename a couple of variables there as well; Rewriting the tests, I found a somewhat annoying bug where non-default aligned chunks would account for more than needed when placed in the quarantine due to `<< MinAlignment` instead of `<< MinAlignmentLog`. This is fixed and tested for now. Reviewers: alekseyshl, kcc Reviewed By: alekseyshl Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D35694 llvm-svn: 308884
2017-07-24 23:29:38 +08:00
Chunk, EstimatedSize);
}
}
}
// Deallocates a Chunk, which means adding it to the delayed free list (or
// Quarantine).
void deallocate(void *UserPtr, uptr DeleteSize, AllocType Type) {
// For a deallocation, we only ensure minimal initialization, meaning thread
// local data will be left uninitialized for now (when using ELF TLS). The
// fallback cache will be used instead. This is a workaround for a situation
// where the only heap operation performed in a thread would be a free past
// the TLS destructors, ending up in initialized thread specific data never
// being destroyed properly. Any other heap operation will do a full init.
initThreadMaybe(/*MinimalInit=*/true);
// if (&__sanitizer_free_hook) __sanitizer_free_hook(UserPtr);
if (UNLIKELY(!UserPtr))
return;
uptr UserBeg = reinterpret_cast<uptr>(UserPtr);
if (UNLIKELY(!IsAligned(UserBeg, MinAlignment))) {
dieWithMessage("ERROR: attempted to deallocate a chunk not properly "
"aligned at address %p\n", UserPtr);
}
ScudoChunk *Chunk = getScudoChunk(UserBeg);
[scudo] Quarantine overhaul Summary: First, some context. The main feedback we get about the quarantine is that it's too memory hungry. A single MB of quarantine will have an impact of 3 to 4MB of PSS/RSS, and things quickly get out of hand in terms of memory usage, and the quarantine ends up disabled. The main objective of the quarantine is to protect from use-after-free exploitation by making it harder for an attacker to reallocate a controlled chunk in place of the targeted freed chunk. This is achieved by not making it available to the backend right away for reuse, but holding it a little while. Historically, what has usually been the target of such attacks was objects, where vtable pointers or other function pointers could constitute a valuable targeti to replace. Those are usually on the smaller side. There is barely any advantage in putting the quarantine several megabytes of RGB data or the like. Now for the patch. This patch introduces a new way the Quarantine behaves in Scudo. First of all, the size of the Quarantine will be defined in KB instead of MB, then we introduce a new option: the size up to which (lower than or equal to) a chunk will be quarantined. This way, we only quarantine smaller chunks, and the size of the quarantine remains manageable. It also prevents someone from triggering a recycle by allocating something huge. We default to 512 bytes on 32-bit and 2048 bytes on 64-bit platforms. In details, the patches includes the following: - introduce `QuarantineSizeKb`, but honor `QuarantineSizeMb` if set to fall back to the old behavior (meaning no threshold in that case); `QuarantineSizeMb` is described as deprecated in the options descriptios; documentation update will follow; - introduce `QuarantineChunksUpToSize`, the new threshold value; - update the `quarantine.cpp` test, and other tests using `QuarantineSizeMb`; - remove `AllocatorOptions::copyTo`, it wasn't used; - slightly change the logic around `quarantineOrDeallocateChunk` to accomodate for the new logic; rename a couple of variables there as well; Rewriting the tests, I found a somewhat annoying bug where non-default aligned chunks would account for more than needed when placed in the quarantine due to `<< MinAlignment` instead of `<< MinAlignmentLog`. This is fixed and tested for now. Reviewers: alekseyshl, kcc Reviewed By: alekseyshl Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D35694 llvm-svn: 308884
2017-07-24 23:29:38 +08:00
UnpackedHeader Header;
Chunk->loadHeader(&Header);
if (UNLIKELY(Header.State != ChunkAllocated)) {
dieWithMessage("ERROR: invalid chunk state when deallocating address "
[scudo] 32-bit and hardware agnostic support Summary: This update introduces i386 support for the Scudo Hardened Allocator, and offers software alternatives for functions that used to require hardware specific instruction sets. This should make porting to new architectures easier. Among the changes: - The chunk header has been changed to accomodate the size limitations encountered on 32-bit architectures. We now fit everything in 64-bit. This was achieved by storing the amount of unused bytes in an allocation rather than the size itself, as one can be deduced from the other with the help of the GetActuallyAllocatedSize function. As it turns out, this header can be used for both 64 and 32 bit, and as such we dropped the requirement for the 128-bit compare and exchange instruction support (cmpxchg16b). - Add 32-bit support for the checksum and the PRNG functions: if the SSE 4.2 instruction set is supported, use the 32-bit CRC32 instruction, and in the XorShift128, use a 32-bit based state instead of 64-bit. - Add software support for CRC32: if SSE 4.2 is not supported, fallback on a software implementation. - Modify tests that were not 32-bit compliant, and expand them to cover more allocation and alignment sizes. The random shuffle test has been deactivated for linux-i386 & linux-i686 as the 32-bit sanitizer allocator doesn't currently randomize chunks. Reviewers: alekseyshl, kcc Subscribers: filcab, llvm-commits, tberghammer, danalbert, srhines, mgorny, modocache Differential Revision: https://reviews.llvm.org/D26358 llvm-svn: 288255
2016-12-01 01:32:20 +08:00
"%p\n", UserPtr);
}
if (DeallocationTypeMismatch) {
// The deallocation type has to match the allocation one.
[scudo] Quarantine overhaul Summary: First, some context. The main feedback we get about the quarantine is that it's too memory hungry. A single MB of quarantine will have an impact of 3 to 4MB of PSS/RSS, and things quickly get out of hand in terms of memory usage, and the quarantine ends up disabled. The main objective of the quarantine is to protect from use-after-free exploitation by making it harder for an attacker to reallocate a controlled chunk in place of the targeted freed chunk. This is achieved by not making it available to the backend right away for reuse, but holding it a little while. Historically, what has usually been the target of such attacks was objects, where vtable pointers or other function pointers could constitute a valuable targeti to replace. Those are usually on the smaller side. There is barely any advantage in putting the quarantine several megabytes of RGB data or the like. Now for the patch. This patch introduces a new way the Quarantine behaves in Scudo. First of all, the size of the Quarantine will be defined in KB instead of MB, then we introduce a new option: the size up to which (lower than or equal to) a chunk will be quarantined. This way, we only quarantine smaller chunks, and the size of the quarantine remains manageable. It also prevents someone from triggering a recycle by allocating something huge. We default to 512 bytes on 32-bit and 2048 bytes on 64-bit platforms. In details, the patches includes the following: - introduce `QuarantineSizeKb`, but honor `QuarantineSizeMb` if set to fall back to the old behavior (meaning no threshold in that case); `QuarantineSizeMb` is described as deprecated in the options descriptios; documentation update will follow; - introduce `QuarantineChunksUpToSize`, the new threshold value; - update the `quarantine.cpp` test, and other tests using `QuarantineSizeMb`; - remove `AllocatorOptions::copyTo`, it wasn't used; - slightly change the logic around `quarantineOrDeallocateChunk` to accomodate for the new logic; rename a couple of variables there as well; Rewriting the tests, I found a somewhat annoying bug where non-default aligned chunks would account for more than needed when placed in the quarantine due to `<< MinAlignment` instead of `<< MinAlignmentLog`. This is fixed and tested for now. Reviewers: alekseyshl, kcc Reviewed By: alekseyshl Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D35694 llvm-svn: 308884
2017-07-24 23:29:38 +08:00
if (Header.AllocType != Type) {
// With the exception of memalign'd Chunks, that can be still be free'd.
[scudo] Quarantine overhaul Summary: First, some context. The main feedback we get about the quarantine is that it's too memory hungry. A single MB of quarantine will have an impact of 3 to 4MB of PSS/RSS, and things quickly get out of hand in terms of memory usage, and the quarantine ends up disabled. The main objective of the quarantine is to protect from use-after-free exploitation by making it harder for an attacker to reallocate a controlled chunk in place of the targeted freed chunk. This is achieved by not making it available to the backend right away for reuse, but holding it a little while. Historically, what has usually been the target of such attacks was objects, where vtable pointers or other function pointers could constitute a valuable targeti to replace. Those are usually on the smaller side. There is barely any advantage in putting the quarantine several megabytes of RGB data or the like. Now for the patch. This patch introduces a new way the Quarantine behaves in Scudo. First of all, the size of the Quarantine will be defined in KB instead of MB, then we introduce a new option: the size up to which (lower than or equal to) a chunk will be quarantined. This way, we only quarantine smaller chunks, and the size of the quarantine remains manageable. It also prevents someone from triggering a recycle by allocating something huge. We default to 512 bytes on 32-bit and 2048 bytes on 64-bit platforms. In details, the patches includes the following: - introduce `QuarantineSizeKb`, but honor `QuarantineSizeMb` if set to fall back to the old behavior (meaning no threshold in that case); `QuarantineSizeMb` is described as deprecated in the options descriptios; documentation update will follow; - introduce `QuarantineChunksUpToSize`, the new threshold value; - update the `quarantine.cpp` test, and other tests using `QuarantineSizeMb`; - remove `AllocatorOptions::copyTo`, it wasn't used; - slightly change the logic around `quarantineOrDeallocateChunk` to accomodate for the new logic; rename a couple of variables there as well; Rewriting the tests, I found a somewhat annoying bug where non-default aligned chunks would account for more than needed when placed in the quarantine due to `<< MinAlignment` instead of `<< MinAlignmentLog`. This is fixed and tested for now. Reviewers: alekseyshl, kcc Reviewed By: alekseyshl Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D35694 llvm-svn: 308884
2017-07-24 23:29:38 +08:00
if (Header.AllocType != FromMemalign || Type != FromMalloc) {
[scudo] Application & platform compatibility changes Summary: This patch changes a few (small) things around for compatibility purposes for the current Android & Fuchsia work: - `realloc`'ing some memory that was not allocated with `malloc`, `calloc` or `realloc`, while UB according to http://pubs.opengroup.org/onlinepubs/009695399/functions/realloc.html is more common that one would think. We now only check this if `DeallocationTypeMismatch` is set; change the "mismatch" error messages to be more homogeneous; - some sketchily written but widely used libraries expect a call to `realloc` to copy the usable size of the old chunk to the new one instead of the requested size. We have to begrundingly abide by this de-facto standard. This doesn't seem to impact security either way, unless someone comes up with something we didn't think about; - the CRC32 intrinsics for 64-bit take a 64-bit first argument. This is misleading as the upper 32 bits end up being ignored. This was also raising `-Wconversion` errors. Change things to take a `u32` as first argument. This also means we were (and are) only using 32 bits of the Cookie - not a big thing, but worth mentioning. - Includes-wise: prefer `stddef.h` to `cstddef`, move `scudo_flags.h` where it is actually needed. - Add tests for the memalign-realloc case, and the realloc-usable-size one. (Edited typos) Reviewers: alekseyshl Reviewed By: alekseyshl Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D36754 llvm-svn: 311018
2017-08-17 00:40:48 +08:00
dieWithMessage("ERROR: allocation type mismatch when deallocating "
"address %p\n", UserPtr);
}
}
}
[scudo] Quarantine overhaul Summary: First, some context. The main feedback we get about the quarantine is that it's too memory hungry. A single MB of quarantine will have an impact of 3 to 4MB of PSS/RSS, and things quickly get out of hand in terms of memory usage, and the quarantine ends up disabled. The main objective of the quarantine is to protect from use-after-free exploitation by making it harder for an attacker to reallocate a controlled chunk in place of the targeted freed chunk. This is achieved by not making it available to the backend right away for reuse, but holding it a little while. Historically, what has usually been the target of such attacks was objects, where vtable pointers or other function pointers could constitute a valuable targeti to replace. Those are usually on the smaller side. There is barely any advantage in putting the quarantine several megabytes of RGB data or the like. Now for the patch. This patch introduces a new way the Quarantine behaves in Scudo. First of all, the size of the Quarantine will be defined in KB instead of MB, then we introduce a new option: the size up to which (lower than or equal to) a chunk will be quarantined. This way, we only quarantine smaller chunks, and the size of the quarantine remains manageable. It also prevents someone from triggering a recycle by allocating something huge. We default to 512 bytes on 32-bit and 2048 bytes on 64-bit platforms. In details, the patches includes the following: - introduce `QuarantineSizeKb`, but honor `QuarantineSizeMb` if set to fall back to the old behavior (meaning no threshold in that case); `QuarantineSizeMb` is described as deprecated in the options descriptios; documentation update will follow; - introduce `QuarantineChunksUpToSize`, the new threshold value; - update the `quarantine.cpp` test, and other tests using `QuarantineSizeMb`; - remove `AllocatorOptions::copyTo`, it wasn't used; - slightly change the logic around `quarantineOrDeallocateChunk` to accomodate for the new logic; rename a couple of variables there as well; Rewriting the tests, I found a somewhat annoying bug where non-default aligned chunks would account for more than needed when placed in the quarantine due to `<< MinAlignment` instead of `<< MinAlignmentLog`. This is fixed and tested for now. Reviewers: alekseyshl, kcc Reviewed By: alekseyshl Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D35694 llvm-svn: 308884
2017-07-24 23:29:38 +08:00
uptr Size = Header.FromPrimary ? Header.SizeOrUnusedBytes :
Chunk->getUsableSize(&Header) - Header.SizeOrUnusedBytes;
if (DeleteSizeMismatch) {
if (DeleteSize && DeleteSize != Size) {
dieWithMessage("ERROR: invalid sized delete on chunk at address %p\n",
UserPtr);
}
}
[scudo] Quarantine overhaul Summary: First, some context. The main feedback we get about the quarantine is that it's too memory hungry. A single MB of quarantine will have an impact of 3 to 4MB of PSS/RSS, and things quickly get out of hand in terms of memory usage, and the quarantine ends up disabled. The main objective of the quarantine is to protect from use-after-free exploitation by making it harder for an attacker to reallocate a controlled chunk in place of the targeted freed chunk. This is achieved by not making it available to the backend right away for reuse, but holding it a little while. Historically, what has usually been the target of such attacks was objects, where vtable pointers or other function pointers could constitute a valuable targeti to replace. Those are usually on the smaller side. There is barely any advantage in putting the quarantine several megabytes of RGB data or the like. Now for the patch. This patch introduces a new way the Quarantine behaves in Scudo. First of all, the size of the Quarantine will be defined in KB instead of MB, then we introduce a new option: the size up to which (lower than or equal to) a chunk will be quarantined. This way, we only quarantine smaller chunks, and the size of the quarantine remains manageable. It also prevents someone from triggering a recycle by allocating something huge. We default to 512 bytes on 32-bit and 2048 bytes on 64-bit platforms. In details, the patches includes the following: - introduce `QuarantineSizeKb`, but honor `QuarantineSizeMb` if set to fall back to the old behavior (meaning no threshold in that case); `QuarantineSizeMb` is described as deprecated in the options descriptios; documentation update will follow; - introduce `QuarantineChunksUpToSize`, the new threshold value; - update the `quarantine.cpp` test, and other tests using `QuarantineSizeMb`; - remove `AllocatorOptions::copyTo`, it wasn't used; - slightly change the logic around `quarantineOrDeallocateChunk` to accomodate for the new logic; rename a couple of variables there as well; Rewriting the tests, I found a somewhat annoying bug where non-default aligned chunks would account for more than needed when placed in the quarantine due to `<< MinAlignment` instead of `<< MinAlignmentLog`. This is fixed and tested for now. Reviewers: alekseyshl, kcc Reviewed By: alekseyshl Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D35694 llvm-svn: 308884
2017-07-24 23:29:38 +08:00
quarantineOrDeallocateChunk(Chunk, &Header, Size);
}
// Reallocates a chunk. We can save on a new allocation if the new requested
// size still fits in the chunk.
void *reallocate(void *OldPtr, uptr NewSize) {
initThreadMaybe();
uptr UserBeg = reinterpret_cast<uptr>(OldPtr);
if (UNLIKELY(!IsAligned(UserBeg, MinAlignment))) {
dieWithMessage("ERROR: attempted to reallocate a chunk not properly "
"aligned at address %p\n", OldPtr);
}
ScudoChunk *Chunk = getScudoChunk(UserBeg);
[scudo] 32-bit and hardware agnostic support Summary: This update introduces i386 support for the Scudo Hardened Allocator, and offers software alternatives for functions that used to require hardware specific instruction sets. This should make porting to new architectures easier. Among the changes: - The chunk header has been changed to accomodate the size limitations encountered on 32-bit architectures. We now fit everything in 64-bit. This was achieved by storing the amount of unused bytes in an allocation rather than the size itself, as one can be deduced from the other with the help of the GetActuallyAllocatedSize function. As it turns out, this header can be used for both 64 and 32 bit, and as such we dropped the requirement for the 128-bit compare and exchange instruction support (cmpxchg16b). - Add 32-bit support for the checksum and the PRNG functions: if the SSE 4.2 instruction set is supported, use the 32-bit CRC32 instruction, and in the XorShift128, use a 32-bit based state instead of 64-bit. - Add software support for CRC32: if SSE 4.2 is not supported, fallback on a software implementation. - Modify tests that were not 32-bit compliant, and expand them to cover more allocation and alignment sizes. The random shuffle test has been deactivated for linux-i386 & linux-i686 as the 32-bit sanitizer allocator doesn't currently randomize chunks. Reviewers: alekseyshl, kcc Subscribers: filcab, llvm-commits, tberghammer, danalbert, srhines, mgorny, modocache Differential Revision: https://reviews.llvm.org/D26358 llvm-svn: 288255
2016-12-01 01:32:20 +08:00
UnpackedHeader OldHeader;
Chunk->loadHeader(&OldHeader);
if (UNLIKELY(OldHeader.State != ChunkAllocated)) {
[scudo] 32-bit and hardware agnostic support Summary: This update introduces i386 support for the Scudo Hardened Allocator, and offers software alternatives for functions that used to require hardware specific instruction sets. This should make porting to new architectures easier. Among the changes: - The chunk header has been changed to accomodate the size limitations encountered on 32-bit architectures. We now fit everything in 64-bit. This was achieved by storing the amount of unused bytes in an allocation rather than the size itself, as one can be deduced from the other with the help of the GetActuallyAllocatedSize function. As it turns out, this header can be used for both 64 and 32 bit, and as such we dropped the requirement for the 128-bit compare and exchange instruction support (cmpxchg16b). - Add 32-bit support for the checksum and the PRNG functions: if the SSE 4.2 instruction set is supported, use the 32-bit CRC32 instruction, and in the XorShift128, use a 32-bit based state instead of 64-bit. - Add software support for CRC32: if SSE 4.2 is not supported, fallback on a software implementation. - Modify tests that were not 32-bit compliant, and expand them to cover more allocation and alignment sizes. The random shuffle test has been deactivated for linux-i386 & linux-i686 as the 32-bit sanitizer allocator doesn't currently randomize chunks. Reviewers: alekseyshl, kcc Subscribers: filcab, llvm-commits, tberghammer, danalbert, srhines, mgorny, modocache Differential Revision: https://reviews.llvm.org/D26358 llvm-svn: 288255
2016-12-01 01:32:20 +08:00
dieWithMessage("ERROR: invalid chunk state when reallocating address "
"%p\n", OldPtr);
}
[scudo] Application & platform compatibility changes Summary: This patch changes a few (small) things around for compatibility purposes for the current Android & Fuchsia work: - `realloc`'ing some memory that was not allocated with `malloc`, `calloc` or `realloc`, while UB according to http://pubs.opengroup.org/onlinepubs/009695399/functions/realloc.html is more common that one would think. We now only check this if `DeallocationTypeMismatch` is set; change the "mismatch" error messages to be more homogeneous; - some sketchily written but widely used libraries expect a call to `realloc` to copy the usable size of the old chunk to the new one instead of the requested size. We have to begrundingly abide by this de-facto standard. This doesn't seem to impact security either way, unless someone comes up with something we didn't think about; - the CRC32 intrinsics for 64-bit take a 64-bit first argument. This is misleading as the upper 32 bits end up being ignored. This was also raising `-Wconversion` errors. Change things to take a `u32` as first argument. This also means we were (and are) only using 32 bits of the Cookie - not a big thing, but worth mentioning. - Includes-wise: prefer `stddef.h` to `cstddef`, move `scudo_flags.h` where it is actually needed. - Add tests for the memalign-realloc case, and the realloc-usable-size one. (Edited typos) Reviewers: alekseyshl Reviewed By: alekseyshl Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D36754 llvm-svn: 311018
2017-08-17 00:40:48 +08:00
if (DeallocationTypeMismatch) {
if (UNLIKELY(OldHeader.AllocType != FromMalloc)) {
dieWithMessage("ERROR: allocation type mismatch when reallocating "
"address %p\n", OldPtr);
}
}
uptr UsableSize = Chunk->getUsableSize(&OldHeader);
// The new size still fits in the current chunk, and the size difference
// is reasonable.
if (NewSize <= UsableSize &&
(UsableSize - NewSize) < (SizeClassMap::kMaxSize / 2)) {
UnpackedHeader NewHeader = OldHeader;
NewHeader.SizeOrUnusedBytes =
OldHeader.FromPrimary ? NewSize : UsableSize - NewSize;
Chunk->compareExchangeHeader(&NewHeader, &OldHeader);
return OldPtr;
}
// Otherwise, we have to allocate a new chunk and copy the contents of the
// old one.
void *NewPtr = allocate(NewSize, MinAlignment, FromMalloc);
if (NewPtr) {
uptr OldSize = OldHeader.FromPrimary ? OldHeader.SizeOrUnusedBytes :
UsableSize - OldHeader.SizeOrUnusedBytes;
[scudo] Application & platform compatibility changes Summary: This patch changes a few (small) things around for compatibility purposes for the current Android & Fuchsia work: - `realloc`'ing some memory that was not allocated with `malloc`, `calloc` or `realloc`, while UB according to http://pubs.opengroup.org/onlinepubs/009695399/functions/realloc.html is more common that one would think. We now only check this if `DeallocationTypeMismatch` is set; change the "mismatch" error messages to be more homogeneous; - some sketchily written but widely used libraries expect a call to `realloc` to copy the usable size of the old chunk to the new one instead of the requested size. We have to begrundingly abide by this de-facto standard. This doesn't seem to impact security either way, unless someone comes up with something we didn't think about; - the CRC32 intrinsics for 64-bit take a 64-bit first argument. This is misleading as the upper 32 bits end up being ignored. This was also raising `-Wconversion` errors. Change things to take a `u32` as first argument. This also means we were (and are) only using 32 bits of the Cookie - not a big thing, but worth mentioning. - Includes-wise: prefer `stddef.h` to `cstddef`, move `scudo_flags.h` where it is actually needed. - Add tests for the memalign-realloc case, and the realloc-usable-size one. (Edited typos) Reviewers: alekseyshl Reviewed By: alekseyshl Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D36754 llvm-svn: 311018
2017-08-17 00:40:48 +08:00
memcpy(NewPtr, OldPtr, Min(NewSize, UsableSize));
[scudo] Quarantine overhaul Summary: First, some context. The main feedback we get about the quarantine is that it's too memory hungry. A single MB of quarantine will have an impact of 3 to 4MB of PSS/RSS, and things quickly get out of hand in terms of memory usage, and the quarantine ends up disabled. The main objective of the quarantine is to protect from use-after-free exploitation by making it harder for an attacker to reallocate a controlled chunk in place of the targeted freed chunk. This is achieved by not making it available to the backend right away for reuse, but holding it a little while. Historically, what has usually been the target of such attacks was objects, where vtable pointers or other function pointers could constitute a valuable targeti to replace. Those are usually on the smaller side. There is barely any advantage in putting the quarantine several megabytes of RGB data or the like. Now for the patch. This patch introduces a new way the Quarantine behaves in Scudo. First of all, the size of the Quarantine will be defined in KB instead of MB, then we introduce a new option: the size up to which (lower than or equal to) a chunk will be quarantined. This way, we only quarantine smaller chunks, and the size of the quarantine remains manageable. It also prevents someone from triggering a recycle by allocating something huge. We default to 512 bytes on 32-bit and 2048 bytes on 64-bit platforms. In details, the patches includes the following: - introduce `QuarantineSizeKb`, but honor `QuarantineSizeMb` if set to fall back to the old behavior (meaning no threshold in that case); `QuarantineSizeMb` is described as deprecated in the options descriptios; documentation update will follow; - introduce `QuarantineChunksUpToSize`, the new threshold value; - update the `quarantine.cpp` test, and other tests using `QuarantineSizeMb`; - remove `AllocatorOptions::copyTo`, it wasn't used; - slightly change the logic around `quarantineOrDeallocateChunk` to accomodate for the new logic; rename a couple of variables there as well; Rewriting the tests, I found a somewhat annoying bug where non-default aligned chunks would account for more than needed when placed in the quarantine due to `<< MinAlignment` instead of `<< MinAlignmentLog`. This is fixed and tested for now. Reviewers: alekseyshl, kcc Reviewed By: alekseyshl Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D35694 llvm-svn: 308884
2017-07-24 23:29:38 +08:00
quarantineOrDeallocateChunk(Chunk, &OldHeader, OldSize);
}
return NewPtr;
}
[scudo] 32-bit and hardware agnostic support Summary: This update introduces i386 support for the Scudo Hardened Allocator, and offers software alternatives for functions that used to require hardware specific instruction sets. This should make porting to new architectures easier. Among the changes: - The chunk header has been changed to accomodate the size limitations encountered on 32-bit architectures. We now fit everything in 64-bit. This was achieved by storing the amount of unused bytes in an allocation rather than the size itself, as one can be deduced from the other with the help of the GetActuallyAllocatedSize function. As it turns out, this header can be used for both 64 and 32 bit, and as such we dropped the requirement for the 128-bit compare and exchange instruction support (cmpxchg16b). - Add 32-bit support for the checksum and the PRNG functions: if the SSE 4.2 instruction set is supported, use the 32-bit CRC32 instruction, and in the XorShift128, use a 32-bit based state instead of 64-bit. - Add software support for CRC32: if SSE 4.2 is not supported, fallback on a software implementation. - Modify tests that were not 32-bit compliant, and expand them to cover more allocation and alignment sizes. The random shuffle test has been deactivated for linux-i386 & linux-i686 as the 32-bit sanitizer allocator doesn't currently randomize chunks. Reviewers: alekseyshl, kcc Subscribers: filcab, llvm-commits, tberghammer, danalbert, srhines, mgorny, modocache Differential Revision: https://reviews.llvm.org/D26358 llvm-svn: 288255
2016-12-01 01:32:20 +08:00
// Helper function that returns the actual usable size of a chunk.
uptr getUsableSize(const void *Ptr) {
initThreadMaybe();
if (UNLIKELY(!Ptr))
[scudo] 32-bit and hardware agnostic support Summary: This update introduces i386 support for the Scudo Hardened Allocator, and offers software alternatives for functions that used to require hardware specific instruction sets. This should make porting to new architectures easier. Among the changes: - The chunk header has been changed to accomodate the size limitations encountered on 32-bit architectures. We now fit everything in 64-bit. This was achieved by storing the amount of unused bytes in an allocation rather than the size itself, as one can be deduced from the other with the help of the GetActuallyAllocatedSize function. As it turns out, this header can be used for both 64 and 32 bit, and as such we dropped the requirement for the 128-bit compare and exchange instruction support (cmpxchg16b). - Add 32-bit support for the checksum and the PRNG functions: if the SSE 4.2 instruction set is supported, use the 32-bit CRC32 instruction, and in the XorShift128, use a 32-bit based state instead of 64-bit. - Add software support for CRC32: if SSE 4.2 is not supported, fallback on a software implementation. - Modify tests that were not 32-bit compliant, and expand them to cover more allocation and alignment sizes. The random shuffle test has been deactivated for linux-i386 & linux-i686 as the 32-bit sanitizer allocator doesn't currently randomize chunks. Reviewers: alekseyshl, kcc Subscribers: filcab, llvm-commits, tberghammer, danalbert, srhines, mgorny, modocache Differential Revision: https://reviews.llvm.org/D26358 llvm-svn: 288255
2016-12-01 01:32:20 +08:00
return 0;
uptr UserBeg = reinterpret_cast<uptr>(Ptr);
ScudoChunk *Chunk = getScudoChunk(UserBeg);
[scudo] 32-bit and hardware agnostic support Summary: This update introduces i386 support for the Scudo Hardened Allocator, and offers software alternatives for functions that used to require hardware specific instruction sets. This should make porting to new architectures easier. Among the changes: - The chunk header has been changed to accomodate the size limitations encountered on 32-bit architectures. We now fit everything in 64-bit. This was achieved by storing the amount of unused bytes in an allocation rather than the size itself, as one can be deduced from the other with the help of the GetActuallyAllocatedSize function. As it turns out, this header can be used for both 64 and 32 bit, and as such we dropped the requirement for the 128-bit compare and exchange instruction support (cmpxchg16b). - Add 32-bit support for the checksum and the PRNG functions: if the SSE 4.2 instruction set is supported, use the 32-bit CRC32 instruction, and in the XorShift128, use a 32-bit based state instead of 64-bit. - Add software support for CRC32: if SSE 4.2 is not supported, fallback on a software implementation. - Modify tests that were not 32-bit compliant, and expand them to cover more allocation and alignment sizes. The random shuffle test has been deactivated for linux-i386 & linux-i686 as the 32-bit sanitizer allocator doesn't currently randomize chunks. Reviewers: alekseyshl, kcc Subscribers: filcab, llvm-commits, tberghammer, danalbert, srhines, mgorny, modocache Differential Revision: https://reviews.llvm.org/D26358 llvm-svn: 288255
2016-12-01 01:32:20 +08:00
UnpackedHeader Header;
Chunk->loadHeader(&Header);
// Getting the usable size of a chunk only makes sense if it's allocated.
if (UNLIKELY(Header.State != ChunkAllocated)) {
[scudo] 32-bit and hardware agnostic support Summary: This update introduces i386 support for the Scudo Hardened Allocator, and offers software alternatives for functions that used to require hardware specific instruction sets. This should make porting to new architectures easier. Among the changes: - The chunk header has been changed to accomodate the size limitations encountered on 32-bit architectures. We now fit everything in 64-bit. This was achieved by storing the amount of unused bytes in an allocation rather than the size itself, as one can be deduced from the other with the help of the GetActuallyAllocatedSize function. As it turns out, this header can be used for both 64 and 32 bit, and as such we dropped the requirement for the 128-bit compare and exchange instruction support (cmpxchg16b). - Add 32-bit support for the checksum and the PRNG functions: if the SSE 4.2 instruction set is supported, use the 32-bit CRC32 instruction, and in the XorShift128, use a 32-bit based state instead of 64-bit. - Add software support for CRC32: if SSE 4.2 is not supported, fallback on a software implementation. - Modify tests that were not 32-bit compliant, and expand them to cover more allocation and alignment sizes. The random shuffle test has been deactivated for linux-i386 & linux-i686 as the 32-bit sanitizer allocator doesn't currently randomize chunks. Reviewers: alekseyshl, kcc Subscribers: filcab, llvm-commits, tberghammer, danalbert, srhines, mgorny, modocache Differential Revision: https://reviews.llvm.org/D26358 llvm-svn: 288255
2016-12-01 01:32:20 +08:00
dieWithMessage("ERROR: invalid chunk state when sizing address %p\n",
Ptr);
}
return Chunk->getUsableSize(&Header);
}
void *calloc(uptr NMemB, uptr Size) {
initThreadMaybe();
if (UNLIKELY(CheckForCallocOverflow(NMemB, Size)))
return FailureHandler::OnBadRequest();
return allocate(NMemB * Size, MinAlignment, FromMalloc, true);
}
void commitBack(ScudoThreadContext *ThreadContext) {
AllocatorCache *Cache = getAllocatorCache(ThreadContext);
AllocatorQuarantine.Drain(getQuarantineCache(ThreadContext),
QuarantineCallback(Cache));
BackendAllocator.destroyCache(Cache);
}
uptr getStats(AllocatorStat StatType) {
initThreadMaybe();
uptr stats[AllocatorStatCount];
BackendAllocator.getStats(stats);
return stats[StatType];
}
void *handleBadRequest() {
initThreadMaybe();
return FailureHandler::OnBadRequest();
}
};
static ScudoAllocator Instance(LINKER_INITIALIZED);
static ScudoBackendAllocator &getBackendAllocator() {
return Instance.BackendAllocator;
}
static void initScudoInternal(const AllocatorOptions &Options) {
Instance.init(Options);
}
void ScudoThreadContext::init() {
getBackendAllocator().initCache(&Cache);
[scudo] PRNG makeover Summary: This follows the addition of `GetRandom` with D34412. We remove our `/dev/urandom` code and use the new function. Additionally, change the PRNG for a slightly faster version. One of the issues with the old code is that we have 64 full bits of randomness per "next", using only 8 of those for the Salt and discarding the rest. So we add a cached u64 in the PRNG that can serve up to 8 u8 before having to call the "next" function again. During some integration work, I also realized that some very early processes (like `init`) do not benefit from `/dev/urandom` yet. So if there is no `getrandom` syscall as well, we have to fallback to some sort of initialization of the PRNG. Now a few words on why XoRoShiRo and not something else. I have played a while with various PRNGs on 32 & 64 bit platforms. Some results are below. LCG 32 & 64 are usually faster but produce respectively 15 & 31 bits of entropy, meaning that to get a full 64-bit, you would need to call them several times. The simple XorShift is fast, produces 32 bits but is mediocre with regard to PRNG test suites, PCG is slower overall, and XoRoShiRo is faster than XorShift128+ and produces full 64 bits. %%% root@tulip-chiphd:/data # ./randtest.arm [+] starting xs32... [?] xs32 duration: 22431833053ns [+] starting lcg32... [?] lcg32 duration: 14941402090ns [+] starting pcg32... [?] pcg32 duration: 44941973771ns [+] starting xs128p... [?] xs128p duration: 48889786981ns [+] starting lcg64... [?] lcg64 duration: 33831042391ns [+] starting xos128p... [?] xos128p duration: 44850878605ns root@tulip-chiphd:/data # ./randtest.aarch64 [+] starting xs32... [?] xs32 duration: 22425151678ns [+] starting lcg32... [?] lcg32 duration: 14954255257ns [+] starting pcg32... [?] pcg32 duration: 37346265726ns [+] starting xs128p... [?] xs128p duration: 22523807219ns [+] starting lcg64... [?] lcg64 duration: 26141304679ns [+] starting xos128p... [?] xos128p duration: 14937033215ns %%% Reviewers: alekseyshl Reviewed By: alekseyshl Subscribers: aemerson, kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D35221 llvm-svn: 307798
2017-07-12 23:29:08 +08:00
Prng.init();
memset(QuarantineCachePlaceHolder, 0, sizeof(QuarantineCachePlaceHolder));
}
void ScudoThreadContext::commitBack() {
Instance.commitBack(this);
}
void *scudoMalloc(uptr Size, AllocType Type) {
return SetErrnoOnNull(Instance.allocate(Size, MinAlignment, Type));
}
void scudoFree(void *Ptr, AllocType Type) {
Instance.deallocate(Ptr, 0, Type);
}
void scudoSizedFree(void *Ptr, uptr Size, AllocType Type) {
Instance.deallocate(Ptr, Size, Type);
}
void *scudoRealloc(void *Ptr, uptr Size) {
if (!Ptr)
return SetErrnoOnNull(Instance.allocate(Size, MinAlignment, FromMalloc));
if (Size == 0) {
Instance.deallocate(Ptr, 0, FromMalloc);
return nullptr;
}
return SetErrnoOnNull(Instance.reallocate(Ptr, Size));
}
void *scudoCalloc(uptr NMemB, uptr Size) {
return SetErrnoOnNull(Instance.calloc(NMemB, Size));
}
void *scudoValloc(uptr Size) {
return SetErrnoOnNull(
Instance.allocate(Size, GetPageSizeCached(), FromMemalign));
}
void *scudoPvalloc(uptr Size) {
uptr PageSize = GetPageSizeCached();
if (UNLIKELY(CheckForPvallocOverflow(Size, PageSize))) {
errno = errno_ENOMEM;
return Instance.handleBadRequest();
}
// pvalloc(0) should allocate one page.
Size = Size ? RoundUpTo(Size, PageSize) : PageSize;
return SetErrnoOnNull(Instance.allocate(Size, PageSize, FromMemalign));
}
void *scudoMemalign(uptr Alignment, uptr Size) {
if (UNLIKELY(!IsPowerOfTwo(Alignment))) {
errno = errno_EINVAL;
return Instance.handleBadRequest();
}
return SetErrnoOnNull(Instance.allocate(Size, Alignment, FromMemalign));
}
int scudoPosixMemalign(void **MemPtr, uptr Alignment, uptr Size) {
if (UNLIKELY(!CheckPosixMemalignAlignment(Alignment))) {
Instance.handleBadRequest();
return errno_EINVAL;
}
void *Ptr = Instance.allocate(Size, Alignment, FromMemalign);
if (UNLIKELY(!Ptr))
return errno_ENOMEM;
*MemPtr = Ptr;
return 0;
}
void *scudoAlignedAlloc(uptr Alignment, uptr Size) {
if (UNLIKELY(!CheckAlignedAllocAlignmentAndSize(Alignment, Size))) {
errno = errno_EINVAL;
return Instance.handleBadRequest();
}
return SetErrnoOnNull(Instance.allocate(Size, Alignment, FromMalloc));
}
uptr scudoMallocUsableSize(void *Ptr) {
return Instance.getUsableSize(Ptr);
}
[scudo] 32-bit and hardware agnostic support Summary: This update introduces i386 support for the Scudo Hardened Allocator, and offers software alternatives for functions that used to require hardware specific instruction sets. This should make porting to new architectures easier. Among the changes: - The chunk header has been changed to accomodate the size limitations encountered on 32-bit architectures. We now fit everything in 64-bit. This was achieved by storing the amount of unused bytes in an allocation rather than the size itself, as one can be deduced from the other with the help of the GetActuallyAllocatedSize function. As it turns out, this header can be used for both 64 and 32 bit, and as such we dropped the requirement for the 128-bit compare and exchange instruction support (cmpxchg16b). - Add 32-bit support for the checksum and the PRNG functions: if the SSE 4.2 instruction set is supported, use the 32-bit CRC32 instruction, and in the XorShift128, use a 32-bit based state instead of 64-bit. - Add software support for CRC32: if SSE 4.2 is not supported, fallback on a software implementation. - Modify tests that were not 32-bit compliant, and expand them to cover more allocation and alignment sizes. The random shuffle test has been deactivated for linux-i386 & linux-i686 as the 32-bit sanitizer allocator doesn't currently randomize chunks. Reviewers: alekseyshl, kcc Subscribers: filcab, llvm-commits, tberghammer, danalbert, srhines, mgorny, modocache Differential Revision: https://reviews.llvm.org/D26358 llvm-svn: 288255
2016-12-01 01:32:20 +08:00
} // namespace __scudo
using namespace __scudo;
// MallocExtension helper functions
uptr __sanitizer_get_current_allocated_bytes() {
return Instance.getStats(AllocatorStatAllocated);
}
uptr __sanitizer_get_heap_size() {
return Instance.getStats(AllocatorStatMapped);
}
uptr __sanitizer_get_free_bytes() {
return 1;
}
uptr __sanitizer_get_unmapped_bytes() {
return 1;
}
uptr __sanitizer_get_estimated_allocated_size(uptr size) {
return size;
}
[scudo] 32-bit and hardware agnostic support Summary: This update introduces i386 support for the Scudo Hardened Allocator, and offers software alternatives for functions that used to require hardware specific instruction sets. This should make porting to new architectures easier. Among the changes: - The chunk header has been changed to accomodate the size limitations encountered on 32-bit architectures. We now fit everything in 64-bit. This was achieved by storing the amount of unused bytes in an allocation rather than the size itself, as one can be deduced from the other with the help of the GetActuallyAllocatedSize function. As it turns out, this header can be used for both 64 and 32 bit, and as such we dropped the requirement for the 128-bit compare and exchange instruction support (cmpxchg16b). - Add 32-bit support for the checksum and the PRNG functions: if the SSE 4.2 instruction set is supported, use the 32-bit CRC32 instruction, and in the XorShift128, use a 32-bit based state instead of 64-bit. - Add software support for CRC32: if SSE 4.2 is not supported, fallback on a software implementation. - Modify tests that were not 32-bit compliant, and expand them to cover more allocation and alignment sizes. The random shuffle test has been deactivated for linux-i386 & linux-i686 as the 32-bit sanitizer allocator doesn't currently randomize chunks. Reviewers: alekseyshl, kcc Subscribers: filcab, llvm-commits, tberghammer, danalbert, srhines, mgorny, modocache Differential Revision: https://reviews.llvm.org/D26358 llvm-svn: 288255
2016-12-01 01:32:20 +08:00
int __sanitizer_get_ownership(const void *Ptr) {
return Instance.isValidPointer(Ptr);
}
[scudo] 32-bit and hardware agnostic support Summary: This update introduces i386 support for the Scudo Hardened Allocator, and offers software alternatives for functions that used to require hardware specific instruction sets. This should make porting to new architectures easier. Among the changes: - The chunk header has been changed to accomodate the size limitations encountered on 32-bit architectures. We now fit everything in 64-bit. This was achieved by storing the amount of unused bytes in an allocation rather than the size itself, as one can be deduced from the other with the help of the GetActuallyAllocatedSize function. As it turns out, this header can be used for both 64 and 32 bit, and as such we dropped the requirement for the 128-bit compare and exchange instruction support (cmpxchg16b). - Add 32-bit support for the checksum and the PRNG functions: if the SSE 4.2 instruction set is supported, use the 32-bit CRC32 instruction, and in the XorShift128, use a 32-bit based state instead of 64-bit. - Add software support for CRC32: if SSE 4.2 is not supported, fallback on a software implementation. - Modify tests that were not 32-bit compliant, and expand them to cover more allocation and alignment sizes. The random shuffle test has been deactivated for linux-i386 & linux-i686 as the 32-bit sanitizer allocator doesn't currently randomize chunks. Reviewers: alekseyshl, kcc Subscribers: filcab, llvm-commits, tberghammer, danalbert, srhines, mgorny, modocache Differential Revision: https://reviews.llvm.org/D26358 llvm-svn: 288255
2016-12-01 01:32:20 +08:00
uptr __sanitizer_get_allocated_size(const void *Ptr) {
return Instance.getUsableSize(Ptr);
}