[sanitizer] Initial implementation of a Hardened Allocator
Summary:
This is an initial implementation of a Hardened Allocator based on Sanitizer Common's CombinedAllocator.
It aims at mitigating heap based vulnerabilities by adding several features to the base allocator, while staying relatively fast.
The following were implemented:
- additional consistency checks on the allocation function parameters and on the heap chunks;
- use of checksum protected chunk header, to detect corruption;
- randomness to the allocator base;
- delayed freelist (quarantine), to mitigate use after free and overall determinism.
Additional mitigations are in the works.
Reviewers: eugenis, aizatsky, pcc, krasin, vitalybuka, glider, dvyukov, kcc
Subscribers: kubabrecka, filcab, llvm-commits
Differential Revision: http://reviews.llvm.org/D20084
llvm-svn: 271968
2016-06-07 09:20:26 +08:00
|
|
|
//===-- scudo_utils.h -------------------------------------------*- C++ -*-===//
|
|
|
|
//
|
|
|
|
// The LLVM Compiler Infrastructure
|
|
|
|
//
|
|
|
|
// This file is distributed under the University of Illinois Open Source
|
|
|
|
// License. See LICENSE.TXT for details.
|
|
|
|
//
|
|
|
|
//===----------------------------------------------------------------------===//
|
|
|
|
///
|
|
|
|
/// Header for scudo_utils.cpp.
|
|
|
|
///
|
|
|
|
//===----------------------------------------------------------------------===//
|
|
|
|
|
|
|
|
#ifndef SCUDO_UTILS_H_
|
|
|
|
#define SCUDO_UTILS_H_
|
|
|
|
|
|
|
|
#include <string.h>
|
|
|
|
|
|
|
|
#include "sanitizer_common/sanitizer_common.h"
|
|
|
|
|
|
|
|
namespace __scudo {
|
|
|
|
|
|
|
|
template <class Dest, class Source>
|
|
|
|
inline Dest bit_cast(const Source& source) {
|
|
|
|
static_assert(sizeof(Dest) == sizeof(Source), "Sizes are not equal!");
|
|
|
|
Dest dest;
|
|
|
|
memcpy(&dest, &source, sizeof(dest));
|
|
|
|
return dest;
|
|
|
|
}
|
|
|
|
|
2016-08-03 07:23:13 +08:00
|
|
|
void NORETURN dieWithMessage(const char *Format, ...);
|
[sanitizer] Initial implementation of a Hardened Allocator
Summary:
This is an initial implementation of a Hardened Allocator based on Sanitizer Common's CombinedAllocator.
It aims at mitigating heap based vulnerabilities by adding several features to the base allocator, while staying relatively fast.
The following were implemented:
- additional consistency checks on the allocation function parameters and on the heap chunks;
- use of checksum protected chunk header, to detect corruption;
- randomness to the allocator base;
- delayed freelist (quarantine), to mitigate use after free and overall determinism.
Additional mitigations are in the works.
Reviewers: eugenis, aizatsky, pcc, krasin, vitalybuka, glider, dvyukov, kcc
Subscribers: kubabrecka, filcab, llvm-commits
Differential Revision: http://reviews.llvm.org/D20084
llvm-svn: 271968
2016-06-07 09:20:26 +08:00
|
|
|
|
[scudo] 32-bit and hardware agnostic support
Summary:
This update introduces i386 support for the Scudo Hardened Allocator, and
offers software alternatives for functions that used to require hardware
specific instruction sets. This should make porting to new architectures
easier.
Among the changes:
- The chunk header has been changed to accomodate the size limitations
encountered on 32-bit architectures. We now fit everything in 64-bit. This
was achieved by storing the amount of unused bytes in an allocation rather
than the size itself, as one can be deduced from the other with the help
of the GetActuallyAllocatedSize function. As it turns out, this header can
be used for both 64 and 32 bit, and as such we dropped the requirement for
the 128-bit compare and exchange instruction support (cmpxchg16b).
- Add 32-bit support for the checksum and the PRNG functions: if the SSE 4.2
instruction set is supported, use the 32-bit CRC32 instruction, and in the
XorShift128, use a 32-bit based state instead of 64-bit.
- Add software support for CRC32: if SSE 4.2 is not supported, fallback on a
software implementation.
- Modify tests that were not 32-bit compliant, and expand them to cover more
allocation and alignment sizes. The random shuffle test has been deactivated
for linux-i386 & linux-i686 as the 32-bit sanitizer allocator doesn't
currently randomize chunks.
Reviewers: alekseyshl, kcc
Subscribers: filcab, llvm-commits, tberghammer, danalbert, srhines, mgorny, modocache
Differential Revision: https://reviews.llvm.org/D26358
llvm-svn: 288255
2016-12-01 01:32:20 +08:00
|
|
|
enum CPUFeature {
|
|
|
|
CRC32CPUFeature = 0,
|
|
|
|
MaxCPUFeature,
|
[sanitizer] Initial implementation of a Hardened Allocator
Summary:
This is an initial implementation of a Hardened Allocator based on Sanitizer Common's CombinedAllocator.
It aims at mitigating heap based vulnerabilities by adding several features to the base allocator, while staying relatively fast.
The following were implemented:
- additional consistency checks on the allocation function parameters and on the heap chunks;
- use of checksum protected chunk header, to detect corruption;
- randomness to the allocator base;
- delayed freelist (quarantine), to mitigate use after free and overall determinism.
Additional mitigations are in the works.
Reviewers: eugenis, aizatsky, pcc, krasin, vitalybuka, glider, dvyukov, kcc
Subscribers: kubabrecka, filcab, llvm-commits
Differential Revision: http://reviews.llvm.org/D20084
llvm-svn: 271968
2016-06-07 09:20:26 +08:00
|
|
|
};
|
|
|
|
bool testCPUFeature(CPUFeature feature);
|
|
|
|
|
[scudo] PRNG makeover
Summary:
This follows the addition of `GetRandom` with D34412. We remove our
`/dev/urandom` code and use the new function. Additionally, change the PRNG for
a slightly faster version. One of the issues with the old code is that we have
64 full bits of randomness per "next", using only 8 of those for the Salt and
discarding the rest. So we add a cached u64 in the PRNG that can serve up to
8 u8 before having to call the "next" function again.
During some integration work, I also realized that some very early processes
(like `init`) do not benefit from `/dev/urandom` yet. So if there is no
`getrandom` syscall as well, we have to fallback to some sort of initialization
of the PRNG.
Now a few words on why XoRoShiRo and not something else. I have played a while
with various PRNGs on 32 & 64 bit platforms. Some results are below. LCG 32 & 64
are usually faster but produce respectively 15 & 31 bits of entropy, meaning
that to get a full 64-bit, you would need to call them several times. The simple
XorShift is fast, produces 32 bits but is mediocre with regard to PRNG test
suites, PCG is slower overall, and XoRoShiRo is faster than XorShift128+ and
produces full 64 bits.
%%%
root@tulip-chiphd:/data # ./randtest.arm
[+] starting xs32...
[?] xs32 duration: 22431833053ns
[+] starting lcg32...
[?] lcg32 duration: 14941402090ns
[+] starting pcg32...
[?] pcg32 duration: 44941973771ns
[+] starting xs128p...
[?] xs128p duration: 48889786981ns
[+] starting lcg64...
[?] lcg64 duration: 33831042391ns
[+] starting xos128p...
[?] xos128p duration: 44850878605ns
root@tulip-chiphd:/data # ./randtest.aarch64
[+] starting xs32...
[?] xs32 duration: 22425151678ns
[+] starting lcg32...
[?] lcg32 duration: 14954255257ns
[+] starting pcg32...
[?] pcg32 duration: 37346265726ns
[+] starting xs128p...
[?] xs128p duration: 22523807219ns
[+] starting lcg64...
[?] lcg64 duration: 26141304679ns
[+] starting xos128p...
[?] xos128p duration: 14937033215ns
%%%
Reviewers: alekseyshl
Reviewed By: alekseyshl
Subscribers: aemerson, kristof.beyls, llvm-commits
Differential Revision: https://reviews.llvm.org/D35221
llvm-svn: 307798
2017-07-12 23:29:08 +08:00
|
|
|
INLINE u64 rotl(const u64 X, int K) {
|
|
|
|
return (X << K) | (X >> (64 - K));
|
|
|
|
}
|
|
|
|
|
|
|
|
// XoRoShiRo128+ PRNG (http://xoroshiro.di.unimi.it/).
|
|
|
|
struct XoRoShiRo128Plus {
|
[sanitizer] Initial implementation of a Hardened Allocator
Summary:
This is an initial implementation of a Hardened Allocator based on Sanitizer Common's CombinedAllocator.
It aims at mitigating heap based vulnerabilities by adding several features to the base allocator, while staying relatively fast.
The following were implemented:
- additional consistency checks on the allocation function parameters and on the heap chunks;
- use of checksum protected chunk header, to detect corruption;
- randomness to the allocator base;
- delayed freelist (quarantine), to mitigate use after free and overall determinism.
Additional mitigations are in the works.
Reviewers: eugenis, aizatsky, pcc, krasin, vitalybuka, glider, dvyukov, kcc
Subscribers: kubabrecka, filcab, llvm-commits
Differential Revision: http://reviews.llvm.org/D20084
llvm-svn: 271968
2016-06-07 09:20:26 +08:00
|
|
|
public:
|
[scudo] PRNG makeover
Summary:
This follows the addition of `GetRandom` with D34412. We remove our
`/dev/urandom` code and use the new function. Additionally, change the PRNG for
a slightly faster version. One of the issues with the old code is that we have
64 full bits of randomness per "next", using only 8 of those for the Salt and
discarding the rest. So we add a cached u64 in the PRNG that can serve up to
8 u8 before having to call the "next" function again.
During some integration work, I also realized that some very early processes
(like `init`) do not benefit from `/dev/urandom` yet. So if there is no
`getrandom` syscall as well, we have to fallback to some sort of initialization
of the PRNG.
Now a few words on why XoRoShiRo and not something else. I have played a while
with various PRNGs on 32 & 64 bit platforms. Some results are below. LCG 32 & 64
are usually faster but produce respectively 15 & 31 bits of entropy, meaning
that to get a full 64-bit, you would need to call them several times. The simple
XorShift is fast, produces 32 bits but is mediocre with regard to PRNG test
suites, PCG is slower overall, and XoRoShiRo is faster than XorShift128+ and
produces full 64 bits.
%%%
root@tulip-chiphd:/data # ./randtest.arm
[+] starting xs32...
[?] xs32 duration: 22431833053ns
[+] starting lcg32...
[?] lcg32 duration: 14941402090ns
[+] starting pcg32...
[?] pcg32 duration: 44941973771ns
[+] starting xs128p...
[?] xs128p duration: 48889786981ns
[+] starting lcg64...
[?] lcg64 duration: 33831042391ns
[+] starting xos128p...
[?] xos128p duration: 44850878605ns
root@tulip-chiphd:/data # ./randtest.aarch64
[+] starting xs32...
[?] xs32 duration: 22425151678ns
[+] starting lcg32...
[?] lcg32 duration: 14954255257ns
[+] starting pcg32...
[?] pcg32 duration: 37346265726ns
[+] starting xs128p...
[?] xs128p duration: 22523807219ns
[+] starting lcg64...
[?] lcg64 duration: 26141304679ns
[+] starting xos128p...
[?] xos128p duration: 14937033215ns
%%%
Reviewers: alekseyshl
Reviewed By: alekseyshl
Subscribers: aemerson, kristof.beyls, llvm-commits
Differential Revision: https://reviews.llvm.org/D35221
llvm-svn: 307798
2017-07-12 23:29:08 +08:00
|
|
|
void init() {
|
2017-08-14 22:53:47 +08:00
|
|
|
if (UNLIKELY(!GetRandom(reinterpret_cast<void *>(State), sizeof(State),
|
|
|
|
/*blocking=*/false))) {
|
|
|
|
// On some platforms, early processes like `init` do not have an
|
|
|
|
// initialized random pool (getrandom blocks and /dev/urandom doesn't
|
|
|
|
// exist yet), but we still have to provide them with some degree of
|
|
|
|
// entropy. Not having a secure seed is not as problematic for them, as
|
|
|
|
// they are less likely to be the target of heap based vulnerabilities
|
|
|
|
// exploitation attempts.
|
[scudo] PRNG makeover
Summary:
This follows the addition of `GetRandom` with D34412. We remove our
`/dev/urandom` code and use the new function. Additionally, change the PRNG for
a slightly faster version. One of the issues with the old code is that we have
64 full bits of randomness per "next", using only 8 of those for the Salt and
discarding the rest. So we add a cached u64 in the PRNG that can serve up to
8 u8 before having to call the "next" function again.
During some integration work, I also realized that some very early processes
(like `init`) do not benefit from `/dev/urandom` yet. So if there is no
`getrandom` syscall as well, we have to fallback to some sort of initialization
of the PRNG.
Now a few words on why XoRoShiRo and not something else. I have played a while
with various PRNGs on 32 & 64 bit platforms. Some results are below. LCG 32 & 64
are usually faster but produce respectively 15 & 31 bits of entropy, meaning
that to get a full 64-bit, you would need to call them several times. The simple
XorShift is fast, produces 32 bits but is mediocre with regard to PRNG test
suites, PCG is slower overall, and XoRoShiRo is faster than XorShift128+ and
produces full 64 bits.
%%%
root@tulip-chiphd:/data # ./randtest.arm
[+] starting xs32...
[?] xs32 duration: 22431833053ns
[+] starting lcg32...
[?] lcg32 duration: 14941402090ns
[+] starting pcg32...
[?] pcg32 duration: 44941973771ns
[+] starting xs128p...
[?] xs128p duration: 48889786981ns
[+] starting lcg64...
[?] lcg64 duration: 33831042391ns
[+] starting xos128p...
[?] xos128p duration: 44850878605ns
root@tulip-chiphd:/data # ./randtest.aarch64
[+] starting xs32...
[?] xs32 duration: 22425151678ns
[+] starting lcg32...
[?] lcg32 duration: 14954255257ns
[+] starting pcg32...
[?] pcg32 duration: 37346265726ns
[+] starting xs128p...
[?] xs128p duration: 22523807219ns
[+] starting lcg64...
[?] lcg64 duration: 26141304679ns
[+] starting xos128p...
[?] xos128p duration: 14937033215ns
%%%
Reviewers: alekseyshl
Reviewed By: alekseyshl
Subscribers: aemerson, kristof.beyls, llvm-commits
Differential Revision: https://reviews.llvm.org/D35221
llvm-svn: 307798
2017-07-12 23:29:08 +08:00
|
|
|
State[0] = NanoTime();
|
|
|
|
State[1] = 0;
|
|
|
|
}
|
|
|
|
fillCache();
|
[sanitizer] Initial implementation of a Hardened Allocator
Summary:
This is an initial implementation of a Hardened Allocator based on Sanitizer Common's CombinedAllocator.
It aims at mitigating heap based vulnerabilities by adding several features to the base allocator, while staying relatively fast.
The following were implemented:
- additional consistency checks on the allocation function parameters and on the heap chunks;
- use of checksum protected chunk header, to detect corruption;
- randomness to the allocator base;
- delayed freelist (quarantine), to mitigate use after free and overall determinism.
Additional mitigations are in the works.
Reviewers: eugenis, aizatsky, pcc, krasin, vitalybuka, glider, dvyukov, kcc
Subscribers: kubabrecka, filcab, llvm-commits
Differential Revision: http://reviews.llvm.org/D20084
llvm-svn: 271968
2016-06-07 09:20:26 +08:00
|
|
|
}
|
[scudo] PRNG makeover
Summary:
This follows the addition of `GetRandom` with D34412. We remove our
`/dev/urandom` code and use the new function. Additionally, change the PRNG for
a slightly faster version. One of the issues with the old code is that we have
64 full bits of randomness per "next", using only 8 of those for the Salt and
discarding the rest. So we add a cached u64 in the PRNG that can serve up to
8 u8 before having to call the "next" function again.
During some integration work, I also realized that some very early processes
(like `init`) do not benefit from `/dev/urandom` yet. So if there is no
`getrandom` syscall as well, we have to fallback to some sort of initialization
of the PRNG.
Now a few words on why XoRoShiRo and not something else. I have played a while
with various PRNGs on 32 & 64 bit platforms. Some results are below. LCG 32 & 64
are usually faster but produce respectively 15 & 31 bits of entropy, meaning
that to get a full 64-bit, you would need to call them several times. The simple
XorShift is fast, produces 32 bits but is mediocre with regard to PRNG test
suites, PCG is slower overall, and XoRoShiRo is faster than XorShift128+ and
produces full 64 bits.
%%%
root@tulip-chiphd:/data # ./randtest.arm
[+] starting xs32...
[?] xs32 duration: 22431833053ns
[+] starting lcg32...
[?] lcg32 duration: 14941402090ns
[+] starting pcg32...
[?] pcg32 duration: 44941973771ns
[+] starting xs128p...
[?] xs128p duration: 48889786981ns
[+] starting lcg64...
[?] lcg64 duration: 33831042391ns
[+] starting xos128p...
[?] xos128p duration: 44850878605ns
root@tulip-chiphd:/data # ./randtest.aarch64
[+] starting xs32...
[?] xs32 duration: 22425151678ns
[+] starting lcg32...
[?] lcg32 duration: 14954255257ns
[+] starting pcg32...
[?] pcg32 duration: 37346265726ns
[+] starting xs128p...
[?] xs128p duration: 22523807219ns
[+] starting lcg64...
[?] lcg64 duration: 26141304679ns
[+] starting xos128p...
[?] xos128p duration: 14937033215ns
%%%
Reviewers: alekseyshl
Reviewed By: alekseyshl
Subscribers: aemerson, kristof.beyls, llvm-commits
Differential Revision: https://reviews.llvm.org/D35221
llvm-svn: 307798
2017-07-12 23:29:08 +08:00
|
|
|
u8 getU8() {
|
|
|
|
if (UNLIKELY(isCacheEmpty()))
|
|
|
|
fillCache();
|
|
|
|
const u8 Result = static_cast<u8>(CachedBytes & 0xff);
|
|
|
|
CachedBytes >>= 8;
|
|
|
|
CachedBytesAvailable--;
|
|
|
|
return Result;
|
|
|
|
}
|
|
|
|
u64 getU64() { return next(); }
|
|
|
|
|
[sanitizer] Initial implementation of a Hardened Allocator
Summary:
This is an initial implementation of a Hardened Allocator based on Sanitizer Common's CombinedAllocator.
It aims at mitigating heap based vulnerabilities by adding several features to the base allocator, while staying relatively fast.
The following were implemented:
- additional consistency checks on the allocation function parameters and on the heap chunks;
- use of checksum protected chunk header, to detect corruption;
- randomness to the allocator base;
- delayed freelist (quarantine), to mitigate use after free and overall determinism.
Additional mitigations are in the works.
Reviewers: eugenis, aizatsky, pcc, krasin, vitalybuka, glider, dvyukov, kcc
Subscribers: kubabrecka, filcab, llvm-commits
Differential Revision: http://reviews.llvm.org/D20084
llvm-svn: 271968
2016-06-07 09:20:26 +08:00
|
|
|
private:
|
[scudo] PRNG makeover
Summary:
This follows the addition of `GetRandom` with D34412. We remove our
`/dev/urandom` code and use the new function. Additionally, change the PRNG for
a slightly faster version. One of the issues with the old code is that we have
64 full bits of randomness per "next", using only 8 of those for the Salt and
discarding the rest. So we add a cached u64 in the PRNG that can serve up to
8 u8 before having to call the "next" function again.
During some integration work, I also realized that some very early processes
(like `init`) do not benefit from `/dev/urandom` yet. So if there is no
`getrandom` syscall as well, we have to fallback to some sort of initialization
of the PRNG.
Now a few words on why XoRoShiRo and not something else. I have played a while
with various PRNGs on 32 & 64 bit platforms. Some results are below. LCG 32 & 64
are usually faster but produce respectively 15 & 31 bits of entropy, meaning
that to get a full 64-bit, you would need to call them several times. The simple
XorShift is fast, produces 32 bits but is mediocre with regard to PRNG test
suites, PCG is slower overall, and XoRoShiRo is faster than XorShift128+ and
produces full 64 bits.
%%%
root@tulip-chiphd:/data # ./randtest.arm
[+] starting xs32...
[?] xs32 duration: 22431833053ns
[+] starting lcg32...
[?] lcg32 duration: 14941402090ns
[+] starting pcg32...
[?] pcg32 duration: 44941973771ns
[+] starting xs128p...
[?] xs128p duration: 48889786981ns
[+] starting lcg64...
[?] lcg64 duration: 33831042391ns
[+] starting xos128p...
[?] xos128p duration: 44850878605ns
root@tulip-chiphd:/data # ./randtest.aarch64
[+] starting xs32...
[?] xs32 duration: 22425151678ns
[+] starting lcg32...
[?] lcg32 duration: 14954255257ns
[+] starting pcg32...
[?] pcg32 duration: 37346265726ns
[+] starting xs128p...
[?] xs128p duration: 22523807219ns
[+] starting lcg64...
[?] lcg64 duration: 26141304679ns
[+] starting xos128p...
[?] xos128p duration: 14937033215ns
%%%
Reviewers: alekseyshl
Reviewed By: alekseyshl
Subscribers: aemerson, kristof.beyls, llvm-commits
Differential Revision: https://reviews.llvm.org/D35221
llvm-svn: 307798
2017-07-12 23:29:08 +08:00
|
|
|
u8 CachedBytesAvailable;
|
|
|
|
u64 CachedBytes;
|
[scudo] 32-bit and hardware agnostic support
Summary:
This update introduces i386 support for the Scudo Hardened Allocator, and
offers software alternatives for functions that used to require hardware
specific instruction sets. This should make porting to new architectures
easier.
Among the changes:
- The chunk header has been changed to accomodate the size limitations
encountered on 32-bit architectures. We now fit everything in 64-bit. This
was achieved by storing the amount of unused bytes in an allocation rather
than the size itself, as one can be deduced from the other with the help
of the GetActuallyAllocatedSize function. As it turns out, this header can
be used for both 64 and 32 bit, and as such we dropped the requirement for
the 128-bit compare and exchange instruction support (cmpxchg16b).
- Add 32-bit support for the checksum and the PRNG functions: if the SSE 4.2
instruction set is supported, use the 32-bit CRC32 instruction, and in the
XorShift128, use a 32-bit based state instead of 64-bit.
- Add software support for CRC32: if SSE 4.2 is not supported, fallback on a
software implementation.
- Modify tests that were not 32-bit compliant, and expand them to cover more
allocation and alignment sizes. The random shuffle test has been deactivated
for linux-i386 & linux-i686 as the 32-bit sanitizer allocator doesn't
currently randomize chunks.
Reviewers: alekseyshl, kcc
Subscribers: filcab, llvm-commits, tberghammer, danalbert, srhines, mgorny, modocache
Differential Revision: https://reviews.llvm.org/D26358
llvm-svn: 288255
2016-12-01 01:32:20 +08:00
|
|
|
u64 State[2];
|
[scudo] PRNG makeover
Summary:
This follows the addition of `GetRandom` with D34412. We remove our
`/dev/urandom` code and use the new function. Additionally, change the PRNG for
a slightly faster version. One of the issues with the old code is that we have
64 full bits of randomness per "next", using only 8 of those for the Salt and
discarding the rest. So we add a cached u64 in the PRNG that can serve up to
8 u8 before having to call the "next" function again.
During some integration work, I also realized that some very early processes
(like `init`) do not benefit from `/dev/urandom` yet. So if there is no
`getrandom` syscall as well, we have to fallback to some sort of initialization
of the PRNG.
Now a few words on why XoRoShiRo and not something else. I have played a while
with various PRNGs on 32 & 64 bit platforms. Some results are below. LCG 32 & 64
are usually faster but produce respectively 15 & 31 bits of entropy, meaning
that to get a full 64-bit, you would need to call them several times. The simple
XorShift is fast, produces 32 bits but is mediocre with regard to PRNG test
suites, PCG is slower overall, and XoRoShiRo is faster than XorShift128+ and
produces full 64 bits.
%%%
root@tulip-chiphd:/data # ./randtest.arm
[+] starting xs32...
[?] xs32 duration: 22431833053ns
[+] starting lcg32...
[?] lcg32 duration: 14941402090ns
[+] starting pcg32...
[?] pcg32 duration: 44941973771ns
[+] starting xs128p...
[?] xs128p duration: 48889786981ns
[+] starting lcg64...
[?] lcg64 duration: 33831042391ns
[+] starting xos128p...
[?] xos128p duration: 44850878605ns
root@tulip-chiphd:/data # ./randtest.aarch64
[+] starting xs32...
[?] xs32 duration: 22425151678ns
[+] starting lcg32...
[?] lcg32 duration: 14954255257ns
[+] starting pcg32...
[?] pcg32 duration: 37346265726ns
[+] starting xs128p...
[?] xs128p duration: 22523807219ns
[+] starting lcg64...
[?] lcg64 duration: 26141304679ns
[+] starting xos128p...
[?] xos128p duration: 14937033215ns
%%%
Reviewers: alekseyshl
Reviewed By: alekseyshl
Subscribers: aemerson, kristof.beyls, llvm-commits
Differential Revision: https://reviews.llvm.org/D35221
llvm-svn: 307798
2017-07-12 23:29:08 +08:00
|
|
|
u64 next() {
|
|
|
|
const u64 S0 = State[0];
|
|
|
|
u64 S1 = State[1];
|
|
|
|
const u64 Result = S0 + S1;
|
|
|
|
S1 ^= S0;
|
|
|
|
State[0] = rotl(S0, 55) ^ S1 ^ (S1 << 14);
|
|
|
|
State[1] = rotl(S1, 36);
|
|
|
|
return Result;
|
|
|
|
}
|
|
|
|
bool isCacheEmpty() {
|
|
|
|
return CachedBytesAvailable == 0;
|
|
|
|
}
|
|
|
|
void fillCache() {
|
|
|
|
CachedBytes = next();
|
|
|
|
CachedBytesAvailable = sizeof(CachedBytes);
|
|
|
|
}
|
[sanitizer] Initial implementation of a Hardened Allocator
Summary:
This is an initial implementation of a Hardened Allocator based on Sanitizer Common's CombinedAllocator.
It aims at mitigating heap based vulnerabilities by adding several features to the base allocator, while staying relatively fast.
The following were implemented:
- additional consistency checks on the allocation function parameters and on the heap chunks;
- use of checksum protected chunk header, to detect corruption;
- randomness to the allocator base;
- delayed freelist (quarantine), to mitigate use after free and overall determinism.
Additional mitigations are in the works.
Reviewers: eugenis, aizatsky, pcc, krasin, vitalybuka, glider, dvyukov, kcc
Subscribers: kubabrecka, filcab, llvm-commits
Differential Revision: http://reviews.llvm.org/D20084
llvm-svn: 271968
2016-06-07 09:20:26 +08:00
|
|
|
};
|
|
|
|
|
[scudo] PRNG makeover
Summary:
This follows the addition of `GetRandom` with D34412. We remove our
`/dev/urandom` code and use the new function. Additionally, change the PRNG for
a slightly faster version. One of the issues with the old code is that we have
64 full bits of randomness per "next", using only 8 of those for the Salt and
discarding the rest. So we add a cached u64 in the PRNG that can serve up to
8 u8 before having to call the "next" function again.
During some integration work, I also realized that some very early processes
(like `init`) do not benefit from `/dev/urandom` yet. So if there is no
`getrandom` syscall as well, we have to fallback to some sort of initialization
of the PRNG.
Now a few words on why XoRoShiRo and not something else. I have played a while
with various PRNGs on 32 & 64 bit platforms. Some results are below. LCG 32 & 64
are usually faster but produce respectively 15 & 31 bits of entropy, meaning
that to get a full 64-bit, you would need to call them several times. The simple
XorShift is fast, produces 32 bits but is mediocre with regard to PRNG test
suites, PCG is slower overall, and XoRoShiRo is faster than XorShift128+ and
produces full 64 bits.
%%%
root@tulip-chiphd:/data # ./randtest.arm
[+] starting xs32...
[?] xs32 duration: 22431833053ns
[+] starting lcg32...
[?] lcg32 duration: 14941402090ns
[+] starting pcg32...
[?] pcg32 duration: 44941973771ns
[+] starting xs128p...
[?] xs128p duration: 48889786981ns
[+] starting lcg64...
[?] lcg64 duration: 33831042391ns
[+] starting xos128p...
[?] xos128p duration: 44850878605ns
root@tulip-chiphd:/data # ./randtest.aarch64
[+] starting xs32...
[?] xs32 duration: 22425151678ns
[+] starting lcg32...
[?] lcg32 duration: 14954255257ns
[+] starting pcg32...
[?] pcg32 duration: 37346265726ns
[+] starting xs128p...
[?] xs128p duration: 22523807219ns
[+] starting lcg64...
[?] lcg64 duration: 26141304679ns
[+] starting xos128p...
[?] xos128p duration: 14937033215ns
%%%
Reviewers: alekseyshl
Reviewed By: alekseyshl
Subscribers: aemerson, kristof.beyls, llvm-commits
Differential Revision: https://reviews.llvm.org/D35221
llvm-svn: 307798
2017-07-12 23:29:08 +08:00
|
|
|
typedef XoRoShiRo128Plus ScudoPrng;
|
|
|
|
|
[scudo] 32-bit and hardware agnostic support
Summary:
This update introduces i386 support for the Scudo Hardened Allocator, and
offers software alternatives for functions that used to require hardware
specific instruction sets. This should make porting to new architectures
easier.
Among the changes:
- The chunk header has been changed to accomodate the size limitations
encountered on 32-bit architectures. We now fit everything in 64-bit. This
was achieved by storing the amount of unused bytes in an allocation rather
than the size itself, as one can be deduced from the other with the help
of the GetActuallyAllocatedSize function. As it turns out, this header can
be used for both 64 and 32 bit, and as such we dropped the requirement for
the 128-bit compare and exchange instruction support (cmpxchg16b).
- Add 32-bit support for the checksum and the PRNG functions: if the SSE 4.2
instruction set is supported, use the 32-bit CRC32 instruction, and in the
XorShift128, use a 32-bit based state instead of 64-bit.
- Add software support for CRC32: if SSE 4.2 is not supported, fallback on a
software implementation.
- Modify tests that were not 32-bit compliant, and expand them to cover more
allocation and alignment sizes. The random shuffle test has been deactivated
for linux-i386 & linux-i686 as the 32-bit sanitizer allocator doesn't
currently randomize chunks.
Reviewers: alekseyshl, kcc
Subscribers: filcab, llvm-commits, tberghammer, danalbert, srhines, mgorny, modocache
Differential Revision: https://reviews.llvm.org/D26358
llvm-svn: 288255
2016-12-01 01:32:20 +08:00
|
|
|
} // namespace __scudo
|
[sanitizer] Initial implementation of a Hardened Allocator
Summary:
This is an initial implementation of a Hardened Allocator based on Sanitizer Common's CombinedAllocator.
It aims at mitigating heap based vulnerabilities by adding several features to the base allocator, while staying relatively fast.
The following were implemented:
- additional consistency checks on the allocation function parameters and on the heap chunks;
- use of checksum protected chunk header, to detect corruption;
- randomness to the allocator base;
- delayed freelist (quarantine), to mitigate use after free and overall determinism.
Additional mitigations are in the works.
Reviewers: eugenis, aizatsky, pcc, krasin, vitalybuka, glider, dvyukov, kcc
Subscribers: kubabrecka, filcab, llvm-commits
Differential Revision: http://reviews.llvm.org/D20084
llvm-svn: 271968
2016-06-07 09:20:26 +08:00
|
|
|
|
|
|
|
#endif // SCUDO_UTILS_H_
|