llvm-project/compiler-rt/lib/dfsan/dfsan_interceptors.cpp

Ignoring revisions in .git-blame-ignore-revs. Click here to bypass and see the normal blame view.

246 lines
7.9 KiB
C++
Raw Normal View History

//===-- dfsan_interceptors.cpp --------------------------------------------===//
//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//
//===----------------------------------------------------------------------===//
//
// This file is a part of DataFlowSanitizer.
//
// Interceptors for standard library functions.
//===----------------------------------------------------------------------===//
#include <sys/syscall.h>
#include <unistd.h>
#include "dfsan/dfsan.h"
[dfsan] Use the sanitizer allocator to reduce memory cost dfsan does not use sanitizer allocator as others. In practice, we let it use glibc's allocator since tcmalloc needs more work to be working with dfsan well. With glibc, we observe large memory leakage. This could relate to two things: 1) glibc allocator has limitation: for example, tcmalloc can reduce memory footprint 2x easily 2) glibc may call unmmap directly as an internal system call by using system call number. so DFSan has no way to release shadow spaces for those unmmap. Using sanitizer allocator addresses the above issues 1) its memory management is close to tcmalloc 2) we can register callback when sanitizer allocator calls unmmap, so dfsan can release shadow spaces correctly. Our experiment with internal server-based application proved that with the change, in a-few-day run, memory usage leakage is close to what tcmalloc does w/o dfsan. This change mainly follows MSan's code. 1) define allocator callbacks at dfsan_allocator.h|cpp 2) mark allocator APIs to be discard 3) intercept allocator APIs 4) make dfsan_set_label consistent with MSan's SetShadow when setting 0 labels, define dfsan_release_meta_memory when unmap is called 5) add flags about whether zeroing memory after malloc/free. dfsan works at byte-level, so bit-level oparations can cause reading undefined shadow. See D96842. zeroing memory after malloc helps this. About zeroing after free, reading after free is definitely UB, but if user code does so, it is hard to debug an overtainting caused by this w/o running MSan. So we add the flag to help debugging. This change will be split to small changes for review. Before that, a question is "this code shares a lot of with MSan, for example, dfsan_allocator.* and dfsan_new_delete.*. Does it make sense to unify the code at sanitizer_common? will that introduce some maintenance issue?" Reviewed By: morehouse Differential Revision: https://reviews.llvm.org/D101204
2021-04-24 06:35:25 +08:00
#include "dfsan/dfsan_thread.h"
#include "interception/interception.h"
[dfsan] Use the sanitizer allocator to reduce memory cost dfsan does not use sanitizer allocator as others. In practice, we let it use glibc's allocator since tcmalloc needs more work to be working with dfsan well. With glibc, we observe large memory leakage. This could relate to two things: 1) glibc allocator has limitation: for example, tcmalloc can reduce memory footprint 2x easily 2) glibc may call unmmap directly as an internal system call by using system call number. so DFSan has no way to release shadow spaces for those unmmap. Using sanitizer allocator addresses the above issues 1) its memory management is close to tcmalloc 2) we can register callback when sanitizer allocator calls unmmap, so dfsan can release shadow spaces correctly. Our experiment with internal server-based application proved that with the change, in a-few-day run, memory usage leakage is close to what tcmalloc does w/o dfsan. This change mainly follows MSan's code. 1) define allocator callbacks at dfsan_allocator.h|cpp 2) mark allocator APIs to be discard 3) intercept allocator APIs 4) make dfsan_set_label consistent with MSan's SetShadow when setting 0 labels, define dfsan_release_meta_memory when unmap is called 5) add flags about whether zeroing memory after malloc/free. dfsan works at byte-level, so bit-level oparations can cause reading undefined shadow. See D96842. zeroing memory after malloc helps this. About zeroing after free, reading after free is definitely UB, but if user code does so, it is hard to debug an overtainting caused by this w/o running MSan. So we add the flag to help debugging. This change will be split to small changes for review. Before that, a question is "this code shares a lot of with MSan, for example, dfsan_allocator.* and dfsan_new_delete.*. Does it make sense to unify the code at sanitizer_common? will that introduce some maintenance issue?" Reviewed By: morehouse Differential Revision: https://reviews.llvm.org/D101204
2021-04-24 06:35:25 +08:00
#include "sanitizer_common/sanitizer_allocator_interface.h"
#include "sanitizer_common/sanitizer_common.h"
[dfsan] Use the sanitizer allocator to reduce memory cost dfsan does not use sanitizer allocator as others. In practice, we let it use glibc's allocator since tcmalloc needs more work to be working with dfsan well. With glibc, we observe large memory leakage. This could relate to two things: 1) glibc allocator has limitation: for example, tcmalloc can reduce memory footprint 2x easily 2) glibc may call unmmap directly as an internal system call by using system call number. so DFSan has no way to release shadow spaces for those unmmap. Using sanitizer allocator addresses the above issues 1) its memory management is close to tcmalloc 2) we can register callback when sanitizer allocator calls unmmap, so dfsan can release shadow spaces correctly. Our experiment with internal server-based application proved that with the change, in a-few-day run, memory usage leakage is close to what tcmalloc does w/o dfsan. This change mainly follows MSan's code. 1) define allocator callbacks at dfsan_allocator.h|cpp 2) mark allocator APIs to be discard 3) intercept allocator APIs 4) make dfsan_set_label consistent with MSan's SetShadow when setting 0 labels, define dfsan_release_meta_memory when unmap is called 5) add flags about whether zeroing memory after malloc/free. dfsan works at byte-level, so bit-level oparations can cause reading undefined shadow. See D96842. zeroing memory after malloc helps this. About zeroing after free, reading after free is definitely UB, but if user code does so, it is hard to debug an overtainting caused by this w/o running MSan. So we add the flag to help debugging. This change will be split to small changes for review. Before that, a question is "this code shares a lot of with MSan, for example, dfsan_allocator.* and dfsan_new_delete.*. Does it make sense to unify the code at sanitizer_common? will that introduce some maintenance issue?" Reviewed By: morehouse Differential Revision: https://reviews.llvm.org/D101204
2021-04-24 06:35:25 +08:00
#include "sanitizer_common/sanitizer_errno.h"
#include "sanitizer_common/sanitizer_platform_limits_posix.h"
#include "sanitizer_common/sanitizer_posix.h"
#include "sanitizer_common/sanitizer_tls_get_addr.h"
using namespace __sanitizer;
namespace {
bool interceptors_initialized;
} // namespace
[dfsan] Use the sanitizer allocator to reduce memory cost dfsan does not use sanitizer allocator as others. In practice, we let it use glibc's allocator since tcmalloc needs more work to be working with dfsan well. With glibc, we observe large memory leakage. This could relate to two things: 1) glibc allocator has limitation: for example, tcmalloc can reduce memory footprint 2x easily 2) glibc may call unmmap directly as an internal system call by using system call number. so DFSan has no way to release shadow spaces for those unmmap. Using sanitizer allocator addresses the above issues 1) its memory management is close to tcmalloc 2) we can register callback when sanitizer allocator calls unmmap, so dfsan can release shadow spaces correctly. Our experiment with internal server-based application proved that with the change, in a-few-day run, memory usage leakage is close to what tcmalloc does w/o dfsan. This change mainly follows MSan's code. 1) define allocator callbacks at dfsan_allocator.h|cpp 2) mark allocator APIs to be discard 3) intercept allocator APIs 4) make dfsan_set_label consistent with MSan's SetShadow when setting 0 labels, define dfsan_release_meta_memory when unmap is called 5) add flags about whether zeroing memory after malloc/free. dfsan works at byte-level, so bit-level oparations can cause reading undefined shadow. See D96842. zeroing memory after malloc helps this. About zeroing after free, reading after free is definitely UB, but if user code does so, it is hard to debug an overtainting caused by this w/o running MSan. So we add the flag to help debugging. This change will be split to small changes for review. Before that, a question is "this code shares a lot of with MSan, for example, dfsan_allocator.* and dfsan_new_delete.*. Does it make sense to unify the code at sanitizer_common? will that introduce some maintenance issue?" Reviewed By: morehouse Differential Revision: https://reviews.llvm.org/D101204
2021-04-24 06:35:25 +08:00
INTERCEPTOR(void *, reallocarray, void *ptr, SIZE_T nmemb, SIZE_T size) {
return __dfsan::dfsan_reallocarray(ptr, nmemb, size);
}
INTERCEPTOR(void *, __libc_memalign, SIZE_T alignment, SIZE_T size) {
void *ptr = __dfsan::dfsan_memalign(alignment, size);
if (ptr)
DTLS_on_libc_memalign(ptr, size);
return ptr;
}
INTERCEPTOR(void *, aligned_alloc, SIZE_T alignment, SIZE_T size) {
return __dfsan::dfsan_aligned_alloc(alignment, size);
}
static uptr allocated_for_dlsym;
static const uptr kDlsymAllocPoolSize = 1024;
static uptr alloc_memory_for_dlsym[kDlsymAllocPoolSize];
static bool IsInDlsymAllocPool(const void *ptr) {
uptr off = (uptr)ptr - (uptr)alloc_memory_for_dlsym;
return off < sizeof(alloc_memory_for_dlsym);
}
static void *AllocateFromLocalPool(uptr size_in_bytes) {
uptr size_in_words = RoundUpTo(size_in_bytes, kWordSize) / kWordSize;
void *mem = (void *)&alloc_memory_for_dlsym[allocated_for_dlsym];
allocated_for_dlsym += size_in_words;
CHECK_LT(allocated_for_dlsym, kDlsymAllocPoolSize);
return mem;
}
INTERCEPTOR(void *, calloc, SIZE_T nmemb, SIZE_T size) {
if (UNLIKELY(!__dfsan::dfsan_inited))
// Hack: dlsym calls calloc before REAL(calloc) is retrieved from dlsym.
return AllocateFromLocalPool(nmemb * size);
return __dfsan::dfsan_calloc(nmemb, size);
}
INTERCEPTOR(void *, realloc, void *ptr, SIZE_T size) {
if (UNLIKELY(IsInDlsymAllocPool(ptr))) {
uptr offset = (uptr)ptr - (uptr)alloc_memory_for_dlsym;
uptr copy_size = Min(size, kDlsymAllocPoolSize - offset);
void *new_ptr;
if (UNLIKELY(!__dfsan::dfsan_inited)) {
new_ptr = AllocateFromLocalPool(copy_size);
} else {
copy_size = size;
new_ptr = __dfsan::dfsan_malloc(copy_size);
}
internal_memcpy(new_ptr, ptr, copy_size);
return new_ptr;
}
return __dfsan::dfsan_realloc(ptr, size);
}
INTERCEPTOR(void *, malloc, SIZE_T size) {
if (UNLIKELY(!__dfsan::dfsan_inited))
// Hack: dlsym calls malloc before REAL(malloc) is retrieved from dlsym.
return AllocateFromLocalPool(size);
return __dfsan::dfsan_malloc(size);
}
INTERCEPTOR(void, free, void *ptr) {
if (!ptr || UNLIKELY(IsInDlsymAllocPool(ptr)))
return;
return __dfsan::dfsan_deallocate(ptr);
}
INTERCEPTOR(void, cfree, void *ptr) {
if (!ptr || UNLIKELY(IsInDlsymAllocPool(ptr)))
return;
return __dfsan::dfsan_deallocate(ptr);
}
[dfsan] Use the sanitizer allocator to reduce memory cost dfsan does not use sanitizer allocator as others. In practice, we let it use glibc's allocator since tcmalloc needs more work to be working with dfsan well. With glibc, we observe large memory leakage. This could relate to two things: 1) glibc allocator has limitation: for example, tcmalloc can reduce memory footprint 2x easily 2) glibc may call unmmap directly as an internal system call by using system call number. so DFSan has no way to release shadow spaces for those unmmap. Using sanitizer allocator addresses the above issues 1) its memory management is close to tcmalloc 2) we can register callback when sanitizer allocator calls unmmap, so dfsan can release shadow spaces correctly. Our experiment with internal server-based application proved that with the change, in a-few-day run, memory usage leakage is close to what tcmalloc does w/o dfsan. This change mainly follows MSan's code. 1) define allocator callbacks at dfsan_allocator.h|cpp 2) mark allocator APIs to be discard 3) intercept allocator APIs 4) make dfsan_set_label consistent with MSan's SetShadow when setting 0 labels, define dfsan_release_meta_memory when unmap is called 5) add flags about whether zeroing memory after malloc/free. dfsan works at byte-level, so bit-level oparations can cause reading undefined shadow. See D96842. zeroing memory after malloc helps this. About zeroing after free, reading after free is definitely UB, but if user code does so, it is hard to debug an overtainting caused by this w/o running MSan. So we add the flag to help debugging. This change will be split to small changes for review. Before that, a question is "this code shares a lot of with MSan, for example, dfsan_allocator.* and dfsan_new_delete.*. Does it make sense to unify the code at sanitizer_common? will that introduce some maintenance issue?" Reviewed By: morehouse Differential Revision: https://reviews.llvm.org/D101204
2021-04-24 06:35:25 +08:00
INTERCEPTOR(int, posix_memalign, void **memptr, SIZE_T alignment, SIZE_T size) {
CHECK_NE(memptr, 0);
int res = __dfsan::dfsan_posix_memalign(memptr, alignment, size);
if (!res)
dfsan_set_label(0, memptr, sizeof(*memptr));
return res;
}
INTERCEPTOR(void *, memalign, SIZE_T alignment, SIZE_T size) {
return __dfsan::dfsan_memalign(alignment, size);
}
INTERCEPTOR(void *, valloc, SIZE_T size) { return __dfsan::dfsan_valloc(size); }
INTERCEPTOR(void *, pvalloc, SIZE_T size) {
return __dfsan::dfsan_pvalloc(size);
}
INTERCEPTOR(void, mallinfo, __sanitizer_struct_mallinfo *sret) {
internal_memset(sret, 0, sizeof(*sret));
dfsan_set_label(0, sret, sizeof(*sret));
}
INTERCEPTOR(int, mallopt, int cmd, int value) { return 0; }
INTERCEPTOR(void, malloc_stats, void) {
// FIXME: implement, but don't call REAL(malloc_stats)!
}
INTERCEPTOR(uptr, malloc_usable_size, void *ptr) {
return __sanitizer_get_allocated_size(ptr);
}
[dfsan] Use the sanitizer allocator to reduce memory cost dfsan does not use sanitizer allocator as others. In practice, we let it use glibc's allocator since tcmalloc needs more work to be working with dfsan well. With glibc, we observe large memory leakage. This could relate to two things: 1) glibc allocator has limitation: for example, tcmalloc can reduce memory footprint 2x easily 2) glibc may call unmmap directly as an internal system call by using system call number. so DFSan has no way to release shadow spaces for those unmmap. Using sanitizer allocator addresses the above issues 1) its memory management is close to tcmalloc 2) we can register callback when sanitizer allocator calls unmmap, so dfsan can release shadow spaces correctly. Our experiment with internal server-based application proved that with the change, in a-few-day run, memory usage leakage is close to what tcmalloc does w/o dfsan. This change mainly follows MSan's code. 1) define allocator callbacks at dfsan_allocator.h|cpp 2) mark allocator APIs to be discard 3) intercept allocator APIs 4) make dfsan_set_label consistent with MSan's SetShadow when setting 0 labels, define dfsan_release_meta_memory when unmap is called 5) add flags about whether zeroing memory after malloc/free. dfsan works at byte-level, so bit-level oparations can cause reading undefined shadow. See D96842. zeroing memory after malloc helps this. About zeroing after free, reading after free is definitely UB, but if user code does so, it is hard to debug an overtainting caused by this w/o running MSan. So we add the flag to help debugging. This change will be split to small changes for review. Before that, a question is "this code shares a lot of with MSan, for example, dfsan_allocator.* and dfsan_new_delete.*. Does it make sense to unify the code at sanitizer_common? will that introduce some maintenance issue?" Reviewed By: morehouse Differential Revision: https://reviews.llvm.org/D101204
2021-04-24 06:35:25 +08:00
#define ENSURE_DFSAN_INITED() \
do { \
CHECK(!__dfsan::dfsan_init_is_running); \
if (!__dfsan::dfsan_inited) { \
__dfsan::dfsan_init(); \
} \
} while (0)
#define COMMON_INTERCEPTOR_ENTER(func, ...) \
if (__dfsan::dfsan_init_is_running) \
return REAL(func)(__VA_ARGS__); \
ENSURE_DFSAN_INITED(); \
dfsan_set_label(0, __errno_location(), sizeof(int));
[dfsan] Use the sanitizer allocator to reduce memory cost dfsan does not use sanitizer allocator as others. In practice, we let it use glibc's allocator since tcmalloc needs more work to be working with dfsan well. With glibc, we observe large memory leakage. This could relate to two things: 1) glibc allocator has limitation: for example, tcmalloc can reduce memory footprint 2x easily 2) glibc may call unmmap directly as an internal system call by using system call number. so DFSan has no way to release shadow spaces for those unmmap. Using sanitizer allocator addresses the above issues 1) its memory management is close to tcmalloc 2) we can register callback when sanitizer allocator calls unmmap, so dfsan can release shadow spaces correctly. Our experiment with internal server-based application proved that with the change, in a-few-day run, memory usage leakage is close to what tcmalloc does w/o dfsan. This change mainly follows MSan's code. 1) define allocator callbacks at dfsan_allocator.h|cpp 2) mark allocator APIs to be discard 3) intercept allocator APIs 4) make dfsan_set_label consistent with MSan's SetShadow when setting 0 labels, define dfsan_release_meta_memory when unmap is called 5) add flags about whether zeroing memory after malloc/free. dfsan works at byte-level, so bit-level oparations can cause reading undefined shadow. See D96842. zeroing memory after malloc helps this. About zeroing after free, reading after free is definitely UB, but if user code does so, it is hard to debug an overtainting caused by this w/o running MSan. So we add the flag to help debugging. This change will be split to small changes for review. Before that, a question is "this code shares a lot of with MSan, for example, dfsan_allocator.* and dfsan_new_delete.*. Does it make sense to unify the code at sanitizer_common? will that introduce some maintenance issue?" Reviewed By: morehouse Differential Revision: https://reviews.llvm.org/D101204
2021-04-24 06:35:25 +08:00
INTERCEPTOR(void *, mmap, void *addr, SIZE_T length, int prot, int flags,
int fd, OFF_T offset) {
if (common_flags()->detect_write_exec)
ReportMmapWriteExec(prot);
if (!__dfsan::dfsan_inited)
return (void *)internal_mmap(addr, length, prot, flags, fd, offset);
COMMON_INTERCEPTOR_ENTER(mmap, addr, length, prot, flags, fd, offset);
void *res = REAL(mmap)(addr, length, prot, flags, fd, offset);
if (res != (void *)-1) {
dfsan_set_label(0, res, RoundUpTo(length, GetPageSizeCached()));
[dfsan] Use the sanitizer allocator to reduce memory cost dfsan does not use sanitizer allocator as others. In practice, we let it use glibc's allocator since tcmalloc needs more work to be working with dfsan well. With glibc, we observe large memory leakage. This could relate to two things: 1) glibc allocator has limitation: for example, tcmalloc can reduce memory footprint 2x easily 2) glibc may call unmmap directly as an internal system call by using system call number. so DFSan has no way to release shadow spaces for those unmmap. Using sanitizer allocator addresses the above issues 1) its memory management is close to tcmalloc 2) we can register callback when sanitizer allocator calls unmmap, so dfsan can release shadow spaces correctly. Our experiment with internal server-based application proved that with the change, in a-few-day run, memory usage leakage is close to what tcmalloc does w/o dfsan. This change mainly follows MSan's code. 1) define allocator callbacks at dfsan_allocator.h|cpp 2) mark allocator APIs to be discard 3) intercept allocator APIs 4) make dfsan_set_label consistent with MSan's SetShadow when setting 0 labels, define dfsan_release_meta_memory when unmap is called 5) add flags about whether zeroing memory after malloc/free. dfsan works at byte-level, so bit-level oparations can cause reading undefined shadow. See D96842. zeroing memory after malloc helps this. About zeroing after free, reading after free is definitely UB, but if user code does so, it is hard to debug an overtainting caused by this w/o running MSan. So we add the flag to help debugging. This change will be split to small changes for review. Before that, a question is "this code shares a lot of with MSan, for example, dfsan_allocator.* and dfsan_new_delete.*. Does it make sense to unify the code at sanitizer_common? will that introduce some maintenance issue?" Reviewed By: morehouse Differential Revision: https://reviews.llvm.org/D101204
2021-04-24 06:35:25 +08:00
}
return res;
}
INTERCEPTOR(void *, mmap64, void *addr, SIZE_T length, int prot, int flags,
int fd, OFF64_T offset) {
[dfsan] Use the sanitizer allocator to reduce memory cost dfsan does not use sanitizer allocator as others. In practice, we let it use glibc's allocator since tcmalloc needs more work to be working with dfsan well. With glibc, we observe large memory leakage. This could relate to two things: 1) glibc allocator has limitation: for example, tcmalloc can reduce memory footprint 2x easily 2) glibc may call unmmap directly as an internal system call by using system call number. so DFSan has no way to release shadow spaces for those unmmap. Using sanitizer allocator addresses the above issues 1) its memory management is close to tcmalloc 2) we can register callback when sanitizer allocator calls unmmap, so dfsan can release shadow spaces correctly. Our experiment with internal server-based application proved that with the change, in a-few-day run, memory usage leakage is close to what tcmalloc does w/o dfsan. This change mainly follows MSan's code. 1) define allocator callbacks at dfsan_allocator.h|cpp 2) mark allocator APIs to be discard 3) intercept allocator APIs 4) make dfsan_set_label consistent with MSan's SetShadow when setting 0 labels, define dfsan_release_meta_memory when unmap is called 5) add flags about whether zeroing memory after malloc/free. dfsan works at byte-level, so bit-level oparations can cause reading undefined shadow. See D96842. zeroing memory after malloc helps this. About zeroing after free, reading after free is definitely UB, but if user code does so, it is hard to debug an overtainting caused by this w/o running MSan. So we add the flag to help debugging. This change will be split to small changes for review. Before that, a question is "this code shares a lot of with MSan, for example, dfsan_allocator.* and dfsan_new_delete.*. Does it make sense to unify the code at sanitizer_common? will that introduce some maintenance issue?" Reviewed By: morehouse Differential Revision: https://reviews.llvm.org/D101204
2021-04-24 06:35:25 +08:00
if (common_flags()->detect_write_exec)
ReportMmapWriteExec(prot);
if (!__dfsan::dfsan_inited)
return (void *)internal_mmap(addr, length, prot, flags, fd, offset);
COMMON_INTERCEPTOR_ENTER(mmap64, addr, length, prot, flags, fd, offset);
void *res = REAL(mmap64)(addr, length, prot, flags, fd, offset);
[dfsan] Use the sanitizer allocator to reduce memory cost dfsan does not use sanitizer allocator as others. In practice, we let it use glibc's allocator since tcmalloc needs more work to be working with dfsan well. With glibc, we observe large memory leakage. This could relate to two things: 1) glibc allocator has limitation: for example, tcmalloc can reduce memory footprint 2x easily 2) glibc may call unmmap directly as an internal system call by using system call number. so DFSan has no way to release shadow spaces for those unmmap. Using sanitizer allocator addresses the above issues 1) its memory management is close to tcmalloc 2) we can register callback when sanitizer allocator calls unmmap, so dfsan can release shadow spaces correctly. Our experiment with internal server-based application proved that with the change, in a-few-day run, memory usage leakage is close to what tcmalloc does w/o dfsan. This change mainly follows MSan's code. 1) define allocator callbacks at dfsan_allocator.h|cpp 2) mark allocator APIs to be discard 3) intercept allocator APIs 4) make dfsan_set_label consistent with MSan's SetShadow when setting 0 labels, define dfsan_release_meta_memory when unmap is called 5) add flags about whether zeroing memory after malloc/free. dfsan works at byte-level, so bit-level oparations can cause reading undefined shadow. See D96842. zeroing memory after malloc helps this. About zeroing after free, reading after free is definitely UB, but if user code does so, it is hard to debug an overtainting caused by this w/o running MSan. So we add the flag to help debugging. This change will be split to small changes for review. Before that, a question is "this code shares a lot of with MSan, for example, dfsan_allocator.* and dfsan_new_delete.*. Does it make sense to unify the code at sanitizer_common? will that introduce some maintenance issue?" Reviewed By: morehouse Differential Revision: https://reviews.llvm.org/D101204
2021-04-24 06:35:25 +08:00
if (res != (void *)-1) {
dfsan_set_label(0, res, RoundUpTo(length, GetPageSizeCached()));
[dfsan] Use the sanitizer allocator to reduce memory cost dfsan does not use sanitizer allocator as others. In practice, we let it use glibc's allocator since tcmalloc needs more work to be working with dfsan well. With glibc, we observe large memory leakage. This could relate to two things: 1) glibc allocator has limitation: for example, tcmalloc can reduce memory footprint 2x easily 2) glibc may call unmmap directly as an internal system call by using system call number. so DFSan has no way to release shadow spaces for those unmmap. Using sanitizer allocator addresses the above issues 1) its memory management is close to tcmalloc 2) we can register callback when sanitizer allocator calls unmmap, so dfsan can release shadow spaces correctly. Our experiment with internal server-based application proved that with the change, in a-few-day run, memory usage leakage is close to what tcmalloc does w/o dfsan. This change mainly follows MSan's code. 1) define allocator callbacks at dfsan_allocator.h|cpp 2) mark allocator APIs to be discard 3) intercept allocator APIs 4) make dfsan_set_label consistent with MSan's SetShadow when setting 0 labels, define dfsan_release_meta_memory when unmap is called 5) add flags about whether zeroing memory after malloc/free. dfsan works at byte-level, so bit-level oparations can cause reading undefined shadow. See D96842. zeroing memory after malloc helps this. About zeroing after free, reading after free is definitely UB, but if user code does so, it is hard to debug an overtainting caused by this w/o running MSan. So we add the flag to help debugging. This change will be split to small changes for review. Before that, a question is "this code shares a lot of with MSan, for example, dfsan_allocator.* and dfsan_new_delete.*. Does it make sense to unify the code at sanitizer_common? will that introduce some maintenance issue?" Reviewed By: morehouse Differential Revision: https://reviews.llvm.org/D101204
2021-04-24 06:35:25 +08:00
}
return res;
}
INTERCEPTOR(int, munmap, void *addr, SIZE_T length) {
[dfsan] Use the sanitizer allocator to reduce memory cost dfsan does not use sanitizer allocator as others. In practice, we let it use glibc's allocator since tcmalloc needs more work to be working with dfsan well. With glibc, we observe large memory leakage. This could relate to two things: 1) glibc allocator has limitation: for example, tcmalloc can reduce memory footprint 2x easily 2) glibc may call unmmap directly as an internal system call by using system call number. so DFSan has no way to release shadow spaces for those unmmap. Using sanitizer allocator addresses the above issues 1) its memory management is close to tcmalloc 2) we can register callback when sanitizer allocator calls unmmap, so dfsan can release shadow spaces correctly. Our experiment with internal server-based application proved that with the change, in a-few-day run, memory usage leakage is close to what tcmalloc does w/o dfsan. This change mainly follows MSan's code. 1) define allocator callbacks at dfsan_allocator.h|cpp 2) mark allocator APIs to be discard 3) intercept allocator APIs 4) make dfsan_set_label consistent with MSan's SetShadow when setting 0 labels, define dfsan_release_meta_memory when unmap is called 5) add flags about whether zeroing memory after malloc/free. dfsan works at byte-level, so bit-level oparations can cause reading undefined shadow. See D96842. zeroing memory after malloc helps this. About zeroing after free, reading after free is definitely UB, but if user code does so, it is hard to debug an overtainting caused by this w/o running MSan. So we add the flag to help debugging. This change will be split to small changes for review. Before that, a question is "this code shares a lot of with MSan, for example, dfsan_allocator.* and dfsan_new_delete.*. Does it make sense to unify the code at sanitizer_common? will that introduce some maintenance issue?" Reviewed By: morehouse Differential Revision: https://reviews.llvm.org/D101204
2021-04-24 06:35:25 +08:00
if (!__dfsan::dfsan_inited)
return internal_munmap(addr, length);
COMMON_INTERCEPTOR_ENTER(munmap, addr, length);
int res = REAL(munmap)(addr, length);
if (res != -1)
dfsan_set_label(0, addr, RoundUpTo(length, GetPageSizeCached()));
return res;
}
[dfsan] Use the sanitizer allocator to reduce memory cost dfsan does not use sanitizer allocator as others. In practice, we let it use glibc's allocator since tcmalloc needs more work to be working with dfsan well. With glibc, we observe large memory leakage. This could relate to two things: 1) glibc allocator has limitation: for example, tcmalloc can reduce memory footprint 2x easily 2) glibc may call unmmap directly as an internal system call by using system call number. so DFSan has no way to release shadow spaces for those unmmap. Using sanitizer allocator addresses the above issues 1) its memory management is close to tcmalloc 2) we can register callback when sanitizer allocator calls unmmap, so dfsan can release shadow spaces correctly. Our experiment with internal server-based application proved that with the change, in a-few-day run, memory usage leakage is close to what tcmalloc does w/o dfsan. This change mainly follows MSan's code. 1) define allocator callbacks at dfsan_allocator.h|cpp 2) mark allocator APIs to be discard 3) intercept allocator APIs 4) make dfsan_set_label consistent with MSan's SetShadow when setting 0 labels, define dfsan_release_meta_memory when unmap is called 5) add flags about whether zeroing memory after malloc/free. dfsan works at byte-level, so bit-level oparations can cause reading undefined shadow. See D96842. zeroing memory after malloc helps this. About zeroing after free, reading after free is definitely UB, but if user code does so, it is hard to debug an overtainting caused by this w/o running MSan. So we add the flag to help debugging. This change will be split to small changes for review. Before that, a question is "this code shares a lot of with MSan, for example, dfsan_allocator.* and dfsan_new_delete.*. Does it make sense to unify the code at sanitizer_common? will that introduce some maintenance issue?" Reviewed By: morehouse Differential Revision: https://reviews.llvm.org/D101204
2021-04-24 06:35:25 +08:00
#define COMMON_INTERCEPTOR_GET_TLS_RANGE(begin, end) \
if (__dfsan::DFsanThread *t = __dfsan::GetCurrentThread()) { \
*begin = t->tls_begin(); \
*end = t->tls_end(); \
} else { \
*begin = *end = 0; \
}
#define COMMON_INTERCEPTOR_INITIALIZE_RANGE(ptr, size) \
dfsan_set_label(0, ptr, size)
INTERCEPTOR(void *, __tls_get_addr, void *arg) {
COMMON_INTERCEPTOR_ENTER(__tls_get_addr, arg);
void *res = REAL(__tls_get_addr)(arg);
uptr tls_begin, tls_end;
COMMON_INTERCEPTOR_GET_TLS_RANGE(&tls_begin, &tls_end);
DTLS::DTV *dtv = DTLS_on_tls_get_addr(arg, res, tls_begin, tls_end);
if (dtv) {
// New DTLS block has been allocated.
COMMON_INTERCEPTOR_INITIALIZE_RANGE((void *)dtv->beg, dtv->size);
}
return res;
}
namespace __dfsan {
[dfsan] Use the sanitizer allocator to reduce memory cost dfsan does not use sanitizer allocator as others. In practice, we let it use glibc's allocator since tcmalloc needs more work to be working with dfsan well. With glibc, we observe large memory leakage. This could relate to two things: 1) glibc allocator has limitation: for example, tcmalloc can reduce memory footprint 2x easily 2) glibc may call unmmap directly as an internal system call by using system call number. so DFSan has no way to release shadow spaces for those unmmap. Using sanitizer allocator addresses the above issues 1) its memory management is close to tcmalloc 2) we can register callback when sanitizer allocator calls unmmap, so dfsan can release shadow spaces correctly. Our experiment with internal server-based application proved that with the change, in a-few-day run, memory usage leakage is close to what tcmalloc does w/o dfsan. This change mainly follows MSan's code. 1) define allocator callbacks at dfsan_allocator.h|cpp 2) mark allocator APIs to be discard 3) intercept allocator APIs 4) make dfsan_set_label consistent with MSan's SetShadow when setting 0 labels, define dfsan_release_meta_memory when unmap is called 5) add flags about whether zeroing memory after malloc/free. dfsan works at byte-level, so bit-level oparations can cause reading undefined shadow. See D96842. zeroing memory after malloc helps this. About zeroing after free, reading after free is definitely UB, but if user code does so, it is hard to debug an overtainting caused by this w/o running MSan. So we add the flag to help debugging. This change will be split to small changes for review. Before that, a question is "this code shares a lot of with MSan, for example, dfsan_allocator.* and dfsan_new_delete.*. Does it make sense to unify the code at sanitizer_common? will that introduce some maintenance issue?" Reviewed By: morehouse Differential Revision: https://reviews.llvm.org/D101204
2021-04-24 06:35:25 +08:00
void initialize_interceptors() {
CHECK(!interceptors_initialized);
[dfsan] Use the sanitizer allocator to reduce memory cost dfsan does not use sanitizer allocator as others. In practice, we let it use glibc's allocator since tcmalloc needs more work to be working with dfsan well. With glibc, we observe large memory leakage. This could relate to two things: 1) glibc allocator has limitation: for example, tcmalloc can reduce memory footprint 2x easily 2) glibc may call unmmap directly as an internal system call by using system call number. so DFSan has no way to release shadow spaces for those unmmap. Using sanitizer allocator addresses the above issues 1) its memory management is close to tcmalloc 2) we can register callback when sanitizer allocator calls unmmap, so dfsan can release shadow spaces correctly. Our experiment with internal server-based application proved that with the change, in a-few-day run, memory usage leakage is close to what tcmalloc does w/o dfsan. This change mainly follows MSan's code. 1) define allocator callbacks at dfsan_allocator.h|cpp 2) mark allocator APIs to be discard 3) intercept allocator APIs 4) make dfsan_set_label consistent with MSan's SetShadow when setting 0 labels, define dfsan_release_meta_memory when unmap is called 5) add flags about whether zeroing memory after malloc/free. dfsan works at byte-level, so bit-level oparations can cause reading undefined shadow. See D96842. zeroing memory after malloc helps this. About zeroing after free, reading after free is definitely UB, but if user code does so, it is hard to debug an overtainting caused by this w/o running MSan. So we add the flag to help debugging. This change will be split to small changes for review. Before that, a question is "this code shares a lot of with MSan, for example, dfsan_allocator.* and dfsan_new_delete.*. Does it make sense to unify the code at sanitizer_common? will that introduce some maintenance issue?" Reviewed By: morehouse Differential Revision: https://reviews.llvm.org/D101204
2021-04-24 06:35:25 +08:00
INTERCEPT_FUNCTION(aligned_alloc);
INTERCEPT_FUNCTION(calloc);
INTERCEPT_FUNCTION(cfree);
INTERCEPT_FUNCTION(free);
INTERCEPT_FUNCTION(mallinfo);
INTERCEPT_FUNCTION(malloc);
INTERCEPT_FUNCTION(malloc_stats);
INTERCEPT_FUNCTION(malloc_usable_size);
INTERCEPT_FUNCTION(mallopt);
INTERCEPT_FUNCTION(memalign);
INTERCEPT_FUNCTION(mmap);
INTERCEPT_FUNCTION(mmap64);
INTERCEPT_FUNCTION(munmap);
[dfsan] Use the sanitizer allocator to reduce memory cost dfsan does not use sanitizer allocator as others. In practice, we let it use glibc's allocator since tcmalloc needs more work to be working with dfsan well. With glibc, we observe large memory leakage. This could relate to two things: 1) glibc allocator has limitation: for example, tcmalloc can reduce memory footprint 2x easily 2) glibc may call unmmap directly as an internal system call by using system call number. so DFSan has no way to release shadow spaces for those unmmap. Using sanitizer allocator addresses the above issues 1) its memory management is close to tcmalloc 2) we can register callback when sanitizer allocator calls unmmap, so dfsan can release shadow spaces correctly. Our experiment with internal server-based application proved that with the change, in a-few-day run, memory usage leakage is close to what tcmalloc does w/o dfsan. This change mainly follows MSan's code. 1) define allocator callbacks at dfsan_allocator.h|cpp 2) mark allocator APIs to be discard 3) intercept allocator APIs 4) make dfsan_set_label consistent with MSan's SetShadow when setting 0 labels, define dfsan_release_meta_memory when unmap is called 5) add flags about whether zeroing memory after malloc/free. dfsan works at byte-level, so bit-level oparations can cause reading undefined shadow. See D96842. zeroing memory after malloc helps this. About zeroing after free, reading after free is definitely UB, but if user code does so, it is hard to debug an overtainting caused by this w/o running MSan. So we add the flag to help debugging. This change will be split to small changes for review. Before that, a question is "this code shares a lot of with MSan, for example, dfsan_allocator.* and dfsan_new_delete.*. Does it make sense to unify the code at sanitizer_common? will that introduce some maintenance issue?" Reviewed By: morehouse Differential Revision: https://reviews.llvm.org/D101204
2021-04-24 06:35:25 +08:00
INTERCEPT_FUNCTION(posix_memalign);
INTERCEPT_FUNCTION(pvalloc);
INTERCEPT_FUNCTION(realloc);
INTERCEPT_FUNCTION(reallocarray);
INTERCEPT_FUNCTION(valloc);
INTERCEPT_FUNCTION(__tls_get_addr);
INTERCEPT_FUNCTION(__libc_memalign);
interceptors_initialized = true;
}
} // namespace __dfsan