Supported ctime_r, fgets, getcwd, get_current_dir_name, gethostname,
getrlimit, getrusage, strcpy, time, inet_pton, localtime_r,
getpwuid_r, epoll_wait, poll, select, sched_getaffinity
Most of them work as calling their non-origin verision directly.
This is a part of https://reviews.llvm.org/D95835.
Reviewed By: morehouse
Differential Revision: https://reviews.llvm.org/D98966
Supported strrchr, strrstr, strto*, recvmmsg, recrmsg, nanosleep,
memchr, snprintf, socketpair, sprintf, getocketname, getsocketopt,
gettimeofday, getpeername.
strcpy was added because the test of sprintf need it. It will be
committed by D98966. Please ignore it when reviewing.
This is a part of https://reviews.llvm.org/D95835.
Reviewed By: gbalats
Differential Revision: https://reviews.llvm.org/D99109
This would cause linking errors after https://reviews.llvm.org/D97483
that introduced new prefixes for ABI wrappers with origin tracking mode.
We will renable this after the full origin tracking is checked in.
DFSan at store does store shadow data; store app data; and at load does
load shadow data; load app data.
When an application data is atomic, one overtainting case is
thread A: load shadow
thread B: store shadow
thread B: store app
thread A: load app
If the application address had been used by other flows, thread A reads
previous shadow, causing overtainting.
The change is similar to MSan's solution.
1) enforce ordering of app load/store
2) load shadow after load app; store shadow before shadow app
3) do not track atomic store by reseting its shadow to be 0.
The last one is to address a case like this.
Thread A: load app
Thread B: store shadow
Thread A: load shadow
Thread B: store app
This approach eliminates overtainting as a trade-off between undertainting
flows via shadow data race.
Note that this change addresses only native atomic instructions, but
does not support builtin libcalls yet.
https://llvm.org/docs/Atomics.html#libcalls-atomic
Reviewed-by: morehouse
Differential Revision: https://reviews.llvm.org/D97310
This is a part of https://reviews.llvm.org/D95835.
This change is to address two problems
1) When recording stacks in origin tracking, libunwind is not async signal safe. Inside signal callbacks, we need
to use fast unwind. Fast unwind needs threads
2) StackDepot used by origin tracking is not async signal safe, we set a flag per thread inside
a signal callback to prevent from using it.
The thread registration is similar to ASan and MSan.
Related MSan changes are
* 98f5ea0dba
* f653cda269
* 5a7c364343
Some changes in the diff are used in the next diffs
1) The test case pthread.c is not very interesting for now. It will be
extended to test origin tracking later.
2) DFsanThread::InSignalHandler will be used by origin tracking later.
Reviewed-by: morehouse
Differential Revision: https://reviews.llvm.org/D95963
DFSan uses TLS to pass metadata of arguments and return values. When an
instrumented function accesses the TLS, if a signal callback happens, and
the callback calls other instrumented functions with updating the same TLS,
the TLS is in an inconsistent state after the callback ends. This may cause
either under-tainting or over-tainting.
This fix follows MSan's workaround.
cb22c67a21
It simply resets TLS at restore. This prevents from over-tainting. Although
under-tainting may still happen, a taint flow can be found eventually if we
run a DFSan-instrumented program multiple times. The alternative option is
saving the entire TLS. However the TLS storage takes 2k bytes, and signal calls
could be nested. So it does not seem worth.
This diff fixes sigaction. A following diff will be fixing signal.
Reviewed-by: morehouse
Differential Revision: https://reviews.llvm.org/D95642
The wrapper clears shadow for addr and addrlen when written to.
Reviewed By: stephan.yichao.zhao
Differential Revision: https://reviews.llvm.org/D93046
The wrapper clears shadow for any bytes written to addr or addrlen.
Reviewed By: stephan.yichao.zhao
Differential Revision: https://reviews.llvm.org/D92964
The wrapper clears shadow for optval and optlen when written.
Reviewed By: stephan.yichao.zhao, vitalybuka
Differential Revision: https://reviews.llvm.org/D92961
*************
* The problem
*************
See motivation examples in compiler-rt/test/dfsan/pair.cpp. The current
DFSan always uses a 16bit shadow value for a variable with any type by
combining all shadow values of all bytes of the variable. So it cannot
distinguish two fields of a struct: each field's shadow value equals the
combined shadow value of all fields. This introduces an overtaint issue.
Consider a parsing function
std::pair<char*, int> get_token(char* p);
where p points to a buffer to parse, the returned pair includes the next
token and the pointer to the position in the buffer after the token.
If the token is tainted, then both the returned pointer and int ar
tainted. If the parser keeps on using get_token for the rest parsing,
all the following outputs are tainted because of the tainted pointer.
The CL is the first change to address the issue.
**************************
* The proposed improvement
**************************
Eventually all fields and indices have their own shadow values in
variables and memory.
For example, variables with type {i1, i3}, [2 x i1], {[2 x i4], i8},
[2 x {i1, i1}] have shadow values with type {i16, i16}, [2 x i16],
{[2 x i16], i16}, [2 x {i16, i16}] correspondingly; variables with
primary type still have shadow values i16.
***************************
* An potential implementation plan
***************************
The idea is to adopt the change incrementially.
1) This CL
Support field-level accuracy at variables/args/ret in TLS mode,
load/store/alloca still use combined shadow values.
After the alloca promotion and SSA construction phases (>=-O1), we
assume alloca and memory operations are reduced. So if struct
variables do not relate to memory, their tracking is accurate at
field level.
2) Support field-level accuracy at alloca
3) Support field-level accuracy at load/store
These two should make O0 and real memory access work.
4) Support vector if necessary.
5) Support Args mode if necessary.
6) Support passing more accurate shadow values via custom functions if
necessary.
***************
* About this CL.
***************
The CL did the following
1) extended TLS arg/ret to work with aggregate types. This is similar
to what MSan does.
2) implemented how to map between an original type/value/zero-const to
its shadow type/value/zero-const.
3) extended (insert|extract)value to use field/index-level progagation.
4) for other instructions, propagation rules are combining inputs by or.
The CL converts between aggragate and primary shadow values at the
cases.
5) Custom function interfaces also need such a conversion because
all existing custom functions use i16. It is unclear whether custome
functions need more accurate shadow propagation yet.
6) Added test cases for aggregate type related cases.
Reviewed-by: morehouse
Differential Revision: https://reviews.llvm.org/D92261
On aarch64 with kernel 4.12.13 the test sporadically fails with
RSS at start: 1564, after mmap: 103964, after mmap+set label: 308768, \
after fixed map: 206368, after another mmap+set label: 308768, after \
munmap: 206368
release_shadow_space.c.tmp: [...]/release_shadow_space.c:80: int \
main(int, char **): Assertion `after_fixed_mmap <= before + delta' failed.
It seems on some executions the memory is not fully released, even
after munmap. And it also seems that ASLR is hurting it by adding
some fragmentation, by disabling it I could not reproduce the issue
in multiple runs.
It turned out that at dynamic shared library mode, the memory access
pattern can increase memory footprint significantly on OS when transparent
hugepages (THP) are enabled. This could cause >70x memory overhead than
running a static linked binary. For example, a static binary with RSS
overhead 300M can use > 23G RSS if it is built dynamically.
/proc/../smaps shows in 6204552 kB RSS 6141952 kB relates to
AnonHugePages.
Also such a high RSS happens in some rate: around 25% runs may use > 23G RSS, the
rest uses in between 6-23G. I guess this may relate to how user memory
is allocated and distributted across huge pages.
THP is a trade-off between time and space. We have a flag
no_huge_pages_for_shadow for sanitizer. It is true by default but DFSan
did not follow this. Depending on if a target is built statically or
dynamically, maybe Clang can set no_huge_pages_for_shadow accordingly
after this change. But it still seems fine to follow the default setting of
no_huge_pages_for_shadow. If time is an issue, and users are fine with
high RSS, this flag can be set to false selectively.
After D88686, munmap uses MADV_DONTNEED to ensure zero-out before the
next access. Because the entire shadow space is created by MAP_PRIVATE
and MAP_ANONYMOUS, the first access is also on zero-filled values.
So it is fine to not zero-out data, but use madvise(MADV_DONTNEED) at
mmap. This reduces runtime
overhead.
Reviewed-by: morehouse
Differential Revision: https://reviews.llvm.org/D88755
When an application does a lot of pairs of mmap and munmap, if we did
not release shadoe memory used by mmap addresses, this would increase
memory usage.
Reviewed-by: morehouse
Differential Revision: https://reviews.llvm.org/D88686
InitializeInterceptors() calls dlsym(), which calls calloc(). Depending
on the allocator implementation, calloc() may invoke mmap(), which
results in a segfault since REAL(mmap) is still being resolved.
We fix this by doing a direct syscall if interceptors haven't been fully
resolved yet.
Reviewed By: vitalybuka
Differential Revision: https://reviews.llvm.org/D86168
While the instrumentation never calls dfsan_union in fast16labels mode,
the custom wrappers do. We detect fast16labels mode by checking whether
any labels have been created. If not, we must be using fast16labels
mode.
Reviewed By: vitalybuka
Differential Revision: https://reviews.llvm.org/D86012
Unmapping and remapping is dangerous since another thread could touch
the shadow memory while it is unmapped. But there is really no need to
unmap anyway, since mmap(MAP_FIXED) will happily clobber the existing
mapping with zeroes. This is thread-safe since the mmap() is done under
the same kernel lock as page faults are done.
Reviewed By: vitalybuka
Differential Revision: https://reviews.llvm.org/D85947
base and nptr_label were swapped, which meant we were passing nptr's
shadow as the base to the operation. Usually, the shadow is 0, which
causes strtoull to guess the correct base from the string prefix (e.g.,
0x means base-16 and 0 means base-8), hiding this bug. Adjust the test
case to expose the bug.
Reviewed By: vitalybuka
Differential Revision: https://reviews.llvm.org/D85935
Adds the -fast-16-labels flag, which enables efficient instrumentation
for DFSan when the user needs <=16 labels. The instrumentation
eliminates most branches and most calls to __dfsan_union or
__dfsan_union_load.
Reviewed By: vitalybuka
Differential Revision: https://reviews.llvm.org/D84371