Commit Graph

202 Commits

Author SHA1 Message Date
Jianzhou Zhao 1fb612d060 [dfsan] Add a DFSan allocator
This is a part of https://reviews.llvm.org/D101204

Reviewed By: morehouse

Differential Revision: https://reviews.llvm.org/D101666
2021-05-05 00:51:45 +00:00
Jianzhou Zhao 36cec26b38 [dfsan] move dfsan_flags.h to cc files
D101666 needs this change.

Reviewed By: morehouse

Differential Revision: https://reviews.llvm.org/D101857
2021-05-04 22:54:02 +00:00
Fangrui Song 2fec8860d8 [sanitizer] Set IndentPPDirectives: AfterHash in .clang-format
Code patterns like this are common, `#` at the line beginning
(https://google.github.io/styleguide/cppguide.html#Preprocessor_Directives),
one space indentation for if/elif/else directives.
```
#if SANITIZER_LINUX
# if defined(__aarch64__)
# endif
#endif
```

However, currently clang-format wants to reformat the code to
```
#if SANITIZER_LINUX
#if defined(__aarch64__)
#endif
#endif
```

This significantly harms readability in my review.  Use `IndentPPDirectives:
AfterHash` to defeat the diagnostic. clang-format will now suggest:

```
#if SANITIZER_LINUX
#  if defined(__aarch64__)
#  endif
#endif
```

Unfortunately there is no clang-format option using indent with 1 for
just preprocessor directives. However, this is still one step forward
from the current behavior.

Reviewed By: #sanitizers, vitalybuka

Differential Revision: https://reviews.llvm.org/D100238
2021-05-03 13:49:41 -07:00
Jianzhou Zhao 7fdf270965 [dfsan] Track origin at loads
The first version of origin tracking tracks only memory stores. Although
    this is sufficient for understanding correct flows, it is hard to figure
    out where an undefined value is read from. To find reading undefined values,
    we still have to do a reverse binary search from the last store in the chain
    with printing and logging at possible code paths. This is
    quite inefficient.

    Tracking memory load instructions can help this case. The main issues of
    tracking loads are performance and code size overheads.

    With tracking only stores, the code size overhead is 38%,
    memory overhead is 1x, and cpu overhead is 3x. In practice #load is much
    larger than #store, so both code size and cpu overhead increases. The
    first blocker is code size overhead: link fails if we inline tracking
    loads. The workaround is using external function calls to propagate
    metadata. This is also the workaround ASan uses. The cpu overhead
    is ~10x. This is a trade off between debuggability and performance,
    and will be used only when debugging cases that tracking only stores
    is not enough.

Reviewed By: gbalats

Differential Revision: https://reviews.llvm.org/D100967
2021-04-22 16:25:24 +00:00
Jianzhou Zhao e4701471d6 [dfsan] Set sigemptyset's return label to be 0
This was not set from when the wrapper was introduced.

Reviewed By: gbalats

Differential Revision: https://reviews.llvm.org/D99678
2021-03-31 21:07:44 +00:00
Jianzhou Zhao aaab444179 [dfsan] Ignore dfsan origin wrappers when instrumenting code 2021-03-29 00:15:26 +00:00
Jianzhou Zhao 4950695eba [dfsan] Add Origin ABI Wrappers
Supported ctime_r, fgets, getcwd, get_current_dir_name, gethostname,
getrlimit, getrusage, strcpy, time, inet_pton, localtime_r,
getpwuid_r, epoll_wait, poll, select, sched_getaffinity

Most of them work as calling their non-origin verision directly.

This is a part of https://reviews.llvm.org/D95835.

Reviewed By: morehouse

Differential Revision: https://reviews.llvm.org/D98966
2021-03-24 18:22:03 +00:00
Jianzhou Zhao 91516925dd [dfsan] Add Origin ABI Wrappers
Supported strrchr, strrstr, strto*, recvmmsg, recrmsg, nanosleep,
    memchr, snprintf, socketpair, sprintf, getocketname, getsocketopt,
    gettimeofday, getpeername.

    strcpy was added because the test of sprintf need it. It will be
    committed by D98966. Please ignore it when reviewing.

    This is a part of https://reviews.llvm.org/D95835.

    Reviewed By: gbalats

    Differential Revision: https://reviews.llvm.org/D99109
2021-03-24 16:13:09 +00:00
Jianzhou Zhao 1fe042041c [dfsan] Add origin ABI wrappers
supported: dl_get_tls_static_info, calloc, clock_gettime,
dfsan_set_write_callback, dl_iterato_phdr, dlopen, memcpy,
memmove, memset, pread, read, strcat, strdup, strncpy

This is a part of https://reviews.llvm.org/D95835.

Reviewed By: morehouse

Differential Revision: https://reviews.llvm.org/D98790
2021-03-19 16:23:25 +00:00
Jianzhou Zhao ec5ed66cee [dfsan] Add origin ABI wrappers
supported: bcmp, fstat, memcmp, stat, strcasecmp, strchr, strcmp,
strncasecmp, strncp, strpbrk

This is a part of https://reviews.llvm.org/D95835.

Reviewed By: morehouse

Differential Revision: https://reviews.llvm.org/D98636
2021-03-17 02:22:35 +00:00
Jianzhou Zhao 9cf5220c5c [dfsan] Updated check_custom_wrappers.sh to dedup function names
The origin wrappers added by https://reviews.llvm.org/D98359 reuse
those __dfsw_ functions.
2021-03-15 19:12:08 +00:00
Jianzhou Zhao 57a532b3ac [dfsan] Do not check dfsan_get_origin by check_custom_wrappers.sh
It is implemented like dfsan_get_label, and does not any code
in dfsan_custome.cpp.
2021-03-15 18:55:34 +00:00
Jianzhou Zhao 4e67ae7b6b [dfsan] Add origin ABI wrappers for thread/signal/fork
This is a part of https://reviews.llvm.org/D95835.

See bb91e02efd about the similar issue of fork in MSan's origin tracking.

Reviewed By: morehouse

Differential Revision: https://reviews.llvm.org/D98359
2021-03-15 16:18:00 +00:00
Vitaly Buka 2fcd872d8a [dfsan] Remove dfsan_get_origin from done_abilist.txt
Followup for D95835
2021-03-05 17:59:39 -08:00
Jianzhou Zhao c20db7ea6a [dfsan] Add utils to get and print origin paths and some test cases
This is a part of https://reviews.llvm.org/D95835.

Reviewed By: morehouse, gbalats

Differential Revision: https://reviews.llvm.org/D97962
2021-03-06 00:11:35 +00:00
Vitaly Buka 657a58a571 [dfsan,NFC] Suppress cpplint warning 2021-03-04 20:42:18 -08:00
Jianzhou Zhao a47d435bc4 [dfsan] Propagate origins for callsites
This is a part of https://reviews.llvm.org/D95835.

Each customized function has two wrappers. The
first one dfsw is for the normal shadow propagation. The second one dfso is used
when origin tracking is on. It calls the first one, and does additional
origin propagation. Which one to use can be decided at instrumentation
time. This is to ensure minimal additional overhead when origin tracking
is off.

Reviewed-by: morehouse

Differential Revision: https://reviews.llvm.org/D97483
2021-02-26 19:12:03 +00:00
Jianzhou Zhao a05aa0dd5e [dfsan] Update memset and dfsan_(set|add)_label with origin tracking
This is a part of https://reviews.llvm.org/D95835.

Reviewed-by: morehouse

Differential Revision: https://reviews.llvm.org/D97302
2021-02-23 23:16:33 +00:00
Jianzhou Zhao 063a6fa87e [dfsan] Add origin tls/move/read APIs
This is a part of https://reviews.llvm.org/D95835.

Added
1) TLS storage
2) a weak global used to set by instrumented code
3) move origins

These APIs are similar to MSan's APIs
  https://github.com/llvm/llvm-project/blob/main/compiler-rt/lib/msan/msan_poisoning.cpp
We first improved MSan's by https://reviews.llvm.org/D94572 and https://reviews.llvm.org/D94552.
So the correctness has been verified by MSan.
After the DFSan instrument code is ready, we wil be adding more test
cases

4) read

To reduce origin tracking cost, some of the read APIs return only
the origin from the first taint data.

Note that we did not add origin set APIs here because they are related
to code instrumentation, will be added later with IR transformation
code.

Reviewed-by: morehouse

Differential Revision: https://reviews.llvm.org/D96564
2021-02-18 17:48:20 +00:00
Jianzhou Zhao a7538fee3a [dfsan] Comment out ChainOrigin temporarily
It was added by D96160, will be used by D96564.
Some OS got errors if it is not used.
Comment it out for the time being.
2021-02-12 18:13:24 +00:00
Jianzhou Zhao 7590c0078d [dfsan] Turn off THP at dfsan_flush
https://reviews.llvm.org/D89662 turned this off at dfsan_init.
dfsan_flush also needs to turn it off.
W/o this a program may get more and more memory usage after hours.

Reviewed-by: morehouse

Differential Revision: https://reviews.llvm.org/D96569
2021-02-12 17:10:09 +00:00
Jianzhou Zhao 083d45b21c [dfsan] Fix building OriginAddr at non-linux OS
Fix the broken build by D96545
2021-02-12 05:02:14 +00:00
Jianzhou Zhao 5ebbc5802f [dfsan] Introduce memory mapping for origin tracking
Reviewed-by: morehouse

Differential Revision: https://reviews.llvm.org/D96545
2021-02-11 22:33:16 +00:00
Jianzhou Zhao 2d9c6e10e9 [dfsan] Add origin chain utils
This is a part of https://reviews.llvm.org/D95835.

The design is based on MSan origin chains.

An 4-byte origin is a hash of an origin chain. An origin chain is a
pair of a stack hash id and a hash to its previous origin chain. 0 means
no previous origin chains exist. We limit the length of a chain to be
16. With origin_history_size = 0, the limit is removed.

The change does not have any test cases yet. The following change
will be adding test cases when the APIs are used.

Reviewed-by: morehouse

Differential Revision: https://reviews.llvm.org/D96160
2021-02-11 19:10:11 +00:00
Jianzhou Zhao 0f3fd3b281 [dfsan] Add thread registration
This is a part of https://reviews.llvm.org/D95835.

This change is to address two problems
1) When recording stacks in origin tracking, libunwind is not async signal safe. Inside signal callbacks, we need
to use fast unwind. Fast unwind needs threads
2) StackDepot used by origin tracking is not async signal safe, we set a flag per thread inside
a signal callback to prevent from using it.

The thread registration is similar to ASan and MSan.

Related MSan changes are
* 98f5ea0dba
* f653cda269
* 5a7c364343

Some changes in the diff are used in the next diffs
1) The test case pthread.c is not very interesting for now. It will be
  extended to test origin tracking later.
2) DFsanThread::InSignalHandler will be used by origin tracking later.

Reviewed-by: morehouse

Differential Revision: https://reviews.llvm.org/D95963
2021-02-05 17:38:59 +00:00
Jianzhou Zhao 15f26c5f51 [dfsan] Wrap strcat
Reviewed-by: morehouse

Differential Revision: https://reviews.llvm.org/D95923
2021-02-03 18:50:29 +00:00
Jianzhou Zhao 93afc3452c [dfsan] Clean TLS after signal callbacks
Similar to https://reviews.llvm.org/D95642, this diff fixes signal.

Reviewed-by: morehouse

Differential Revision: https://reviews.llvm.org/D95896
2021-02-03 17:21:28 +00:00
Jianzhou Zhao 3f568e1fbb [dfsan] Wrap memmove
Reviewed-by: morehouse

Differential Revision: https://reviews.llvm.org/D95883
2021-02-03 05:15:56 +00:00
Jianzhou Zhao e1a4322f81 [dfsan] Clean TLS after sigaction callbacks
DFSan uses TLS to pass metadata of arguments and return values. When an
instrumented function accesses the TLS, if a signal callback happens, and
the callback calls other instrumented functions with updating the same TLS,
the TLS is in an inconsistent state after the callback ends. This may cause
either under-tainting or over-tainting.

This fix follows MSan's workaround.
  cb22c67a21
It simply resets TLS at restore. This prevents from over-tainting. Although
under-tainting may still happen, a taint flow can be found eventually if we
run a DFSan-instrumented program multiple times. The alternative option is
saving the entire TLS. However the TLS storage takes 2k bytes, and signal calls
could be nested. So it does not seem worth.

This diff fixes sigaction. A following diff will be fixing signal.

Reviewed-by: morehouse

Differential Revision: https://reviews.llvm.org/D95642
2021-02-02 22:07:17 +00:00
Matt Morehouse 7bc7501ac1 [DFSan] Add custom wrapper for recvmmsg.
Uses the recvmsg wrapper logic in a loop.

Reviewed By: stephan.yichao.zhao

Differential Revision: https://reviews.llvm.org/D93059
2020-12-11 06:24:56 -08:00
Matt Morehouse 009931644a [DFSan] Add custom wrapper for pthread_join.
The wrapper clears shadow for retval.

Reviewed By: stephan.yichao.zhao

Differential Revision: https://reviews.llvm.org/D93047
2020-12-10 13:41:24 -08:00
Matt Morehouse fa4bd4b338 [DFSan] Add custom wrapper for getpeername.
The wrapper clears shadow for addr and addrlen when written to.

Reviewed By: stephan.yichao.zhao

Differential Revision: https://reviews.llvm.org/D93046
2020-12-10 12:26:06 -08:00
Matt Morehouse 72fd47b93d [DFSan] Add custom wrapper for _dl_get_tls_static_info.
Implementation is here:
https://code.woboq.org/userspace/glibc/elf/dl-tls.c.html#307

We use weak symbols to avoid linking issues with glibcs older than 2.27.

Reviewed By: stephan.yichao.zhao

Differential Revision: https://reviews.llvm.org/D93053
2020-12-10 11:03:28 -08:00
Matt Morehouse bdaeb82a5f [DFSan] Add custom wrapper for sigaltstack.
The wrapper clears shadow for old_ss.

Reviewed By: stephan.yichao.zhao

Differential Revision: https://reviews.llvm.org/D93041
2020-12-10 10:16:36 -08:00
Matt Morehouse 8a874a4277 [DFSan] Add custom wrapper for getsockname.
The wrapper clears shadow for any bytes written to addr or addrlen.

Reviewed By: stephan.yichao.zhao

Differential Revision: https://reviews.llvm.org/D92964
2020-12-10 08:13:05 -08:00
Matt Morehouse 4eedc2e3af [DFSan] Add custom wrapper for getsockopt.
The wrapper clears shadow for optval and optlen when written.

Reviewed By: stephan.yichao.zhao, vitalybuka

Differential Revision: https://reviews.llvm.org/D92961
2020-12-09 14:29:38 -08:00
Matt Morehouse a3eb2fb247 [DFSan] Add custom wrapper for recvmsg.
The wrapper clears shadow for anything written by recvmsg.

Reviewed By: stephan.yichao.zhao

Differential Revision: https://reviews.llvm.org/D92949
2020-12-09 13:07:51 -08:00
Matt Morehouse 6f13445fb6 [DFSan] Add custom wrapper for epoll_wait.
The wrapper clears shadow for any events written.

Reviewed By: stephan.yichao.zhao

Differential Revision: https://reviews.llvm.org/D92891
2020-12-09 06:05:29 -08:00
Matt Morehouse 483fb33360 [DFSan] Add pthread and other functions to ABI list.
The non-pthread functions are all clear discard functions.

Some of the pthread ones could clear shadow, but aren't worth writing
custom wrappers for.  I can't think of any reasonable scenario where we
would pass tainted memory to these pthread functions.

Reviewed By: stephan.yichao.zhao

Differential Revision: https://reviews.llvm.org/D92877
2020-12-08 13:55:35 -08:00
Matt Morehouse 3bd2ad5a08 [DFSan] Add several math functions to ABI list.
These are all straightforward functional entries.

Reviewed By: stephan.yichao.zhao

Differential Revision: https://reviews.llvm.org/D92791
2020-12-08 10:51:05 -08:00
Jianzhou Zhao 80e326a8c4 [dfsan] Support passing non-i16 shadow values in TLS mode
This is a child diff of D92261.

It extended TLS arg/ret to work with aggregate types.

For a function
  t foo(t1 a1, t2 a2, ... tn an)
Its arguments shadow are saved in TLS args like
  a1_s, a2_s, ..., an_s
TLS ret simply includes r_s. By calculating the type size of each shadow
value, we can get their offset.

This is similar to what MSan does. See __msan_retval_tls and __msan_param_tls
from llvm/lib/Transforms/Instrumentation/MemorySanitizer.cpp.

Note that this change does not add test cases for overflowed TLS
arg/ret because this is hard to test w/o supporting aggregate shdow
types. We will be adding them after supporting that.

Reviewed-by: morehouse

Differential Revision: https://reviews.llvm.org/D92440
2020-12-04 02:45:07 +00:00
Jianzhou Zhao b4ac05d763 Replace the equivalent code by UnionTableAddr
UnionTableAddr is always inlined.

Reviewed-by: morehouse

Differential Revision: https://reviews.llvm.org/DD91758
2020-11-19 20:15:25 +00:00
Jianzhou Zhao 3597fba4e5 Add a simple stack trace printer for DFSan
Reviewed-by: morehouse
Differential Revision: https://reviews.llvm.org/D91235
2020-11-11 19:00:59 +00:00
Jianzhou Zhao 91dc545bf2 Set Huge Page mode on shadow regions based on no_huge_pages_for_shadow
It turned out that at dynamic shared library mode, the memory access
pattern can increase memory footprint significantly on OS when transparent
hugepages (THP) are enabled. This could cause >70x memory overhead than
running a static linked binary. For example, a static binary with RSS
overhead 300M can use > 23G RSS if it is built dynamically.
/proc/../smaps shows in 6204552 kB RSS 6141952 kB relates to
AnonHugePages.

Also such a high RSS happens in some rate: around 25% runs may use > 23G RSS, the
rest uses in between 6-23G. I guess this may relate to how user memory
is allocated and distributted across huge pages.

THP is a trade-off between time and space. We have a flag
no_huge_pages_for_shadow for sanitizer. It is true by default but DFSan
did not follow this. Depending on if a target is built statically or
dynamically, maybe Clang can set no_huge_pages_for_shadow accordingly
after this change. But it still seems fine to follow the default setting of
no_huge_pages_for_shadow. If time is an issue, and users are fine with
high RSS, this flag can be set to false selectively.
2020-10-20 16:50:59 +00:00
Jianzhou Zhao cc07fbe37d Release pages to OS when setting 0 label
This is a follow up patch of https://reviews.llvm.org/D88755.

When set 0 label for an address range, we can release pages within the
corresponding shadow address range to OS, and set only addresses outside
the pages to be 0.

Reviewed-by: morehouse, eugenis
Differential Revision: https://reviews.llvm.org/D89199
2020-10-20 16:22:11 +00:00
Jianzhou Zhao 4d1d8ae710 Replace shadow space zero-out by madvise at mmap
After D88686, munmap uses MADV_DONTNEED to ensure zero-out before the
next access. Because the entire shadow space is created by MAP_PRIVATE
and MAP_ANONYMOUS, the first access is also on zero-filled values.

So it is fine to not zero-out data, but use madvise(MADV_DONTNEED) at
mmap. This reduces runtime
overhead.

Reviewed-by: morehouse

Differential Revision: https://reviews.llvm.org/D88755
2020-10-06 21:29:50 +00:00
Jianzhou Zhao 045a620c45 Release the shadow memory used by the mmap range at munmap
When an application does a lot of pairs of mmap and munmap, if we did
not release shadoe memory used by mmap addresses, this would increase
memory usage.

Reviewed-by: morehouse

Differential Revision: https://reviews.llvm.org/D88686
2020-10-02 20:17:22 +00:00
Matt Morehouse 23bab1eb43 [DFSan] Add strpbrk wrapper.
Reviewed By: vitalybuka

Differential Revision: https://reviews.llvm.org/D87849
2020-09-18 08:54:14 -07:00
Matt Morehouse 50dd545b00 [DFSan] Add bcmp wrapper.
Reviewed By: vitalybuka

Differential Revision: https://reviews.llvm.org/D87801
2020-09-17 09:23:49 -07:00
Matt Morehouse df017fd906 Revert "[DFSan] Add bcmp wrapper."
This reverts commit 559f919812 due to bot
failure.
2020-09-17 08:43:45 -07:00