From a code size perspective it turns out to be better to use a
callee-saved register to pass the shadow base. For non-leaf functions
it avoids the need to reload the shadow base into x9 after each
function call, at the cost of an additional stack slot to save the
caller's x20. But with x9 there is also a stack size cost, either
as a result of copying x9 to a callee-saved register across calls or
by spilling it to stack, so for the non-leaf functions the change to
stack usage is largely neutral.
It is also code size (and stack size) neutral for many leaf functions.
Although they now need to save/restore x20 this can typically be
combined via LDP/STP into the x30 save/restore. In the case where
the function needs callee-saved registers or stack spills we end up
needing, on average, 8 more bytes of stack and 1 more instruction
but given the improvements to other functions this seems like the
right tradeoff.
Unfortunately we cannot change the register for the v1 (non short
granules) check because the runtime assumes that the shadow base
register is stored in x9, so the v1 check still uses x9.
Aside from that there is no change to the ABI because the choice
of shadow base register is a contract between the caller and the
outlined check function, both of which are compiler generated. We do
need to rename the v2 check functions though because the functions
are deduplicated based on their names, not on their contents, and we
need to make sure that when object files from old and new compilers
are linked together we don't end up with a function that uses x9
calling an outlined check that uses x20 or vice versa.
With this change code size of /system/lib64/*.so in an Android build
with HWASan goes from 200066976 bytes to 194085912 bytes, or a 3%
decrease.
Differential Revision: https://reviews.llvm.org/D90422
CallInst::updateProfWeight() creates branch_weights with i64 instead of i32.
To be more consistent everywhere and remove lots of casts from uint64_t
to uint32_t, use i64 for branch_weights.
Reviewed By: davidxl
Differential Revision: https://reviews.llvm.org/D88609
On aarch64 with kernel 4.12.13 the test sporadically fails with
RSS at start: 1564, after mmap: 103964, after mmap+set label: 308768, \
after fixed map: 206368, after another mmap+set label: 308768, after \
munmap: 206368
release_shadow_space.c.tmp: [...]/release_shadow_space.c:80: int \
main(int, char **): Assertion `after_fixed_mmap <= before + delta' failed.
It seems on some executions the memory is not fully released, even
after munmap. And it also seems that ASLR is hurting it by adding
some fragmentation, by disabling it I could not reproduce the issue
in multiple runs.
I finally see why this test is failing (on now 2 bots). Somehow the path
name is getting messed up, and the "linux" converted to "1". I suspect
there is something in the environment causing the macro expansion in the
test to get messed up:
http://lab.llvm.org:8011/#/builders/112/builds/555/steps/5/logs/FAIL__MemProfiler-x86_64-linux__log_path_test_cpphttp://lab.llvm.org:8011/#/builders/37/builds/275/steps/31/logs/stdio
On the avr bot:
-DPROFILE_NAME_VAR="/home/buildbot/llvm-avr-linux/llvm-avr-linux/stage1/projects/compiler-rt/test/memprof/X86_64LinuxConfig/TestCases/Output/log_path_test.cpp.tmp.log2"
after macros expansions becomes:
/home/buildbot/llvm-avr-1/llvm-avr-1/stage1/projects/compiler-rt/test/memprof/X86_64LinuxConfig/TestCases/Output/log_path_test.cpp.tmp.log2
Similar (s/linux/1/) on the other bot.
Disable it while I investigate
After 81f7b96ed0, I can see that the
reason this test is failing on llvm-avr-linux is that it doesn't think
the directory exists (error comes during file open for write command).
Not sure why since this is the main test Output directory and we created
a different file there earlier in the test from the same file open
invocation. Print directory contents in an attempt to debug.
CallInst::updateProfWeight() creates branch_weights with i64 instead of i32.
To be more consistent everywhere and remove lots of casts from uint64_t
to uint32_t, use i64 for branch_weights.
Reviewed By: davidxl
Differential Revision: https://reviews.llvm.org/D88609
Disable the part of this test that started failing only on the
llvm-avr-linux bot after 5c20d7db9f.
Unfortunately, "XFAIL: avr" does not work. Still in the process of
trying to figure out how to debug.
-print_full_coverage=1 produces a detailed branch coverage dump when run on a single file.
Uses same infrastructure as -print_coverage flag, but prints all branches (regardless of coverage status) in an easy-to-parse format.
Usage: For internal use with machine learning fuzzing models which require detailed coverage information on seed files to generate mutations.
Differential Revision: https://reviews.llvm.org/D85928
Reverts the XFAIL added in b67a2aef8a,
which had no effect.
Adjust the test to make sure all output is dumped to stderr, so that
hopefully I can get a better idea of where/why this is failing.
Remove some redundant checking while here.
For unknown reasons, this test started failing only on the
llvm-avr-linux bot after 5c20d7db9f2791367b9311130eb44afecb16829c:
http://lab.llvm.org:8011/#/builders/112/builds/365
The error message is not helpful, and I have an email out to the bot
owner to help with debugging. XFAIL it on avr for now.
These compiler-rt tests should be UNSUPPORTED instead of XFAIL, which seems to be the real intent of the authors.
Reviewed By: vvereschaka
Differential Revision: https://reviews.llvm.org/D89840
This will allow the output directory to be specified by a build time
option, similar to the directory specified for regular PGO profiles via
-fprofile-generate=. The memory profiling instrumentation pass will
set up the variable. This is the same mechanism used by the PGO
instrumentation and runtime.
Depends on D87120 and D89629.
Differential Revision: https://reviews.llvm.org/D89086
The RISC-V implementations of the `__mulsi3`, `__muldi3` builtins were
conditionally compiling the actual function definitions depending on whether
the M extension was present or not. This caused Compiler-RT testing failures
for RISC-V targets with the M extension, as when these sources were included
the `librt_has_mul*i3` features were still being defined. These `librt_has_*`
definitions are used to conditionally run the respective tests. Since the
actual functions were not being compiled-in, the generic test for `__muldi3`
would fail. This patch makes these implementations follow the normal
Compiler-RT convention of always including the definition, and conditionally
running the respective tests by using the lit conditional
`REQUIRES: librt_has_*`.
Since the `mulsi3_test.c` wasn't actually RISC-V-specific, this patch also
moves it out of the `riscv` directory. It now only depends on
`librt_has_mulsi3` to run.
Differential Revision: https://reviews.llvm.org/D86457
Do not crash when AsanThread::GetStackVariableShadowStart does not find
a variable for a pointer on a shadow stack.
Differential Revision: https://reviews.llvm.org/D89552
It turned out that at dynamic shared library mode, the memory access
pattern can increase memory footprint significantly on OS when transparent
hugepages (THP) are enabled. This could cause >70x memory overhead than
running a static linked binary. For example, a static binary with RSS
overhead 300M can use > 23G RSS if it is built dynamically.
/proc/../smaps shows in 6204552 kB RSS 6141952 kB relates to
AnonHugePages.
Also such a high RSS happens in some rate: around 25% runs may use > 23G RSS, the
rest uses in between 6-23G. I guess this may relate to how user memory
is allocated and distributted across huge pages.
THP is a trade-off between time and space. We have a flag
no_huge_pages_for_shadow for sanitizer. It is true by default but DFSan
did not follow this. Depending on if a target is built statically or
dynamically, maybe Clang can set no_huge_pages_for_shadow accordingly
after this change. But it still seems fine to follow the default setting of
no_huge_pages_for_shadow. If time is an issue, and users are fine with
high RSS, this flag can be set to false selectively.
Adds some simple sanity checks that the support functions for the atomic
builtins do the right thing. This doesn't test concurrency and memory model
issues.
Differential Revision: https://reviews.llvm.org/D86278
See RFC for background:
http://lists.llvm.org/pipermail/llvm-dev/2020-June/142744.html
Follow on companion to the clang/llvm instrumentation support in D85948
and committed earlier.
This patch adds the compiler-rt runtime support for the memory
profiling.
Note that much of this support was cloned from asan (and then greatly
simplified and renamed). For example the interactions with the
sanitizer_common allocators, error handling, interception, etc.
The bulk of the memory profiling specific code can be found in the
MemInfoBlock, MemInfoBlockCache, and related classes defined and used
in memprof_allocator.cpp.
For now, the memory profile is dumped to text (stderr by default, but
honors the sanitizer_common log_path flag). It is dumped in either a
default verbose format, or an optional terse format.
This patch also adds a set of tests for the core functionality.
Differential Revision: https://reviews.llvm.org/D87120
Currently the 'emulator' value is fixed at build time. This patch allows changing the emulator
at testing time and enables us to run the tests on different board or simulators without needing
to run CMake again to change the value of emulator.
With this patch in place, the value of 'emulator' can be changed at test time from the command
line like this:
$ llvm-lit --param=emulator="..."
Reviewed By: delcypher
Differential Revision: https://reviews.llvm.org/D84708
ARM thumb/thumb2 frame pointer is inconsistent on GCC and Clang [1]
and fast-unwider is also unreliable when mixing arm and thumb code [2].
The fast unwinder on ARM tries to probe and compare the frame-pointer
at different stack layout positions and it works reliable only on
systems where all the libraries were built in arm mode (either with
gcc or clang) or with clang in thmb mode (which uses the same stack
frame pointer layout in arm and thumb).
However when mixing objects built with different abi modes the
fast unwinder is still problematic as shown by the failures on the
AddressSanitizer.ThreadStackReuseTest. For these failures, the
malloc is called by the loader itself and since it has been built
with a thum enabled gcc, the stack frame is not correctly obtained
and the suppression rule is not applied (resulting in a leak warning).
The check for fast-unwinder-works is also changed: instead of checking
f it is explicit enabled in the compiler flags, it now checks if
compiler defined thumb pre-processor.
This should fix BZ#44158.
[1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92172
[2] https://bugs.llvm.org/show_bug.cgi?id=44158
Reviewed By: eugenis
Differential Revision: https://reviews.llvm.org/D88958
Adds a check to avoid symbolization when printing stack traces if the
stack_trace_format flag does not need it. While there is a symbolize
flag that can be turned off to skip some of the symbolization,
SymbolizePC() still unconditionally looks up the module name and offset.
Avoid invoking SymbolizePC() at all if not needed.
This is an efficiency improvement when dumping all stack traces as part
of the memory profiler in D87120, for large stripped apps where we want
to symbolize as a post pass.
Differential Revision: https://reviews.llvm.org/D88361
After D88686, munmap uses MADV_DONTNEED to ensure zero-out before the
next access. Because the entire shadow space is created by MAP_PRIVATE
and MAP_ANONYMOUS, the first access is also on zero-filled values.
So it is fine to not zero-out data, but use madvise(MADV_DONTNEED) at
mmap. This reduces runtime
overhead.
Reviewed-by: morehouse
Differential Revision: https://reviews.llvm.org/D88755
[11/11] patch series to port ASAN for riscv64
These changes allow using ASAN on RISCV64 architecture.
The majority of existing tests are passing. With few exceptions (see below).
Tests we run on qemu and on "HiFive Unleashed" board.
Tests run:
```
Asan-riscv64-inline-Test - pass
Asan-riscv64-inline-Noinst-Test - pass
Asan-riscv64-calls-Noinst-Test - pass
Asan-riscv64-calls-Test - pass
```
Lit tests:
```
RISCV64LinuxConfig (282 supported, few failures)
RISCV64LinuxDynamicConfig (289 supported, few failures)
```
Lit failures:
```
TestCases/malloc_context_size.cpp - asan works, but backtrace misses some calls
TestCases/Linux/malloc_delete_mismatch.cpp - asan works, but backtrace misses some calls
TestCases/Linux/static_tls.cpp - "Can't guess glibc version" (under debugging)
TestCases/asan_and_llvm_coverage_test.cpp - missing libclang_rt.profile-riscv64.a
```
These failures are under debugging currently and shall be addressed in a
subsequent commits.
Depends On D87581
Reviewed By: eugenis, vitalybuka
Differential Revision: https://reviews.llvm.org/D87582
When an application does a lot of pairs of mmap and munmap, if we did
not release shadoe memory used by mmap addresses, this would increase
memory usage.
Reviewed-by: morehouse
Differential Revision: https://reviews.llvm.org/D88686
`Posix/no_asan_gen_globals.c` currently `FAIL`s on Solaris:
$ nm no_asan_gen_globals.c.tmp.exe | grep ___asan_gen_
0809696a r .L___asan_gen_.1
0809a4cd r .L___asan_gen_.2
080908e2 r .L___asan_gen_.4
0809a4cd r .L___asan_gen_.5
0809a529 r .L___asan_gen_.7
0809a4cd r .L___asan_gen_.8
As detailed in Bug 47607, there are two factors here:
- `clang` plays games by emitting some local labels into the symbol
table. When instead one uses `-fno-integrated-as` to have `gas` create
the object files, they don't land in the objects in the first place.
- Unlike GNU `ld`, the Solaris `ld` doesn't support support
`-X`/`--discard-locals` but instead relies on the assembler to follow its
specification and not emit local labels.
Therefore this patch `XFAIL`s the test on Solaris.
Tested on `amd64-pc-solaris2.11` and `x86_64-pc-linux-gnu`.
Differential Revision: https://reviews.llvm.org/D88218
`Posix/unpoison-alternate-stack.cpp` currently `FAIL`s on Solaris/i386.
Some of the problems are generic:
- `clang` warns compiling the testcase:
compiler-rt/test/asan/TestCases/Posix/unpoison-alternate-stack.cpp:83:7: warning: nested designators are a C99 extension [-Wc99-designator]
.sa_sigaction = signalHandler,
^~~~~~~~~~~~~
compiler-rt/test/asan/TestCases/Posix/unpoison-alternate-stack.cpp:84:7: warning: ISO C++ requires field designators to be specified in declaration order; field '_funcptr' will be initialized after field 'sa_flags' [-Wreorder-init-list]
.sa_flags = SA_SIGINFO | SA_NODEFER | SA_ONSTACK,
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
and some more instances. This can all easily be avoided by initializing
each field separately.
- The test `SEGV`s in `__asan_memcpy`. The default Solaris/i386 stack size
is only 4 kB, while `__asan_memcpy` tries to allocate either 5436
(32-bit) or 10688 bytes (64-bit) on the stack. This patch avoids this by
requiring at least 16 kB stack size.
- Even without `-fsanitize=address` I get an assertion failure:
Assertion failed: !isOnSignalStack(), file compiler-rt/test/asan/TestCases/Posix/unpoison-alternate-stack.cpp, line 117
The fundamental problem with this testcase is that `longjmp` from a
signal handler is highly unportable; XPG7 strongly warns against it and
it is thus unspecified which stack is used when `longjmp`ing from a
signal handler running on an alternative stack.
So I'm `XFAIL`ing this testcase on Solaris.
Tested on `amd64-pc-solaris2.11` and `x86_64-pc-linux-gnu`.
Differential Revision: https://reviews.llvm.org/D88501
This commit adds an interceptor for the pthread_detach function,
calling into ThreadRegistry::DetachThread, allowing for thread contexts
to be reused.
Without this change, programs may fail when they create more than 8K
threads.
Fixes: https://bugs.llvm.org/show_bug.cgi?id=47389
Reviewed By: vitalybuka
Differential Revision: https://reviews.llvm.org/D88184
Add support for expanding the %t filename specifier in LLVM_PROFILE_FILE
to the TMPDIR environment variable. This is supported on all platforms.
On Darwin, TMPDIR is used to specify a temporary application-specific
scratch directory. When testing apps on remote devices, it can be
challenging for the host device to determine the correct TMPDIR, so it's
helpful to have the runtime do this work.
rdar://68524185
Differential Revision: https://reviews.llvm.org/D87332