Summary:
Set proper errno code on alloction failures and change some
implementations to satisfy their man-specified requirements:
LSan: valloc and memalign
ASan: pvalloc, memalign and posix_memalign
Changing both allocators in one patch since LSan depends on ASan allocator in some configurations.
Reviewers: vitalybuka
Subscribers: kubamracek, llvm-commits
Differential Revision: https://reviews.llvm.org/D35440
llvm-svn: 308064
Set proper errno code on alloction failures and change valloc and
memalign implementations to satisfy their man-specified requirements.
llvm-svn: 308063
Summary:
Set proper errno code on alloction failure and change pvalloc and
posix_memalign implementation to satisfy their man-specified
requirements.
Reviewers: cryptoad
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D35429
llvm-svn: 308053
This change implements 2 optimizations of sync clocks that reduce memory consumption:
Use previously unused first level block space to store clock elements.
Currently a clock for 100 threads consumes 3 512-byte blocks:
2 64-bit second level blocks to store clock elements
+1 32-bit first level block to store indices to second level blocks
Only 8 bytes of the first level block are actually used.
With this change such clock consumes only 2 blocks.
Share similar clocks differing only by a single clock entry for the current thread.
When a thread does several release operations on fresh sync objects without intervening
acquire operations in between (e.g. initialization of several fields in ctor),
the resulting clocks differ only by a single entry for the current thread.
This change reuses a single clock for such release operations. The current thread time
(which is different for different clocks) is stored in dirty entries.
We are experiencing issues with a large program that eats all 64M clock blocks
(32GB of non-flushable memory) and crashes with dense allocator overflow.
Max number of threads in the program is ~170 which is currently quite unfortunate
(consume 4 blocks per clock). Currently it crashes after consuming 60+ GB of memory.
The first optimization brings clock block consumption down to ~40M and
allows the program to work. The second optimization further reduces block consumption
to "modest" 16M blocks (~8GB of RAM) and reduces overall RAM consumption to ~30GB.
Measurements on another real world C++ RPC benchmark show RSS reduction
from 3.491G to 3.186G and a modest speedup of ~5%.
Go parallel client/server HTTP benchmark:
https://github.com/golang/benchmarks/blob/master/http/http.go
shows RSS reduction from 320MB to 240MB and a few percent speedup.
Reviewed in https://reviews.llvm.org/D35323
llvm-svn: 308018
Summary:
libsanitizer doesn't build against latest glibc anymore, see https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81066 for details.
One of the changes is that stack_t changed from typedef struct sigaltstack { ... } stack_t; to typedef struct { ... } stack_t; for conformance reasons.
And the other change is that the glibc internal __need_res_state macro is now ignored, so when doing
```
#define __need_res_state
#include <resolv.h>
```
the effect is now the same as just
```
#include <resolv.h>
```
and thus one doesn't get just the
```
struct __res_state { ... };
```
definition, but newly also the
```
extern struct __res_state *__res_state(void) __attribute__ ((__const__));
```
prototype. So __res_state is no longer a type, but a function.
Reviewers: kcc, ygribov
Reviewed By: kcc
Subscribers: kubamracek
Differential Revision: https://reviews.llvm.org/D35246
llvm-svn: 307969
Summary:
Secondary backed allocations do not require a cache. While it's not necessary
an issue when each thread has its cache, it becomes one with a shared pool of
caches (Android), as a Secondary backed allocation or deallocation holds a
cache that could be useful to another thread doing a Primary backed allocation.
We introduce an additional PRNG and its mutex (to avoid contention with the
Fallback one for Primary allocations) that will provide the `Salt` needed for
Secondary backed allocations.
I changed some of the code in a way that feels more readable to me (eg: using
some values directly rather than going through ternary assigned variables,
using directly `true`/`false` rather than `FromPrimary`). I will let reviewers
decide if it actually is.
An additional change is to mark `CheckForCallocOverflow` as `UNLIKELY`.
Reviewers: alekseyshl
Reviewed By: alekseyshl
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D35358
llvm-svn: 307958
Summary:
We were missing many feature flags that newer gcc supports and we had our own set of feature flags that gcc didnt' support that were overlapping. Clang's implementation assumes gcc's features list so a mismatch here is problematic.
I've also matched the cpu type/subtype lists with gcc and removed all the cpus that gcc doesn't support. I've also removed the fallback autodetection logic that was taken from Host.cpp. It was the main reason we had extra feature flags relative to gcc. I don't think gcc does this in libgcc.
Once this support is in place we can consider implementing __builtin_cpu_is in clang. This could also be needed for function dispatching that Erich Keane is working on.
Reviewers: echristo, asbirlea, RKSimon, erichkeane, zvi
Reviewed By: asbirlea
Subscribers: dberris, llvm-commits
Differential Revision: https://reviews.llvm.org/D35214
llvm-svn: 307878
On iOS/AArch64, the address space is very limited and has a dynamic maximum address based on the configuration of the device. We're already using a dynamic shadow, and we find a large-enough "gap" in the VM where we place the shadow memory. In some cases and some device configuration, we might not be able to find a large-enough gap: E.g. if the main executable is linked against a large number of libraries that are not part of the system, these libraries can fragment the address space, and this happens before ASan starts initializing.
This patch has a solution, where we have a "backup plan" when we cannot find a large-enough gap: We will restrict the address space (via MmapFixedNoAccess) to a limit, for which the shadow limit will fit.
Differential Revision: https://reviews.llvm.org/D35098
llvm-svn: 307865
Add Fuchsia support to some builtings and avoid building builtins
that are not and will never be used on Fuchsia.
Differential Revision: https://reviews.llvm.org/D34075
llvm-svn: 307832
Summary:
This follows the addition of `GetRandom` with D34412. We remove our
`/dev/urandom` code and use the new function. Additionally, change the PRNG for
a slightly faster version. One of the issues with the old code is that we have
64 full bits of randomness per "next", using only 8 of those for the Salt and
discarding the rest. So we add a cached u64 in the PRNG that can serve up to
8 u8 before having to call the "next" function again.
During some integration work, I also realized that some very early processes
(like `init`) do not benefit from `/dev/urandom` yet. So if there is no
`getrandom` syscall as well, we have to fallback to some sort of initialization
of the PRNG.
Now a few words on why XoRoShiRo and not something else. I have played a while
with various PRNGs on 32 & 64 bit platforms. Some results are below. LCG 32 & 64
are usually faster but produce respectively 15 & 31 bits of entropy, meaning
that to get a full 64-bit, you would need to call them several times. The simple
XorShift is fast, produces 32 bits but is mediocre with regard to PRNG test
suites, PCG is slower overall, and XoRoShiRo is faster than XorShift128+ and
produces full 64 bits.
%%%
root@tulip-chiphd:/data # ./randtest.arm
[+] starting xs32...
[?] xs32 duration: 22431833053ns
[+] starting lcg32...
[?] lcg32 duration: 14941402090ns
[+] starting pcg32...
[?] pcg32 duration: 44941973771ns
[+] starting xs128p...
[?] xs128p duration: 48889786981ns
[+] starting lcg64...
[?] lcg64 duration: 33831042391ns
[+] starting xos128p...
[?] xos128p duration: 44850878605ns
root@tulip-chiphd:/data # ./randtest.aarch64
[+] starting xs32...
[?] xs32 duration: 22425151678ns
[+] starting lcg32...
[?] lcg32 duration: 14954255257ns
[+] starting pcg32...
[?] pcg32 duration: 37346265726ns
[+] starting xs128p...
[?] xs128p duration: 22523807219ns
[+] starting lcg64...
[?] lcg64 duration: 26141304679ns
[+] starting xos128p...
[?] xos128p duration: 14937033215ns
%%%
Reviewers: alekseyshl
Reviewed By: alekseyshl
Subscribers: aemerson, kristof.beyls, llvm-commits
Differential Revision: https://reviews.llvm.org/D35221
llvm-svn: 307798
1. Add SyncClock::ResetImpl which removes code
duplication between ctor and Reset.
2. Move SyncClock::Resize to SyncClock methods,
currently it's defined between ThreadClock methods.
llvm-svn: 307785
Don't create sync object if it does not exist yet. For example, an atomic
pointer is initialized to nullptr and then periodically acquire-loaded.
llvm-svn: 307778
The test should have been added in 289682
"tsan: allow Java VM iterate over allocated objects"
but I forgot to avn add.
Author: Alexander Smundak (asmundak)
Reviewed in https://reviews.llvm.org/D27720
llvm-svn: 307776
r307338 enabled new optimization reducing number of operation in tested functions.
There is no any performance regression detectable with TsanRtlTest DISABLED_BENCH.Mop* tests.
llvm-svn: 307739
Cleaner than using a while loop to copy the string character by character.
Reviewers: alekseyshl, glider
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D35136
llvm-svn: 307696
Summary:
This function is only called once and is fairly simple. Inline to
keep API simple.
Reviewers: alekseyshl, kubamracek
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D35270
llvm-svn: 307695
Summary:
This is the first in a series of patches to refactor sanitizer_procmaps
to allow MachO section information to be exposed on darwin.
In addition, grouping all segment information in a single struct is
cleaner than passing it through a large set of output parameters, and
avoids the need for annotations of NULL parameters for unneeded
information.
The filename string is optional and must be managed and supplied by the
calling function. This is to allow the MemoryMappedSegment struct to be
stored on the stack without causing overly large stack sizes.
Reviewers: alekseyshl, kubamracek, glider
Subscribers: emaste, llvm-commits
Differential Revision: https://reviews.llvm.org/D35135
llvm-svn: 307688
Printing stacktrace from ASAN crashes with a segfault in DEDUP mode when
symbolication is missing.
Differential Revision: https://reviews.llvm.org/D34914
llvm-svn: 307577
This patch ports the assembly file implementing TSan's setjmp support to AArch64 on Darwin.
Differential Revision: https://reviews.llvm.org/D35143
llvm-svn: 307541