We need the compiler generated variable to override the weak symbol of
the same name inside the profile runtime, but using LinkOnceODRLinkage
results in weak symbol being emitted in which case the symbol selected
by the linker is going to depend on the order of inputs which can be
fragile.
This change replaces the use of weak definition inside the runtime with
a weak alias. We place the compiler generated symbol inside a COMDAT
group so dead definition can be garbage collected by the linker.
We also disable the use of runtime counter relocation on Darwin since
Mach-O doesn't support weak external references, but Darwin already uses
a different continous mode that relies on overmapping so runtime counter
relocation isn't needed there.
Differential Revision: https://reviews.llvm.org/D105176
For some reason we have 2 switches on arch and add
half of arch flags in one place and half in another.
Merge these 2 switches.
Reviewed By: melver
Differential Revision: https://reviews.llvm.org/D106274
We obtain the current PC is all interceptors and collectively
common interceptor code contributes to overall slowdown
(in particular cheaper str/mem* functions).
The current way to obtain the current PC involves:
4493e1: e8 3a f3 fe ff callq 438720 <_ZN11__sanitizer10StackTrace12GetCurrentPcEv>
4493e9: 48 89 c6 mov %rax,%rsi
and the called function is:
uptr StackTrace::GetCurrentPc() {
438720: 48 8b 04 24 mov (%rsp),%rax
438724: c3 retq
The new way uses address of a local label and involves just:
44a888: 48 8d 35 fa ff ff ff lea -0x6(%rip),%rsi
I am not switching all uses of StackTrace::GetCurrentPc to GET_CURRENT_PC
because it may lead some differences in produced reports and break tests.
The difference comes from the fact that currently we have PC pointing
to the CALL instruction, but the new way does not yield any code on its own
so the PC points to a random instruction in the function and symbolizing
that instruction can produce additional inlined frames (if the random
instruction happen to relate to some inlined function).
Reviewed By: vitalybuka, melver
Differential Revision: https://reviews.llvm.org/D106046
This member is now only used when storage is heap-allocated so it does not
need to be const. Dropping 'const' eliminates cast warnings on many builders.
These have been failing on our bots for a while due to
incomplete backtraces. (you don't get the names of the
functions that did the access, just the reporter frames)
See:
https://lab.llvm.org/buildbot/#/builders/170/builds/180
Adds support for MachO static initializers/deinitializers and eh-frame
registration via the ORC runtime.
This commit introduces cooperative support code into the ORC runtime and ORC
LLVM libraries (especially the MachOPlatform class) to support macho runtime
features for JIT'd code. This commit introduces support for static
initializers, static destructors (via cxa_atexit interposition), and eh-frame
registration. Near-future commits will add support for MachO native
thread-local variables, and language runtime registration (e.g. for Objective-C
and Swift).
The llvm-jitlink tool is updated to use the ORC runtime where available, and
regression tests for the new MachOPlatform support are added to compiler-rt.
Notable changes on the ORC runtime side:
1. The new macho_platform.h / macho_platform.cpp files contain the bulk of the
runtime-side support. This includes eh-frame registration; jit versions of
dlopen, dlsym, and dlclose; a cxa_atexit interpose to record static destructors,
and an '__orc_rt_macho_run_program' function that defines running a JIT'd MachO
program in terms of the jit- dlopen/dlsym/dlclose functions.
2. Replaces JITTargetAddress (and casting operations) with ExecutorAddress
(copied from LLVM) to improve type-safety of address management.
3. Adds serialization support for ExecutorAddress and unordered_map types to
the runtime-side Simple Packed Serialization code.
4. Adds orc-runtime regression tests to ensure that static initializers and
cxa-atexit interposes work as expected.
Notable changes on the LLVM side:
1. The MachOPlatform class is updated to:
1.1. Load the ORC runtime into the ExecutionSession.
1.2. Set up standard aliases for macho-specific runtime functions. E.g.
___cxa_atexit -> ___orc_rt_macho_cxa_atexit.
1.3. Install the MachOPlatformPlugin to scrape LinkGraphs for information
needed to support MachO features (e.g. eh-frames, mod-inits), and
communicate this information to the runtime.
1.4. Provide entry-points that the runtime can call to request initializers,
perform symbol lookup, and request deinitialiers (the latter is
implemented as an empty placeholder as macho object deinits are rarely
used).
1.5. Create a MachO header object for each JITDylib (defining the __mh_header
and __dso_handle symbols).
2. The llvm-jitlink tool (and llvm-jitlink-executor) are updated to use the
runtime when available.
3. A `lookupInitSymbolsAsync` method is added to the Platform base class. This
can be used to issue an async lookup for initializer symbols. The existing
`lookupInitSymbols` method is retained (the GenericIRPlatform code is still
using it), but is deprecated and will be removed soon.
4. JIT-dispatch support code is added to ExecutorProcessControl.
The JIT-dispatch system allows handlers in the JIT process to be associated with
'tag' symbols in the executor, and allows the executor to make remote procedure
calls back to the JIT process (via __orc_rt_jit_dispatch) using those tags.
The primary use case is ORC runtime code that needs to call bakc to handlers in
orc::Platform subclasses. E.g. __orc_rt_macho_jit_dlopen calling back to
MachOPlatform::rt_getInitializers using __orc_rt_macho_get_initializers_tag.
(The system is generic however, and could be used by non-runtime code).
The new ExecutorProcessControl::JITDispatchInfo struct provides the address
(in the executor) of the jit-dispatch function and a jit-dispatch context
object, and implementations of the dispatch function are added to
SelfExecutorProcessControl and OrcRPCExecutorProcessControl.
5. OrcRPCTPCServer is updated to support JIT-dispatch calls over ORC-RPC.
6. Serialization support for StringMap is added to the LLVM-side Simple Packed
Serialization code.
7. A JITLink::allocateBuffer operation is introduced to allocate writable memory
attached to the graph. This is used by the MachO header synthesis code, and will
be generically useful for other clients who want to create new graph content
from scratch.
ptrauth stores info in the address of functions, so it's not the right address we should check if poisoned
rdar://75246928
Differential Revision: https://reviews.llvm.org/D106199
Windows bot failed with:
sanitizer_win.cpp.obj : error LNK2019: unresolved external symbol WaitOnAddress referenced in function "void __cdecl __sanitizer::FutexWait(struct __sanitizer::atomic_uint32_t *,unsigned int)" (?FutexWait@__sanitizer@@YAXPEAUatomic_uint32_t@1@I@Z)
sanitizer_win.cpp.obj : error LNK2019: unresolved external symbol WakeByAddressSingle referenced in function "void __cdecl __sanitizer::FutexWake(struct __sanitizer::atomic_uint32_t *,unsigned int)" (?FutexWake@__sanitizer@@YAXPEAUatomic_uint32_t@1@I@Z)
sanitizer_win.cpp.obj : error LNK2019: unresolved external symbol WakeByAddressAll referenced in function "void __cdecl __sanitizer::FutexWake(struct __sanitizer::atomic_uint32_t *,unsigned int)" (?FutexWake@__sanitizer@@YAXPEAUatomic_uint32_t@1@I@Z)
https://lab.llvm.org/buildbot/#/builders/127/builds/14046
According to MSDN we need to link Synchronization.lib:
https://docs.microsoft.com/en-us/windows/win32/api/synchapi/nf-synchapi-waitonaddress
Differential Revision: https://reviews.llvm.org/D106167
This was fixed in the past for `frexp`, but was not made for `frexpl` & `frexpf` https://github.com/google/sanitizers/issues/321
This patch copies the fix over to `frexpl` because it caused `frexp_interceptor.cpp` test to fail on iPhone and `frexpf` for consistency.
rdar://79652161
Reviewed By: delcypher, vitalybuka
Differential Revision: https://reviews.llvm.org/D104948
Semaphore is a portable way to park/unpark threads.
The plan is to use it to implement a portable blocking
mutex in subsequent changes. Semaphore can also be used
to efficiently wait for other things (e.g. we currently
spin to synchronize thread creation and start).
Reviewed By: vitalybuka, melver
Differential Revision: https://reviews.llvm.org/D106071
After we relocate counters, we no longer need to keep the original copy
around so we can return the memory back to the operating system.
Differential Revision: https://reviews.llvm.org/D104839
Currently we don't lock ScopedErrorReportLock around fork
and it mostly works becuase tsan has own report_mtx that
is locked around fork and tsan reports.
However, sanitizer_common code prints some own reports
which are not protected by tsan's report_mtx. So it's better
to lock ScopedErrorReportLock explicitly.
Reviewed By: melver
Differential Revision: https://reviews.llvm.org/D106048
The new GET_CURRENT_PC() can lead to spurious top inlined internal frames.
Here are 2 examples from bots, in both cases the malloc is supposed to be
the top frame (#0):
WARNING: ThreadSanitizer: signal-unsafe call inside of a signal
#0 __sanitizer::StackTrace::GetNextInstructionPc(unsigned long)
#1 malloc
Location is heap block of size 99 at 0xbe3800003800 allocated by thread T1:
#0 __sanitizer::StackTrace::GetNextInstructionPc(unsigned long)
#1 malloc
Let's strip these internal top frames from reports.
With other code changes I also observed some top frames
from __tsan::ScopedInterceptor, proactively remove these as well.
Differential Revision: https://reviews.llvm.org/D106081
We obtain the current PC is all interceptors and collectively
common interceptor code contributes to overall slowdown
(in particular cheaper str/mem* functions).
The current way to obtain the current PC involves:
4493e1: e8 3a f3 fe ff callq 438720 <_ZN11__sanitizer10StackTrace12GetCurrentPcEv>
4493e9: 48 89 c6 mov %rax,%rsi
and the called function is:
uptr StackTrace::GetCurrentPc() {
438720: 48 8b 04 24 mov (%rsp),%rax
438724: c3 retq
The new way uses address of a local label and involves just:
44a888: 48 8d 35 fa ff ff ff lea -0x6(%rip),%rsi
I am not switching all uses of StackTrace::GetCurrentPc to GET_CURRENT_PC
because it may lead some differences in produced reports and break tests.
The difference comes from the fact that currently we have PC pointing
to the CALL instruction, but the new way does not yield any code on its own
so the PC points to a random instruction in the function and symbolizing
that instruction can produce additional inlined frames (if the random
instruction happen to relate to some inlined function).
Reviewed By: melver
Differential Revision: https://reviews.llvm.org/D106046
Define the address ranges (similar to the C/C++ ones, but with the heap
range merged into the app range) and enable the sanity check.
Reviewed By: dvyukov
Differential Revision: https://reviews.llvm.org/D105629
XFAIL map32bit, define the maximum possible allocation size in
mmap_large.cpp.
Reviewed By: dvyukov
Differential Revision: https://reviews.llvm.org/D105629
Reuse the assembly glue code from sanitizer_common_interceptors.inc and
the handling logic from the __tls_get_addr interceptor.
Reviewed By: dvyukov
Differential Revision: https://reviews.llvm.org/D105629
The kernel supports a full 64-bit VMA, but we can use only 48 bits due
to the limitation imposed by SyncVar::GetId(). So define the address
ranges similar to the other architectures, except that the address
space "tail" needs to be made inaccessible in CheckAndProtect(). Since
it's for only one architecture, don't make an abstraction for this.
Reviewed By: dvyukov
Differential Revision: https://reviews.llvm.org/D105629
These tests depend on TSan seeing the intercepted memcpy(), so they
break when the compiler chooses the builtin version.
Reviewed By: dvyukov
Differential Revision: https://reviews.llvm.org/D105629
s390x requires ThreadRegistry.mtx_.opaque_storage_ to be 4-byte
aligned. Since other architectures may have similar requirements, use
the maximum thread_registry_placeholder alignment from other
sanitizers, which is 64 (LSan).
Reviewed By: dvyukov
Differential Revision: https://reviews.llvm.org/D105629
When running with an old glibc, CollectStaticTlsBlocks() calls
__tls_get_addr() in order to force TLS allocation. This function is not
available on s390 and the code simply does nothing in this case,
so all the resulting static TLS blocks end up being incorrect.
Fix by calling __tls_get_offset() on s390.
Reviewed By: dvyukov
Differential Revision: https://reviews.llvm.org/D105629
setuid(0) hangs on SystemZ under TSan because TSan's BackgroundThread
ignores SIGSETXID. This in turn happens because internal_sigdelset()
messes up the mask bits on big-endian system due to how
__sanitizer_kernel_sigset_t is defined.
Commit d9a1a53b8d ("[ESan] [MIPS] Fix workingset-signal-posix.cpp on
MIPS") fixed this for MIPS by adjusting the __sanitizer_kernel_sigset_t
definition. Generalize this by defining __SANITIZER_KERNEL_NSIG based
on kernel's _NSIG and using uptr[] for __sanitizer_kernel_sigset_t.sig
on all platforms.
Reviewed By: dvyukov
Differential Revision: https://reviews.llvm.org/D105629
Currently ThreadRegistry is overcomplicated because of tsan,
it needs tid quarantine and reuse counters. Other sanitizers
don't need that. It also seems that no other sanitizer now
needs max number of threads. Asan used to need 2^24 limit,
but it does not seem to be needed now. Other sanitizers blindly
copy-pasted that without reasons. Lsan also uses quarantine,
but I don't see why that may be potentially needed.
Add a ThreadRegistry ctor that does not require any sizes
and use it in all sanitizers except for tsan.
In preparation for new tsan runtime, which won't need
any of these parameters as well.
Reviewed By: vitalybuka
Differential Revision: https://reviews.llvm.org/D105713