Summary:
This change is part of the larger XRay Profiling Mode effort.
Here we implement an arena allocator, for fixed sized buffers used in a
segmented array implementation. This change adds the segmented array
data structure, which relies on the allocator to provide and maintain
the storage for the segmented array.
Key features of the `Allocator` type:
* It uses cache-aligned blocks, intended to host the actual data. These
blocks are cache-line-size multiples of contiguous bytes.
* The `Allocator` has a maximum memory budget, set at construction
time. This allows us to cap the amount of data each specific
`Allocator` instance is responsible for.
* Upon destruction, the `Allocator` will clean up the storage it's
used, handing it back to the internal allocator used in
sanitizer_common.
Key features of the `Array` type:
* Each segmented array is always backed by an `Allocator`, which is
either user-provided or uses a global allocator.
* When an `Array` grows, it grows by appending a segment that's
fixed-sized. The size of each segment is computed by the number of
elements of type `T` that can fit into cache line multiples.
* An `Array` does not return memory to the `Allocator`, but it can keep
track of the current number of "live" objects it stores.
* When an `Array` is destroyed, it will not return memory to the
`Allocator`. Users should clean up the `Allocator` independently of
the `Array`.
* The `Array` type keeps a freelist of the chunks it's used before, so
that trimming and growing will re-use previously allocated chunks.
These basic data structures are used by the XRay Profiling Mode
implementation to implement efficient and cache-aware storage for data
that's typically read-and-write heavy for tracking latency information.
We're relying on the cache line characteristics of the architecture to
provide us good data isolation and cache friendliness, when we're
performing operations like searching for elements and/or updating data
hosted in these cache lines.
Reviewers: echristo, pelikan, kpw
Subscribers: mgorny, llvm-commits
Differential Revision: https://reviews.llvm.org/D45756
llvm-svn: 331141
This code is ill-formed, but under -fno-exceptions compilers generally accept it (at least, prior to C++17). This allows this code to be built by Clang in C++17 mode.
llvm-svn: 330765
Summary:
Typed event patching is implemented for x86-64, but functions must
be defined for other arches.
Reviewers: dberris, pelikan
Subscribers: nemanjai, javed.absar, delcypher, #sanitizers, llvm-commits
Differential Revision: https://reviews.llvm.org/D45749
llvm-svn: 330231
Summary:
Compiler-rt support first before defining the __xray_typedevent() lowering in
llvm. I'm looking for some early feedback before I touch much more code.
Reviewers: dberris
Subscribers: delcypher, llvm-commits, #sanitizers
Differential Revision: https://reviews.llvm.org/D43668
llvm-svn: 330218
Summary:
- last change (+ the Apple support change) missed a lot of indentation
- shorten architecture SOURCES definitions as most fit 1 line/arch
- comment in English what's where, and where the different .a come from
(using only the word "runtime" in the comment isn't useful, since the
CMake primitive itself says "runtime" in its name)
- skip unsupported architectures quickly, to avoid extra indentation
Reviewers: dberris, eizan, kpw
Subscribers: mgorny, delcypher, #sanitizers, llvm-commits
Differential Revision: https://reviews.llvm.org/D45568
llvm-svn: 329998
Summary:
This patch implements the `-fxray-modes=` flag which allows users
building with XRay instrumentation to decide which modes to pre-package
into the binary being linked. The default is the status quo, which will
link all the available modes.
For this to work we're also breaking apart the mode implementations
(xray-fdr and xray-basic) from the main xray runtime. This gives more
granular control of which modes are pre-packaged, and picked from
clang's invocation.
This fixes llvm.org/PR37066.
Note that in the future, we may change the default for clang to only
contain the profiling implementation under development in D44620, when
that implementation is ready.
Reviewers: echristo, eizan, chandlerc
Reviewed By: echristo
Subscribers: mgorny, mgrang, cfe-commits, llvm-commits
Differential Revision: https://reviews.llvm.org/D45474
llvm-svn: 329772
Summary:
This is D45125; the patch enables the build of XRay on OpenBSD. We also
introduce some OpenBSD specific changes to the runtime implementation,
involving how we get the TSC rate through the syscall interface specific
to OpenBSD.
Reviewers: dberris
Authored by: devnexen
Subscribers: dberris, mgorny, krytarowski, llvm-commits
Differential Revision: https://reviews.llvm.org/D45125
llvm-svn: 329189
Summary:
This change adds APIs to allow logging implementations to provide a
function for iterating through in-memory buffers (if they hold in-memory
buffers) and a way for users to generically deal with these buffers
in-process. These APIs are:
- __xray_log_set_buffer_iterator(...) and
__xray_log_remove_buffer_iterator(): installs and removes an
iterator function that takes an XRayBuffer and yields the next one.
- __xray_log_process_buffers(...): takes a function pointer that can
take a mode identifier (string) and an XRayBuffer to process this
data as they see fit.
The intent is to have the FDR mode implementation's buffers be
available through this `__xray_log_process_buffers(...)` API, so that
they can be streamed from memory instead of flushed to disk (useful for
getting the data to a network, or doing in-process analysis).
Basic mode logging will not support this mechanism as it's designed to
write the data mostly to disk.
Future implementations will may depend on this API as well, to allow for
programmatically working through the XRay buffers exposed to the
users in some fashion.
Reviewers: eizan, kpw, pelikan
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D43495
llvm-svn: 326866
Summary:
This change makes changes to XRay implementation files trigger re-builds
of the unit tests. Prior to this change, the unit tests were not built
and run properly if the implementation files were changed during the
development process. This change forces the dependency on all files in
the XRay include and lib hosted files in compiler-rt.
Caveat is, that new files added to the director(ies) will need a re-run
of CMake to re-generate the fileset.
We think this is an OK compromise, since adding new files may
necessitate editing (or adding) new unit tests. It's also less likely
that we're adding new files without updating the CMake configuration to
include the functionality in the XRay runtime implementation anyway.
Reviewers: pelikan, kpw, nglevin
Subscribers: mgorny, llvm-commits
Differential Revision: https://reviews.llvm.org/D44080
llvm-svn: 326842
Summary:
- Enabling the build.
- Using assembly for the cpuid parts.
- Using thr_self FreeBSD call to get the thread id
Patch by: David CARLIER
Reviewers: dberris, rnk, krytarowski
Reviewed By: dberris, krytarowski
Subscribers: emaste, stevecheckoway, nglevin, srhines, kubamracek, dberris, mgorny, krytarowski, llvm-commits, #sanitizers
Differential Revision: https://reviews.llvm.org/D43278
llvm-svn: 325240
Summary:
Allow for options to be defined at compile time, like is already the case for
other sanitizers, via `SCUDO_DEFAULT_OPTIONS`.
Reviewers: alekseyshl, dberris
Reviewed By: alekseyshl, dberris
Subscribers: kubamracek, delcypher, llvm-commits, #sanitizers
Differential Revision: https://reviews.llvm.org/D42980
llvm-svn: 324620
Summary:
This change expands the amount of registers stashed by the entry and
`__xray_CustomEvent` trampolines.
We've found that since the `__xray_CustomEvent` trampoline calls can show up in
situations where the scratch registers are being used, and since we don't
typically want to affect the code-gen around the disabled
`__xray_customevent(...)` intrinsic calls, that we need to save and restore the
state of even the scratch registers in the handling of these custom events.
Reviewers: pcc, pelikan, dblaikie, eizan, kpw, echristo, chandlerc
Reviewed By: echristo
Subscribers: chandlerc, echristo, hiraditya, davide, dblaikie, llvm-commits
Differential Revision: https://reviews.llvm.org/D40894
llvm-svn: 323940
Summary:
While there, unify InMemoryRawLog and InMemoryRawLogWithArg's coding style:
- swap libc's memcpy(3) for sanitizer's internal memcpy
- use basic pointer arithmetics to compute offsets from the first record
entry in the pre-allocated buffer, which is always the appropriate type
for the given function
- lose the local variable references as the TLD.* names fit just as well
Reviewers: eizan, kpw, dberris, dblaikie
Subscribers: #sanitizers, llvm-commits
Differential Revision: https://reviews.llvm.org/D42289
llvm-svn: 322941
FDRLoggingTest::MultiThreadedCycling uses std::array so we need to
include the right C++ header and not rely on transitive dependencies.
llvm-svn: 321485
Summary:
Before this change, XRay would conservatively patch sections of the code
one sled at a time. Upon testing/profiling, this turns out to take an
inordinate amount of time and cycles. For an instrumented clang binary,
the cycles spent both in the patching/unpatching routine constituted 4%
of the cycles -- this didn't count the time spent in the kernel while
performing the mprotect calls in quick succession.
With this change, we're coalescing the number of calls to mprotect from
being linear to the number of instrumentation points, to now being a
lower constant when patching all the sleds through `__xray_patch()` or
`__xray_unpatch()`. In the case of calling `__xray_patch_function()` or
`__xray_unpatch_function()` we're now doing an mprotect call once for
all the sleds for that function (reduction of at least 2x calls to
mprotect).
Reviewers: kpw, eizan
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D41153
llvm-svn: 320664
This change makes XRay print the log file output only when the verbosity
level is higher than 0. It reduces the log spam in the default case when
we want XRay running silently, except when there are actual
fatal/serious errors.
We also update the documentation to show how to get the information
after the change to the default behaviour.
llvm-svn: 320550
r319165 introduced a change to CMakeLists.txt for xray where the set of supported
architectures for XRay was iterated over, tested if they could be targeted then
passed to add_compiler_rt_object_libraries. However all targets were passed,
rather than the architecture that was just tested. For cases such as MIPS, where
mips and mips64 are supported, cmake would then test if mips64 could be targetted
resulting in an attempt to produce multiple identical logical target names, falling
afowl of CMP0002.
Reviewers: dberris
Differential Revision: https://reviews.llvm.org/D40890
llvm-svn: 319893
Summary:
This change implements the basic mode filtering similar to what we do in
FDR mode. The implementation is slightly simpler in basic-mode filtering
because we have less details to remember, but the idea is the same. At a
high level, we do the following to decide when to filter function call
records:
- We maintain a per-thread "shadow stack" which keeps track of the
XRay instrumented functions we've encountered in a thread's
execution.
- We push an entry onto the stack when we enter an XRay instrumented
function, and note the CPU, TSC, and type of entry (whether we have
payload or not when entering).
- When we encounter an exit event, we determine whether the function
being exited is the same function we've entered recently, was
executing in the same CPU, and the delta of the recent TSC and the
recorded TSC at the top of the stack is less than the equivalent
amount of microseconds we're configured to ignore -- then we un-wind
the record offset an appropriate number of times (so we can
overwrite the records later).
We also support limiting the stack depth of the recorded functions,
so that we don't arbitrarily write deep function call stacks.
Reviewers: eizan, pelikan, kpw, dblaikie
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D40828
llvm-svn: 319762
Summary:
This change allows for registration of multiple logging implementations
through a central mechanism in XRay, mapping an implementation to a
"mode". Modes are strings that are used as keys to determine which
implementation to install through a single API. This mechanism allows
users to choose which implementation to install either from the
environment variable 'XRAY_OPTIONS' with the `xray_mode=` flag, or
programmatically using the `__xray_select_mode(...)` function.
Here, we introduce two API functions for the XRay logging:
__xray_log_register_mode(Mode, Impl): Associates an XRayLogImpl to a
string Mode. We can only have one implementation associated with a given
Mode.
__xray_log_select_mode(Mode): Finds the associated Impl for Mode and
installs it as if by calling `__xray_set_log_impl(...)`.
Along with these changes, we also deprecate the xray_naive_log and
xray_fdr_log flags and encourage users to instead use the xray_mode
flag.
Reviewers: kpw, dblaikie, eizan, pelikan
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D40703
llvm-svn: 319759
Summary:
In cases where we can't use the .preinit_array section (as in Darwin for
example) we instead use dynamic initialisation. We know that this
alternative approach will race with the initializers of other objects at
global scope, but this is strictly better than nothing.
Reviewers: kubamracek, nglevin
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D40599
llvm-svn: 319366
This renames ASM_TSAN_SYMBOL and ASM_TSAN_SYMBOL_INTERCEPTOR to just ASM_SYMBOL and ASM_SYMBOL_INTERCEPTOR, because they can be useful in more places than just TSan. Also introduce a CMake function to add ASM sources to a target.
Differential Revision: https://reviews.llvm.org/D40143
llvm-svn: 319339
rL319241 was a bit too aggressive removing sources dependencies. This
restores the actual required dependency for armhf.
Follow-up to D39114.
llvm-svn: 319255
This isolates the per-architecture files from the common files
implementing the XRay facilities. Because of the refactoring done in
D39114, we were including the definition of the sources in the archive
twice, causing link-time failures.
Follow-up to D39114.
llvm-svn: 319241
This change is the first in a series of changes to get the XRay runtime
building on macOS. This first allows us to build the minimal parts of
XRay to get us started on supporting macOS development. These include:
- CMake changes to allow targeting x86_64 initially.
- Allowing for building the initialisation routines without
`.preinit_array` support.
- Use __sanitizer::SleepForMillis() to work around the lack of
clock_nanosleep on macOS.
- Deprecate the xray_fdr_log_grace_period_us flag, and introduce
the xray_fdr_log_grace_period_ms flag instead, to use
milliseconds across platforms.
Reviewers: kubamracek
Subscribers: llvm-commits, krytarowski, nglevin, mgorny
Differential Review: https://reviews.llvm.org/D39114
llvm-svn: 319165
Summary:
Before this patch, XRay's basic (naive mode) logging would be
initialised and installed in an adhoc manner. This patch ports the
implementation of the basic (naive mode) logging implementation to use
the common XRay framework.
We also make the following changes to reduce the variance between the
usage model of basic mode from FDR (flight data recorder) mode:
- Allow programmatic control of the size of the buffers dedicated to
per-thread records. This removes some hard-coded constants and turns
them into runtime-controllable flags and through an Options
structure.
- Default the `xray_naive_log` option to false. For now, the only way
to start basic mode is to set the environment variable, or set the
default at build-time compiler options. Because of this change we've
had to update a couple of tests relying on basic mode being always
on.
- Removed the reliance on a non-trivially destructible per-thread
resource manager. We use a similar trick done in D39526 to use
pthread_key_create() and pthread_setspecific() to ensure that the
per-thread cleanup handling is performed at thread-exit time.
We also radically simplify the code structure for basic mode, to move
most of the implementation in the `__xray` namespace.
Reviewers: pelikan, eizan, kpw
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D40164
llvm-svn: 318734
Summary:
Before this change, the FDR mode implementation relied on at thread-exit
handling to return buffers back to the (global) buffer queue. This
introduces issues with the initialisation of the thread_local objects
which, even through the use of pthread_setspecific(...) may eventually
call into an allocation function. Similar to previous changes in this
line, we're finding that there is a huge potential for deadlocks when
initialising these thread-locals when the memory allocation
implementation is also xray-instrumented.
In this change, we limit the call to pthread_setspecific(...) to provide
a non-null value to associate to the key created with
pthread_key_create(...). While this doesn't completely eliminate the
potential for the deadlock(s), it does allow us to still clean up at
thread exit when we need to. The change is that we don't need to do more
work when starting and ending a thread's lifetime. We also have a test
to make sure that we actually can safely recycle the buffers in case we
end up re-using the buffer(s) available from the queue on multiple
thread entry/exits.
This change cuts across both LLVM and compiler-rt to allow us to update
both the XRay runtime implementation as well as the library support for
loading these new versions of the FDR mode logging. Version 2 of the FDR
logging implementation makes the following changes:
* Introduction of a new 'BufferExtents' metadata record that's outside
of the buffer's contents but are written before the actual buffer.
This data is associated to the Buffer handed out by the BufferQueue
rather than a record that occupies bytes in the actual buffer.
* Removal of the "end of buffer" records. This is in-line with the
changes we described above, to allow for optimistic logging without
explicit record writing at thread exit.
The optimistic logging model operates under the following assumptions:
* Threads writing to the buffers will potentially race with the thread
attempting to flush the log. To avoid this situation from occuring,
we make sure that when we've finalized the logging implementation,
that threads will see this finalization state on the next write, and
either choose to not write records the thread would have written or
write the record(s) in two phases -- first write the record(s), then
update the extents metadata.
* We change the buffer queue implementation so that once it's handed
out a buffer to a thread, that we assume that buffer is marked
"used" to be able to capture partial writes. None of this will be
safe to handle if threads are racing to write the extents records
and the reader thread is attempting to flush the log. The optimism
comes from the finalization routine being required to complete
before we attempt to flush the log.
This is a fairly significant semantics change for the FDR
implementation. This is why we've decided to update the version number
for FDR mode logs. The tools, however, still need to be able to support
older versions of the log until we finally deprecate those earlier
versions.
Reviewers: dblaikie, pelikan, kpw
Subscribers: llvm-commits, hiraditya
Differential Revision: https://reviews.llvm.org/D39526
llvm-svn: 318733
Summary:
This change fixes the XRay trampolines aside from the __xray_CustomEvent
trampoline to align the stack to 16-byte boundaries before calling the
handler. Before this change we've not been explicitly aligning the stack
to 16-byte boundaries, which makes it dangerous when calling handlers
that leave the stack in a state that isn't strictly 16-byte aligned
after calling the handlers.
We add a test that makes sure we can handle these cases appropriately
after the changes, and prevents us from regressing the state moving
forward.
Fixes http://llvm.org/PR35294.
Reviewers: pelikan, pcc
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D40004
llvm-svn: 318261
Summary:
This change implements the changes required in both clang and
compiler-rt to allow building XRay-instrumented binaries in Darwin. For
now we limit this to x86_64. We also start building the XRay runtime
library in compiler-rt for osx.
A caveat to this is that we don't have the tests set up and running
yet, which we'll do in a set of follow-on changes.
This patch uses the monorepo layout for the coordinated change across
multiple projects.
Reviewers: kubamracek
Subscribers: mgorny, cfe-commits, llvm-commits
Differential Revision: https://reviews.llvm.org/D39114
llvm-svn: 317875
Summary:
This change removes dependencies on STL types:
- std::aligned_storage -- we're using manually-aligned character
buffers instead for metadata and function records.
- std::tuple -- use a plain old struct instead.
This is an incremental step in removing all STL references from the
compiler-rt implementation of XRay (llvm.org/PR32274).
Reviewers: dblaikie, pelikan, kpw
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D39277
llvm-svn: 316816
Summary:
This change removes the dependency on C++ standard library
types/functions in the implementation of the buffer queue. This is an
incremental step in resolving llvm.org/PR32274.
Reviewers: dblaikie, pelikan
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D39175
llvm-svn: 316406
Breaks build:
http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux/builds/4677/steps/build%20with%20ninja/logs/stdio
In file included from compiler-rt/lib/xray/xray_fdr_logging.cc:34:
In file included from compiler-rt/lib/xray/xray_fdr_logging_impl.h:36:
In file included from compiler-rt/lib/xray/xray_flags.h:18:
compiler-rt/lib/xray/../sanitizer_common/sanitizer_flag_parser.h:23:7: error: '__sanitizer::FlagHandlerBase' has virtual functions but non-virtual destructor [-Werror,-Wnon-virtual-dtor]
class FlagHandlerBase {
llvm-svn: 316348
Summary:
In FDR Mode, when we set up a new buffer for a thread that's just
overflowed, we must place the CPU identifier with the TSC record as the
first record. This is so that we can reconstruct all the function
entry/exit with deltas rooted on a TSC record for the CPU at the
beginning of the buffer.
Without doing this, the tools are rejecting the log for cases when we've
overflown and have different buffers that don't have the CPU and TSC
records as the first entry in the buffers.
Reviewers: pelikan
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D38995
llvm-svn: 315987
Summary:
This change allows the XRay basic (naive) mode logging implementation to
start writing the payload entries through the arg1 logging handler. This
implementation writes out the records that the llvm-xray tool and the
trace reader library will start processing in D38550.
This introduces a new payload record type which logs the data through
the in-memory buffer. It uses the same size/alignment that the normal
XRay record entries use. We use a new record type to indicate these new
entries, so that the trace reader library in LLVM can start reading
these entries.
Depends on D38550.
Reviewers: pelikan
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D38551
llvm-svn: 314968
Summary:
This change removes the dependency on using a std::deque<...> for the
storage of the buffers in the buffer queue. We instead implement a
fixed-size circular buffer that's resilient to exhaustion, and preserves
the semantics of the BufferQueue.
We're moving away from using std::deque<...> for two reasons:
- We want to remove dependencies on the STL for data structures.
- We want the data structure we use to not require re-allocation in
the normal course of operation.
The internal implementation of the buffer queue uses heap-allocated
arrays that are initialized once when the BufferQueue is created, and
re-uses slots in the buffer array as buffers are returned in order.
We also change the lock used in the implementation to a spinlock
instead of a blocking mutex. We reason that since the release operations
now take very little time in the critical section, that a spinlock would
be appropriate.
This change is related to D38073.
This change is a re-submit with the following changes:
- Keeping track of the live buffers with a counter independent of the
pointers keeping track of the extents of the circular buffer.
- Additional documentation of what the data members are meant to
represent.
Reviewers: dblaikie, kpw, pelikan
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D38119
llvm-svn: 314877
Summary:
When the XRay user calls the API to finish writing the log, the thread
which is calling the API still hasn't finished and therefore won't get
its trace written. Add a test for only the main thread to check this.
Reviewers: dberris
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D38493
llvm-svn: 314875