[XRay][profiler] Part 1: XRay Allocator and Array Implementations
Summary:
This change is part of the larger XRay Profiling Mode effort.
Here we implement an arena allocator, for fixed sized buffers used in a
segmented array implementation. This change adds the segmented array
data structure, which relies on the allocator to provide and maintain
the storage for the segmented array.
Key features of the `Allocator` type:
* It uses cache-aligned blocks, intended to host the actual data. These
blocks are cache-line-size multiples of contiguous bytes.
* The `Allocator` has a maximum memory budget, set at construction
time. This allows us to cap the amount of data each specific
`Allocator` instance is responsible for.
* Upon destruction, the `Allocator` will clean up the storage it's
used, handing it back to the internal allocator used in
sanitizer_common.
Key features of the `Array` type:
* Each segmented array is always backed by an `Allocator`, which is
either user-provided or uses a global allocator.
* When an `Array` grows, it grows by appending a segment that's
fixed-sized. The size of each segment is computed by the number of
elements of type `T` that can fit into cache line multiples.
* An `Array` does not return memory to the `Allocator`, but it can keep
track of the current number of "live" objects it stores.
* When an `Array` is destroyed, it will not return memory to the
`Allocator`. Users should clean up the `Allocator` independently of
the `Array`.
* The `Array` type keeps a freelist of the chunks it's used before, so
that trimming and growing will re-use previously allocated chunks.
These basic data structures are used by the XRay Profiling Mode
implementation to implement efficient and cache-aware storage for data
that's typically read-and-write heavy for tracking latency information.
We're relying on the cache line characteristics of the architecture to
provide us good data isolation and cache friendliness, when we're
performing operations like searching for elements and/or updating data
hosted in these cache lines.
Reviewers: echristo, pelikan, kpw
Subscribers: mgorny, llvm-commits
Differential Revision: https://reviews.llvm.org/D45756
llvm-svn: 331141
2018-04-29 21:46:30 +08:00
|
|
|
//===-- xray_segmented_array.h ---------------------------------*- C++ -*-===//
|
|
|
|
//
|
2019-01-19 16:50:56 +08:00
|
|
|
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
|
|
|
|
// See https://llvm.org/LICENSE.txt for license information.
|
|
|
|
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
|
[XRay][profiler] Part 1: XRay Allocator and Array Implementations
Summary:
This change is part of the larger XRay Profiling Mode effort.
Here we implement an arena allocator, for fixed sized buffers used in a
segmented array implementation. This change adds the segmented array
data structure, which relies on the allocator to provide and maintain
the storage for the segmented array.
Key features of the `Allocator` type:
* It uses cache-aligned blocks, intended to host the actual data. These
blocks are cache-line-size multiples of contiguous bytes.
* The `Allocator` has a maximum memory budget, set at construction
time. This allows us to cap the amount of data each specific
`Allocator` instance is responsible for.
* Upon destruction, the `Allocator` will clean up the storage it's
used, handing it back to the internal allocator used in
sanitizer_common.
Key features of the `Array` type:
* Each segmented array is always backed by an `Allocator`, which is
either user-provided or uses a global allocator.
* When an `Array` grows, it grows by appending a segment that's
fixed-sized. The size of each segment is computed by the number of
elements of type `T` that can fit into cache line multiples.
* An `Array` does not return memory to the `Allocator`, but it can keep
track of the current number of "live" objects it stores.
* When an `Array` is destroyed, it will not return memory to the
`Allocator`. Users should clean up the `Allocator` independently of
the `Array`.
* The `Array` type keeps a freelist of the chunks it's used before, so
that trimming and growing will re-use previously allocated chunks.
These basic data structures are used by the XRay Profiling Mode
implementation to implement efficient and cache-aware storage for data
that's typically read-and-write heavy for tracking latency information.
We're relying on the cache line characteristics of the architecture to
provide us good data isolation and cache friendliness, when we're
performing operations like searching for elements and/or updating data
hosted in these cache lines.
Reviewers: echristo, pelikan, kpw
Subscribers: mgorny, llvm-commits
Differential Revision: https://reviews.llvm.org/D45756
llvm-svn: 331141
2018-04-29 21:46:30 +08:00
|
|
|
//
|
|
|
|
//===----------------------------------------------------------------------===//
|
|
|
|
//
|
|
|
|
// This file is a part of XRay, a dynamic runtime instrumentation system.
|
|
|
|
//
|
2018-07-18 10:08:39 +08:00
|
|
|
// Defines the implementation of a segmented array, with fixed-size segments
|
[XRay][profiler] Part 1: XRay Allocator and Array Implementations
Summary:
This change is part of the larger XRay Profiling Mode effort.
Here we implement an arena allocator, for fixed sized buffers used in a
segmented array implementation. This change adds the segmented array
data structure, which relies on the allocator to provide and maintain
the storage for the segmented array.
Key features of the `Allocator` type:
* It uses cache-aligned blocks, intended to host the actual data. These
blocks are cache-line-size multiples of contiguous bytes.
* The `Allocator` has a maximum memory budget, set at construction
time. This allows us to cap the amount of data each specific
`Allocator` instance is responsible for.
* Upon destruction, the `Allocator` will clean up the storage it's
used, handing it back to the internal allocator used in
sanitizer_common.
Key features of the `Array` type:
* Each segmented array is always backed by an `Allocator`, which is
either user-provided or uses a global allocator.
* When an `Array` grows, it grows by appending a segment that's
fixed-sized. The size of each segment is computed by the number of
elements of type `T` that can fit into cache line multiples.
* An `Array` does not return memory to the `Allocator`, but it can keep
track of the current number of "live" objects it stores.
* When an `Array` is destroyed, it will not return memory to the
`Allocator`. Users should clean up the `Allocator` independently of
the `Array`.
* The `Array` type keeps a freelist of the chunks it's used before, so
that trimming and growing will re-use previously allocated chunks.
These basic data structures are used by the XRay Profiling Mode
implementation to implement efficient and cache-aware storage for data
that's typically read-and-write heavy for tracking latency information.
We're relying on the cache line characteristics of the architecture to
provide us good data isolation and cache friendliness, when we're
performing operations like searching for elements and/or updating data
hosted in these cache lines.
Reviewers: echristo, pelikan, kpw
Subscribers: mgorny, llvm-commits
Differential Revision: https://reviews.llvm.org/D45756
llvm-svn: 331141
2018-04-29 21:46:30 +08:00
|
|
|
// backing the segments.
|
|
|
|
//
|
|
|
|
//===----------------------------------------------------------------------===//
|
|
|
|
#ifndef XRAY_SEGMENTED_ARRAY_H
|
|
|
|
#define XRAY_SEGMENTED_ARRAY_H
|
|
|
|
|
|
|
|
#include "sanitizer_common/sanitizer_allocator.h"
|
|
|
|
#include "xray_allocator.h"
|
2018-07-10 16:25:44 +08:00
|
|
|
#include "xray_utils.h"
|
|
|
|
#include <cassert>
|
[XRay][profiler] Part 1: XRay Allocator and Array Implementations
Summary:
This change is part of the larger XRay Profiling Mode effort.
Here we implement an arena allocator, for fixed sized buffers used in a
segmented array implementation. This change adds the segmented array
data structure, which relies on the allocator to provide and maintain
the storage for the segmented array.
Key features of the `Allocator` type:
* It uses cache-aligned blocks, intended to host the actual data. These
blocks are cache-line-size multiples of contiguous bytes.
* The `Allocator` has a maximum memory budget, set at construction
time. This allows us to cap the amount of data each specific
`Allocator` instance is responsible for.
* Upon destruction, the `Allocator` will clean up the storage it's
used, handing it back to the internal allocator used in
sanitizer_common.
Key features of the `Array` type:
* Each segmented array is always backed by an `Allocator`, which is
either user-provided or uses a global allocator.
* When an `Array` grows, it grows by appending a segment that's
fixed-sized. The size of each segment is computed by the number of
elements of type `T` that can fit into cache line multiples.
* An `Array` does not return memory to the `Allocator`, but it can keep
track of the current number of "live" objects it stores.
* When an `Array` is destroyed, it will not return memory to the
`Allocator`. Users should clean up the `Allocator` independently of
the `Array`.
* The `Array` type keeps a freelist of the chunks it's used before, so
that trimming and growing will re-use previously allocated chunks.
These basic data structures are used by the XRay Profiling Mode
implementation to implement efficient and cache-aware storage for data
that's typically read-and-write heavy for tracking latency information.
We're relying on the cache line characteristics of the architecture to
provide us good data isolation and cache friendliness, when we're
performing operations like searching for elements and/or updating data
hosted in these cache lines.
Reviewers: echristo, pelikan, kpw
Subscribers: mgorny, llvm-commits
Differential Revision: https://reviews.llvm.org/D45756
llvm-svn: 331141
2018-04-29 21:46:30 +08:00
|
|
|
#include <type_traits>
|
|
|
|
#include <utility>
|
|
|
|
|
|
|
|
namespace __xray {
|
|
|
|
|
|
|
|
/// The Array type provides an interface similar to std::vector<...> but does
|
|
|
|
/// not shrink in size. Once constructed, elements can be appended but cannot be
|
|
|
|
/// removed. The implementation is heavily dependent on the contract provided by
|
|
|
|
/// the Allocator type, in that all memory will be released when the Allocator
|
|
|
|
/// is destroyed. When an Array is destroyed, it will destroy elements in the
|
2018-07-18 10:08:39 +08:00
|
|
|
/// backing store but will not free the memory.
|
|
|
|
template <class T> class Array {
|
2018-12-07 11:19:13 +08:00
|
|
|
struct Segment {
|
|
|
|
Segment *Prev;
|
|
|
|
Segment *Next;
|
2018-07-20 17:22:22 +08:00
|
|
|
char Data[1];
|
2018-07-18 10:08:39 +08:00
|
|
|
};
|
|
|
|
|
|
|
|
public:
|
|
|
|
// Each segment of the array will be laid out with the following assumptions:
|
|
|
|
//
|
|
|
|
// - Each segment will be on a cache-line address boundary (kCacheLineSize
|
|
|
|
// aligned).
|
|
|
|
//
|
|
|
|
// - The elements will be accessed through an aligned pointer, dependent on
|
|
|
|
// the alignment of T.
|
|
|
|
//
|
|
|
|
// - Each element is at least two-pointers worth from the beginning of the
|
|
|
|
// Segment, aligned properly, and the rest of the elements are accessed
|
|
|
|
// through appropriate alignment.
|
|
|
|
//
|
|
|
|
// We then compute the size of the segment to follow this logic:
|
|
|
|
//
|
|
|
|
// - Compute the number of elements that can fit within
|
|
|
|
// kCacheLineSize-multiple segments, minus the size of two pointers.
|
|
|
|
//
|
|
|
|
// - Request cacheline-multiple sized elements from the allocator.
|
2018-12-07 11:19:13 +08:00
|
|
|
static constexpr uint64_t AlignedElementStorageSize =
|
2018-07-18 10:08:39 +08:00
|
|
|
sizeof(typename std::aligned_storage<sizeof(T), alignof(T)>::type);
|
|
|
|
|
2018-12-07 11:19:13 +08:00
|
|
|
static constexpr uint64_t SegmentControlBlockSize = sizeof(Segment *) * 2;
|
|
|
|
|
|
|
|
static constexpr uint64_t SegmentSize = nearest_boundary(
|
|
|
|
SegmentControlBlockSize + next_pow2(sizeof(T)), kCacheLineSize);
|
2018-07-18 10:08:39 +08:00
|
|
|
|
|
|
|
using AllocatorType = Allocator<SegmentSize>;
|
|
|
|
|
2018-12-07 11:19:13 +08:00
|
|
|
static constexpr uint64_t ElementsPerSegment =
|
|
|
|
(SegmentSize - SegmentControlBlockSize) / next_pow2(sizeof(T));
|
2018-07-18 10:08:39 +08:00
|
|
|
|
|
|
|
static_assert(ElementsPerSegment > 0,
|
|
|
|
"Must have at least 1 element per segment.");
|
|
|
|
|
2018-12-07 11:19:13 +08:00
|
|
|
static Segment SentinelSegment;
|
[XRay][profiler] Part 1: XRay Allocator and Array Implementations
Summary:
This change is part of the larger XRay Profiling Mode effort.
Here we implement an arena allocator, for fixed sized buffers used in a
segmented array implementation. This change adds the segmented array
data structure, which relies on the allocator to provide and maintain
the storage for the segmented array.
Key features of the `Allocator` type:
* It uses cache-aligned blocks, intended to host the actual data. These
blocks are cache-line-size multiples of contiguous bytes.
* The `Allocator` has a maximum memory budget, set at construction
time. This allows us to cap the amount of data each specific
`Allocator` instance is responsible for.
* Upon destruction, the `Allocator` will clean up the storage it's
used, handing it back to the internal allocator used in
sanitizer_common.
Key features of the `Array` type:
* Each segmented array is always backed by an `Allocator`, which is
either user-provided or uses a global allocator.
* When an `Array` grows, it grows by appending a segment that's
fixed-sized. The size of each segment is computed by the number of
elements of type `T` that can fit into cache line multiples.
* An `Array` does not return memory to the `Allocator`, but it can keep
track of the current number of "live" objects it stores.
* When an `Array` is destroyed, it will not return memory to the
`Allocator`. Users should clean up the `Allocator` independently of
the `Array`.
* The `Array` type keeps a freelist of the chunks it's used before, so
that trimming and growing will re-use previously allocated chunks.
These basic data structures are used by the XRay Profiling Mode
implementation to implement efficient and cache-aware storage for data
that's typically read-and-write heavy for tracking latency information.
We're relying on the cache line characteristics of the architecture to
provide us good data isolation and cache friendliness, when we're
performing operations like searching for elements and/or updating data
hosted in these cache lines.
Reviewers: echristo, pelikan, kpw
Subscribers: mgorny, llvm-commits
Differential Revision: https://reviews.llvm.org/D45756
llvm-svn: 331141
2018-04-29 21:46:30 +08:00
|
|
|
|
2018-12-07 11:19:13 +08:00
|
|
|
using size_type = uint64_t;
|
2018-10-22 10:11:27 +08:00
|
|
|
|
2018-07-18 10:08:39 +08:00
|
|
|
private:
|
[XRay][profiler] Part 1: XRay Allocator and Array Implementations
Summary:
This change is part of the larger XRay Profiling Mode effort.
Here we implement an arena allocator, for fixed sized buffers used in a
segmented array implementation. This change adds the segmented array
data structure, which relies on the allocator to provide and maintain
the storage for the segmented array.
Key features of the `Allocator` type:
* It uses cache-aligned blocks, intended to host the actual data. These
blocks are cache-line-size multiples of contiguous bytes.
* The `Allocator` has a maximum memory budget, set at construction
time. This allows us to cap the amount of data each specific
`Allocator` instance is responsible for.
* Upon destruction, the `Allocator` will clean up the storage it's
used, handing it back to the internal allocator used in
sanitizer_common.
Key features of the `Array` type:
* Each segmented array is always backed by an `Allocator`, which is
either user-provided or uses a global allocator.
* When an `Array` grows, it grows by appending a segment that's
fixed-sized. The size of each segment is computed by the number of
elements of type `T` that can fit into cache line multiples.
* An `Array` does not return memory to the `Allocator`, but it can keep
track of the current number of "live" objects it stores.
* When an `Array` is destroyed, it will not return memory to the
`Allocator`. Users should clean up the `Allocator` independently of
the `Array`.
* The `Array` type keeps a freelist of the chunks it's used before, so
that trimming and growing will re-use previously allocated chunks.
These basic data structures are used by the XRay Profiling Mode
implementation to implement efficient and cache-aware storage for data
that's typically read-and-write heavy for tracking latency information.
We're relying on the cache line characteristics of the architecture to
provide us good data isolation and cache friendliness, when we're
performing operations like searching for elements and/or updating data
hosted in these cache lines.
Reviewers: echristo, pelikan, kpw
Subscribers: mgorny, llvm-commits
Differential Revision: https://reviews.llvm.org/D45756
llvm-svn: 331141
2018-04-29 21:46:30 +08:00
|
|
|
// This Iterator models a BidirectionalIterator.
|
|
|
|
template <class U> class Iterator {
|
2018-12-07 11:19:13 +08:00
|
|
|
Segment *S = &SentinelSegment;
|
|
|
|
uint64_t Offset = 0;
|
|
|
|
uint64_t Size = 0;
|
[XRay][profiler] Part 1: XRay Allocator and Array Implementations
Summary:
This change is part of the larger XRay Profiling Mode effort.
Here we implement an arena allocator, for fixed sized buffers used in a
segmented array implementation. This change adds the segmented array
data structure, which relies on the allocator to provide and maintain
the storage for the segmented array.
Key features of the `Allocator` type:
* It uses cache-aligned blocks, intended to host the actual data. These
blocks are cache-line-size multiples of contiguous bytes.
* The `Allocator` has a maximum memory budget, set at construction
time. This allows us to cap the amount of data each specific
`Allocator` instance is responsible for.
* Upon destruction, the `Allocator` will clean up the storage it's
used, handing it back to the internal allocator used in
sanitizer_common.
Key features of the `Array` type:
* Each segmented array is always backed by an `Allocator`, which is
either user-provided or uses a global allocator.
* When an `Array` grows, it grows by appending a segment that's
fixed-sized. The size of each segment is computed by the number of
elements of type `T` that can fit into cache line multiples.
* An `Array` does not return memory to the `Allocator`, but it can keep
track of the current number of "live" objects it stores.
* When an `Array` is destroyed, it will not return memory to the
`Allocator`. Users should clean up the `Allocator` independently of
the `Array`.
* The `Array` type keeps a freelist of the chunks it's used before, so
that trimming and growing will re-use previously allocated chunks.
These basic data structures are used by the XRay Profiling Mode
implementation to implement efficient and cache-aware storage for data
that's typically read-and-write heavy for tracking latency information.
We're relying on the cache line characteristics of the architecture to
provide us good data isolation and cache friendliness, when we're
performing operations like searching for elements and/or updating data
hosted in these cache lines.
Reviewers: echristo, pelikan, kpw
Subscribers: mgorny, llvm-commits
Differential Revision: https://reviews.llvm.org/D45756
llvm-svn: 331141
2018-04-29 21:46:30 +08:00
|
|
|
|
|
|
|
public:
|
2018-12-07 11:19:13 +08:00
|
|
|
Iterator(Segment *IS, uint64_t Off, uint64_t S) XRAY_NEVER_INSTRUMENT
|
2018-09-07 18:16:14 +08:00
|
|
|
: S(IS),
|
|
|
|
Offset(Off),
|
|
|
|
Size(S) {}
|
|
|
|
Iterator(const Iterator &) NOEXCEPT XRAY_NEVER_INSTRUMENT = default;
|
|
|
|
Iterator() NOEXCEPT XRAY_NEVER_INSTRUMENT = default;
|
|
|
|
Iterator(Iterator &&) NOEXCEPT XRAY_NEVER_INSTRUMENT = default;
|
|
|
|
Iterator &operator=(const Iterator &) XRAY_NEVER_INSTRUMENT = default;
|
|
|
|
Iterator &operator=(Iterator &&) XRAY_NEVER_INSTRUMENT = default;
|
|
|
|
~Iterator() XRAY_NEVER_INSTRUMENT = default;
|
|
|
|
|
|
|
|
Iterator &operator++() XRAY_NEVER_INSTRUMENT {
|
2018-07-18 10:08:39 +08:00
|
|
|
if (++Offset % ElementsPerSegment || Offset == Size)
|
[XRay][profiler] Part 1: XRay Allocator and Array Implementations
Summary:
This change is part of the larger XRay Profiling Mode effort.
Here we implement an arena allocator, for fixed sized buffers used in a
segmented array implementation. This change adds the segmented array
data structure, which relies on the allocator to provide and maintain
the storage for the segmented array.
Key features of the `Allocator` type:
* It uses cache-aligned blocks, intended to host the actual data. These
blocks are cache-line-size multiples of contiguous bytes.
* The `Allocator` has a maximum memory budget, set at construction
time. This allows us to cap the amount of data each specific
`Allocator` instance is responsible for.
* Upon destruction, the `Allocator` will clean up the storage it's
used, handing it back to the internal allocator used in
sanitizer_common.
Key features of the `Array` type:
* Each segmented array is always backed by an `Allocator`, which is
either user-provided or uses a global allocator.
* When an `Array` grows, it grows by appending a segment that's
fixed-sized. The size of each segment is computed by the number of
elements of type `T` that can fit into cache line multiples.
* An `Array` does not return memory to the `Allocator`, but it can keep
track of the current number of "live" objects it stores.
* When an `Array` is destroyed, it will not return memory to the
`Allocator`. Users should clean up the `Allocator` independently of
the `Array`.
* The `Array` type keeps a freelist of the chunks it's used before, so
that trimming and growing will re-use previously allocated chunks.
These basic data structures are used by the XRay Profiling Mode
implementation to implement efficient and cache-aware storage for data
that's typically read-and-write heavy for tracking latency information.
We're relying on the cache line characteristics of the architecture to
provide us good data isolation and cache friendliness, when we're
performing operations like searching for elements and/or updating data
hosted in these cache lines.
Reviewers: echristo, pelikan, kpw
Subscribers: mgorny, llvm-commits
Differential Revision: https://reviews.llvm.org/D45756
llvm-svn: 331141
2018-04-29 21:46:30 +08:00
|
|
|
return *this;
|
|
|
|
|
|
|
|
// At this point, we know that Offset % N == 0, so we must advance the
|
2018-07-18 10:08:39 +08:00
|
|
|
// segment pointer.
|
|
|
|
DCHECK_EQ(Offset % ElementsPerSegment, 0);
|
2018-07-10 16:25:44 +08:00
|
|
|
DCHECK_NE(Offset, Size);
|
2018-07-18 10:08:39 +08:00
|
|
|
DCHECK_NE(S, &SentinelSegment);
|
|
|
|
DCHECK_NE(S->Next, &SentinelSegment);
|
|
|
|
S = S->Next;
|
|
|
|
DCHECK_NE(S, &SentinelSegment);
|
[XRay][profiler] Part 1: XRay Allocator and Array Implementations
Summary:
This change is part of the larger XRay Profiling Mode effort.
Here we implement an arena allocator, for fixed sized buffers used in a
segmented array implementation. This change adds the segmented array
data structure, which relies on the allocator to provide and maintain
the storage for the segmented array.
Key features of the `Allocator` type:
* It uses cache-aligned blocks, intended to host the actual data. These
blocks are cache-line-size multiples of contiguous bytes.
* The `Allocator` has a maximum memory budget, set at construction
time. This allows us to cap the amount of data each specific
`Allocator` instance is responsible for.
* Upon destruction, the `Allocator` will clean up the storage it's
used, handing it back to the internal allocator used in
sanitizer_common.
Key features of the `Array` type:
* Each segmented array is always backed by an `Allocator`, which is
either user-provided or uses a global allocator.
* When an `Array` grows, it grows by appending a segment that's
fixed-sized. The size of each segment is computed by the number of
elements of type `T` that can fit into cache line multiples.
* An `Array` does not return memory to the `Allocator`, but it can keep
track of the current number of "live" objects it stores.
* When an `Array` is destroyed, it will not return memory to the
`Allocator`. Users should clean up the `Allocator` independently of
the `Array`.
* The `Array` type keeps a freelist of the chunks it's used before, so
that trimming and growing will re-use previously allocated chunks.
These basic data structures are used by the XRay Profiling Mode
implementation to implement efficient and cache-aware storage for data
that's typically read-and-write heavy for tracking latency information.
We're relying on the cache line characteristics of the architecture to
provide us good data isolation and cache friendliness, when we're
performing operations like searching for elements and/or updating data
hosted in these cache lines.
Reviewers: echristo, pelikan, kpw
Subscribers: mgorny, llvm-commits
Differential Revision: https://reviews.llvm.org/D45756
llvm-svn: 331141
2018-04-29 21:46:30 +08:00
|
|
|
return *this;
|
|
|
|
}
|
|
|
|
|
2018-09-07 18:16:14 +08:00
|
|
|
Iterator &operator--() XRAY_NEVER_INSTRUMENT {
|
2018-07-18 10:08:39 +08:00
|
|
|
DCHECK_NE(S, &SentinelSegment);
|
[XRay][profiler] Part 1: XRay Allocator and Array Implementations
Summary:
This change is part of the larger XRay Profiling Mode effort.
Here we implement an arena allocator, for fixed sized buffers used in a
segmented array implementation. This change adds the segmented array
data structure, which relies on the allocator to provide and maintain
the storage for the segmented array.
Key features of the `Allocator` type:
* It uses cache-aligned blocks, intended to host the actual data. These
blocks are cache-line-size multiples of contiguous bytes.
* The `Allocator` has a maximum memory budget, set at construction
time. This allows us to cap the amount of data each specific
`Allocator` instance is responsible for.
* Upon destruction, the `Allocator` will clean up the storage it's
used, handing it back to the internal allocator used in
sanitizer_common.
Key features of the `Array` type:
* Each segmented array is always backed by an `Allocator`, which is
either user-provided or uses a global allocator.
* When an `Array` grows, it grows by appending a segment that's
fixed-sized. The size of each segment is computed by the number of
elements of type `T` that can fit into cache line multiples.
* An `Array` does not return memory to the `Allocator`, but it can keep
track of the current number of "live" objects it stores.
* When an `Array` is destroyed, it will not return memory to the
`Allocator`. Users should clean up the `Allocator` independently of
the `Array`.
* The `Array` type keeps a freelist of the chunks it's used before, so
that trimming and growing will re-use previously allocated chunks.
These basic data structures are used by the XRay Profiling Mode
implementation to implement efficient and cache-aware storage for data
that's typically read-and-write heavy for tracking latency information.
We're relying on the cache line characteristics of the architecture to
provide us good data isolation and cache friendliness, when we're
performing operations like searching for elements and/or updating data
hosted in these cache lines.
Reviewers: echristo, pelikan, kpw
Subscribers: mgorny, llvm-commits
Differential Revision: https://reviews.llvm.org/D45756
llvm-svn: 331141
2018-04-29 21:46:30 +08:00
|
|
|
DCHECK_GT(Offset, 0);
|
|
|
|
|
2018-07-10 16:25:44 +08:00
|
|
|
auto PreviousOffset = Offset--;
|
2018-07-18 10:08:39 +08:00
|
|
|
if (PreviousOffset != Size && PreviousOffset % ElementsPerSegment == 0) {
|
|
|
|
DCHECK_NE(S->Prev, &SentinelSegment);
|
|
|
|
S = S->Prev;
|
2018-07-10 16:25:44 +08:00
|
|
|
}
|
|
|
|
|
[XRay][profiler] Part 1: XRay Allocator and Array Implementations
Summary:
This change is part of the larger XRay Profiling Mode effort.
Here we implement an arena allocator, for fixed sized buffers used in a
segmented array implementation. This change adds the segmented array
data structure, which relies on the allocator to provide and maintain
the storage for the segmented array.
Key features of the `Allocator` type:
* It uses cache-aligned blocks, intended to host the actual data. These
blocks are cache-line-size multiples of contiguous bytes.
* The `Allocator` has a maximum memory budget, set at construction
time. This allows us to cap the amount of data each specific
`Allocator` instance is responsible for.
* Upon destruction, the `Allocator` will clean up the storage it's
used, handing it back to the internal allocator used in
sanitizer_common.
Key features of the `Array` type:
* Each segmented array is always backed by an `Allocator`, which is
either user-provided or uses a global allocator.
* When an `Array` grows, it grows by appending a segment that's
fixed-sized. The size of each segment is computed by the number of
elements of type `T` that can fit into cache line multiples.
* An `Array` does not return memory to the `Allocator`, but it can keep
track of the current number of "live" objects it stores.
* When an `Array` is destroyed, it will not return memory to the
`Allocator`. Users should clean up the `Allocator` independently of
the `Array`.
* The `Array` type keeps a freelist of the chunks it's used before, so
that trimming and growing will re-use previously allocated chunks.
These basic data structures are used by the XRay Profiling Mode
implementation to implement efficient and cache-aware storage for data
that's typically read-and-write heavy for tracking latency information.
We're relying on the cache line characteristics of the architecture to
provide us good data isolation and cache friendliness, when we're
performing operations like searching for elements and/or updating data
hosted in these cache lines.
Reviewers: echristo, pelikan, kpw
Subscribers: mgorny, llvm-commits
Differential Revision: https://reviews.llvm.org/D45756
llvm-svn: 331141
2018-04-29 21:46:30 +08:00
|
|
|
return *this;
|
|
|
|
}
|
|
|
|
|
2018-09-07 18:16:14 +08:00
|
|
|
Iterator operator++(int) XRAY_NEVER_INSTRUMENT {
|
[XRay][profiler] Part 1: XRay Allocator and Array Implementations
Summary:
This change is part of the larger XRay Profiling Mode effort.
Here we implement an arena allocator, for fixed sized buffers used in a
segmented array implementation. This change adds the segmented array
data structure, which relies on the allocator to provide and maintain
the storage for the segmented array.
Key features of the `Allocator` type:
* It uses cache-aligned blocks, intended to host the actual data. These
blocks are cache-line-size multiples of contiguous bytes.
* The `Allocator` has a maximum memory budget, set at construction
time. This allows us to cap the amount of data each specific
`Allocator` instance is responsible for.
* Upon destruction, the `Allocator` will clean up the storage it's
used, handing it back to the internal allocator used in
sanitizer_common.
Key features of the `Array` type:
* Each segmented array is always backed by an `Allocator`, which is
either user-provided or uses a global allocator.
* When an `Array` grows, it grows by appending a segment that's
fixed-sized. The size of each segment is computed by the number of
elements of type `T` that can fit into cache line multiples.
* An `Array` does not return memory to the `Allocator`, but it can keep
track of the current number of "live" objects it stores.
* When an `Array` is destroyed, it will not return memory to the
`Allocator`. Users should clean up the `Allocator` independently of
the `Array`.
* The `Array` type keeps a freelist of the chunks it's used before, so
that trimming and growing will re-use previously allocated chunks.
These basic data structures are used by the XRay Profiling Mode
implementation to implement efficient and cache-aware storage for data
that's typically read-and-write heavy for tracking latency information.
We're relying on the cache line characteristics of the architecture to
provide us good data isolation and cache friendliness, when we're
performing operations like searching for elements and/or updating data
hosted in these cache lines.
Reviewers: echristo, pelikan, kpw
Subscribers: mgorny, llvm-commits
Differential Revision: https://reviews.llvm.org/D45756
llvm-svn: 331141
2018-04-29 21:46:30 +08:00
|
|
|
Iterator Copy(*this);
|
|
|
|
++(*this);
|
|
|
|
return Copy;
|
|
|
|
}
|
|
|
|
|
2018-09-07 18:16:14 +08:00
|
|
|
Iterator operator--(int) XRAY_NEVER_INSTRUMENT {
|
[XRay][profiler] Part 1: XRay Allocator and Array Implementations
Summary:
This change is part of the larger XRay Profiling Mode effort.
Here we implement an arena allocator, for fixed sized buffers used in a
segmented array implementation. This change adds the segmented array
data structure, which relies on the allocator to provide and maintain
the storage for the segmented array.
Key features of the `Allocator` type:
* It uses cache-aligned blocks, intended to host the actual data. These
blocks are cache-line-size multiples of contiguous bytes.
* The `Allocator` has a maximum memory budget, set at construction
time. This allows us to cap the amount of data each specific
`Allocator` instance is responsible for.
* Upon destruction, the `Allocator` will clean up the storage it's
used, handing it back to the internal allocator used in
sanitizer_common.
Key features of the `Array` type:
* Each segmented array is always backed by an `Allocator`, which is
either user-provided or uses a global allocator.
* When an `Array` grows, it grows by appending a segment that's
fixed-sized. The size of each segment is computed by the number of
elements of type `T` that can fit into cache line multiples.
* An `Array` does not return memory to the `Allocator`, but it can keep
track of the current number of "live" objects it stores.
* When an `Array` is destroyed, it will not return memory to the
`Allocator`. Users should clean up the `Allocator` independently of
the `Array`.
* The `Array` type keeps a freelist of the chunks it's used before, so
that trimming and growing will re-use previously allocated chunks.
These basic data structures are used by the XRay Profiling Mode
implementation to implement efficient and cache-aware storage for data
that's typically read-and-write heavy for tracking latency information.
We're relying on the cache line characteristics of the architecture to
provide us good data isolation and cache friendliness, when we're
performing operations like searching for elements and/or updating data
hosted in these cache lines.
Reviewers: echristo, pelikan, kpw
Subscribers: mgorny, llvm-commits
Differential Revision: https://reviews.llvm.org/D45756
llvm-svn: 331141
2018-04-29 21:46:30 +08:00
|
|
|
Iterator Copy(*this);
|
|
|
|
--(*this);
|
|
|
|
return Copy;
|
|
|
|
}
|
|
|
|
|
|
|
|
template <class V, class W>
|
2018-09-07 18:16:14 +08:00
|
|
|
friend bool operator==(const Iterator<V> &L,
|
|
|
|
const Iterator<W> &R) XRAY_NEVER_INSTRUMENT {
|
2018-07-18 10:08:39 +08:00
|
|
|
return L.S == R.S && L.Offset == R.Offset;
|
[XRay][profiler] Part 1: XRay Allocator and Array Implementations
Summary:
This change is part of the larger XRay Profiling Mode effort.
Here we implement an arena allocator, for fixed sized buffers used in a
segmented array implementation. This change adds the segmented array
data structure, which relies on the allocator to provide and maintain
the storage for the segmented array.
Key features of the `Allocator` type:
* It uses cache-aligned blocks, intended to host the actual data. These
blocks are cache-line-size multiples of contiguous bytes.
* The `Allocator` has a maximum memory budget, set at construction
time. This allows us to cap the amount of data each specific
`Allocator` instance is responsible for.
* Upon destruction, the `Allocator` will clean up the storage it's
used, handing it back to the internal allocator used in
sanitizer_common.
Key features of the `Array` type:
* Each segmented array is always backed by an `Allocator`, which is
either user-provided or uses a global allocator.
* When an `Array` grows, it grows by appending a segment that's
fixed-sized. The size of each segment is computed by the number of
elements of type `T` that can fit into cache line multiples.
* An `Array` does not return memory to the `Allocator`, but it can keep
track of the current number of "live" objects it stores.
* When an `Array` is destroyed, it will not return memory to the
`Allocator`. Users should clean up the `Allocator` independently of
the `Array`.
* The `Array` type keeps a freelist of the chunks it's used before, so
that trimming and growing will re-use previously allocated chunks.
These basic data structures are used by the XRay Profiling Mode
implementation to implement efficient and cache-aware storage for data
that's typically read-and-write heavy for tracking latency information.
We're relying on the cache line characteristics of the architecture to
provide us good data isolation and cache friendliness, when we're
performing operations like searching for elements and/or updating data
hosted in these cache lines.
Reviewers: echristo, pelikan, kpw
Subscribers: mgorny, llvm-commits
Differential Revision: https://reviews.llvm.org/D45756
llvm-svn: 331141
2018-04-29 21:46:30 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
template <class V, class W>
|
2018-09-07 18:16:14 +08:00
|
|
|
friend bool operator!=(const Iterator<V> &L,
|
|
|
|
const Iterator<W> &R) XRAY_NEVER_INSTRUMENT {
|
[XRay][profiler] Part 1: XRay Allocator and Array Implementations
Summary:
This change is part of the larger XRay Profiling Mode effort.
Here we implement an arena allocator, for fixed sized buffers used in a
segmented array implementation. This change adds the segmented array
data structure, which relies on the allocator to provide and maintain
the storage for the segmented array.
Key features of the `Allocator` type:
* It uses cache-aligned blocks, intended to host the actual data. These
blocks are cache-line-size multiples of contiguous bytes.
* The `Allocator` has a maximum memory budget, set at construction
time. This allows us to cap the amount of data each specific
`Allocator` instance is responsible for.
* Upon destruction, the `Allocator` will clean up the storage it's
used, handing it back to the internal allocator used in
sanitizer_common.
Key features of the `Array` type:
* Each segmented array is always backed by an `Allocator`, which is
either user-provided or uses a global allocator.
* When an `Array` grows, it grows by appending a segment that's
fixed-sized. The size of each segment is computed by the number of
elements of type `T` that can fit into cache line multiples.
* An `Array` does not return memory to the `Allocator`, but it can keep
track of the current number of "live" objects it stores.
* When an `Array` is destroyed, it will not return memory to the
`Allocator`. Users should clean up the `Allocator` independently of
the `Array`.
* The `Array` type keeps a freelist of the chunks it's used before, so
that trimming and growing will re-use previously allocated chunks.
These basic data structures are used by the XRay Profiling Mode
implementation to implement efficient and cache-aware storage for data
that's typically read-and-write heavy for tracking latency information.
We're relying on the cache line characteristics of the architecture to
provide us good data isolation and cache friendliness, when we're
performing operations like searching for elements and/or updating data
hosted in these cache lines.
Reviewers: echristo, pelikan, kpw
Subscribers: mgorny, llvm-commits
Differential Revision: https://reviews.llvm.org/D45756
llvm-svn: 331141
2018-04-29 21:46:30 +08:00
|
|
|
return !(L == R);
|
|
|
|
}
|
|
|
|
|
2018-09-07 18:16:14 +08:00
|
|
|
U &operator*() const XRAY_NEVER_INSTRUMENT {
|
2018-07-18 10:08:39 +08:00
|
|
|
DCHECK_NE(S, &SentinelSegment);
|
|
|
|
auto RelOff = Offset % ElementsPerSegment;
|
|
|
|
|
|
|
|
// We need to compute the character-aligned pointer, offset from the
|
|
|
|
// segment's Data location to get the element in the position of Offset.
|
2018-12-07 11:19:13 +08:00
|
|
|
auto Base = &S->Data;
|
2018-07-18 10:08:39 +08:00
|
|
|
auto AlignedOffset = Base + (RelOff * AlignedElementStorageSize);
|
|
|
|
return *reinterpret_cast<U *>(AlignedOffset);
|
[XRay][profiler] Part 1: XRay Allocator and Array Implementations
Summary:
This change is part of the larger XRay Profiling Mode effort.
Here we implement an arena allocator, for fixed sized buffers used in a
segmented array implementation. This change adds the segmented array
data structure, which relies on the allocator to provide and maintain
the storage for the segmented array.
Key features of the `Allocator` type:
* It uses cache-aligned blocks, intended to host the actual data. These
blocks are cache-line-size multiples of contiguous bytes.
* The `Allocator` has a maximum memory budget, set at construction
time. This allows us to cap the amount of data each specific
`Allocator` instance is responsible for.
* Upon destruction, the `Allocator` will clean up the storage it's
used, handing it back to the internal allocator used in
sanitizer_common.
Key features of the `Array` type:
* Each segmented array is always backed by an `Allocator`, which is
either user-provided or uses a global allocator.
* When an `Array` grows, it grows by appending a segment that's
fixed-sized. The size of each segment is computed by the number of
elements of type `T` that can fit into cache line multiples.
* An `Array` does not return memory to the `Allocator`, but it can keep
track of the current number of "live" objects it stores.
* When an `Array` is destroyed, it will not return memory to the
`Allocator`. Users should clean up the `Allocator` independently of
the `Array`.
* The `Array` type keeps a freelist of the chunks it's used before, so
that trimming and growing will re-use previously allocated chunks.
These basic data structures are used by the XRay Profiling Mode
implementation to implement efficient and cache-aware storage for data
that's typically read-and-write heavy for tracking latency information.
We're relying on the cache line characteristics of the architecture to
provide us good data isolation and cache friendliness, when we're
performing operations like searching for elements and/or updating data
hosted in these cache lines.
Reviewers: echristo, pelikan, kpw
Subscribers: mgorny, llvm-commits
Differential Revision: https://reviews.llvm.org/D45756
llvm-svn: 331141
2018-04-29 21:46:30 +08:00
|
|
|
}
|
|
|
|
|
2018-09-07 18:16:14 +08:00
|
|
|
U *operator->() const XRAY_NEVER_INSTRUMENT { return &(**this); }
|
[XRay][profiler] Part 1: XRay Allocator and Array Implementations
Summary:
This change is part of the larger XRay Profiling Mode effort.
Here we implement an arena allocator, for fixed sized buffers used in a
segmented array implementation. This change adds the segmented array
data structure, which relies on the allocator to provide and maintain
the storage for the segmented array.
Key features of the `Allocator` type:
* It uses cache-aligned blocks, intended to host the actual data. These
blocks are cache-line-size multiples of contiguous bytes.
* The `Allocator` has a maximum memory budget, set at construction
time. This allows us to cap the amount of data each specific
`Allocator` instance is responsible for.
* Upon destruction, the `Allocator` will clean up the storage it's
used, handing it back to the internal allocator used in
sanitizer_common.
Key features of the `Array` type:
* Each segmented array is always backed by an `Allocator`, which is
either user-provided or uses a global allocator.
* When an `Array` grows, it grows by appending a segment that's
fixed-sized. The size of each segment is computed by the number of
elements of type `T` that can fit into cache line multiples.
* An `Array` does not return memory to the `Allocator`, but it can keep
track of the current number of "live" objects it stores.
* When an `Array` is destroyed, it will not return memory to the
`Allocator`. Users should clean up the `Allocator` independently of
the `Array`.
* The `Array` type keeps a freelist of the chunks it's used before, so
that trimming and growing will re-use previously allocated chunks.
These basic data structures are used by the XRay Profiling Mode
implementation to implement efficient and cache-aware storage for data
that's typically read-and-write heavy for tracking latency information.
We're relying on the cache line characteristics of the architecture to
provide us good data isolation and cache friendliness, when we're
performing operations like searching for elements and/or updating data
hosted in these cache lines.
Reviewers: echristo, pelikan, kpw
Subscribers: mgorny, llvm-commits
Differential Revision: https://reviews.llvm.org/D45756
llvm-svn: 331141
2018-04-29 21:46:30 +08:00
|
|
|
};
|
|
|
|
|
2018-12-07 11:19:13 +08:00
|
|
|
AllocatorType *Alloc;
|
|
|
|
Segment *Head;
|
|
|
|
Segment *Tail;
|
|
|
|
|
|
|
|
// Here we keep track of segments in the freelist, to allow us to re-use
|
|
|
|
// segments when elements are trimmed off the end.
|
|
|
|
Segment *Freelist;
|
|
|
|
uint64_t Size;
|
|
|
|
|
|
|
|
// ===============================
|
|
|
|
// In the following implementation, we work through the algorithms and the
|
|
|
|
// list operations using the following notation:
|
|
|
|
//
|
|
|
|
// - pred(s) is the predecessor (previous node accessor) and succ(s) is
|
|
|
|
// the successor (next node accessor).
|
|
|
|
//
|
|
|
|
// - S is a sentinel segment, which has the following property:
|
|
|
|
//
|
|
|
|
// pred(S) == succ(S) == S
|
|
|
|
//
|
|
|
|
// - @ is a loop operator, which can imply pred(s) == s if it appears on
|
|
|
|
// the left of s, or succ(s) == S if it appears on the right of s.
|
|
|
|
//
|
|
|
|
// - sL <-> sR : means a bidirectional relation between sL and sR, which
|
|
|
|
// means:
|
|
|
|
//
|
|
|
|
// succ(sL) == sR && pred(SR) == sL
|
|
|
|
//
|
|
|
|
// - sL -> sR : implies a unidirectional relation between sL and SR,
|
|
|
|
// with the following properties:
|
|
|
|
//
|
|
|
|
// succ(sL) == sR
|
|
|
|
//
|
|
|
|
// sL <- sR : implies a unidirectional relation between sR and sL,
|
|
|
|
// with the following properties:
|
|
|
|
//
|
|
|
|
// pred(sR) == sL
|
|
|
|
//
|
|
|
|
// ===============================
|
|
|
|
|
|
|
|
Segment *NewSegment() XRAY_NEVER_INSTRUMENT {
|
|
|
|
// We need to handle the case in which enough elements have been trimmed to
|
|
|
|
// allow us to re-use segments we've allocated before. For this we look into
|
|
|
|
// the Freelist, to see whether we need to actually allocate new blocks or
|
|
|
|
// just re-use blocks we've already seen before.
|
|
|
|
if (Freelist != &SentinelSegment) {
|
|
|
|
// The current state of lists resemble something like this at this point:
|
|
|
|
//
|
|
|
|
// Freelist: @S@<-f0->...<->fN->@S@
|
|
|
|
// ^ Freelist
|
|
|
|
//
|
|
|
|
// We want to perform a splice of `f0` from Freelist to a temporary list,
|
|
|
|
// which looks like:
|
|
|
|
//
|
|
|
|
// Templist: @S@<-f0->@S@
|
|
|
|
// ^ FreeSegment
|
|
|
|
//
|
|
|
|
// Our algorithm preconditions are:
|
|
|
|
DCHECK_EQ(Freelist->Prev, &SentinelSegment);
|
|
|
|
|
|
|
|
// Then the algorithm we implement is:
|
|
|
|
//
|
|
|
|
// SFS = Freelist
|
|
|
|
// Freelist = succ(Freelist)
|
|
|
|
// if (Freelist != S)
|
|
|
|
// pred(Freelist) = S
|
|
|
|
// succ(SFS) = S
|
|
|
|
// pred(SFS) = S
|
|
|
|
//
|
|
|
|
auto *FreeSegment = Freelist;
|
|
|
|
Freelist = Freelist->Next;
|
|
|
|
|
|
|
|
// Note that we need to handle the case where Freelist is now pointing to
|
|
|
|
// S, which we don't want to be overwriting.
|
|
|
|
// TODO: Determine whether the cost of the branch is higher than the cost
|
|
|
|
// of the blind assignment.
|
|
|
|
if (Freelist != &SentinelSegment)
|
|
|
|
Freelist->Prev = &SentinelSegment;
|
|
|
|
|
|
|
|
FreeSegment->Next = &SentinelSegment;
|
|
|
|
FreeSegment->Prev = &SentinelSegment;
|
|
|
|
|
|
|
|
// Our postconditions are:
|
|
|
|
DCHECK_EQ(Freelist->Prev, &SentinelSegment);
|
|
|
|
DCHECK_NE(FreeSegment, &SentinelSegment);
|
|
|
|
return FreeSegment;
|
|
|
|
}
|
|
|
|
|
|
|
|
auto SegmentBlock = Alloc->Allocate();
|
|
|
|
if (SegmentBlock.Data == nullptr)
|
|
|
|
return nullptr;
|
|
|
|
|
|
|
|
// Placement-new the Segment element at the beginning of the SegmentBlock.
|
|
|
|
new (SegmentBlock.Data) Segment{&SentinelSegment, &SentinelSegment, {0}};
|
|
|
|
auto SB = reinterpret_cast<Segment *>(SegmentBlock.Data);
|
|
|
|
return SB;
|
|
|
|
}
|
|
|
|
|
|
|
|
Segment *InitHeadAndTail() XRAY_NEVER_INSTRUMENT {
|
|
|
|
DCHECK_EQ(Head, &SentinelSegment);
|
|
|
|
DCHECK_EQ(Tail, &SentinelSegment);
|
|
|
|
auto S = NewSegment();
|
|
|
|
if (S == nullptr)
|
|
|
|
return nullptr;
|
|
|
|
DCHECK_EQ(S->Next, &SentinelSegment);
|
|
|
|
DCHECK_EQ(S->Prev, &SentinelSegment);
|
|
|
|
DCHECK_NE(S, &SentinelSegment);
|
|
|
|
Head = S;
|
|
|
|
Tail = S;
|
|
|
|
DCHECK_EQ(Head, Tail);
|
|
|
|
DCHECK_EQ(Tail->Next, &SentinelSegment);
|
|
|
|
DCHECK_EQ(Tail->Prev, &SentinelSegment);
|
|
|
|
return S;
|
|
|
|
}
|
|
|
|
|
|
|
|
Segment *AppendNewSegment() XRAY_NEVER_INSTRUMENT {
|
|
|
|
auto S = NewSegment();
|
|
|
|
if (S == nullptr)
|
|
|
|
return nullptr;
|
|
|
|
DCHECK_NE(Tail, &SentinelSegment);
|
|
|
|
DCHECK_EQ(Tail->Next, &SentinelSegment);
|
|
|
|
DCHECK_EQ(S->Prev, &SentinelSegment);
|
|
|
|
DCHECK_EQ(S->Next, &SentinelSegment);
|
|
|
|
S->Prev = Tail;
|
|
|
|
Tail->Next = S;
|
|
|
|
Tail = S;
|
|
|
|
DCHECK_EQ(S, S->Prev->Next);
|
|
|
|
DCHECK_EQ(Tail->Next, &SentinelSegment);
|
|
|
|
return S;
|
|
|
|
}
|
|
|
|
|
[XRay][profiler] Part 1: XRay Allocator and Array Implementations
Summary:
This change is part of the larger XRay Profiling Mode effort.
Here we implement an arena allocator, for fixed sized buffers used in a
segmented array implementation. This change adds the segmented array
data structure, which relies on the allocator to provide and maintain
the storage for the segmented array.
Key features of the `Allocator` type:
* It uses cache-aligned blocks, intended to host the actual data. These
blocks are cache-line-size multiples of contiguous bytes.
* The `Allocator` has a maximum memory budget, set at construction
time. This allows us to cap the amount of data each specific
`Allocator` instance is responsible for.
* Upon destruction, the `Allocator` will clean up the storage it's
used, handing it back to the internal allocator used in
sanitizer_common.
Key features of the `Array` type:
* Each segmented array is always backed by an `Allocator`, which is
either user-provided or uses a global allocator.
* When an `Array` grows, it grows by appending a segment that's
fixed-sized. The size of each segment is computed by the number of
elements of type `T` that can fit into cache line multiples.
* An `Array` does not return memory to the `Allocator`, but it can keep
track of the current number of "live" objects it stores.
* When an `Array` is destroyed, it will not return memory to the
`Allocator`. Users should clean up the `Allocator` independently of
the `Array`.
* The `Array` type keeps a freelist of the chunks it's used before, so
that trimming and growing will re-use previously allocated chunks.
These basic data structures are used by the XRay Profiling Mode
implementation to implement efficient and cache-aware storage for data
that's typically read-and-write heavy for tracking latency information.
We're relying on the cache line characteristics of the architecture to
provide us good data isolation and cache friendliness, when we're
performing operations like searching for elements and/or updating data
hosted in these cache lines.
Reviewers: echristo, pelikan, kpw
Subscribers: mgorny, llvm-commits
Differential Revision: https://reviews.llvm.org/D45756
llvm-svn: 331141
2018-04-29 21:46:30 +08:00
|
|
|
public:
|
2018-12-07 11:19:13 +08:00
|
|
|
explicit Array(AllocatorType &A) XRAY_NEVER_INSTRUMENT
|
|
|
|
: Alloc(&A),
|
|
|
|
Head(&SentinelSegment),
|
|
|
|
Tail(&SentinelSegment),
|
|
|
|
Freelist(&SentinelSegment),
|
|
|
|
Size(0) {}
|
|
|
|
|
|
|
|
Array() XRAY_NEVER_INSTRUMENT : Alloc(nullptr),
|
|
|
|
Head(&SentinelSegment),
|
|
|
|
Tail(&SentinelSegment),
|
|
|
|
Freelist(&SentinelSegment),
|
|
|
|
Size(0) {}
|
[XRay][profiler] Part 1: XRay Allocator and Array Implementations
Summary:
This change is part of the larger XRay Profiling Mode effort.
Here we implement an arena allocator, for fixed sized buffers used in a
segmented array implementation. This change adds the segmented array
data structure, which relies on the allocator to provide and maintain
the storage for the segmented array.
Key features of the `Allocator` type:
* It uses cache-aligned blocks, intended to host the actual data. These
blocks are cache-line-size multiples of contiguous bytes.
* The `Allocator` has a maximum memory budget, set at construction
time. This allows us to cap the amount of data each specific
`Allocator` instance is responsible for.
* Upon destruction, the `Allocator` will clean up the storage it's
used, handing it back to the internal allocator used in
sanitizer_common.
Key features of the `Array` type:
* Each segmented array is always backed by an `Allocator`, which is
either user-provided or uses a global allocator.
* When an `Array` grows, it grows by appending a segment that's
fixed-sized. The size of each segment is computed by the number of
elements of type `T` that can fit into cache line multiples.
* An `Array` does not return memory to the `Allocator`, but it can keep
track of the current number of "live" objects it stores.
* When an `Array` is destroyed, it will not return memory to the
`Allocator`. Users should clean up the `Allocator` independently of
the `Array`.
* The `Array` type keeps a freelist of the chunks it's used before, so
that trimming and growing will re-use previously allocated chunks.
These basic data structures are used by the XRay Profiling Mode
implementation to implement efficient and cache-aware storage for data
that's typically read-and-write heavy for tracking latency information.
We're relying on the cache line characteristics of the architecture to
provide us good data isolation and cache friendliness, when we're
performing operations like searching for elements and/or updating data
hosted in these cache lines.
Reviewers: echristo, pelikan, kpw
Subscribers: mgorny, llvm-commits
Differential Revision: https://reviews.llvm.org/D45756
llvm-svn: 331141
2018-04-29 21:46:30 +08:00
|
|
|
|
|
|
|
Array(const Array &) = delete;
|
2018-12-07 11:19:13 +08:00
|
|
|
Array &operator=(const Array &) = delete;
|
|
|
|
|
|
|
|
Array(Array &&O) XRAY_NEVER_INSTRUMENT : Alloc(O.Alloc),
|
|
|
|
Head(O.Head),
|
|
|
|
Tail(O.Tail),
|
|
|
|
Freelist(O.Freelist),
|
|
|
|
Size(O.Size) {
|
|
|
|
O.Alloc = nullptr;
|
|
|
|
O.Head = &SentinelSegment;
|
|
|
|
O.Tail = &SentinelSegment;
|
|
|
|
O.Size = 0;
|
|
|
|
O.Freelist = &SentinelSegment;
|
|
|
|
}
|
|
|
|
|
|
|
|
Array &operator=(Array &&O) XRAY_NEVER_INSTRUMENT {
|
|
|
|
Alloc = O.Alloc;
|
|
|
|
O.Alloc = nullptr;
|
|
|
|
Head = O.Head;
|
2018-07-18 10:08:39 +08:00
|
|
|
O.Head = &SentinelSegment;
|
2018-12-07 11:19:13 +08:00
|
|
|
Tail = O.Tail;
|
2018-07-18 10:08:39 +08:00
|
|
|
O.Tail = &SentinelSegment;
|
2018-12-07 11:19:13 +08:00
|
|
|
Freelist = O.Freelist;
|
|
|
|
O.Freelist = &SentinelSegment;
|
|
|
|
Size = O.Size;
|
[XRay][profiler] Part 1: XRay Allocator and Array Implementations
Summary:
This change is part of the larger XRay Profiling Mode effort.
Here we implement an arena allocator, for fixed sized buffers used in a
segmented array implementation. This change adds the segmented array
data structure, which relies on the allocator to provide and maintain
the storage for the segmented array.
Key features of the `Allocator` type:
* It uses cache-aligned blocks, intended to host the actual data. These
blocks are cache-line-size multiples of contiguous bytes.
* The `Allocator` has a maximum memory budget, set at construction
time. This allows us to cap the amount of data each specific
`Allocator` instance is responsible for.
* Upon destruction, the `Allocator` will clean up the storage it's
used, handing it back to the internal allocator used in
sanitizer_common.
Key features of the `Array` type:
* Each segmented array is always backed by an `Allocator`, which is
either user-provided or uses a global allocator.
* When an `Array` grows, it grows by appending a segment that's
fixed-sized. The size of each segment is computed by the number of
elements of type `T` that can fit into cache line multiples.
* An `Array` does not return memory to the `Allocator`, but it can keep
track of the current number of "live" objects it stores.
* When an `Array` is destroyed, it will not return memory to the
`Allocator`. Users should clean up the `Allocator` independently of
the `Array`.
* The `Array` type keeps a freelist of the chunks it's used before, so
that trimming and growing will re-use previously allocated chunks.
These basic data structures are used by the XRay Profiling Mode
implementation to implement efficient and cache-aware storage for data
that's typically read-and-write heavy for tracking latency information.
We're relying on the cache line characteristics of the architecture to
provide us good data isolation and cache friendliness, when we're
performing operations like searching for elements and/or updating data
hosted in these cache lines.
Reviewers: echristo, pelikan, kpw
Subscribers: mgorny, llvm-commits
Differential Revision: https://reviews.llvm.org/D45756
llvm-svn: 331141
2018-04-29 21:46:30 +08:00
|
|
|
O.Size = 0;
|
2018-12-07 11:19:13 +08:00
|
|
|
return *this;
|
|
|
|
}
|
|
|
|
|
|
|
|
~Array() XRAY_NEVER_INSTRUMENT {
|
|
|
|
for (auto &E : *this)
|
|
|
|
(&E)->~T();
|
[XRay][profiler] Part 1: XRay Allocator and Array Implementations
Summary:
This change is part of the larger XRay Profiling Mode effort.
Here we implement an arena allocator, for fixed sized buffers used in a
segmented array implementation. This change adds the segmented array
data structure, which relies on the allocator to provide and maintain
the storage for the segmented array.
Key features of the `Allocator` type:
* It uses cache-aligned blocks, intended to host the actual data. These
blocks are cache-line-size multiples of contiguous bytes.
* The `Allocator` has a maximum memory budget, set at construction
time. This allows us to cap the amount of data each specific
`Allocator` instance is responsible for.
* Upon destruction, the `Allocator` will clean up the storage it's
used, handing it back to the internal allocator used in
sanitizer_common.
Key features of the `Array` type:
* Each segmented array is always backed by an `Allocator`, which is
either user-provided or uses a global allocator.
* When an `Array` grows, it grows by appending a segment that's
fixed-sized. The size of each segment is computed by the number of
elements of type `T` that can fit into cache line multiples.
* An `Array` does not return memory to the `Allocator`, but it can keep
track of the current number of "live" objects it stores.
* When an `Array` is destroyed, it will not return memory to the
`Allocator`. Users should clean up the `Allocator` independently of
the `Array`.
* The `Array` type keeps a freelist of the chunks it's used before, so
that trimming and growing will re-use previously allocated chunks.
These basic data structures are used by the XRay Profiling Mode
implementation to implement efficient and cache-aware storage for data
that's typically read-and-write heavy for tracking latency information.
We're relying on the cache line characteristics of the architecture to
provide us good data isolation and cache friendliness, when we're
performing operations like searching for elements and/or updating data
hosted in these cache lines.
Reviewers: echristo, pelikan, kpw
Subscribers: mgorny, llvm-commits
Differential Revision: https://reviews.llvm.org/D45756
llvm-svn: 331141
2018-04-29 21:46:30 +08:00
|
|
|
}
|
|
|
|
|
2018-09-07 18:16:14 +08:00
|
|
|
bool empty() const XRAY_NEVER_INSTRUMENT { return Size == 0; }
|
[XRay][profiler] Part 1: XRay Allocator and Array Implementations
Summary:
This change is part of the larger XRay Profiling Mode effort.
Here we implement an arena allocator, for fixed sized buffers used in a
segmented array implementation. This change adds the segmented array
data structure, which relies on the allocator to provide and maintain
the storage for the segmented array.
Key features of the `Allocator` type:
* It uses cache-aligned blocks, intended to host the actual data. These
blocks are cache-line-size multiples of contiguous bytes.
* The `Allocator` has a maximum memory budget, set at construction
time. This allows us to cap the amount of data each specific
`Allocator` instance is responsible for.
* Upon destruction, the `Allocator` will clean up the storage it's
used, handing it back to the internal allocator used in
sanitizer_common.
Key features of the `Array` type:
* Each segmented array is always backed by an `Allocator`, which is
either user-provided or uses a global allocator.
* When an `Array` grows, it grows by appending a segment that's
fixed-sized. The size of each segment is computed by the number of
elements of type `T` that can fit into cache line multiples.
* An `Array` does not return memory to the `Allocator`, but it can keep
track of the current number of "live" objects it stores.
* When an `Array` is destroyed, it will not return memory to the
`Allocator`. Users should clean up the `Allocator` independently of
the `Array`.
* The `Array` type keeps a freelist of the chunks it's used before, so
that trimming and growing will re-use previously allocated chunks.
These basic data structures are used by the XRay Profiling Mode
implementation to implement efficient and cache-aware storage for data
that's typically read-and-write heavy for tracking latency information.
We're relying on the cache line characteristics of the architecture to
provide us good data isolation and cache friendliness, when we're
performing operations like searching for elements and/or updating data
hosted in these cache lines.
Reviewers: echristo, pelikan, kpw
Subscribers: mgorny, llvm-commits
Differential Revision: https://reviews.llvm.org/D45756
llvm-svn: 331141
2018-04-29 21:46:30 +08:00
|
|
|
|
2018-09-07 18:16:14 +08:00
|
|
|
AllocatorType &allocator() const XRAY_NEVER_INSTRUMENT {
|
2018-05-31 12:55:11 +08:00
|
|
|
DCHECK_NE(Alloc, nullptr);
|
|
|
|
return *Alloc;
|
[XRay][profiler] Part 1: XRay Allocator and Array Implementations
Summary:
This change is part of the larger XRay Profiling Mode effort.
Here we implement an arena allocator, for fixed sized buffers used in a
segmented array implementation. This change adds the segmented array
data structure, which relies on the allocator to provide and maintain
the storage for the segmented array.
Key features of the `Allocator` type:
* It uses cache-aligned blocks, intended to host the actual data. These
blocks are cache-line-size multiples of contiguous bytes.
* The `Allocator` has a maximum memory budget, set at construction
time. This allows us to cap the amount of data each specific
`Allocator` instance is responsible for.
* Upon destruction, the `Allocator` will clean up the storage it's
used, handing it back to the internal allocator used in
sanitizer_common.
Key features of the `Array` type:
* Each segmented array is always backed by an `Allocator`, which is
either user-provided or uses a global allocator.
* When an `Array` grows, it grows by appending a segment that's
fixed-sized. The size of each segment is computed by the number of
elements of type `T` that can fit into cache line multiples.
* An `Array` does not return memory to the `Allocator`, but it can keep
track of the current number of "live" objects it stores.
* When an `Array` is destroyed, it will not return memory to the
`Allocator`. Users should clean up the `Allocator` independently of
the `Array`.
* The `Array` type keeps a freelist of the chunks it's used before, so
that trimming and growing will re-use previously allocated chunks.
These basic data structures are used by the XRay Profiling Mode
implementation to implement efficient and cache-aware storage for data
that's typically read-and-write heavy for tracking latency information.
We're relying on the cache line characteristics of the architecture to
provide us good data isolation and cache friendliness, when we're
performing operations like searching for elements and/or updating data
hosted in these cache lines.
Reviewers: echristo, pelikan, kpw
Subscribers: mgorny, llvm-commits
Differential Revision: https://reviews.llvm.org/D45756
llvm-svn: 331141
2018-04-29 21:46:30 +08:00
|
|
|
}
|
|
|
|
|
2018-12-07 11:19:13 +08:00
|
|
|
uint64_t size() const XRAY_NEVER_INSTRUMENT { return Size; }
|
Revert r348335 "[XRay] Move-only Allocator, FunctionCallTrie, and Array"
.. and also the follow-ups r348336 r348338.
It broke stand-alone compiler-rt builds with GCC 4.8:
In file included from /work/llvm/projects/compiler-rt/lib/xray/xray_function_call_trie.h:20:0,
from /work/llvm/projects/compiler-rt/lib/xray/xray_profile_collector.h:21,
from /work/llvm/projects/compiler-rt/lib/xray/xray_profile_collector.cc:15:
/work/llvm/projects/compiler-rt/lib/xray/xray_segmented_array.h: In instantiation of ‘T* __xray::Array<T>::AppendEmplace(Args&& ...) [with Args = {const __xray::FunctionCallTrie::mergeInto(__xray::FunctionCallTrie&) const::NodeAndTarget&}; T = __xray::FunctionCallTrie::mergeInto(__xray::FunctionCallTrie&) const::NodeAndTarget]’:
/work/llvm/projects/compiler-rt/lib/xray/xray_segmented_array.h:383:71: required from ‘T* __xray::Array<T>::Append(const T&) [with T = __xray::FunctionCallTrie::mergeInto(__xray::FunctionCallTrie&) const::NodeAndTarget]’
/work/llvm/projects/compiler-rt/lib/xray/xray_function_call_trie.h:517:54: required from here
/work/llvm/projects/compiler-rt/lib/xray/xray_segmented_array.h:378:5: error: could not convert ‘{std::forward<const __xray::FunctionCallTrie::mergeInto(__xray::FunctionCallTrie&) const::NodeAndTarget&>((* & args#0))}’ from ‘<brace-enclosed initializer list>’ to ‘__xray::FunctionCallTrie::mergeInto(__xray::FunctionCallTrie&) const::NodeAndTarget’
new (AlignedOffset) T{std::forward<Args>(args)...};
^
/work/llvm/projects/compiler-rt/lib/xray/xray_segmented_array.h: In instantiation of ‘T* __xray::Array<T>::AppendEmplace(Args&& ...) [with Args = {const __xray::profileCollectorService::{anonymous}::ThreadTrie&}; T = __xray::profileCollectorService::{anonymous}::ThreadTrie]’:
/work/llvm/projects/compiler-rt/lib/xray/xray_segmented_array.h:383:71: required from ‘T* __xray::Array<T>::Append(const T&) [with T = __xray::profileCollectorService::{anonymous}::ThreadTrie]’
/work/llvm/projects/compiler-rt/lib/xray/xray_profile_collector.cc:98:34: required from here
/work/llvm/projects/compiler-rt/lib/xray/xray_segmented_array.h:378:5: error: could not convert ‘{std::forward<const __xray::profileCollectorService::{anonymous}::ThreadTrie&>((* & args#0))}’ from
‘<brace-enclosed initializer list>’ to ‘__xray::profileCollectorService::{anonymous}::ThreadTrie’
/work/llvm/projects/compiler-rt/lib/xray/xray_segmented_array.h: In instantiation of ‘T* __xray::Array<T>::AppendEmplace(Args&& ...) [with Args = {const __xray::profileCollectorService::{anonymous}::ProfileBuffer&}; T = __xray::profileCollectorService::{anonymous}::ProfileBuffer]’:
/work/llvm/projects/compiler-rt/lib/xray/xray_segmented_array.h:383:71: required from ‘T* __xray::Array<T>::Append(const T&) [with T = __xray::profileCollectorService::{anonymous}::ProfileBuffer]
’
/work/llvm/projects/compiler-rt/lib/xray/xray_profile_collector.cc:244:44: required from here
/work/llvm/projects/compiler-rt/lib/xray/xray_segmented_array.h:378:5: error: could not convert ‘{std::forward<const __xray::profileCollectorService::{anonymous}::ProfileBuffer&>((* & args#0))}’ from ‘<brace-enclosed initializer list>’ to ‘__xray::profileCollectorService::{anonymous}::ProfileBuffer’
> Summary:
> This change makes the allocator and function call trie implementations
> move-aware and remove the FunctionCallTrie's reliance on a
> heap-allocated set of allocators.
>
> The change makes it possible to always have storage associated with
> Allocator instances, not necessarily having heap-allocated memory
> obtainable from these allocator instances. We also use thread-local
> uninitialised storage.
>
> We've also re-worked the segmented array implementation to have more
> precondition and post-condition checks when built in debug mode. This
> enables us to better implement some of the operations with surrounding
> documentation as well. The `trim` algorithm now has more documentation
> on the implementation, reducing the requirement to handle special
> conditions, and being more rigorous on the computations involved.
>
> In this change we also introduce an initialisation guard, through which
> we prevent an initialisation operation from racing with a cleanup
> operation.
>
> We also ensure that the ThreadTries array is not destroyed while copies
> into the elements are still being performed by other threads submitting
> profiles.
>
> Note that this change still has an issue with accessing thread-local
> storage from signal handlers that are instrumented with XRay. We also
> learn that with the testing of this patch, that there will be cases
> where calls to mmap(...) (through internal_mmap(...)) might be called in
> signal handlers, but are not async-signal-safe. Subsequent patches will
> address this, by re-using the `BufferQueue` type used in the FDR mode
> implementation for pre-allocated memory segments per active, tracing
> thread.
>
> We still want to land this change despite the known issues, with fixes
> forthcoming.
>
> Reviewers: mboerger, jfb
>
> Subscribers: jfb, llvm-commits
>
> Differential Revision: https://reviews.llvm.org/D54989
llvm-svn: 348346
2018-12-05 18:19:55 +08:00
|
|
|
|
2018-12-07 11:19:13 +08:00
|
|
|
template <class... Args>
|
|
|
|
T *AppendEmplace(Args &&... args) XRAY_NEVER_INSTRUMENT {
|
|
|
|
DCHECK((Size == 0 && Head == &SentinelSegment && Head == Tail) ||
|
|
|
|
(Size != 0 && Head != &SentinelSegment && Tail != &SentinelSegment));
|
|
|
|
if (UNLIKELY(Head == &SentinelSegment)) {
|
|
|
|
auto R = InitHeadAndTail();
|
|
|
|
if (R == nullptr)
|
Revert r348335 "[XRay] Move-only Allocator, FunctionCallTrie, and Array"
.. and also the follow-ups r348336 r348338.
It broke stand-alone compiler-rt builds with GCC 4.8:
In file included from /work/llvm/projects/compiler-rt/lib/xray/xray_function_call_trie.h:20:0,
from /work/llvm/projects/compiler-rt/lib/xray/xray_profile_collector.h:21,
from /work/llvm/projects/compiler-rt/lib/xray/xray_profile_collector.cc:15:
/work/llvm/projects/compiler-rt/lib/xray/xray_segmented_array.h: In instantiation of ‘T* __xray::Array<T>::AppendEmplace(Args&& ...) [with Args = {const __xray::FunctionCallTrie::mergeInto(__xray::FunctionCallTrie&) const::NodeAndTarget&}; T = __xray::FunctionCallTrie::mergeInto(__xray::FunctionCallTrie&) const::NodeAndTarget]’:
/work/llvm/projects/compiler-rt/lib/xray/xray_segmented_array.h:383:71: required from ‘T* __xray::Array<T>::Append(const T&) [with T = __xray::FunctionCallTrie::mergeInto(__xray::FunctionCallTrie&) const::NodeAndTarget]’
/work/llvm/projects/compiler-rt/lib/xray/xray_function_call_trie.h:517:54: required from here
/work/llvm/projects/compiler-rt/lib/xray/xray_segmented_array.h:378:5: error: could not convert ‘{std::forward<const __xray::FunctionCallTrie::mergeInto(__xray::FunctionCallTrie&) const::NodeAndTarget&>((* & args#0))}’ from ‘<brace-enclosed initializer list>’ to ‘__xray::FunctionCallTrie::mergeInto(__xray::FunctionCallTrie&) const::NodeAndTarget’
new (AlignedOffset) T{std::forward<Args>(args)...};
^
/work/llvm/projects/compiler-rt/lib/xray/xray_segmented_array.h: In instantiation of ‘T* __xray::Array<T>::AppendEmplace(Args&& ...) [with Args = {const __xray::profileCollectorService::{anonymous}::ThreadTrie&}; T = __xray::profileCollectorService::{anonymous}::ThreadTrie]’:
/work/llvm/projects/compiler-rt/lib/xray/xray_segmented_array.h:383:71: required from ‘T* __xray::Array<T>::Append(const T&) [with T = __xray::profileCollectorService::{anonymous}::ThreadTrie]’
/work/llvm/projects/compiler-rt/lib/xray/xray_profile_collector.cc:98:34: required from here
/work/llvm/projects/compiler-rt/lib/xray/xray_segmented_array.h:378:5: error: could not convert ‘{std::forward<const __xray::profileCollectorService::{anonymous}::ThreadTrie&>((* & args#0))}’ from
‘<brace-enclosed initializer list>’ to ‘__xray::profileCollectorService::{anonymous}::ThreadTrie’
/work/llvm/projects/compiler-rt/lib/xray/xray_segmented_array.h: In instantiation of ‘T* __xray::Array<T>::AppendEmplace(Args&& ...) [with Args = {const __xray::profileCollectorService::{anonymous}::ProfileBuffer&}; T = __xray::profileCollectorService::{anonymous}::ProfileBuffer]’:
/work/llvm/projects/compiler-rt/lib/xray/xray_segmented_array.h:383:71: required from ‘T* __xray::Array<T>::Append(const T&) [with T = __xray::profileCollectorService::{anonymous}::ProfileBuffer]
’
/work/llvm/projects/compiler-rt/lib/xray/xray_profile_collector.cc:244:44: required from here
/work/llvm/projects/compiler-rt/lib/xray/xray_segmented_array.h:378:5: error: could not convert ‘{std::forward<const __xray::profileCollectorService::{anonymous}::ProfileBuffer&>((* & args#0))}’ from ‘<brace-enclosed initializer list>’ to ‘__xray::profileCollectorService::{anonymous}::ProfileBuffer’
> Summary:
> This change makes the allocator and function call trie implementations
> move-aware and remove the FunctionCallTrie's reliance on a
> heap-allocated set of allocators.
>
> The change makes it possible to always have storage associated with
> Allocator instances, not necessarily having heap-allocated memory
> obtainable from these allocator instances. We also use thread-local
> uninitialised storage.
>
> We've also re-worked the segmented array implementation to have more
> precondition and post-condition checks when built in debug mode. This
> enables us to better implement some of the operations with surrounding
> documentation as well. The `trim` algorithm now has more documentation
> on the implementation, reducing the requirement to handle special
> conditions, and being more rigorous on the computations involved.
>
> In this change we also introduce an initialisation guard, through which
> we prevent an initialisation operation from racing with a cleanup
> operation.
>
> We also ensure that the ThreadTries array is not destroyed while copies
> into the elements are still being performed by other threads submitting
> profiles.
>
> Note that this change still has an issue with accessing thread-local
> storage from signal handlers that are instrumented with XRay. We also
> learn that with the testing of this patch, that there will be cases
> where calls to mmap(...) (through internal_mmap(...)) might be called in
> signal handlers, but are not async-signal-safe. Subsequent patches will
> address this, by re-using the `BufferQueue` type used in the FDR mode
> implementation for pre-allocated memory segments per active, tracing
> thread.
>
> We still want to land this change despite the known issues, with fixes
> forthcoming.
>
> Reviewers: mboerger, jfb
>
> Subscribers: jfb, llvm-commits
>
> Differential Revision: https://reviews.llvm.org/D54989
llvm-svn: 348346
2018-12-05 18:19:55 +08:00
|
|
|
return nullptr;
|
2018-12-07 11:19:13 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
DCHECK_NE(Head, &SentinelSegment);
|
|
|
|
DCHECK_NE(Tail, &SentinelSegment);
|
Revert r348335 "[XRay] Move-only Allocator, FunctionCallTrie, and Array"
.. and also the follow-ups r348336 r348338.
It broke stand-alone compiler-rt builds with GCC 4.8:
In file included from /work/llvm/projects/compiler-rt/lib/xray/xray_function_call_trie.h:20:0,
from /work/llvm/projects/compiler-rt/lib/xray/xray_profile_collector.h:21,
from /work/llvm/projects/compiler-rt/lib/xray/xray_profile_collector.cc:15:
/work/llvm/projects/compiler-rt/lib/xray/xray_segmented_array.h: In instantiation of ‘T* __xray::Array<T>::AppendEmplace(Args&& ...) [with Args = {const __xray::FunctionCallTrie::mergeInto(__xray::FunctionCallTrie&) const::NodeAndTarget&}; T = __xray::FunctionCallTrie::mergeInto(__xray::FunctionCallTrie&) const::NodeAndTarget]’:
/work/llvm/projects/compiler-rt/lib/xray/xray_segmented_array.h:383:71: required from ‘T* __xray::Array<T>::Append(const T&) [with T = __xray::FunctionCallTrie::mergeInto(__xray::FunctionCallTrie&) const::NodeAndTarget]’
/work/llvm/projects/compiler-rt/lib/xray/xray_function_call_trie.h:517:54: required from here
/work/llvm/projects/compiler-rt/lib/xray/xray_segmented_array.h:378:5: error: could not convert ‘{std::forward<const __xray::FunctionCallTrie::mergeInto(__xray::FunctionCallTrie&) const::NodeAndTarget&>((* & args#0))}’ from ‘<brace-enclosed initializer list>’ to ‘__xray::FunctionCallTrie::mergeInto(__xray::FunctionCallTrie&) const::NodeAndTarget’
new (AlignedOffset) T{std::forward<Args>(args)...};
^
/work/llvm/projects/compiler-rt/lib/xray/xray_segmented_array.h: In instantiation of ‘T* __xray::Array<T>::AppendEmplace(Args&& ...) [with Args = {const __xray::profileCollectorService::{anonymous}::ThreadTrie&}; T = __xray::profileCollectorService::{anonymous}::ThreadTrie]’:
/work/llvm/projects/compiler-rt/lib/xray/xray_segmented_array.h:383:71: required from ‘T* __xray::Array<T>::Append(const T&) [with T = __xray::profileCollectorService::{anonymous}::ThreadTrie]’
/work/llvm/projects/compiler-rt/lib/xray/xray_profile_collector.cc:98:34: required from here
/work/llvm/projects/compiler-rt/lib/xray/xray_segmented_array.h:378:5: error: could not convert ‘{std::forward<const __xray::profileCollectorService::{anonymous}::ThreadTrie&>((* & args#0))}’ from
‘<brace-enclosed initializer list>’ to ‘__xray::profileCollectorService::{anonymous}::ThreadTrie’
/work/llvm/projects/compiler-rt/lib/xray/xray_segmented_array.h: In instantiation of ‘T* __xray::Array<T>::AppendEmplace(Args&& ...) [with Args = {const __xray::profileCollectorService::{anonymous}::ProfileBuffer&}; T = __xray::profileCollectorService::{anonymous}::ProfileBuffer]’:
/work/llvm/projects/compiler-rt/lib/xray/xray_segmented_array.h:383:71: required from ‘T* __xray::Array<T>::Append(const T&) [with T = __xray::profileCollectorService::{anonymous}::ProfileBuffer]
’
/work/llvm/projects/compiler-rt/lib/xray/xray_profile_collector.cc:244:44: required from here
/work/llvm/projects/compiler-rt/lib/xray/xray_segmented_array.h:378:5: error: could not convert ‘{std::forward<const __xray::profileCollectorService::{anonymous}::ProfileBuffer&>((* & args#0))}’ from ‘<brace-enclosed initializer list>’ to ‘__xray::profileCollectorService::{anonymous}::ProfileBuffer’
> Summary:
> This change makes the allocator and function call trie implementations
> move-aware and remove the FunctionCallTrie's reliance on a
> heap-allocated set of allocators.
>
> The change makes it possible to always have storage associated with
> Allocator instances, not necessarily having heap-allocated memory
> obtainable from these allocator instances. We also use thread-local
> uninitialised storage.
>
> We've also re-worked the segmented array implementation to have more
> precondition and post-condition checks when built in debug mode. This
> enables us to better implement some of the operations with surrounding
> documentation as well. The `trim` algorithm now has more documentation
> on the implementation, reducing the requirement to handle special
> conditions, and being more rigorous on the computations involved.
>
> In this change we also introduce an initialisation guard, through which
> we prevent an initialisation operation from racing with a cleanup
> operation.
>
> We also ensure that the ThreadTries array is not destroyed while copies
> into the elements are still being performed by other threads submitting
> profiles.
>
> Note that this change still has an issue with accessing thread-local
> storage from signal handlers that are instrumented with XRay. We also
> learn that with the testing of this patch, that there will be cases
> where calls to mmap(...) (through internal_mmap(...)) might be called in
> signal handlers, but are not async-signal-safe. Subsequent patches will
> address this, by re-using the `BufferQueue` type used in the FDR mode
> implementation for pre-allocated memory segments per active, tracing
> thread.
>
> We still want to land this change despite the known issues, with fixes
> forthcoming.
>
> Reviewers: mboerger, jfb
>
> Subscribers: jfb, llvm-commits
>
> Differential Revision: https://reviews.llvm.org/D54989
llvm-svn: 348346
2018-12-05 18:19:55 +08:00
|
|
|
|
|
|
|
auto Offset = Size % ElementsPerSegment;
|
2018-12-06 08:25:56 +08:00
|
|
|
if (UNLIKELY(Size != 0 && Offset == 0))
|
|
|
|
if (AppendNewSegment() == nullptr)
|
Revert r348335 "[XRay] Move-only Allocator, FunctionCallTrie, and Array"
.. and also the follow-ups r348336 r348338.
It broke stand-alone compiler-rt builds with GCC 4.8:
In file included from /work/llvm/projects/compiler-rt/lib/xray/xray_function_call_trie.h:20:0,
from /work/llvm/projects/compiler-rt/lib/xray/xray_profile_collector.h:21,
from /work/llvm/projects/compiler-rt/lib/xray/xray_profile_collector.cc:15:
/work/llvm/projects/compiler-rt/lib/xray/xray_segmented_array.h: In instantiation of ‘T* __xray::Array<T>::AppendEmplace(Args&& ...) [with Args = {const __xray::FunctionCallTrie::mergeInto(__xray::FunctionCallTrie&) const::NodeAndTarget&}; T = __xray::FunctionCallTrie::mergeInto(__xray::FunctionCallTrie&) const::NodeAndTarget]’:
/work/llvm/projects/compiler-rt/lib/xray/xray_segmented_array.h:383:71: required from ‘T* __xray::Array<T>::Append(const T&) [with T = __xray::FunctionCallTrie::mergeInto(__xray::FunctionCallTrie&) const::NodeAndTarget]’
/work/llvm/projects/compiler-rt/lib/xray/xray_function_call_trie.h:517:54: required from here
/work/llvm/projects/compiler-rt/lib/xray/xray_segmented_array.h:378:5: error: could not convert ‘{std::forward<const __xray::FunctionCallTrie::mergeInto(__xray::FunctionCallTrie&) const::NodeAndTarget&>((* & args#0))}’ from ‘<brace-enclosed initializer list>’ to ‘__xray::FunctionCallTrie::mergeInto(__xray::FunctionCallTrie&) const::NodeAndTarget’
new (AlignedOffset) T{std::forward<Args>(args)...};
^
/work/llvm/projects/compiler-rt/lib/xray/xray_segmented_array.h: In instantiation of ‘T* __xray::Array<T>::AppendEmplace(Args&& ...) [with Args = {const __xray::profileCollectorService::{anonymous}::ThreadTrie&}; T = __xray::profileCollectorService::{anonymous}::ThreadTrie]’:
/work/llvm/projects/compiler-rt/lib/xray/xray_segmented_array.h:383:71: required from ‘T* __xray::Array<T>::Append(const T&) [with T = __xray::profileCollectorService::{anonymous}::ThreadTrie]’
/work/llvm/projects/compiler-rt/lib/xray/xray_profile_collector.cc:98:34: required from here
/work/llvm/projects/compiler-rt/lib/xray/xray_segmented_array.h:378:5: error: could not convert ‘{std::forward<const __xray::profileCollectorService::{anonymous}::ThreadTrie&>((* & args#0))}’ from
‘<brace-enclosed initializer list>’ to ‘__xray::profileCollectorService::{anonymous}::ThreadTrie’
/work/llvm/projects/compiler-rt/lib/xray/xray_segmented_array.h: In instantiation of ‘T* __xray::Array<T>::AppendEmplace(Args&& ...) [with Args = {const __xray::profileCollectorService::{anonymous}::ProfileBuffer&}; T = __xray::profileCollectorService::{anonymous}::ProfileBuffer]’:
/work/llvm/projects/compiler-rt/lib/xray/xray_segmented_array.h:383:71: required from ‘T* __xray::Array<T>::Append(const T&) [with T = __xray::profileCollectorService::{anonymous}::ProfileBuffer]
’
/work/llvm/projects/compiler-rt/lib/xray/xray_profile_collector.cc:244:44: required from here
/work/llvm/projects/compiler-rt/lib/xray/xray_segmented_array.h:378:5: error: could not convert ‘{std::forward<const __xray::profileCollectorService::{anonymous}::ProfileBuffer&>((* & args#0))}’ from ‘<brace-enclosed initializer list>’ to ‘__xray::profileCollectorService::{anonymous}::ProfileBuffer’
> Summary:
> This change makes the allocator and function call trie implementations
> move-aware and remove the FunctionCallTrie's reliance on a
> heap-allocated set of allocators.
>
> The change makes it possible to always have storage associated with
> Allocator instances, not necessarily having heap-allocated memory
> obtainable from these allocator instances. We also use thread-local
> uninitialised storage.
>
> We've also re-worked the segmented array implementation to have more
> precondition and post-condition checks when built in debug mode. This
> enables us to better implement some of the operations with surrounding
> documentation as well. The `trim` algorithm now has more documentation
> on the implementation, reducing the requirement to handle special
> conditions, and being more rigorous on the computations involved.
>
> In this change we also introduce an initialisation guard, through which
> we prevent an initialisation operation from racing with a cleanup
> operation.
>
> We also ensure that the ThreadTries array is not destroyed while copies
> into the elements are still being performed by other threads submitting
> profiles.
>
> Note that this change still has an issue with accessing thread-local
> storage from signal handlers that are instrumented with XRay. We also
> learn that with the testing of this patch, that there will be cases
> where calls to mmap(...) (through internal_mmap(...)) might be called in
> signal handlers, but are not async-signal-safe. Subsequent patches will
> address this, by re-using the `BufferQueue` type used in the FDR mode
> implementation for pre-allocated memory segments per active, tracing
> thread.
>
> We still want to land this change despite the known issues, with fixes
> forthcoming.
>
> Reviewers: mboerger, jfb
>
> Subscribers: jfb, llvm-commits
>
> Differential Revision: https://reviews.llvm.org/D54989
llvm-svn: 348346
2018-12-05 18:19:55 +08:00
|
|
|
return nullptr;
|
|
|
|
|
2018-12-07 11:19:13 +08:00
|
|
|
DCHECK_NE(Tail, &SentinelSegment);
|
|
|
|
auto Base = &Tail->Data;
|
2018-12-06 11:28:57 +08:00
|
|
|
auto AlignedOffset = Base + (Offset * AlignedElementStorageSize);
|
2018-12-07 11:19:13 +08:00
|
|
|
DCHECK_LE(AlignedOffset + sizeof(T),
|
[XRay] Use preallocated memory for XRay profiling
Summary:
This change builds upon D54989, which removes memory allocation from the
critical path of the profiling implementation. This also changes the API
for the profile collection service, to take ownership of the memory and
associated data structures per-thread.
The consolidation of the memory allocation allows us to do two things:
- Limits the amount of memory used by the profiling implementation,
associating preallocated buffers instead of allocating memory
on-demand.
- Consolidate the memory initialisation and cleanup by relying on the
buffer queue's reference counting implementation.
We find a number of places which also display some problematic
behaviour, including:
- Off-by-factor bug in the allocator implementation.
- Unrolling semantics in cases of "memory exhausted" situations, when
managing the state of the function call trie.
We also add a few test cases which verify our understanding of the
behaviour of the system, with important edge-cases (especially for
memory-exhausted cases) in the segmented array and profile collector
unit tests.
Depends on D54989.
Reviewers: mboerger
Subscribers: dschuff, mgorny, dmgreen, jfb, llvm-commits
Differential Revision: https://reviews.llvm.org/D55249
llvm-svn: 348568
2018-12-07 14:23:06 +08:00
|
|
|
reinterpret_cast<unsigned char *>(Base) + SegmentSize);
|
2018-12-07 11:19:13 +08:00
|
|
|
|
|
|
|
// In-place construct at Position.
|
|
|
|
new (AlignedOffset) T{std::forward<Args>(args)...};
|
2018-12-06 11:28:57 +08:00
|
|
|
++Size;
|
2018-12-07 11:19:13 +08:00
|
|
|
return reinterpret_cast<T *>(AlignedOffset);
|
2018-12-06 11:28:57 +08:00
|
|
|
}
|
|
|
|
|
2018-12-07 11:19:13 +08:00
|
|
|
T *Append(const T &E) XRAY_NEVER_INSTRUMENT {
|
|
|
|
// FIXME: This is a duplication of AppenEmplace with the copy semantics
|
|
|
|
// explicitly used, as a work-around to GCC 4.8 not invoking the copy
|
|
|
|
// constructor with the placement new with braced-init syntax.
|
|
|
|
DCHECK((Size == 0 && Head == &SentinelSegment && Head == Tail) ||
|
|
|
|
(Size != 0 && Head != &SentinelSegment && Tail != &SentinelSegment));
|
|
|
|
if (UNLIKELY(Head == &SentinelSegment)) {
|
|
|
|
auto R = InitHeadAndTail();
|
|
|
|
if (R == nullptr)
|
2018-12-06 11:28:57 +08:00
|
|
|
return nullptr;
|
2018-12-07 11:19:13 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
DCHECK_NE(Head, &SentinelSegment);
|
|
|
|
DCHECK_NE(Tail, &SentinelSegment);
|
2018-12-06 11:28:57 +08:00
|
|
|
|
|
|
|
auto Offset = Size % ElementsPerSegment;
|
2018-12-07 11:19:13 +08:00
|
|
|
if (UNLIKELY(Size != 0 && Offset == 0))
|
|
|
|
if (AppendNewSegment() == nullptr)
|
2018-12-06 11:28:57 +08:00
|
|
|
return nullptr;
|
|
|
|
|
2018-07-18 10:08:39 +08:00
|
|
|
DCHECK_NE(Tail, &SentinelSegment);
|
2018-12-07 11:19:13 +08:00
|
|
|
auto Base = &Tail->Data;
|
2018-07-18 10:08:39 +08:00
|
|
|
auto AlignedOffset = Base + (Offset * AlignedElementStorageSize);
|
2018-12-07 11:19:13 +08:00
|
|
|
DCHECK_LE(AlignedOffset + sizeof(T),
|
|
|
|
reinterpret_cast<unsigned char *>(Tail) + SegmentSize);
|
2018-07-18 10:08:39 +08:00
|
|
|
|
[XRay][profiler] Part 1: XRay Allocator and Array Implementations
Summary:
This change is part of the larger XRay Profiling Mode effort.
Here we implement an arena allocator, for fixed sized buffers used in a
segmented array implementation. This change adds the segmented array
data structure, which relies on the allocator to provide and maintain
the storage for the segmented array.
Key features of the `Allocator` type:
* It uses cache-aligned blocks, intended to host the actual data. These
blocks are cache-line-size multiples of contiguous bytes.
* The `Allocator` has a maximum memory budget, set at construction
time. This allows us to cap the amount of data each specific
`Allocator` instance is responsible for.
* Upon destruction, the `Allocator` will clean up the storage it's
used, handing it back to the internal allocator used in
sanitizer_common.
Key features of the `Array` type:
* Each segmented array is always backed by an `Allocator`, which is
either user-provided or uses a global allocator.
* When an `Array` grows, it grows by appending a segment that's
fixed-sized. The size of each segment is computed by the number of
elements of type `T` that can fit into cache line multiples.
* An `Array` does not return memory to the `Allocator`, but it can keep
track of the current number of "live" objects it stores.
* When an `Array` is destroyed, it will not return memory to the
`Allocator`. Users should clean up the `Allocator` independently of
the `Array`.
* The `Array` type keeps a freelist of the chunks it's used before, so
that trimming and growing will re-use previously allocated chunks.
These basic data structures are used by the XRay Profiling Mode
implementation to implement efficient and cache-aware storage for data
that's typically read-and-write heavy for tracking latency information.
We're relying on the cache line characteristics of the architecture to
provide us good data isolation and cache friendliness, when we're
performing operations like searching for elements and/or updating data
hosted in these cache lines.
Reviewers: echristo, pelikan, kpw
Subscribers: mgorny, llvm-commits
Differential Revision: https://reviews.llvm.org/D45756
llvm-svn: 331141
2018-04-29 21:46:30 +08:00
|
|
|
// In-place construct at Position.
|
2018-12-07 11:19:13 +08:00
|
|
|
new (AlignedOffset) T(E);
|
[XRay][profiler] Part 1: XRay Allocator and Array Implementations
Summary:
This change is part of the larger XRay Profiling Mode effort.
Here we implement an arena allocator, for fixed sized buffers used in a
segmented array implementation. This change adds the segmented array
data structure, which relies on the allocator to provide and maintain
the storage for the segmented array.
Key features of the `Allocator` type:
* It uses cache-aligned blocks, intended to host the actual data. These
blocks are cache-line-size multiples of contiguous bytes.
* The `Allocator` has a maximum memory budget, set at construction
time. This allows us to cap the amount of data each specific
`Allocator` instance is responsible for.
* Upon destruction, the `Allocator` will clean up the storage it's
used, handing it back to the internal allocator used in
sanitizer_common.
Key features of the `Array` type:
* Each segmented array is always backed by an `Allocator`, which is
either user-provided or uses a global allocator.
* When an `Array` grows, it grows by appending a segment that's
fixed-sized. The size of each segment is computed by the number of
elements of type `T` that can fit into cache line multiples.
* An `Array` does not return memory to the `Allocator`, but it can keep
track of the current number of "live" objects it stores.
* When an `Array` is destroyed, it will not return memory to the
`Allocator`. Users should clean up the `Allocator` independently of
the `Array`.
* The `Array` type keeps a freelist of the chunks it's used before, so
that trimming and growing will re-use previously allocated chunks.
These basic data structures are used by the XRay Profiling Mode
implementation to implement efficient and cache-aware storage for data
that's typically read-and-write heavy for tracking latency information.
We're relying on the cache line characteristics of the architecture to
provide us good data isolation and cache friendliness, when we're
performing operations like searching for elements and/or updating data
hosted in these cache lines.
Reviewers: echristo, pelikan, kpw
Subscribers: mgorny, llvm-commits
Differential Revision: https://reviews.llvm.org/D45756
llvm-svn: 331141
2018-04-29 21:46:30 +08:00
|
|
|
++Size;
|
2018-12-07 11:19:13 +08:00
|
|
|
return reinterpret_cast<T *>(AlignedOffset);
|
[XRay][profiler] Part 1: XRay Allocator and Array Implementations
Summary:
This change is part of the larger XRay Profiling Mode effort.
Here we implement an arena allocator, for fixed sized buffers used in a
segmented array implementation. This change adds the segmented array
data structure, which relies on the allocator to provide and maintain
the storage for the segmented array.
Key features of the `Allocator` type:
* It uses cache-aligned blocks, intended to host the actual data. These
blocks are cache-line-size multiples of contiguous bytes.
* The `Allocator` has a maximum memory budget, set at construction
time. This allows us to cap the amount of data each specific
`Allocator` instance is responsible for.
* Upon destruction, the `Allocator` will clean up the storage it's
used, handing it back to the internal allocator used in
sanitizer_common.
Key features of the `Array` type:
* Each segmented array is always backed by an `Allocator`, which is
either user-provided or uses a global allocator.
* When an `Array` grows, it grows by appending a segment that's
fixed-sized. The size of each segment is computed by the number of
elements of type `T` that can fit into cache line multiples.
* An `Array` does not return memory to the `Allocator`, but it can keep
track of the current number of "live" objects it stores.
* When an `Array` is destroyed, it will not return memory to the
`Allocator`. Users should clean up the `Allocator` independently of
the `Array`.
* The `Array` type keeps a freelist of the chunks it's used before, so
that trimming and growing will re-use previously allocated chunks.
These basic data structures are used by the XRay Profiling Mode
implementation to implement efficient and cache-aware storage for data
that's typically read-and-write heavy for tracking latency information.
We're relying on the cache line characteristics of the architecture to
provide us good data isolation and cache friendliness, when we're
performing operations like searching for elements and/or updating data
hosted in these cache lines.
Reviewers: echristo, pelikan, kpw
Subscribers: mgorny, llvm-commits
Differential Revision: https://reviews.llvm.org/D45756
llvm-svn: 331141
2018-04-29 21:46:30 +08:00
|
|
|
}
|
|
|
|
|
2018-12-07 11:19:13 +08:00
|
|
|
T &operator[](uint64_t Offset) const XRAY_NEVER_INSTRUMENT {
|
[XRay][profiler] Part 1: XRay Allocator and Array Implementations
Summary:
This change is part of the larger XRay Profiling Mode effort.
Here we implement an arena allocator, for fixed sized buffers used in a
segmented array implementation. This change adds the segmented array
data structure, which relies on the allocator to provide and maintain
the storage for the segmented array.
Key features of the `Allocator` type:
* It uses cache-aligned blocks, intended to host the actual data. These
blocks are cache-line-size multiples of contiguous bytes.
* The `Allocator` has a maximum memory budget, set at construction
time. This allows us to cap the amount of data each specific
`Allocator` instance is responsible for.
* Upon destruction, the `Allocator` will clean up the storage it's
used, handing it back to the internal allocator used in
sanitizer_common.
Key features of the `Array` type:
* Each segmented array is always backed by an `Allocator`, which is
either user-provided or uses a global allocator.
* When an `Array` grows, it grows by appending a segment that's
fixed-sized. The size of each segment is computed by the number of
elements of type `T` that can fit into cache line multiples.
* An `Array` does not return memory to the `Allocator`, but it can keep
track of the current number of "live" objects it stores.
* When an `Array` is destroyed, it will not return memory to the
`Allocator`. Users should clean up the `Allocator` independently of
the `Array`.
* The `Array` type keeps a freelist of the chunks it's used before, so
that trimming and growing will re-use previously allocated chunks.
These basic data structures are used by the XRay Profiling Mode
implementation to implement efficient and cache-aware storage for data
that's typically read-and-write heavy for tracking latency information.
We're relying on the cache line characteristics of the architecture to
provide us good data isolation and cache friendliness, when we're
performing operations like searching for elements and/or updating data
hosted in these cache lines.
Reviewers: echristo, pelikan, kpw
Subscribers: mgorny, llvm-commits
Differential Revision: https://reviews.llvm.org/D45756
llvm-svn: 331141
2018-04-29 21:46:30 +08:00
|
|
|
DCHECK_LE(Offset, Size);
|
|
|
|
// We need to traverse the array enough times to find the element at Offset.
|
2018-07-18 10:08:39 +08:00
|
|
|
auto S = Head;
|
|
|
|
while (Offset >= ElementsPerSegment) {
|
|
|
|
S = S->Next;
|
|
|
|
Offset -= ElementsPerSegment;
|
|
|
|
DCHECK_NE(S, &SentinelSegment);
|
[XRay][profiler] Part 1: XRay Allocator and Array Implementations
Summary:
This change is part of the larger XRay Profiling Mode effort.
Here we implement an arena allocator, for fixed sized buffers used in a
segmented array implementation. This change adds the segmented array
data structure, which relies on the allocator to provide and maintain
the storage for the segmented array.
Key features of the `Allocator` type:
* It uses cache-aligned blocks, intended to host the actual data. These
blocks are cache-line-size multiples of contiguous bytes.
* The `Allocator` has a maximum memory budget, set at construction
time. This allows us to cap the amount of data each specific
`Allocator` instance is responsible for.
* Upon destruction, the `Allocator` will clean up the storage it's
used, handing it back to the internal allocator used in
sanitizer_common.
Key features of the `Array` type:
* Each segmented array is always backed by an `Allocator`, which is
either user-provided or uses a global allocator.
* When an `Array` grows, it grows by appending a segment that's
fixed-sized. The size of each segment is computed by the number of
elements of type `T` that can fit into cache line multiples.
* An `Array` does not return memory to the `Allocator`, but it can keep
track of the current number of "live" objects it stores.
* When an `Array` is destroyed, it will not return memory to the
`Allocator`. Users should clean up the `Allocator` independently of
the `Array`.
* The `Array` type keeps a freelist of the chunks it's used before, so
that trimming and growing will re-use previously allocated chunks.
These basic data structures are used by the XRay Profiling Mode
implementation to implement efficient and cache-aware storage for data
that's typically read-and-write heavy for tracking latency information.
We're relying on the cache line characteristics of the architecture to
provide us good data isolation and cache friendliness, when we're
performing operations like searching for elements and/or updating data
hosted in these cache lines.
Reviewers: echristo, pelikan, kpw
Subscribers: mgorny, llvm-commits
Differential Revision: https://reviews.llvm.org/D45756
llvm-svn: 331141
2018-04-29 21:46:30 +08:00
|
|
|
}
|
2018-12-07 11:19:13 +08:00
|
|
|
auto Base = &S->Data;
|
2018-07-18 10:08:39 +08:00
|
|
|
auto AlignedOffset = Base + (Offset * AlignedElementStorageSize);
|
|
|
|
auto Position = reinterpret_cast<T *>(AlignedOffset);
|
|
|
|
return *reinterpret_cast<T *>(Position);
|
[XRay][profiler] Part 1: XRay Allocator and Array Implementations
Summary:
This change is part of the larger XRay Profiling Mode effort.
Here we implement an arena allocator, for fixed sized buffers used in a
segmented array implementation. This change adds the segmented array
data structure, which relies on the allocator to provide and maintain
the storage for the segmented array.
Key features of the `Allocator` type:
* It uses cache-aligned blocks, intended to host the actual data. These
blocks are cache-line-size multiples of contiguous bytes.
* The `Allocator` has a maximum memory budget, set at construction
time. This allows us to cap the amount of data each specific
`Allocator` instance is responsible for.
* Upon destruction, the `Allocator` will clean up the storage it's
used, handing it back to the internal allocator used in
sanitizer_common.
Key features of the `Array` type:
* Each segmented array is always backed by an `Allocator`, which is
either user-provided or uses a global allocator.
* When an `Array` grows, it grows by appending a segment that's
fixed-sized. The size of each segment is computed by the number of
elements of type `T` that can fit into cache line multiples.
* An `Array` does not return memory to the `Allocator`, but it can keep
track of the current number of "live" objects it stores.
* When an `Array` is destroyed, it will not return memory to the
`Allocator`. Users should clean up the `Allocator` independently of
the `Array`.
* The `Array` type keeps a freelist of the chunks it's used before, so
that trimming and growing will re-use previously allocated chunks.
These basic data structures are used by the XRay Profiling Mode
implementation to implement efficient and cache-aware storage for data
that's typically read-and-write heavy for tracking latency information.
We're relying on the cache line characteristics of the architecture to
provide us good data isolation and cache friendliness, when we're
performing operations like searching for elements and/or updating data
hosted in these cache lines.
Reviewers: echristo, pelikan, kpw
Subscribers: mgorny, llvm-commits
Differential Revision: https://reviews.llvm.org/D45756
llvm-svn: 331141
2018-04-29 21:46:30 +08:00
|
|
|
}
|
|
|
|
|
2018-09-07 18:16:14 +08:00
|
|
|
T &front() const XRAY_NEVER_INSTRUMENT {
|
2018-07-18 10:08:39 +08:00
|
|
|
DCHECK_NE(Head, &SentinelSegment);
|
[XRay][profiler] Part 1: XRay Allocator and Array Implementations
Summary:
This change is part of the larger XRay Profiling Mode effort.
Here we implement an arena allocator, for fixed sized buffers used in a
segmented array implementation. This change adds the segmented array
data structure, which relies on the allocator to provide and maintain
the storage for the segmented array.
Key features of the `Allocator` type:
* It uses cache-aligned blocks, intended to host the actual data. These
blocks are cache-line-size multiples of contiguous bytes.
* The `Allocator` has a maximum memory budget, set at construction
time. This allows us to cap the amount of data each specific
`Allocator` instance is responsible for.
* Upon destruction, the `Allocator` will clean up the storage it's
used, handing it back to the internal allocator used in
sanitizer_common.
Key features of the `Array` type:
* Each segmented array is always backed by an `Allocator`, which is
either user-provided or uses a global allocator.
* When an `Array` grows, it grows by appending a segment that's
fixed-sized. The size of each segment is computed by the number of
elements of type `T` that can fit into cache line multiples.
* An `Array` does not return memory to the `Allocator`, but it can keep
track of the current number of "live" objects it stores.
* When an `Array` is destroyed, it will not return memory to the
`Allocator`. Users should clean up the `Allocator` independently of
the `Array`.
* The `Array` type keeps a freelist of the chunks it's used before, so
that trimming and growing will re-use previously allocated chunks.
These basic data structures are used by the XRay Profiling Mode
implementation to implement efficient and cache-aware storage for data
that's typically read-and-write heavy for tracking latency information.
We're relying on the cache line characteristics of the architecture to
provide us good data isolation and cache friendliness, when we're
performing operations like searching for elements and/or updating data
hosted in these cache lines.
Reviewers: echristo, pelikan, kpw
Subscribers: mgorny, llvm-commits
Differential Revision: https://reviews.llvm.org/D45756
llvm-svn: 331141
2018-04-29 21:46:30 +08:00
|
|
|
DCHECK_NE(Size, 0u);
|
2018-07-10 16:25:44 +08:00
|
|
|
return *begin();
|
[XRay][profiler] Part 1: XRay Allocator and Array Implementations
Summary:
This change is part of the larger XRay Profiling Mode effort.
Here we implement an arena allocator, for fixed sized buffers used in a
segmented array implementation. This change adds the segmented array
data structure, which relies on the allocator to provide and maintain
the storage for the segmented array.
Key features of the `Allocator` type:
* It uses cache-aligned blocks, intended to host the actual data. These
blocks are cache-line-size multiples of contiguous bytes.
* The `Allocator` has a maximum memory budget, set at construction
time. This allows us to cap the amount of data each specific
`Allocator` instance is responsible for.
* Upon destruction, the `Allocator` will clean up the storage it's
used, handing it back to the internal allocator used in
sanitizer_common.
Key features of the `Array` type:
* Each segmented array is always backed by an `Allocator`, which is
either user-provided or uses a global allocator.
* When an `Array` grows, it grows by appending a segment that's
fixed-sized. The size of each segment is computed by the number of
elements of type `T` that can fit into cache line multiples.
* An `Array` does not return memory to the `Allocator`, but it can keep
track of the current number of "live" objects it stores.
* When an `Array` is destroyed, it will not return memory to the
`Allocator`. Users should clean up the `Allocator` independently of
the `Array`.
* The `Array` type keeps a freelist of the chunks it's used before, so
that trimming and growing will re-use previously allocated chunks.
These basic data structures are used by the XRay Profiling Mode
implementation to implement efficient and cache-aware storage for data
that's typically read-and-write heavy for tracking latency information.
We're relying on the cache line characteristics of the architecture to
provide us good data isolation and cache friendliness, when we're
performing operations like searching for elements and/or updating data
hosted in these cache lines.
Reviewers: echristo, pelikan, kpw
Subscribers: mgorny, llvm-commits
Differential Revision: https://reviews.llvm.org/D45756
llvm-svn: 331141
2018-04-29 21:46:30 +08:00
|
|
|
}
|
|
|
|
|
2018-09-07 18:16:14 +08:00
|
|
|
T &back() const XRAY_NEVER_INSTRUMENT {
|
2018-07-18 10:08:39 +08:00
|
|
|
DCHECK_NE(Tail, &SentinelSegment);
|
2018-07-10 16:25:44 +08:00
|
|
|
DCHECK_NE(Size, 0u);
|
|
|
|
auto It = end();
|
|
|
|
--It;
|
|
|
|
return *It;
|
[XRay][profiler] Part 1: XRay Allocator and Array Implementations
Summary:
This change is part of the larger XRay Profiling Mode effort.
Here we implement an arena allocator, for fixed sized buffers used in a
segmented array implementation. This change adds the segmented array
data structure, which relies on the allocator to provide and maintain
the storage for the segmented array.
Key features of the `Allocator` type:
* It uses cache-aligned blocks, intended to host the actual data. These
blocks are cache-line-size multiples of contiguous bytes.
* The `Allocator` has a maximum memory budget, set at construction
time. This allows us to cap the amount of data each specific
`Allocator` instance is responsible for.
* Upon destruction, the `Allocator` will clean up the storage it's
used, handing it back to the internal allocator used in
sanitizer_common.
Key features of the `Array` type:
* Each segmented array is always backed by an `Allocator`, which is
either user-provided or uses a global allocator.
* When an `Array` grows, it grows by appending a segment that's
fixed-sized. The size of each segment is computed by the number of
elements of type `T` that can fit into cache line multiples.
* An `Array` does not return memory to the `Allocator`, but it can keep
track of the current number of "live" objects it stores.
* When an `Array` is destroyed, it will not return memory to the
`Allocator`. Users should clean up the `Allocator` independently of
the `Array`.
* The `Array` type keeps a freelist of the chunks it's used before, so
that trimming and growing will re-use previously allocated chunks.
These basic data structures are used by the XRay Profiling Mode
implementation to implement efficient and cache-aware storage for data
that's typically read-and-write heavy for tracking latency information.
We're relying on the cache line characteristics of the architecture to
provide us good data isolation and cache friendliness, when we're
performing operations like searching for elements and/or updating data
hosted in these cache lines.
Reviewers: echristo, pelikan, kpw
Subscribers: mgorny, llvm-commits
Differential Revision: https://reviews.llvm.org/D45756
llvm-svn: 331141
2018-04-29 21:46:30 +08:00
|
|
|
}
|
|
|
|
|
2018-09-07 18:16:14 +08:00
|
|
|
template <class Predicate>
|
|
|
|
T *find_element(Predicate P) const XRAY_NEVER_INSTRUMENT {
|
[XRay][profiler] Part 1: XRay Allocator and Array Implementations
Summary:
This change is part of the larger XRay Profiling Mode effort.
Here we implement an arena allocator, for fixed sized buffers used in a
segmented array implementation. This change adds the segmented array
data structure, which relies on the allocator to provide and maintain
the storage for the segmented array.
Key features of the `Allocator` type:
* It uses cache-aligned blocks, intended to host the actual data. These
blocks are cache-line-size multiples of contiguous bytes.
* The `Allocator` has a maximum memory budget, set at construction
time. This allows us to cap the amount of data each specific
`Allocator` instance is responsible for.
* Upon destruction, the `Allocator` will clean up the storage it's
used, handing it back to the internal allocator used in
sanitizer_common.
Key features of the `Array` type:
* Each segmented array is always backed by an `Allocator`, which is
either user-provided or uses a global allocator.
* When an `Array` grows, it grows by appending a segment that's
fixed-sized. The size of each segment is computed by the number of
elements of type `T` that can fit into cache line multiples.
* An `Array` does not return memory to the `Allocator`, but it can keep
track of the current number of "live" objects it stores.
* When an `Array` is destroyed, it will not return memory to the
`Allocator`. Users should clean up the `Allocator` independently of
the `Array`.
* The `Array` type keeps a freelist of the chunks it's used before, so
that trimming and growing will re-use previously allocated chunks.
These basic data structures are used by the XRay Profiling Mode
implementation to implement efficient and cache-aware storage for data
that's typically read-and-write heavy for tracking latency information.
We're relying on the cache line characteristics of the architecture to
provide us good data isolation and cache friendliness, when we're
performing operations like searching for elements and/or updating data
hosted in these cache lines.
Reviewers: echristo, pelikan, kpw
Subscribers: mgorny, llvm-commits
Differential Revision: https://reviews.llvm.org/D45756
llvm-svn: 331141
2018-04-29 21:46:30 +08:00
|
|
|
if (empty())
|
|
|
|
return nullptr;
|
|
|
|
|
|
|
|
auto E = end();
|
|
|
|
for (auto I = begin(); I != E; ++I)
|
|
|
|
if (P(*I))
|
|
|
|
return &(*I);
|
|
|
|
|
|
|
|
return nullptr;
|
|
|
|
}
|
|
|
|
|
|
|
|
/// Remove N Elements from the end. This leaves the blocks behind, and not
|
|
|
|
/// require allocation of new blocks for new elements added after trimming.
|
2018-12-07 11:19:13 +08:00
|
|
|
void trim(uint64_t Elements) XRAY_NEVER_INSTRUMENT {
|
2018-07-10 16:25:44 +08:00
|
|
|
auto OldSize = Size;
|
2018-12-07 11:19:13 +08:00
|
|
|
Elements = Elements > Size ? Size : Elements;
|
[XRay][profiler] Part 1: XRay Allocator and Array Implementations
Summary:
This change is part of the larger XRay Profiling Mode effort.
Here we implement an arena allocator, for fixed sized buffers used in a
segmented array implementation. This change adds the segmented array
data structure, which relies on the allocator to provide and maintain
the storage for the segmented array.
Key features of the `Allocator` type:
* It uses cache-aligned blocks, intended to host the actual data. These
blocks are cache-line-size multiples of contiguous bytes.
* The `Allocator` has a maximum memory budget, set at construction
time. This allows us to cap the amount of data each specific
`Allocator` instance is responsible for.
* Upon destruction, the `Allocator` will clean up the storage it's
used, handing it back to the internal allocator used in
sanitizer_common.
Key features of the `Array` type:
* Each segmented array is always backed by an `Allocator`, which is
either user-provided or uses a global allocator.
* When an `Array` grows, it grows by appending a segment that's
fixed-sized. The size of each segment is computed by the number of
elements of type `T` that can fit into cache line multiples.
* An `Array` does not return memory to the `Allocator`, but it can keep
track of the current number of "live" objects it stores.
* When an `Array` is destroyed, it will not return memory to the
`Allocator`. Users should clean up the `Allocator` independently of
the `Array`.
* The `Array` type keeps a freelist of the chunks it's used before, so
that trimming and growing will re-use previously allocated chunks.
These basic data structures are used by the XRay Profiling Mode
implementation to implement efficient and cache-aware storage for data
that's typically read-and-write heavy for tracking latency information.
We're relying on the cache line characteristics of the architecture to
provide us good data isolation and cache friendliness, when we're
performing operations like searching for elements and/or updating data
hosted in these cache lines.
Reviewers: echristo, pelikan, kpw
Subscribers: mgorny, llvm-commits
Differential Revision: https://reviews.llvm.org/D45756
llvm-svn: 331141
2018-04-29 21:46:30 +08:00
|
|
|
Size -= Elements;
|
2018-07-10 16:25:44 +08:00
|
|
|
|
2018-12-07 11:19:13 +08:00
|
|
|
// We compute the number of segments we're going to return from the tail by
|
|
|
|
// counting how many elements have been trimmed. Given the following:
|
|
|
|
//
|
|
|
|
// - Each segment has N valid positions, where N > 0
|
|
|
|
// - The previous size > current size
|
|
|
|
//
|
|
|
|
// To compute the number of segments to return, we need to perform the
|
|
|
|
// following calculations for the number of segments required given 'x'
|
|
|
|
// elements:
|
|
|
|
//
|
|
|
|
// f(x) = {
|
|
|
|
// x == 0 : 0
|
|
|
|
// , 0 < x <= N : 1
|
|
|
|
// , N < x <= max : x / N + (x % N ? 1 : 0)
|
|
|
|
// }
|
|
|
|
//
|
|
|
|
// We can simplify this down to:
|
|
|
|
//
|
|
|
|
// f(x) = {
|
|
|
|
// x == 0 : 0,
|
|
|
|
// , 0 < x <= max : x / N + (x < N || x % N ? 1 : 0)
|
|
|
|
// }
|
|
|
|
//
|
|
|
|
// And further down to:
|
|
|
|
//
|
|
|
|
// f(x) = x ? x / N + (x < N || x % N ? 1 : 0) : 0
|
|
|
|
//
|
|
|
|
// We can then perform the following calculation `s` which counts the number
|
|
|
|
// of segments we need to remove from the end of the data structure:
|
|
|
|
//
|
|
|
|
// s(p, c) = f(p) - f(c)
|
|
|
|
//
|
|
|
|
// If we treat p = previous size, and c = current size, and given the
|
|
|
|
// properties above, the possible range for s(...) is [0..max(typeof(p))/N]
|
|
|
|
// given that typeof(p) == typeof(c).
|
|
|
|
auto F = [](uint64_t X) {
|
|
|
|
return X ? (X / ElementsPerSegment) +
|
|
|
|
(X < ElementsPerSegment || X % ElementsPerSegment ? 1 : 0)
|
|
|
|
: 0;
|
|
|
|
};
|
|
|
|
auto PS = F(OldSize);
|
|
|
|
auto CS = F(Size);
|
|
|
|
DCHECK_GE(PS, CS);
|
|
|
|
auto SegmentsToTrim = PS - CS;
|
|
|
|
for (auto I = 0uL; I < SegmentsToTrim; ++I) {
|
|
|
|
// Here we place the current tail segment to the freelist. To do this
|
|
|
|
// appropriately, we need to perform a splice operation on two
|
|
|
|
// bidirectional linked-lists. In particular, we have the current state of
|
|
|
|
// the doubly-linked list of segments:
|
|
|
|
//
|
|
|
|
// @S@ <- s0 <-> s1 <-> ... <-> sT -> @S@
|
|
|
|
//
|
|
|
|
DCHECK_NE(Head, &SentinelSegment);
|
|
|
|
DCHECK_NE(Tail, &SentinelSegment);
|
2018-12-06 11:28:57 +08:00
|
|
|
DCHECK_EQ(Tail->Next, &SentinelSegment);
|
2018-12-07 11:19:13 +08:00
|
|
|
|
|
|
|
if (Freelist == &SentinelSegment) {
|
|
|
|
// Our two lists at this point are in this configuration:
|
|
|
|
//
|
|
|
|
// Freelist: (potentially) @S@
|
|
|
|
// Mainlist: @S@<-s0<->s1<->...<->sPT<->sT->@S@
|
|
|
|
// ^ Head ^ Tail
|
|
|
|
//
|
|
|
|
// The end state for us will be this configuration:
|
|
|
|
//
|
|
|
|
// Freelist: @S@<-sT->@S@
|
|
|
|
// Mainlist: @S@<-s0<->s1<->...<->sPT->@S@
|
|
|
|
// ^ Head ^ Tail
|
|
|
|
//
|
|
|
|
// The first step for us is to hold a reference to the tail of Mainlist,
|
|
|
|
// which in our notation is represented by sT. We call this our "free
|
|
|
|
// segment" which is the segment we are placing on the Freelist.
|
|
|
|
//
|
|
|
|
// sF = sT
|
|
|
|
//
|
|
|
|
// Then, we also hold a reference to the "pre-tail" element, which we
|
|
|
|
// call sPT:
|
|
|
|
//
|
|
|
|
// sPT = pred(sT)
|
|
|
|
//
|
|
|
|
// We want to splice sT into the beginning of the Freelist, which in
|
|
|
|
// an empty Freelist means placing a segment whose predecessor and
|
|
|
|
// successor is the sentinel segment.
|
|
|
|
//
|
|
|
|
// The splice operation then can be performed in the following
|
|
|
|
// algorithm:
|
|
|
|
//
|
|
|
|
// succ(sPT) = S
|
|
|
|
// pred(sT) = S
|
|
|
|
// succ(sT) = Freelist
|
|
|
|
// Freelist = sT
|
|
|
|
// Tail = sPT
|
|
|
|
//
|
|
|
|
auto SPT = Tail->Prev;
|
|
|
|
SPT->Next = &SentinelSegment;
|
|
|
|
Tail->Prev = &SentinelSegment;
|
|
|
|
Tail->Next = Freelist;
|
|
|
|
Freelist = Tail;
|
|
|
|
Tail = SPT;
|
|
|
|
|
|
|
|
// Our post-conditions here are:
|
|
|
|
DCHECK_EQ(Tail->Next, &SentinelSegment);
|
|
|
|
DCHECK_EQ(Freelist->Prev, &SentinelSegment);
|
|
|
|
} else {
|
|
|
|
// In the other case, where the Freelist is not empty, we perform the
|
|
|
|
// following transformation instead:
|
|
|
|
//
|
|
|
|
// This transforms the current state:
|
|
|
|
//
|
|
|
|
// Freelist: @S@<-f0->@S@
|
|
|
|
// ^ Freelist
|
|
|
|
// Mainlist: @S@<-s0<->s1<->...<->sPT<->sT->@S@
|
|
|
|
// ^ Head ^ Tail
|
|
|
|
//
|
|
|
|
// Into the following:
|
|
|
|
//
|
|
|
|
// Freelist: @S@<-sT<->f0->@S@
|
|
|
|
// ^ Freelist
|
|
|
|
// Mainlist: @S@<-s0<->s1<->...<->sPT->@S@
|
|
|
|
// ^ Head ^ Tail
|
|
|
|
//
|
|
|
|
// The algorithm is:
|
|
|
|
//
|
|
|
|
// sFH = Freelist
|
|
|
|
// sPT = pred(sT)
|
|
|
|
// pred(SFH) = sT
|
|
|
|
// succ(sT) = Freelist
|
|
|
|
// pred(sT) = S
|
|
|
|
// succ(sPT) = S
|
|
|
|
// Tail = sPT
|
|
|
|
// Freelist = sT
|
|
|
|
//
|
|
|
|
auto SFH = Freelist;
|
|
|
|
auto SPT = Tail->Prev;
|
|
|
|
auto ST = Tail;
|
|
|
|
SFH->Prev = ST;
|
|
|
|
ST->Next = Freelist;
|
|
|
|
ST->Prev = &SentinelSegment;
|
|
|
|
SPT->Next = &SentinelSegment;
|
|
|
|
Tail = SPT;
|
|
|
|
Freelist = ST;
|
|
|
|
|
|
|
|
// Our post-conditions here are:
|
|
|
|
DCHECK_EQ(Tail->Next, &SentinelSegment);
|
|
|
|
DCHECK_EQ(Freelist->Prev, &SentinelSegment);
|
|
|
|
DCHECK_EQ(Freelist->Next->Prev, Freelist);
|
|
|
|
}
|
2018-12-06 11:28:57 +08:00
|
|
|
}
|
2018-12-07 11:19:13 +08:00
|
|
|
|
|
|
|
// Now in case we've spliced all the segments in the end, we ensure that the
|
|
|
|
// main list is "empty", or both the head and tail pointing to the sentinel
|
|
|
|
// segment.
|
|
|
|
if (Tail == &SentinelSegment)
|
|
|
|
Head = Tail;
|
|
|
|
|
|
|
|
DCHECK(
|
|
|
|
(Size == 0 && Head == &SentinelSegment && Tail == &SentinelSegment) ||
|
|
|
|
(Size != 0 && Head != &SentinelSegment && Tail != &SentinelSegment));
|
|
|
|
DCHECK(
|
|
|
|
(Freelist != &SentinelSegment && Freelist->Prev == &SentinelSegment) ||
|
|
|
|
(Freelist == &SentinelSegment && Tail->Next == &SentinelSegment));
|
[XRay][profiler] Part 1: XRay Allocator and Array Implementations
Summary:
This change is part of the larger XRay Profiling Mode effort.
Here we implement an arena allocator, for fixed sized buffers used in a
segmented array implementation. This change adds the segmented array
data structure, which relies on the allocator to provide and maintain
the storage for the segmented array.
Key features of the `Allocator` type:
* It uses cache-aligned blocks, intended to host the actual data. These
blocks are cache-line-size multiples of contiguous bytes.
* The `Allocator` has a maximum memory budget, set at construction
time. This allows us to cap the amount of data each specific
`Allocator` instance is responsible for.
* Upon destruction, the `Allocator` will clean up the storage it's
used, handing it back to the internal allocator used in
sanitizer_common.
Key features of the `Array` type:
* Each segmented array is always backed by an `Allocator`, which is
either user-provided or uses a global allocator.
* When an `Array` grows, it grows by appending a segment that's
fixed-sized. The size of each segment is computed by the number of
elements of type `T` that can fit into cache line multiples.
* An `Array` does not return memory to the `Allocator`, but it can keep
track of the current number of "live" objects it stores.
* When an `Array` is destroyed, it will not return memory to the
`Allocator`. Users should clean up the `Allocator` independently of
the `Array`.
* The `Array` type keeps a freelist of the chunks it's used before, so
that trimming and growing will re-use previously allocated chunks.
These basic data structures are used by the XRay Profiling Mode
implementation to implement efficient and cache-aware storage for data
that's typically read-and-write heavy for tracking latency information.
We're relying on the cache line characteristics of the architecture to
provide us good data isolation and cache friendliness, when we're
performing operations like searching for elements and/or updating data
hosted in these cache lines.
Reviewers: echristo, pelikan, kpw
Subscribers: mgorny, llvm-commits
Differential Revision: https://reviews.llvm.org/D45756
llvm-svn: 331141
2018-04-29 21:46:30 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
// Provide iterators.
|
2018-09-07 18:16:14 +08:00
|
|
|
Iterator<T> begin() const XRAY_NEVER_INSTRUMENT {
|
|
|
|
return Iterator<T>(Head, 0, Size);
|
|
|
|
}
|
|
|
|
Iterator<T> end() const XRAY_NEVER_INSTRUMENT {
|
|
|
|
return Iterator<T>(Tail, Size, Size);
|
|
|
|
}
|
|
|
|
Iterator<const T> cbegin() const XRAY_NEVER_INSTRUMENT {
|
|
|
|
return Iterator<const T>(Head, 0, Size);
|
|
|
|
}
|
|
|
|
Iterator<const T> cend() const XRAY_NEVER_INSTRUMENT {
|
|
|
|
return Iterator<const T>(Tail, Size, Size);
|
|
|
|
}
|
[XRay][profiler] Part 1: XRay Allocator and Array Implementations
Summary:
This change is part of the larger XRay Profiling Mode effort.
Here we implement an arena allocator, for fixed sized buffers used in a
segmented array implementation. This change adds the segmented array
data structure, which relies on the allocator to provide and maintain
the storage for the segmented array.
Key features of the `Allocator` type:
* It uses cache-aligned blocks, intended to host the actual data. These
blocks are cache-line-size multiples of contiguous bytes.
* The `Allocator` has a maximum memory budget, set at construction
time. This allows us to cap the amount of data each specific
`Allocator` instance is responsible for.
* Upon destruction, the `Allocator` will clean up the storage it's
used, handing it back to the internal allocator used in
sanitizer_common.
Key features of the `Array` type:
* Each segmented array is always backed by an `Allocator`, which is
either user-provided or uses a global allocator.
* When an `Array` grows, it grows by appending a segment that's
fixed-sized. The size of each segment is computed by the number of
elements of type `T` that can fit into cache line multiples.
* An `Array` does not return memory to the `Allocator`, but it can keep
track of the current number of "live" objects it stores.
* When an `Array` is destroyed, it will not return memory to the
`Allocator`. Users should clean up the `Allocator` independently of
the `Array`.
* The `Array` type keeps a freelist of the chunks it's used before, so
that trimming and growing will re-use previously allocated chunks.
These basic data structures are used by the XRay Profiling Mode
implementation to implement efficient and cache-aware storage for data
that's typically read-and-write heavy for tracking latency information.
We're relying on the cache line characteristics of the architecture to
provide us good data isolation and cache friendliness, when we're
performing operations like searching for elements and/or updating data
hosted in these cache lines.
Reviewers: echristo, pelikan, kpw
Subscribers: mgorny, llvm-commits
Differential Revision: https://reviews.llvm.org/D45756
llvm-svn: 331141
2018-04-29 21:46:30 +08:00
|
|
|
};
|
|
|
|
|
|
|
|
// We need to have this storage definition out-of-line so that the compiler can
|
2018-07-18 10:08:39 +08:00
|
|
|
// ensure that storage for the SentinelSegment is defined and has a single
|
[XRay][profiler] Part 1: XRay Allocator and Array Implementations
Summary:
This change is part of the larger XRay Profiling Mode effort.
Here we implement an arena allocator, for fixed sized buffers used in a
segmented array implementation. This change adds the segmented array
data structure, which relies on the allocator to provide and maintain
the storage for the segmented array.
Key features of the `Allocator` type:
* It uses cache-aligned blocks, intended to host the actual data. These
blocks are cache-line-size multiples of contiguous bytes.
* The `Allocator` has a maximum memory budget, set at construction
time. This allows us to cap the amount of data each specific
`Allocator` instance is responsible for.
* Upon destruction, the `Allocator` will clean up the storage it's
used, handing it back to the internal allocator used in
sanitizer_common.
Key features of the `Array` type:
* Each segmented array is always backed by an `Allocator`, which is
either user-provided or uses a global allocator.
* When an `Array` grows, it grows by appending a segment that's
fixed-sized. The size of each segment is computed by the number of
elements of type `T` that can fit into cache line multiples.
* An `Array` does not return memory to the `Allocator`, but it can keep
track of the current number of "live" objects it stores.
* When an `Array` is destroyed, it will not return memory to the
`Allocator`. Users should clean up the `Allocator` independently of
the `Array`.
* The `Array` type keeps a freelist of the chunks it's used before, so
that trimming and growing will re-use previously allocated chunks.
These basic data structures are used by the XRay Profiling Mode
implementation to implement efficient and cache-aware storage for data
that's typically read-and-write heavy for tracking latency information.
We're relying on the cache line characteristics of the architecture to
provide us good data isolation and cache friendliness, when we're
performing operations like searching for elements and/or updating data
hosted in these cache lines.
Reviewers: echristo, pelikan, kpw
Subscribers: mgorny, llvm-commits
Differential Revision: https://reviews.llvm.org/D45756
llvm-svn: 331141
2018-04-29 21:46:30 +08:00
|
|
|
// address.
|
2018-07-18 10:08:39 +08:00
|
|
|
template <class T>
|
2018-12-07 11:19:13 +08:00
|
|
|
typename Array<T>::Segment Array<T>::SentinelSegment{
|
|
|
|
&Array<T>::SentinelSegment, &Array<T>::SentinelSegment, {'\0'}};
|
[XRay][profiler] Part 1: XRay Allocator and Array Implementations
Summary:
This change is part of the larger XRay Profiling Mode effort.
Here we implement an arena allocator, for fixed sized buffers used in a
segmented array implementation. This change adds the segmented array
data structure, which relies on the allocator to provide and maintain
the storage for the segmented array.
Key features of the `Allocator` type:
* It uses cache-aligned blocks, intended to host the actual data. These
blocks are cache-line-size multiples of contiguous bytes.
* The `Allocator` has a maximum memory budget, set at construction
time. This allows us to cap the amount of data each specific
`Allocator` instance is responsible for.
* Upon destruction, the `Allocator` will clean up the storage it's
used, handing it back to the internal allocator used in
sanitizer_common.
Key features of the `Array` type:
* Each segmented array is always backed by an `Allocator`, which is
either user-provided or uses a global allocator.
* When an `Array` grows, it grows by appending a segment that's
fixed-sized. The size of each segment is computed by the number of
elements of type `T` that can fit into cache line multiples.
* An `Array` does not return memory to the `Allocator`, but it can keep
track of the current number of "live" objects it stores.
* When an `Array` is destroyed, it will not return memory to the
`Allocator`. Users should clean up the `Allocator` independently of
the `Array`.
* The `Array` type keeps a freelist of the chunks it's used before, so
that trimming and growing will re-use previously allocated chunks.
These basic data structures are used by the XRay Profiling Mode
implementation to implement efficient and cache-aware storage for data
that's typically read-and-write heavy for tracking latency information.
We're relying on the cache line characteristics of the architecture to
provide us good data isolation and cache friendliness, when we're
performing operations like searching for elements and/or updating data
hosted in these cache lines.
Reviewers: echristo, pelikan, kpw
Subscribers: mgorny, llvm-commits
Differential Revision: https://reviews.llvm.org/D45756
llvm-svn: 331141
2018-04-29 21:46:30 +08:00
|
|
|
|
|
|
|
} // namespace __xray
|
|
|
|
|
|
|
|
#endif // XRAY_SEGMENTED_ARRAY_H
|