llvm-project/compiler-rt/lib/xray/xray_segmented_array.h

Ignoring revisions in .git-blame-ignore-revs. Click here to bypass and see the normal blame view.

651 lines
21 KiB
C
Raw Normal View History

[XRay][profiler] Part 1: XRay Allocator and Array Implementations Summary: This change is part of the larger XRay Profiling Mode effort. Here we implement an arena allocator, for fixed sized buffers used in a segmented array implementation. This change adds the segmented array data structure, which relies on the allocator to provide and maintain the storage for the segmented array. Key features of the `Allocator` type: * It uses cache-aligned blocks, intended to host the actual data. These blocks are cache-line-size multiples of contiguous bytes. * The `Allocator` has a maximum memory budget, set at construction time. This allows us to cap the amount of data each specific `Allocator` instance is responsible for. * Upon destruction, the `Allocator` will clean up the storage it's used, handing it back to the internal allocator used in sanitizer_common. Key features of the `Array` type: * Each segmented array is always backed by an `Allocator`, which is either user-provided or uses a global allocator. * When an `Array` grows, it grows by appending a segment that's fixed-sized. The size of each segment is computed by the number of elements of type `T` that can fit into cache line multiples. * An `Array` does not return memory to the `Allocator`, but it can keep track of the current number of "live" objects it stores. * When an `Array` is destroyed, it will not return memory to the `Allocator`. Users should clean up the `Allocator` independently of the `Array`. * The `Array` type keeps a freelist of the chunks it's used before, so that trimming and growing will re-use previously allocated chunks. These basic data structures are used by the XRay Profiling Mode implementation to implement efficient and cache-aware storage for data that's typically read-and-write heavy for tracking latency information. We're relying on the cache line characteristics of the architecture to provide us good data isolation and cache friendliness, when we're performing operations like searching for elements and/or updating data hosted in these cache lines. Reviewers: echristo, pelikan, kpw Subscribers: mgorny, llvm-commits Differential Revision: https://reviews.llvm.org/D45756 llvm-svn: 331141
2018-04-29 21:46:30 +08:00
//===-- xray_segmented_array.h ---------------------------------*- C++ -*-===//
//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
[XRay][profiler] Part 1: XRay Allocator and Array Implementations Summary: This change is part of the larger XRay Profiling Mode effort. Here we implement an arena allocator, for fixed sized buffers used in a segmented array implementation. This change adds the segmented array data structure, which relies on the allocator to provide and maintain the storage for the segmented array. Key features of the `Allocator` type: * It uses cache-aligned blocks, intended to host the actual data. These blocks are cache-line-size multiples of contiguous bytes. * The `Allocator` has a maximum memory budget, set at construction time. This allows us to cap the amount of data each specific `Allocator` instance is responsible for. * Upon destruction, the `Allocator` will clean up the storage it's used, handing it back to the internal allocator used in sanitizer_common. Key features of the `Array` type: * Each segmented array is always backed by an `Allocator`, which is either user-provided or uses a global allocator. * When an `Array` grows, it grows by appending a segment that's fixed-sized. The size of each segment is computed by the number of elements of type `T` that can fit into cache line multiples. * An `Array` does not return memory to the `Allocator`, but it can keep track of the current number of "live" objects it stores. * When an `Array` is destroyed, it will not return memory to the `Allocator`. Users should clean up the `Allocator` independently of the `Array`. * The `Array` type keeps a freelist of the chunks it's used before, so that trimming and growing will re-use previously allocated chunks. These basic data structures are used by the XRay Profiling Mode implementation to implement efficient and cache-aware storage for data that's typically read-and-write heavy for tracking latency information. We're relying on the cache line characteristics of the architecture to provide us good data isolation and cache friendliness, when we're performing operations like searching for elements and/or updating data hosted in these cache lines. Reviewers: echristo, pelikan, kpw Subscribers: mgorny, llvm-commits Differential Revision: https://reviews.llvm.org/D45756 llvm-svn: 331141
2018-04-29 21:46:30 +08:00
//
//===----------------------------------------------------------------------===//
//
// This file is a part of XRay, a dynamic runtime instrumentation system.
//
[XRay][compiler-rt] Segmented Array: Simplify and Optimise Summary: This is a follow-on to D49217 which simplifies and optimises the implementation of the segmented array. In this patch we co-locate the book-keeping for segments in the `__xray::Array<T>` with the data it's managing. We take the chance in this patch to actually rename `Chunk` to `Segment` to better align with the high-level description of the segmented array. With measurements using benchmarks landed in D48879, we've identified that calls to `pthread_getspecific` started dominating the cycles, which led us to revert the change made in D49217 to use C++ thread_local initialisation instead (it reduces the cost by a huge margin, since we save one PLT-based call to pthread functions in the hot path). In particular, this is in `__xray::getThreadLocalData()`. We also took the opportunity to remove the least-common-multiple based calculation and instead pack as much data into segments of the array. This greatly simplifies the API of the container which hides as much of the implementation details as possible. For instance, we calculate the number of elements we need for the each segment internally in the Array instead of making it part of the type. With the changes here, we're able to get a measurable improvement on the performance of profiling mode on top of what D48879 already provides. Depends on D48879. Reviewers: kpw, eizan Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D49363 llvm-svn: 337343
2018-07-18 10:08:39 +08:00
// Defines the implementation of a segmented array, with fixed-size segments
[XRay][profiler] Part 1: XRay Allocator and Array Implementations Summary: This change is part of the larger XRay Profiling Mode effort. Here we implement an arena allocator, for fixed sized buffers used in a segmented array implementation. This change adds the segmented array data structure, which relies on the allocator to provide and maintain the storage for the segmented array. Key features of the `Allocator` type: * It uses cache-aligned blocks, intended to host the actual data. These blocks are cache-line-size multiples of contiguous bytes. * The `Allocator` has a maximum memory budget, set at construction time. This allows us to cap the amount of data each specific `Allocator` instance is responsible for. * Upon destruction, the `Allocator` will clean up the storage it's used, handing it back to the internal allocator used in sanitizer_common. Key features of the `Array` type: * Each segmented array is always backed by an `Allocator`, which is either user-provided or uses a global allocator. * When an `Array` grows, it grows by appending a segment that's fixed-sized. The size of each segment is computed by the number of elements of type `T` that can fit into cache line multiples. * An `Array` does not return memory to the `Allocator`, but it can keep track of the current number of "live" objects it stores. * When an `Array` is destroyed, it will not return memory to the `Allocator`. Users should clean up the `Allocator` independently of the `Array`. * The `Array` type keeps a freelist of the chunks it's used before, so that trimming and growing will re-use previously allocated chunks. These basic data structures are used by the XRay Profiling Mode implementation to implement efficient and cache-aware storage for data that's typically read-and-write heavy for tracking latency information. We're relying on the cache line characteristics of the architecture to provide us good data isolation and cache friendliness, when we're performing operations like searching for elements and/or updating data hosted in these cache lines. Reviewers: echristo, pelikan, kpw Subscribers: mgorny, llvm-commits Differential Revision: https://reviews.llvm.org/D45756 llvm-svn: 331141
2018-04-29 21:46:30 +08:00
// backing the segments.
//
//===----------------------------------------------------------------------===//
#ifndef XRAY_SEGMENTED_ARRAY_H
#define XRAY_SEGMENTED_ARRAY_H
#include "sanitizer_common/sanitizer_allocator.h"
#include "xray_allocator.h"
#include "xray_utils.h"
#include <cassert>
[XRay][profiler] Part 1: XRay Allocator and Array Implementations Summary: This change is part of the larger XRay Profiling Mode effort. Here we implement an arena allocator, for fixed sized buffers used in a segmented array implementation. This change adds the segmented array data structure, which relies on the allocator to provide and maintain the storage for the segmented array. Key features of the `Allocator` type: * It uses cache-aligned blocks, intended to host the actual data. These blocks are cache-line-size multiples of contiguous bytes. * The `Allocator` has a maximum memory budget, set at construction time. This allows us to cap the amount of data each specific `Allocator` instance is responsible for. * Upon destruction, the `Allocator` will clean up the storage it's used, handing it back to the internal allocator used in sanitizer_common. Key features of the `Array` type: * Each segmented array is always backed by an `Allocator`, which is either user-provided or uses a global allocator. * When an `Array` grows, it grows by appending a segment that's fixed-sized. The size of each segment is computed by the number of elements of type `T` that can fit into cache line multiples. * An `Array` does not return memory to the `Allocator`, but it can keep track of the current number of "live" objects it stores. * When an `Array` is destroyed, it will not return memory to the `Allocator`. Users should clean up the `Allocator` independently of the `Array`. * The `Array` type keeps a freelist of the chunks it's used before, so that trimming and growing will re-use previously allocated chunks. These basic data structures are used by the XRay Profiling Mode implementation to implement efficient and cache-aware storage for data that's typically read-and-write heavy for tracking latency information. We're relying on the cache line characteristics of the architecture to provide us good data isolation and cache friendliness, when we're performing operations like searching for elements and/or updating data hosted in these cache lines. Reviewers: echristo, pelikan, kpw Subscribers: mgorny, llvm-commits Differential Revision: https://reviews.llvm.org/D45756 llvm-svn: 331141
2018-04-29 21:46:30 +08:00
#include <type_traits>
#include <utility>
namespace __xray {
/// The Array type provides an interface similar to std::vector<...> but does
/// not shrink in size. Once constructed, elements can be appended but cannot be
/// removed. The implementation is heavily dependent on the contract provided by
/// the Allocator type, in that all memory will be released when the Allocator
/// is destroyed. When an Array is destroyed, it will destroy elements in the
[XRay][compiler-rt] Segmented Array: Simplify and Optimise Summary: This is a follow-on to D49217 which simplifies and optimises the implementation of the segmented array. In this patch we co-locate the book-keeping for segments in the `__xray::Array<T>` with the data it's managing. We take the chance in this patch to actually rename `Chunk` to `Segment` to better align with the high-level description of the segmented array. With measurements using benchmarks landed in D48879, we've identified that calls to `pthread_getspecific` started dominating the cycles, which led us to revert the change made in D49217 to use C++ thread_local initialisation instead (it reduces the cost by a huge margin, since we save one PLT-based call to pthread functions in the hot path). In particular, this is in `__xray::getThreadLocalData()`. We also took the opportunity to remove the least-common-multiple based calculation and instead pack as much data into segments of the array. This greatly simplifies the API of the container which hides as much of the implementation details as possible. For instance, we calculate the number of elements we need for the each segment internally in the Array instead of making it part of the type. With the changes here, we're able to get a measurable improvement on the performance of profiling mode on top of what D48879 already provides. Depends on D48879. Reviewers: kpw, eizan Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D49363 llvm-svn: 337343
2018-07-18 10:08:39 +08:00
/// backing store but will not free the memory.
template <class T> class Array {
struct Segment {
Segment *Prev;
Segment *Next;
char Data[1];
[XRay][compiler-rt] Segmented Array: Simplify and Optimise Summary: This is a follow-on to D49217 which simplifies and optimises the implementation of the segmented array. In this patch we co-locate the book-keeping for segments in the `__xray::Array<T>` with the data it's managing. We take the chance in this patch to actually rename `Chunk` to `Segment` to better align with the high-level description of the segmented array. With measurements using benchmarks landed in D48879, we've identified that calls to `pthread_getspecific` started dominating the cycles, which led us to revert the change made in D49217 to use C++ thread_local initialisation instead (it reduces the cost by a huge margin, since we save one PLT-based call to pthread functions in the hot path). In particular, this is in `__xray::getThreadLocalData()`. We also took the opportunity to remove the least-common-multiple based calculation and instead pack as much data into segments of the array. This greatly simplifies the API of the container which hides as much of the implementation details as possible. For instance, we calculate the number of elements we need for the each segment internally in the Array instead of making it part of the type. With the changes here, we're able to get a measurable improvement on the performance of profiling mode on top of what D48879 already provides. Depends on D48879. Reviewers: kpw, eizan Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D49363 llvm-svn: 337343
2018-07-18 10:08:39 +08:00
};
public:
// Each segment of the array will be laid out with the following assumptions:
//
// - Each segment will be on a cache-line address boundary (kCacheLineSize
// aligned).
//
// - The elements will be accessed through an aligned pointer, dependent on
// the alignment of T.
//
// - Each element is at least two-pointers worth from the beginning of the
// Segment, aligned properly, and the rest of the elements are accessed
// through appropriate alignment.
//
// We then compute the size of the segment to follow this logic:
//
// - Compute the number of elements that can fit within
// kCacheLineSize-multiple segments, minus the size of two pointers.
//
// - Request cacheline-multiple sized elements from the allocator.
static constexpr uint64_t AlignedElementStorageSize =
[XRay][compiler-rt] Segmented Array: Simplify and Optimise Summary: This is a follow-on to D49217 which simplifies and optimises the implementation of the segmented array. In this patch we co-locate the book-keeping for segments in the `__xray::Array<T>` with the data it's managing. We take the chance in this patch to actually rename `Chunk` to `Segment` to better align with the high-level description of the segmented array. With measurements using benchmarks landed in D48879, we've identified that calls to `pthread_getspecific` started dominating the cycles, which led us to revert the change made in D49217 to use C++ thread_local initialisation instead (it reduces the cost by a huge margin, since we save one PLT-based call to pthread functions in the hot path). In particular, this is in `__xray::getThreadLocalData()`. We also took the opportunity to remove the least-common-multiple based calculation and instead pack as much data into segments of the array. This greatly simplifies the API of the container which hides as much of the implementation details as possible. For instance, we calculate the number of elements we need for the each segment internally in the Array instead of making it part of the type. With the changes here, we're able to get a measurable improvement on the performance of profiling mode on top of what D48879 already provides. Depends on D48879. Reviewers: kpw, eizan Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D49363 llvm-svn: 337343
2018-07-18 10:08:39 +08:00
sizeof(typename std::aligned_storage<sizeof(T), alignof(T)>::type);
static constexpr uint64_t SegmentControlBlockSize = sizeof(Segment *) * 2;
static constexpr uint64_t SegmentSize = nearest_boundary(
SegmentControlBlockSize + next_pow2(sizeof(T)), kCacheLineSize);
[XRay][compiler-rt] Segmented Array: Simplify and Optimise Summary: This is a follow-on to D49217 which simplifies and optimises the implementation of the segmented array. In this patch we co-locate the book-keeping for segments in the `__xray::Array<T>` with the data it's managing. We take the chance in this patch to actually rename `Chunk` to `Segment` to better align with the high-level description of the segmented array. With measurements using benchmarks landed in D48879, we've identified that calls to `pthread_getspecific` started dominating the cycles, which led us to revert the change made in D49217 to use C++ thread_local initialisation instead (it reduces the cost by a huge margin, since we save one PLT-based call to pthread functions in the hot path). In particular, this is in `__xray::getThreadLocalData()`. We also took the opportunity to remove the least-common-multiple based calculation and instead pack as much data into segments of the array. This greatly simplifies the API of the container which hides as much of the implementation details as possible. For instance, we calculate the number of elements we need for the each segment internally in the Array instead of making it part of the type. With the changes here, we're able to get a measurable improvement on the performance of profiling mode on top of what D48879 already provides. Depends on D48879. Reviewers: kpw, eizan Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D49363 llvm-svn: 337343
2018-07-18 10:08:39 +08:00
using AllocatorType = Allocator<SegmentSize>;
static constexpr uint64_t ElementsPerSegment =
(SegmentSize - SegmentControlBlockSize) / next_pow2(sizeof(T));
[XRay][compiler-rt] Segmented Array: Simplify and Optimise Summary: This is a follow-on to D49217 which simplifies and optimises the implementation of the segmented array. In this patch we co-locate the book-keeping for segments in the `__xray::Array<T>` with the data it's managing. We take the chance in this patch to actually rename `Chunk` to `Segment` to better align with the high-level description of the segmented array. With measurements using benchmarks landed in D48879, we've identified that calls to `pthread_getspecific` started dominating the cycles, which led us to revert the change made in D49217 to use C++ thread_local initialisation instead (it reduces the cost by a huge margin, since we save one PLT-based call to pthread functions in the hot path). In particular, this is in `__xray::getThreadLocalData()`. We also took the opportunity to remove the least-common-multiple based calculation and instead pack as much data into segments of the array. This greatly simplifies the API of the container which hides as much of the implementation details as possible. For instance, we calculate the number of elements we need for the each segment internally in the Array instead of making it part of the type. With the changes here, we're able to get a measurable improvement on the performance of profiling mode on top of what D48879 already provides. Depends on D48879. Reviewers: kpw, eizan Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D49363 llvm-svn: 337343
2018-07-18 10:08:39 +08:00
static_assert(ElementsPerSegment > 0,
"Must have at least 1 element per segment.");
static Segment SentinelSegment;
[XRay][profiler] Part 1: XRay Allocator and Array Implementations Summary: This change is part of the larger XRay Profiling Mode effort. Here we implement an arena allocator, for fixed sized buffers used in a segmented array implementation. This change adds the segmented array data structure, which relies on the allocator to provide and maintain the storage for the segmented array. Key features of the `Allocator` type: * It uses cache-aligned blocks, intended to host the actual data. These blocks are cache-line-size multiples of contiguous bytes. * The `Allocator` has a maximum memory budget, set at construction time. This allows us to cap the amount of data each specific `Allocator` instance is responsible for. * Upon destruction, the `Allocator` will clean up the storage it's used, handing it back to the internal allocator used in sanitizer_common. Key features of the `Array` type: * Each segmented array is always backed by an `Allocator`, which is either user-provided or uses a global allocator. * When an `Array` grows, it grows by appending a segment that's fixed-sized. The size of each segment is computed by the number of elements of type `T` that can fit into cache line multiples. * An `Array` does not return memory to the `Allocator`, but it can keep track of the current number of "live" objects it stores. * When an `Array` is destroyed, it will not return memory to the `Allocator`. Users should clean up the `Allocator` independently of the `Array`. * The `Array` type keeps a freelist of the chunks it's used before, so that trimming and growing will re-use previously allocated chunks. These basic data structures are used by the XRay Profiling Mode implementation to implement efficient and cache-aware storage for data that's typically read-and-write heavy for tracking latency information. We're relying on the cache line characteristics of the architecture to provide us good data isolation and cache friendliness, when we're performing operations like searching for elements and/or updating data hosted in these cache lines. Reviewers: echristo, pelikan, kpw Subscribers: mgorny, llvm-commits Differential Revision: https://reviews.llvm.org/D45756 llvm-svn: 331141
2018-04-29 21:46:30 +08:00
using size_type = uint64_t;
[XRay][compiler-rt] Segmented Array: Simplify and Optimise Summary: This is a follow-on to D49217 which simplifies and optimises the implementation of the segmented array. In this patch we co-locate the book-keeping for segments in the `__xray::Array<T>` with the data it's managing. We take the chance in this patch to actually rename `Chunk` to `Segment` to better align with the high-level description of the segmented array. With measurements using benchmarks landed in D48879, we've identified that calls to `pthread_getspecific` started dominating the cycles, which led us to revert the change made in D49217 to use C++ thread_local initialisation instead (it reduces the cost by a huge margin, since we save one PLT-based call to pthread functions in the hot path). In particular, this is in `__xray::getThreadLocalData()`. We also took the opportunity to remove the least-common-multiple based calculation and instead pack as much data into segments of the array. This greatly simplifies the API of the container which hides as much of the implementation details as possible. For instance, we calculate the number of elements we need for the each segment internally in the Array instead of making it part of the type. With the changes here, we're able to get a measurable improvement on the performance of profiling mode on top of what D48879 already provides. Depends on D48879. Reviewers: kpw, eizan Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D49363 llvm-svn: 337343
2018-07-18 10:08:39 +08:00
private:
[XRay][profiler] Part 1: XRay Allocator and Array Implementations Summary: This change is part of the larger XRay Profiling Mode effort. Here we implement an arena allocator, for fixed sized buffers used in a segmented array implementation. This change adds the segmented array data structure, which relies on the allocator to provide and maintain the storage for the segmented array. Key features of the `Allocator` type: * It uses cache-aligned blocks, intended to host the actual data. These blocks are cache-line-size multiples of contiguous bytes. * The `Allocator` has a maximum memory budget, set at construction time. This allows us to cap the amount of data each specific `Allocator` instance is responsible for. * Upon destruction, the `Allocator` will clean up the storage it's used, handing it back to the internal allocator used in sanitizer_common. Key features of the `Array` type: * Each segmented array is always backed by an `Allocator`, which is either user-provided or uses a global allocator. * When an `Array` grows, it grows by appending a segment that's fixed-sized. The size of each segment is computed by the number of elements of type `T` that can fit into cache line multiples. * An `Array` does not return memory to the `Allocator`, but it can keep track of the current number of "live" objects it stores. * When an `Array` is destroyed, it will not return memory to the `Allocator`. Users should clean up the `Allocator` independently of the `Array`. * The `Array` type keeps a freelist of the chunks it's used before, so that trimming and growing will re-use previously allocated chunks. These basic data structures are used by the XRay Profiling Mode implementation to implement efficient and cache-aware storage for data that's typically read-and-write heavy for tracking latency information. We're relying on the cache line characteristics of the architecture to provide us good data isolation and cache friendliness, when we're performing operations like searching for elements and/or updating data hosted in these cache lines. Reviewers: echristo, pelikan, kpw Subscribers: mgorny, llvm-commits Differential Revision: https://reviews.llvm.org/D45756 llvm-svn: 331141
2018-04-29 21:46:30 +08:00
// This Iterator models a BidirectionalIterator.
template <class U> class Iterator {
Segment *S = &SentinelSegment;
uint64_t Offset = 0;
uint64_t Size = 0;
[XRay][profiler] Part 1: XRay Allocator and Array Implementations Summary: This change is part of the larger XRay Profiling Mode effort. Here we implement an arena allocator, for fixed sized buffers used in a segmented array implementation. This change adds the segmented array data structure, which relies on the allocator to provide and maintain the storage for the segmented array. Key features of the `Allocator` type: * It uses cache-aligned blocks, intended to host the actual data. These blocks are cache-line-size multiples of contiguous bytes. * The `Allocator` has a maximum memory budget, set at construction time. This allows us to cap the amount of data each specific `Allocator` instance is responsible for. * Upon destruction, the `Allocator` will clean up the storage it's used, handing it back to the internal allocator used in sanitizer_common. Key features of the `Array` type: * Each segmented array is always backed by an `Allocator`, which is either user-provided or uses a global allocator. * When an `Array` grows, it grows by appending a segment that's fixed-sized. The size of each segment is computed by the number of elements of type `T` that can fit into cache line multiples. * An `Array` does not return memory to the `Allocator`, but it can keep track of the current number of "live" objects it stores. * When an `Array` is destroyed, it will not return memory to the `Allocator`. Users should clean up the `Allocator` independently of the `Array`. * The `Array` type keeps a freelist of the chunks it's used before, so that trimming and growing will re-use previously allocated chunks. These basic data structures are used by the XRay Profiling Mode implementation to implement efficient and cache-aware storage for data that's typically read-and-write heavy for tracking latency information. We're relying on the cache line characteristics of the architecture to provide us good data isolation and cache friendliness, when we're performing operations like searching for elements and/or updating data hosted in these cache lines. Reviewers: echristo, pelikan, kpw Subscribers: mgorny, llvm-commits Differential Revision: https://reviews.llvm.org/D45756 llvm-svn: 331141
2018-04-29 21:46:30 +08:00
public:
Iterator(Segment *IS, uint64_t Off, uint64_t S) XRAY_NEVER_INSTRUMENT
: S(IS),
Offset(Off),
Size(S) {}
Iterator(const Iterator &) NOEXCEPT XRAY_NEVER_INSTRUMENT = default;
Iterator() NOEXCEPT XRAY_NEVER_INSTRUMENT = default;
Iterator(Iterator &&) NOEXCEPT XRAY_NEVER_INSTRUMENT = default;
Iterator &operator=(const Iterator &) XRAY_NEVER_INSTRUMENT = default;
Iterator &operator=(Iterator &&) XRAY_NEVER_INSTRUMENT = default;
~Iterator() XRAY_NEVER_INSTRUMENT = default;
Iterator &operator++() XRAY_NEVER_INSTRUMENT {
[XRay][compiler-rt] Segmented Array: Simplify and Optimise Summary: This is a follow-on to D49217 which simplifies and optimises the implementation of the segmented array. In this patch we co-locate the book-keeping for segments in the `__xray::Array<T>` with the data it's managing. We take the chance in this patch to actually rename `Chunk` to `Segment` to better align with the high-level description of the segmented array. With measurements using benchmarks landed in D48879, we've identified that calls to `pthread_getspecific` started dominating the cycles, which led us to revert the change made in D49217 to use C++ thread_local initialisation instead (it reduces the cost by a huge margin, since we save one PLT-based call to pthread functions in the hot path). In particular, this is in `__xray::getThreadLocalData()`. We also took the opportunity to remove the least-common-multiple based calculation and instead pack as much data into segments of the array. This greatly simplifies the API of the container which hides as much of the implementation details as possible. For instance, we calculate the number of elements we need for the each segment internally in the Array instead of making it part of the type. With the changes here, we're able to get a measurable improvement on the performance of profiling mode on top of what D48879 already provides. Depends on D48879. Reviewers: kpw, eizan Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D49363 llvm-svn: 337343
2018-07-18 10:08:39 +08:00
if (++Offset % ElementsPerSegment || Offset == Size)
[XRay][profiler] Part 1: XRay Allocator and Array Implementations Summary: This change is part of the larger XRay Profiling Mode effort. Here we implement an arena allocator, for fixed sized buffers used in a segmented array implementation. This change adds the segmented array data structure, which relies on the allocator to provide and maintain the storage for the segmented array. Key features of the `Allocator` type: * It uses cache-aligned blocks, intended to host the actual data. These blocks are cache-line-size multiples of contiguous bytes. * The `Allocator` has a maximum memory budget, set at construction time. This allows us to cap the amount of data each specific `Allocator` instance is responsible for. * Upon destruction, the `Allocator` will clean up the storage it's used, handing it back to the internal allocator used in sanitizer_common. Key features of the `Array` type: * Each segmented array is always backed by an `Allocator`, which is either user-provided or uses a global allocator. * When an `Array` grows, it grows by appending a segment that's fixed-sized. The size of each segment is computed by the number of elements of type `T` that can fit into cache line multiples. * An `Array` does not return memory to the `Allocator`, but it can keep track of the current number of "live" objects it stores. * When an `Array` is destroyed, it will not return memory to the `Allocator`. Users should clean up the `Allocator` independently of the `Array`. * The `Array` type keeps a freelist of the chunks it's used before, so that trimming and growing will re-use previously allocated chunks. These basic data structures are used by the XRay Profiling Mode implementation to implement efficient and cache-aware storage for data that's typically read-and-write heavy for tracking latency information. We're relying on the cache line characteristics of the architecture to provide us good data isolation and cache friendliness, when we're performing operations like searching for elements and/or updating data hosted in these cache lines. Reviewers: echristo, pelikan, kpw Subscribers: mgorny, llvm-commits Differential Revision: https://reviews.llvm.org/D45756 llvm-svn: 331141
2018-04-29 21:46:30 +08:00
return *this;
// At this point, we know that Offset % N == 0, so we must advance the
[XRay][compiler-rt] Segmented Array: Simplify and Optimise Summary: This is a follow-on to D49217 which simplifies and optimises the implementation of the segmented array. In this patch we co-locate the book-keeping for segments in the `__xray::Array<T>` with the data it's managing. We take the chance in this patch to actually rename `Chunk` to `Segment` to better align with the high-level description of the segmented array. With measurements using benchmarks landed in D48879, we've identified that calls to `pthread_getspecific` started dominating the cycles, which led us to revert the change made in D49217 to use C++ thread_local initialisation instead (it reduces the cost by a huge margin, since we save one PLT-based call to pthread functions in the hot path). In particular, this is in `__xray::getThreadLocalData()`. We also took the opportunity to remove the least-common-multiple based calculation and instead pack as much data into segments of the array. This greatly simplifies the API of the container which hides as much of the implementation details as possible. For instance, we calculate the number of elements we need for the each segment internally in the Array instead of making it part of the type. With the changes here, we're able to get a measurable improvement on the performance of profiling mode on top of what D48879 already provides. Depends on D48879. Reviewers: kpw, eizan Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D49363 llvm-svn: 337343
2018-07-18 10:08:39 +08:00
// segment pointer.
DCHECK_EQ(Offset % ElementsPerSegment, 0);
DCHECK_NE(Offset, Size);
[XRay][compiler-rt] Segmented Array: Simplify and Optimise Summary: This is a follow-on to D49217 which simplifies and optimises the implementation of the segmented array. In this patch we co-locate the book-keeping for segments in the `__xray::Array<T>` with the data it's managing. We take the chance in this patch to actually rename `Chunk` to `Segment` to better align with the high-level description of the segmented array. With measurements using benchmarks landed in D48879, we've identified that calls to `pthread_getspecific` started dominating the cycles, which led us to revert the change made in D49217 to use C++ thread_local initialisation instead (it reduces the cost by a huge margin, since we save one PLT-based call to pthread functions in the hot path). In particular, this is in `__xray::getThreadLocalData()`. We also took the opportunity to remove the least-common-multiple based calculation and instead pack as much data into segments of the array. This greatly simplifies the API of the container which hides as much of the implementation details as possible. For instance, we calculate the number of elements we need for the each segment internally in the Array instead of making it part of the type. With the changes here, we're able to get a measurable improvement on the performance of profiling mode on top of what D48879 already provides. Depends on D48879. Reviewers: kpw, eizan Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D49363 llvm-svn: 337343
2018-07-18 10:08:39 +08:00
DCHECK_NE(S, &SentinelSegment);
DCHECK_NE(S->Next, &SentinelSegment);
S = S->Next;
DCHECK_NE(S, &SentinelSegment);
[XRay][profiler] Part 1: XRay Allocator and Array Implementations Summary: This change is part of the larger XRay Profiling Mode effort. Here we implement an arena allocator, for fixed sized buffers used in a segmented array implementation. This change adds the segmented array data structure, which relies on the allocator to provide and maintain the storage for the segmented array. Key features of the `Allocator` type: * It uses cache-aligned blocks, intended to host the actual data. These blocks are cache-line-size multiples of contiguous bytes. * The `Allocator` has a maximum memory budget, set at construction time. This allows us to cap the amount of data each specific `Allocator` instance is responsible for. * Upon destruction, the `Allocator` will clean up the storage it's used, handing it back to the internal allocator used in sanitizer_common. Key features of the `Array` type: * Each segmented array is always backed by an `Allocator`, which is either user-provided or uses a global allocator. * When an `Array` grows, it grows by appending a segment that's fixed-sized. The size of each segment is computed by the number of elements of type `T` that can fit into cache line multiples. * An `Array` does not return memory to the `Allocator`, but it can keep track of the current number of "live" objects it stores. * When an `Array` is destroyed, it will not return memory to the `Allocator`. Users should clean up the `Allocator` independently of the `Array`. * The `Array` type keeps a freelist of the chunks it's used before, so that trimming and growing will re-use previously allocated chunks. These basic data structures are used by the XRay Profiling Mode implementation to implement efficient and cache-aware storage for data that's typically read-and-write heavy for tracking latency information. We're relying on the cache line characteristics of the architecture to provide us good data isolation and cache friendliness, when we're performing operations like searching for elements and/or updating data hosted in these cache lines. Reviewers: echristo, pelikan, kpw Subscribers: mgorny, llvm-commits Differential Revision: https://reviews.llvm.org/D45756 llvm-svn: 331141
2018-04-29 21:46:30 +08:00
return *this;
}
Iterator &operator--() XRAY_NEVER_INSTRUMENT {
[XRay][compiler-rt] Segmented Array: Simplify and Optimise Summary: This is a follow-on to D49217 which simplifies and optimises the implementation of the segmented array. In this patch we co-locate the book-keeping for segments in the `__xray::Array<T>` with the data it's managing. We take the chance in this patch to actually rename `Chunk` to `Segment` to better align with the high-level description of the segmented array. With measurements using benchmarks landed in D48879, we've identified that calls to `pthread_getspecific` started dominating the cycles, which led us to revert the change made in D49217 to use C++ thread_local initialisation instead (it reduces the cost by a huge margin, since we save one PLT-based call to pthread functions in the hot path). In particular, this is in `__xray::getThreadLocalData()`. We also took the opportunity to remove the least-common-multiple based calculation and instead pack as much data into segments of the array. This greatly simplifies the API of the container which hides as much of the implementation details as possible. For instance, we calculate the number of elements we need for the each segment internally in the Array instead of making it part of the type. With the changes here, we're able to get a measurable improvement on the performance of profiling mode on top of what D48879 already provides. Depends on D48879. Reviewers: kpw, eizan Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D49363 llvm-svn: 337343
2018-07-18 10:08:39 +08:00
DCHECK_NE(S, &SentinelSegment);
[XRay][profiler] Part 1: XRay Allocator and Array Implementations Summary: This change is part of the larger XRay Profiling Mode effort. Here we implement an arena allocator, for fixed sized buffers used in a segmented array implementation. This change adds the segmented array data structure, which relies on the allocator to provide and maintain the storage for the segmented array. Key features of the `Allocator` type: * It uses cache-aligned blocks, intended to host the actual data. These blocks are cache-line-size multiples of contiguous bytes. * The `Allocator` has a maximum memory budget, set at construction time. This allows us to cap the amount of data each specific `Allocator` instance is responsible for. * Upon destruction, the `Allocator` will clean up the storage it's used, handing it back to the internal allocator used in sanitizer_common. Key features of the `Array` type: * Each segmented array is always backed by an `Allocator`, which is either user-provided or uses a global allocator. * When an `Array` grows, it grows by appending a segment that's fixed-sized. The size of each segment is computed by the number of elements of type `T` that can fit into cache line multiples. * An `Array` does not return memory to the `Allocator`, but it can keep track of the current number of "live" objects it stores. * When an `Array` is destroyed, it will not return memory to the `Allocator`. Users should clean up the `Allocator` independently of the `Array`. * The `Array` type keeps a freelist of the chunks it's used before, so that trimming and growing will re-use previously allocated chunks. These basic data structures are used by the XRay Profiling Mode implementation to implement efficient and cache-aware storage for data that's typically read-and-write heavy for tracking latency information. We're relying on the cache line characteristics of the architecture to provide us good data isolation and cache friendliness, when we're performing operations like searching for elements and/or updating data hosted in these cache lines. Reviewers: echristo, pelikan, kpw Subscribers: mgorny, llvm-commits Differential Revision: https://reviews.llvm.org/D45756 llvm-svn: 331141
2018-04-29 21:46:30 +08:00
DCHECK_GT(Offset, 0);
auto PreviousOffset = Offset--;
[XRay][compiler-rt] Segmented Array: Simplify and Optimise Summary: This is a follow-on to D49217 which simplifies and optimises the implementation of the segmented array. In this patch we co-locate the book-keeping for segments in the `__xray::Array<T>` with the data it's managing. We take the chance in this patch to actually rename `Chunk` to `Segment` to better align with the high-level description of the segmented array. With measurements using benchmarks landed in D48879, we've identified that calls to `pthread_getspecific` started dominating the cycles, which led us to revert the change made in D49217 to use C++ thread_local initialisation instead (it reduces the cost by a huge margin, since we save one PLT-based call to pthread functions in the hot path). In particular, this is in `__xray::getThreadLocalData()`. We also took the opportunity to remove the least-common-multiple based calculation and instead pack as much data into segments of the array. This greatly simplifies the API of the container which hides as much of the implementation details as possible. For instance, we calculate the number of elements we need for the each segment internally in the Array instead of making it part of the type. With the changes here, we're able to get a measurable improvement on the performance of profiling mode on top of what D48879 already provides. Depends on D48879. Reviewers: kpw, eizan Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D49363 llvm-svn: 337343
2018-07-18 10:08:39 +08:00
if (PreviousOffset != Size && PreviousOffset % ElementsPerSegment == 0) {
DCHECK_NE(S->Prev, &SentinelSegment);
S = S->Prev;
}
[XRay][profiler] Part 1: XRay Allocator and Array Implementations Summary: This change is part of the larger XRay Profiling Mode effort. Here we implement an arena allocator, for fixed sized buffers used in a segmented array implementation. This change adds the segmented array data structure, which relies on the allocator to provide and maintain the storage for the segmented array. Key features of the `Allocator` type: * It uses cache-aligned blocks, intended to host the actual data. These blocks are cache-line-size multiples of contiguous bytes. * The `Allocator` has a maximum memory budget, set at construction time. This allows us to cap the amount of data each specific `Allocator` instance is responsible for. * Upon destruction, the `Allocator` will clean up the storage it's used, handing it back to the internal allocator used in sanitizer_common. Key features of the `Array` type: * Each segmented array is always backed by an `Allocator`, which is either user-provided or uses a global allocator. * When an `Array` grows, it grows by appending a segment that's fixed-sized. The size of each segment is computed by the number of elements of type `T` that can fit into cache line multiples. * An `Array` does not return memory to the `Allocator`, but it can keep track of the current number of "live" objects it stores. * When an `Array` is destroyed, it will not return memory to the `Allocator`. Users should clean up the `Allocator` independently of the `Array`. * The `Array` type keeps a freelist of the chunks it's used before, so that trimming and growing will re-use previously allocated chunks. These basic data structures are used by the XRay Profiling Mode implementation to implement efficient and cache-aware storage for data that's typically read-and-write heavy for tracking latency information. We're relying on the cache line characteristics of the architecture to provide us good data isolation and cache friendliness, when we're performing operations like searching for elements and/or updating data hosted in these cache lines. Reviewers: echristo, pelikan, kpw Subscribers: mgorny, llvm-commits Differential Revision: https://reviews.llvm.org/D45756 llvm-svn: 331141
2018-04-29 21:46:30 +08:00
return *this;
}
Iterator operator++(int) XRAY_NEVER_INSTRUMENT {
[XRay][profiler] Part 1: XRay Allocator and Array Implementations Summary: This change is part of the larger XRay Profiling Mode effort. Here we implement an arena allocator, for fixed sized buffers used in a segmented array implementation. This change adds the segmented array data structure, which relies on the allocator to provide and maintain the storage for the segmented array. Key features of the `Allocator` type: * It uses cache-aligned blocks, intended to host the actual data. These blocks are cache-line-size multiples of contiguous bytes. * The `Allocator` has a maximum memory budget, set at construction time. This allows us to cap the amount of data each specific `Allocator` instance is responsible for. * Upon destruction, the `Allocator` will clean up the storage it's used, handing it back to the internal allocator used in sanitizer_common. Key features of the `Array` type: * Each segmented array is always backed by an `Allocator`, which is either user-provided or uses a global allocator. * When an `Array` grows, it grows by appending a segment that's fixed-sized. The size of each segment is computed by the number of elements of type `T` that can fit into cache line multiples. * An `Array` does not return memory to the `Allocator`, but it can keep track of the current number of "live" objects it stores. * When an `Array` is destroyed, it will not return memory to the `Allocator`. Users should clean up the `Allocator` independently of the `Array`. * The `Array` type keeps a freelist of the chunks it's used before, so that trimming and growing will re-use previously allocated chunks. These basic data structures are used by the XRay Profiling Mode implementation to implement efficient and cache-aware storage for data that's typically read-and-write heavy for tracking latency information. We're relying on the cache line characteristics of the architecture to provide us good data isolation and cache friendliness, when we're performing operations like searching for elements and/or updating data hosted in these cache lines. Reviewers: echristo, pelikan, kpw Subscribers: mgorny, llvm-commits Differential Revision: https://reviews.llvm.org/D45756 llvm-svn: 331141
2018-04-29 21:46:30 +08:00
Iterator Copy(*this);
++(*this);
return Copy;
}
Iterator operator--(int) XRAY_NEVER_INSTRUMENT {
[XRay][profiler] Part 1: XRay Allocator and Array Implementations Summary: This change is part of the larger XRay Profiling Mode effort. Here we implement an arena allocator, for fixed sized buffers used in a segmented array implementation. This change adds the segmented array data structure, which relies on the allocator to provide and maintain the storage for the segmented array. Key features of the `Allocator` type: * It uses cache-aligned blocks, intended to host the actual data. These blocks are cache-line-size multiples of contiguous bytes. * The `Allocator` has a maximum memory budget, set at construction time. This allows us to cap the amount of data each specific `Allocator` instance is responsible for. * Upon destruction, the `Allocator` will clean up the storage it's used, handing it back to the internal allocator used in sanitizer_common. Key features of the `Array` type: * Each segmented array is always backed by an `Allocator`, which is either user-provided or uses a global allocator. * When an `Array` grows, it grows by appending a segment that's fixed-sized. The size of each segment is computed by the number of elements of type `T` that can fit into cache line multiples. * An `Array` does not return memory to the `Allocator`, but it can keep track of the current number of "live" objects it stores. * When an `Array` is destroyed, it will not return memory to the `Allocator`. Users should clean up the `Allocator` independently of the `Array`. * The `Array` type keeps a freelist of the chunks it's used before, so that trimming and growing will re-use previously allocated chunks. These basic data structures are used by the XRay Profiling Mode implementation to implement efficient and cache-aware storage for data that's typically read-and-write heavy for tracking latency information. We're relying on the cache line characteristics of the architecture to provide us good data isolation and cache friendliness, when we're performing operations like searching for elements and/or updating data hosted in these cache lines. Reviewers: echristo, pelikan, kpw Subscribers: mgorny, llvm-commits Differential Revision: https://reviews.llvm.org/D45756 llvm-svn: 331141
2018-04-29 21:46:30 +08:00
Iterator Copy(*this);
--(*this);
return Copy;
}
template <class V, class W>
friend bool operator==(const Iterator<V> &L,
const Iterator<W> &R) XRAY_NEVER_INSTRUMENT {
[XRay][compiler-rt] Segmented Array: Simplify and Optimise Summary: This is a follow-on to D49217 which simplifies and optimises the implementation of the segmented array. In this patch we co-locate the book-keeping for segments in the `__xray::Array<T>` with the data it's managing. We take the chance in this patch to actually rename `Chunk` to `Segment` to better align with the high-level description of the segmented array. With measurements using benchmarks landed in D48879, we've identified that calls to `pthread_getspecific` started dominating the cycles, which led us to revert the change made in D49217 to use C++ thread_local initialisation instead (it reduces the cost by a huge margin, since we save one PLT-based call to pthread functions in the hot path). In particular, this is in `__xray::getThreadLocalData()`. We also took the opportunity to remove the least-common-multiple based calculation and instead pack as much data into segments of the array. This greatly simplifies the API of the container which hides as much of the implementation details as possible. For instance, we calculate the number of elements we need for the each segment internally in the Array instead of making it part of the type. With the changes here, we're able to get a measurable improvement on the performance of profiling mode on top of what D48879 already provides. Depends on D48879. Reviewers: kpw, eizan Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D49363 llvm-svn: 337343
2018-07-18 10:08:39 +08:00
return L.S == R.S && L.Offset == R.Offset;
[XRay][profiler] Part 1: XRay Allocator and Array Implementations Summary: This change is part of the larger XRay Profiling Mode effort. Here we implement an arena allocator, for fixed sized buffers used in a segmented array implementation. This change adds the segmented array data structure, which relies on the allocator to provide and maintain the storage for the segmented array. Key features of the `Allocator` type: * It uses cache-aligned blocks, intended to host the actual data. These blocks are cache-line-size multiples of contiguous bytes. * The `Allocator` has a maximum memory budget, set at construction time. This allows us to cap the amount of data each specific `Allocator` instance is responsible for. * Upon destruction, the `Allocator` will clean up the storage it's used, handing it back to the internal allocator used in sanitizer_common. Key features of the `Array` type: * Each segmented array is always backed by an `Allocator`, which is either user-provided or uses a global allocator. * When an `Array` grows, it grows by appending a segment that's fixed-sized. The size of each segment is computed by the number of elements of type `T` that can fit into cache line multiples. * An `Array` does not return memory to the `Allocator`, but it can keep track of the current number of "live" objects it stores. * When an `Array` is destroyed, it will not return memory to the `Allocator`. Users should clean up the `Allocator` independently of the `Array`. * The `Array` type keeps a freelist of the chunks it's used before, so that trimming and growing will re-use previously allocated chunks. These basic data structures are used by the XRay Profiling Mode implementation to implement efficient and cache-aware storage for data that's typically read-and-write heavy for tracking latency information. We're relying on the cache line characteristics of the architecture to provide us good data isolation and cache friendliness, when we're performing operations like searching for elements and/or updating data hosted in these cache lines. Reviewers: echristo, pelikan, kpw Subscribers: mgorny, llvm-commits Differential Revision: https://reviews.llvm.org/D45756 llvm-svn: 331141
2018-04-29 21:46:30 +08:00
}
template <class V, class W>
friend bool operator!=(const Iterator<V> &L,
const Iterator<W> &R) XRAY_NEVER_INSTRUMENT {
[XRay][profiler] Part 1: XRay Allocator and Array Implementations Summary: This change is part of the larger XRay Profiling Mode effort. Here we implement an arena allocator, for fixed sized buffers used in a segmented array implementation. This change adds the segmented array data structure, which relies on the allocator to provide and maintain the storage for the segmented array. Key features of the `Allocator` type: * It uses cache-aligned blocks, intended to host the actual data. These blocks are cache-line-size multiples of contiguous bytes. * The `Allocator` has a maximum memory budget, set at construction time. This allows us to cap the amount of data each specific `Allocator` instance is responsible for. * Upon destruction, the `Allocator` will clean up the storage it's used, handing it back to the internal allocator used in sanitizer_common. Key features of the `Array` type: * Each segmented array is always backed by an `Allocator`, which is either user-provided or uses a global allocator. * When an `Array` grows, it grows by appending a segment that's fixed-sized. The size of each segment is computed by the number of elements of type `T` that can fit into cache line multiples. * An `Array` does not return memory to the `Allocator`, but it can keep track of the current number of "live" objects it stores. * When an `Array` is destroyed, it will not return memory to the `Allocator`. Users should clean up the `Allocator` independently of the `Array`. * The `Array` type keeps a freelist of the chunks it's used before, so that trimming and growing will re-use previously allocated chunks. These basic data structures are used by the XRay Profiling Mode implementation to implement efficient and cache-aware storage for data that's typically read-and-write heavy for tracking latency information. We're relying on the cache line characteristics of the architecture to provide us good data isolation and cache friendliness, when we're performing operations like searching for elements and/or updating data hosted in these cache lines. Reviewers: echristo, pelikan, kpw Subscribers: mgorny, llvm-commits Differential Revision: https://reviews.llvm.org/D45756 llvm-svn: 331141
2018-04-29 21:46:30 +08:00
return !(L == R);
}
U &operator*() const XRAY_NEVER_INSTRUMENT {
[XRay][compiler-rt] Segmented Array: Simplify and Optimise Summary: This is a follow-on to D49217 which simplifies and optimises the implementation of the segmented array. In this patch we co-locate the book-keeping for segments in the `__xray::Array<T>` with the data it's managing. We take the chance in this patch to actually rename `Chunk` to `Segment` to better align with the high-level description of the segmented array. With measurements using benchmarks landed in D48879, we've identified that calls to `pthread_getspecific` started dominating the cycles, which led us to revert the change made in D49217 to use C++ thread_local initialisation instead (it reduces the cost by a huge margin, since we save one PLT-based call to pthread functions in the hot path). In particular, this is in `__xray::getThreadLocalData()`. We also took the opportunity to remove the least-common-multiple based calculation and instead pack as much data into segments of the array. This greatly simplifies the API of the container which hides as much of the implementation details as possible. For instance, we calculate the number of elements we need for the each segment internally in the Array instead of making it part of the type. With the changes here, we're able to get a measurable improvement on the performance of profiling mode on top of what D48879 already provides. Depends on D48879. Reviewers: kpw, eizan Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D49363 llvm-svn: 337343
2018-07-18 10:08:39 +08:00
DCHECK_NE(S, &SentinelSegment);
auto RelOff = Offset % ElementsPerSegment;
// We need to compute the character-aligned pointer, offset from the
// segment's Data location to get the element in the position of Offset.
auto Base = &S->Data;
[XRay][compiler-rt] Segmented Array: Simplify and Optimise Summary: This is a follow-on to D49217 which simplifies and optimises the implementation of the segmented array. In this patch we co-locate the book-keeping for segments in the `__xray::Array<T>` with the data it's managing. We take the chance in this patch to actually rename `Chunk` to `Segment` to better align with the high-level description of the segmented array. With measurements using benchmarks landed in D48879, we've identified that calls to `pthread_getspecific` started dominating the cycles, which led us to revert the change made in D49217 to use C++ thread_local initialisation instead (it reduces the cost by a huge margin, since we save one PLT-based call to pthread functions in the hot path). In particular, this is in `__xray::getThreadLocalData()`. We also took the opportunity to remove the least-common-multiple based calculation and instead pack as much data into segments of the array. This greatly simplifies the API of the container which hides as much of the implementation details as possible. For instance, we calculate the number of elements we need for the each segment internally in the Array instead of making it part of the type. With the changes here, we're able to get a measurable improvement on the performance of profiling mode on top of what D48879 already provides. Depends on D48879. Reviewers: kpw, eizan Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D49363 llvm-svn: 337343
2018-07-18 10:08:39 +08:00
auto AlignedOffset = Base + (RelOff * AlignedElementStorageSize);
return *reinterpret_cast<U *>(AlignedOffset);
[XRay][profiler] Part 1: XRay Allocator and Array Implementations Summary: This change is part of the larger XRay Profiling Mode effort. Here we implement an arena allocator, for fixed sized buffers used in a segmented array implementation. This change adds the segmented array data structure, which relies on the allocator to provide and maintain the storage for the segmented array. Key features of the `Allocator` type: * It uses cache-aligned blocks, intended to host the actual data. These blocks are cache-line-size multiples of contiguous bytes. * The `Allocator` has a maximum memory budget, set at construction time. This allows us to cap the amount of data each specific `Allocator` instance is responsible for. * Upon destruction, the `Allocator` will clean up the storage it's used, handing it back to the internal allocator used in sanitizer_common. Key features of the `Array` type: * Each segmented array is always backed by an `Allocator`, which is either user-provided or uses a global allocator. * When an `Array` grows, it grows by appending a segment that's fixed-sized. The size of each segment is computed by the number of elements of type `T` that can fit into cache line multiples. * An `Array` does not return memory to the `Allocator`, but it can keep track of the current number of "live" objects it stores. * When an `Array` is destroyed, it will not return memory to the `Allocator`. Users should clean up the `Allocator` independently of the `Array`. * The `Array` type keeps a freelist of the chunks it's used before, so that trimming and growing will re-use previously allocated chunks. These basic data structures are used by the XRay Profiling Mode implementation to implement efficient and cache-aware storage for data that's typically read-and-write heavy for tracking latency information. We're relying on the cache line characteristics of the architecture to provide us good data isolation and cache friendliness, when we're performing operations like searching for elements and/or updating data hosted in these cache lines. Reviewers: echristo, pelikan, kpw Subscribers: mgorny, llvm-commits Differential Revision: https://reviews.llvm.org/D45756 llvm-svn: 331141
2018-04-29 21:46:30 +08:00
}
U *operator->() const XRAY_NEVER_INSTRUMENT { return &(**this); }
[XRay][profiler] Part 1: XRay Allocator and Array Implementations Summary: This change is part of the larger XRay Profiling Mode effort. Here we implement an arena allocator, for fixed sized buffers used in a segmented array implementation. This change adds the segmented array data structure, which relies on the allocator to provide and maintain the storage for the segmented array. Key features of the `Allocator` type: * It uses cache-aligned blocks, intended to host the actual data. These blocks are cache-line-size multiples of contiguous bytes. * The `Allocator` has a maximum memory budget, set at construction time. This allows us to cap the amount of data each specific `Allocator` instance is responsible for. * Upon destruction, the `Allocator` will clean up the storage it's used, handing it back to the internal allocator used in sanitizer_common. Key features of the `Array` type: * Each segmented array is always backed by an `Allocator`, which is either user-provided or uses a global allocator. * When an `Array` grows, it grows by appending a segment that's fixed-sized. The size of each segment is computed by the number of elements of type `T` that can fit into cache line multiples. * An `Array` does not return memory to the `Allocator`, but it can keep track of the current number of "live" objects it stores. * When an `Array` is destroyed, it will not return memory to the `Allocator`. Users should clean up the `Allocator` independently of the `Array`. * The `Array` type keeps a freelist of the chunks it's used before, so that trimming and growing will re-use previously allocated chunks. These basic data structures are used by the XRay Profiling Mode implementation to implement efficient and cache-aware storage for data that's typically read-and-write heavy for tracking latency information. We're relying on the cache line characteristics of the architecture to provide us good data isolation and cache friendliness, when we're performing operations like searching for elements and/or updating data hosted in these cache lines. Reviewers: echristo, pelikan, kpw Subscribers: mgorny, llvm-commits Differential Revision: https://reviews.llvm.org/D45756 llvm-svn: 331141
2018-04-29 21:46:30 +08:00
};
AllocatorType *Alloc;
Segment *Head;
Segment *Tail;
// Here we keep track of segments in the freelist, to allow us to re-use
// segments when elements are trimmed off the end.
Segment *Freelist;
uint64_t Size;
// ===============================
// In the following implementation, we work through the algorithms and the
// list operations using the following notation:
//
// - pred(s) is the predecessor (previous node accessor) and succ(s) is
// the successor (next node accessor).
//
// - S is a sentinel segment, which has the following property:
//
// pred(S) == succ(S) == S
//
// - @ is a loop operator, which can imply pred(s) == s if it appears on
// the left of s, or succ(s) == S if it appears on the right of s.
//
// - sL <-> sR : means a bidirectional relation between sL and sR, which
// means:
//
// succ(sL) == sR && pred(SR) == sL
//
// - sL -> sR : implies a unidirectional relation between sL and SR,
// with the following properties:
//
// succ(sL) == sR
//
// sL <- sR : implies a unidirectional relation between sR and sL,
// with the following properties:
//
// pred(sR) == sL
//
// ===============================
Segment *NewSegment() XRAY_NEVER_INSTRUMENT {
// We need to handle the case in which enough elements have been trimmed to
// allow us to re-use segments we've allocated before. For this we look into
// the Freelist, to see whether we need to actually allocate new blocks or
// just re-use blocks we've already seen before.
if (Freelist != &SentinelSegment) {
// The current state of lists resemble something like this at this point:
//
// Freelist: @S@<-f0->...<->fN->@S@
// ^ Freelist
//
// We want to perform a splice of `f0` from Freelist to a temporary list,
// which looks like:
//
// Templist: @S@<-f0->@S@
// ^ FreeSegment
//
// Our algorithm preconditions are:
DCHECK_EQ(Freelist->Prev, &SentinelSegment);
// Then the algorithm we implement is:
//
// SFS = Freelist
// Freelist = succ(Freelist)
// if (Freelist != S)
// pred(Freelist) = S
// succ(SFS) = S
// pred(SFS) = S
//
auto *FreeSegment = Freelist;
Freelist = Freelist->Next;
// Note that we need to handle the case where Freelist is now pointing to
// S, which we don't want to be overwriting.
// TODO: Determine whether the cost of the branch is higher than the cost
// of the blind assignment.
if (Freelist != &SentinelSegment)
Freelist->Prev = &SentinelSegment;
FreeSegment->Next = &SentinelSegment;
FreeSegment->Prev = &SentinelSegment;
// Our postconditions are:
DCHECK_EQ(Freelist->Prev, &SentinelSegment);
DCHECK_NE(FreeSegment, &SentinelSegment);
return FreeSegment;
}
auto SegmentBlock = Alloc->Allocate();
if (SegmentBlock.Data == nullptr)
return nullptr;
// Placement-new the Segment element at the beginning of the SegmentBlock.
new (SegmentBlock.Data) Segment{&SentinelSegment, &SentinelSegment, {0}};
auto SB = reinterpret_cast<Segment *>(SegmentBlock.Data);
return SB;
}
Segment *InitHeadAndTail() XRAY_NEVER_INSTRUMENT {
DCHECK_EQ(Head, &SentinelSegment);
DCHECK_EQ(Tail, &SentinelSegment);
auto S = NewSegment();
if (S == nullptr)
return nullptr;
DCHECK_EQ(S->Next, &SentinelSegment);
DCHECK_EQ(S->Prev, &SentinelSegment);
DCHECK_NE(S, &SentinelSegment);
Head = S;
Tail = S;
DCHECK_EQ(Head, Tail);
DCHECK_EQ(Tail->Next, &SentinelSegment);
DCHECK_EQ(Tail->Prev, &SentinelSegment);
return S;
}
Segment *AppendNewSegment() XRAY_NEVER_INSTRUMENT {
auto S = NewSegment();
if (S == nullptr)
return nullptr;
DCHECK_NE(Tail, &SentinelSegment);
DCHECK_EQ(Tail->Next, &SentinelSegment);
DCHECK_EQ(S->Prev, &SentinelSegment);
DCHECK_EQ(S->Next, &SentinelSegment);
S->Prev = Tail;
Tail->Next = S;
Tail = S;
DCHECK_EQ(S, S->Prev->Next);
DCHECK_EQ(Tail->Next, &SentinelSegment);
return S;
}
[XRay][profiler] Part 1: XRay Allocator and Array Implementations Summary: This change is part of the larger XRay Profiling Mode effort. Here we implement an arena allocator, for fixed sized buffers used in a segmented array implementation. This change adds the segmented array data structure, which relies on the allocator to provide and maintain the storage for the segmented array. Key features of the `Allocator` type: * It uses cache-aligned blocks, intended to host the actual data. These blocks are cache-line-size multiples of contiguous bytes. * The `Allocator` has a maximum memory budget, set at construction time. This allows us to cap the amount of data each specific `Allocator` instance is responsible for. * Upon destruction, the `Allocator` will clean up the storage it's used, handing it back to the internal allocator used in sanitizer_common. Key features of the `Array` type: * Each segmented array is always backed by an `Allocator`, which is either user-provided or uses a global allocator. * When an `Array` grows, it grows by appending a segment that's fixed-sized. The size of each segment is computed by the number of elements of type `T` that can fit into cache line multiples. * An `Array` does not return memory to the `Allocator`, but it can keep track of the current number of "live" objects it stores. * When an `Array` is destroyed, it will not return memory to the `Allocator`. Users should clean up the `Allocator` independently of the `Array`. * The `Array` type keeps a freelist of the chunks it's used before, so that trimming and growing will re-use previously allocated chunks. These basic data structures are used by the XRay Profiling Mode implementation to implement efficient and cache-aware storage for data that's typically read-and-write heavy for tracking latency information. We're relying on the cache line characteristics of the architecture to provide us good data isolation and cache friendliness, when we're performing operations like searching for elements and/or updating data hosted in these cache lines. Reviewers: echristo, pelikan, kpw Subscribers: mgorny, llvm-commits Differential Revision: https://reviews.llvm.org/D45756 llvm-svn: 331141
2018-04-29 21:46:30 +08:00
public:
explicit Array(AllocatorType &A) XRAY_NEVER_INSTRUMENT
: Alloc(&A),
Head(&SentinelSegment),
Tail(&SentinelSegment),
Freelist(&SentinelSegment),
Size(0) {}
Array() XRAY_NEVER_INSTRUMENT : Alloc(nullptr),
Head(&SentinelSegment),
Tail(&SentinelSegment),
Freelist(&SentinelSegment),
Size(0) {}
[XRay][profiler] Part 1: XRay Allocator and Array Implementations Summary: This change is part of the larger XRay Profiling Mode effort. Here we implement an arena allocator, for fixed sized buffers used in a segmented array implementation. This change adds the segmented array data structure, which relies on the allocator to provide and maintain the storage for the segmented array. Key features of the `Allocator` type: * It uses cache-aligned blocks, intended to host the actual data. These blocks are cache-line-size multiples of contiguous bytes. * The `Allocator` has a maximum memory budget, set at construction time. This allows us to cap the amount of data each specific `Allocator` instance is responsible for. * Upon destruction, the `Allocator` will clean up the storage it's used, handing it back to the internal allocator used in sanitizer_common. Key features of the `Array` type: * Each segmented array is always backed by an `Allocator`, which is either user-provided or uses a global allocator. * When an `Array` grows, it grows by appending a segment that's fixed-sized. The size of each segment is computed by the number of elements of type `T` that can fit into cache line multiples. * An `Array` does not return memory to the `Allocator`, but it can keep track of the current number of "live" objects it stores. * When an `Array` is destroyed, it will not return memory to the `Allocator`. Users should clean up the `Allocator` independently of the `Array`. * The `Array` type keeps a freelist of the chunks it's used before, so that trimming and growing will re-use previously allocated chunks. These basic data structures are used by the XRay Profiling Mode implementation to implement efficient and cache-aware storage for data that's typically read-and-write heavy for tracking latency information. We're relying on the cache line characteristics of the architecture to provide us good data isolation and cache friendliness, when we're performing operations like searching for elements and/or updating data hosted in these cache lines. Reviewers: echristo, pelikan, kpw Subscribers: mgorny, llvm-commits Differential Revision: https://reviews.llvm.org/D45756 llvm-svn: 331141
2018-04-29 21:46:30 +08:00
Array(const Array &) = delete;
Array &operator=(const Array &) = delete;
Array(Array &&O) XRAY_NEVER_INSTRUMENT : Alloc(O.Alloc),
Head(O.Head),
Tail(O.Tail),
Freelist(O.Freelist),
Size(O.Size) {
O.Alloc = nullptr;
O.Head = &SentinelSegment;
O.Tail = &SentinelSegment;
O.Size = 0;
O.Freelist = &SentinelSegment;
}
Array &operator=(Array &&O) XRAY_NEVER_INSTRUMENT {
Alloc = O.Alloc;
O.Alloc = nullptr;
Head = O.Head;
[XRay][compiler-rt] Segmented Array: Simplify and Optimise Summary: This is a follow-on to D49217 which simplifies and optimises the implementation of the segmented array. In this patch we co-locate the book-keeping for segments in the `__xray::Array<T>` with the data it's managing. We take the chance in this patch to actually rename `Chunk` to `Segment` to better align with the high-level description of the segmented array. With measurements using benchmarks landed in D48879, we've identified that calls to `pthread_getspecific` started dominating the cycles, which led us to revert the change made in D49217 to use C++ thread_local initialisation instead (it reduces the cost by a huge margin, since we save one PLT-based call to pthread functions in the hot path). In particular, this is in `__xray::getThreadLocalData()`. We also took the opportunity to remove the least-common-multiple based calculation and instead pack as much data into segments of the array. This greatly simplifies the API of the container which hides as much of the implementation details as possible. For instance, we calculate the number of elements we need for the each segment internally in the Array instead of making it part of the type. With the changes here, we're able to get a measurable improvement on the performance of profiling mode on top of what D48879 already provides. Depends on D48879. Reviewers: kpw, eizan Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D49363 llvm-svn: 337343
2018-07-18 10:08:39 +08:00
O.Head = &SentinelSegment;
Tail = O.Tail;
[XRay][compiler-rt] Segmented Array: Simplify and Optimise Summary: This is a follow-on to D49217 which simplifies and optimises the implementation of the segmented array. In this patch we co-locate the book-keeping for segments in the `__xray::Array<T>` with the data it's managing. We take the chance in this patch to actually rename `Chunk` to `Segment` to better align with the high-level description of the segmented array. With measurements using benchmarks landed in D48879, we've identified that calls to `pthread_getspecific` started dominating the cycles, which led us to revert the change made in D49217 to use C++ thread_local initialisation instead (it reduces the cost by a huge margin, since we save one PLT-based call to pthread functions in the hot path). In particular, this is in `__xray::getThreadLocalData()`. We also took the opportunity to remove the least-common-multiple based calculation and instead pack as much data into segments of the array. This greatly simplifies the API of the container which hides as much of the implementation details as possible. For instance, we calculate the number of elements we need for the each segment internally in the Array instead of making it part of the type. With the changes here, we're able to get a measurable improvement on the performance of profiling mode on top of what D48879 already provides. Depends on D48879. Reviewers: kpw, eizan Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D49363 llvm-svn: 337343
2018-07-18 10:08:39 +08:00
O.Tail = &SentinelSegment;
Freelist = O.Freelist;
O.Freelist = &SentinelSegment;
Size = O.Size;
[XRay][profiler] Part 1: XRay Allocator and Array Implementations Summary: This change is part of the larger XRay Profiling Mode effort. Here we implement an arena allocator, for fixed sized buffers used in a segmented array implementation. This change adds the segmented array data structure, which relies on the allocator to provide and maintain the storage for the segmented array. Key features of the `Allocator` type: * It uses cache-aligned blocks, intended to host the actual data. These blocks are cache-line-size multiples of contiguous bytes. * The `Allocator` has a maximum memory budget, set at construction time. This allows us to cap the amount of data each specific `Allocator` instance is responsible for. * Upon destruction, the `Allocator` will clean up the storage it's used, handing it back to the internal allocator used in sanitizer_common. Key features of the `Array` type: * Each segmented array is always backed by an `Allocator`, which is either user-provided or uses a global allocator. * When an `Array` grows, it grows by appending a segment that's fixed-sized. The size of each segment is computed by the number of elements of type `T` that can fit into cache line multiples. * An `Array` does not return memory to the `Allocator`, but it can keep track of the current number of "live" objects it stores. * When an `Array` is destroyed, it will not return memory to the `Allocator`. Users should clean up the `Allocator` independently of the `Array`. * The `Array` type keeps a freelist of the chunks it's used before, so that trimming and growing will re-use previously allocated chunks. These basic data structures are used by the XRay Profiling Mode implementation to implement efficient and cache-aware storage for data that's typically read-and-write heavy for tracking latency information. We're relying on the cache line characteristics of the architecture to provide us good data isolation and cache friendliness, when we're performing operations like searching for elements and/or updating data hosted in these cache lines. Reviewers: echristo, pelikan, kpw Subscribers: mgorny, llvm-commits Differential Revision: https://reviews.llvm.org/D45756 llvm-svn: 331141
2018-04-29 21:46:30 +08:00
O.Size = 0;
return *this;
}
~Array() XRAY_NEVER_INSTRUMENT {
for (auto &E : *this)
(&E)->~T();
[XRay][profiler] Part 1: XRay Allocator and Array Implementations Summary: This change is part of the larger XRay Profiling Mode effort. Here we implement an arena allocator, for fixed sized buffers used in a segmented array implementation. This change adds the segmented array data structure, which relies on the allocator to provide and maintain the storage for the segmented array. Key features of the `Allocator` type: * It uses cache-aligned blocks, intended to host the actual data. These blocks are cache-line-size multiples of contiguous bytes. * The `Allocator` has a maximum memory budget, set at construction time. This allows us to cap the amount of data each specific `Allocator` instance is responsible for. * Upon destruction, the `Allocator` will clean up the storage it's used, handing it back to the internal allocator used in sanitizer_common. Key features of the `Array` type: * Each segmented array is always backed by an `Allocator`, which is either user-provided or uses a global allocator. * When an `Array` grows, it grows by appending a segment that's fixed-sized. The size of each segment is computed by the number of elements of type `T` that can fit into cache line multiples. * An `Array` does not return memory to the `Allocator`, but it can keep track of the current number of "live" objects it stores. * When an `Array` is destroyed, it will not return memory to the `Allocator`. Users should clean up the `Allocator` independently of the `Array`. * The `Array` type keeps a freelist of the chunks it's used before, so that trimming and growing will re-use previously allocated chunks. These basic data structures are used by the XRay Profiling Mode implementation to implement efficient and cache-aware storage for data that's typically read-and-write heavy for tracking latency information. We're relying on the cache line characteristics of the architecture to provide us good data isolation and cache friendliness, when we're performing operations like searching for elements and/or updating data hosted in these cache lines. Reviewers: echristo, pelikan, kpw Subscribers: mgorny, llvm-commits Differential Revision: https://reviews.llvm.org/D45756 llvm-svn: 331141
2018-04-29 21:46:30 +08:00
}
bool empty() const XRAY_NEVER_INSTRUMENT { return Size == 0; }
[XRay][profiler] Part 1: XRay Allocator and Array Implementations Summary: This change is part of the larger XRay Profiling Mode effort. Here we implement an arena allocator, for fixed sized buffers used in a segmented array implementation. This change adds the segmented array data structure, which relies on the allocator to provide and maintain the storage for the segmented array. Key features of the `Allocator` type: * It uses cache-aligned blocks, intended to host the actual data. These blocks are cache-line-size multiples of contiguous bytes. * The `Allocator` has a maximum memory budget, set at construction time. This allows us to cap the amount of data each specific `Allocator` instance is responsible for. * Upon destruction, the `Allocator` will clean up the storage it's used, handing it back to the internal allocator used in sanitizer_common. Key features of the `Array` type: * Each segmented array is always backed by an `Allocator`, which is either user-provided or uses a global allocator. * When an `Array` grows, it grows by appending a segment that's fixed-sized. The size of each segment is computed by the number of elements of type `T` that can fit into cache line multiples. * An `Array` does not return memory to the `Allocator`, but it can keep track of the current number of "live" objects it stores. * When an `Array` is destroyed, it will not return memory to the `Allocator`. Users should clean up the `Allocator` independently of the `Array`. * The `Array` type keeps a freelist of the chunks it's used before, so that trimming and growing will re-use previously allocated chunks. These basic data structures are used by the XRay Profiling Mode implementation to implement efficient and cache-aware storage for data that's typically read-and-write heavy for tracking latency information. We're relying on the cache line characteristics of the architecture to provide us good data isolation and cache friendliness, when we're performing operations like searching for elements and/or updating data hosted in these cache lines. Reviewers: echristo, pelikan, kpw Subscribers: mgorny, llvm-commits Differential Revision: https://reviews.llvm.org/D45756 llvm-svn: 331141
2018-04-29 21:46:30 +08:00
AllocatorType &allocator() const XRAY_NEVER_INSTRUMENT {
DCHECK_NE(Alloc, nullptr);
return *Alloc;
[XRay][profiler] Part 1: XRay Allocator and Array Implementations Summary: This change is part of the larger XRay Profiling Mode effort. Here we implement an arena allocator, for fixed sized buffers used in a segmented array implementation. This change adds the segmented array data structure, which relies on the allocator to provide and maintain the storage for the segmented array. Key features of the `Allocator` type: * It uses cache-aligned blocks, intended to host the actual data. These blocks are cache-line-size multiples of contiguous bytes. * The `Allocator` has a maximum memory budget, set at construction time. This allows us to cap the amount of data each specific `Allocator` instance is responsible for. * Upon destruction, the `Allocator` will clean up the storage it's used, handing it back to the internal allocator used in sanitizer_common. Key features of the `Array` type: * Each segmented array is always backed by an `Allocator`, which is either user-provided or uses a global allocator. * When an `Array` grows, it grows by appending a segment that's fixed-sized. The size of each segment is computed by the number of elements of type `T` that can fit into cache line multiples. * An `Array` does not return memory to the `Allocator`, but it can keep track of the current number of "live" objects it stores. * When an `Array` is destroyed, it will not return memory to the `Allocator`. Users should clean up the `Allocator` independently of the `Array`. * The `Array` type keeps a freelist of the chunks it's used before, so that trimming and growing will re-use previously allocated chunks. These basic data structures are used by the XRay Profiling Mode implementation to implement efficient and cache-aware storage for data that's typically read-and-write heavy for tracking latency information. We're relying on the cache line characteristics of the architecture to provide us good data isolation and cache friendliness, when we're performing operations like searching for elements and/or updating data hosted in these cache lines. Reviewers: echristo, pelikan, kpw Subscribers: mgorny, llvm-commits Differential Revision: https://reviews.llvm.org/D45756 llvm-svn: 331141
2018-04-29 21:46:30 +08:00
}
uint64_t size() const XRAY_NEVER_INSTRUMENT { return Size; }
Revert r348335 "[XRay] Move-only Allocator, FunctionCallTrie, and Array" .. and also the follow-ups r348336 r348338. It broke stand-alone compiler-rt builds with GCC 4.8: In file included from /work/llvm/projects/compiler-rt/lib/xray/xray_function_call_trie.h:20:0,                  from /work/llvm/projects/compiler-rt/lib/xray/xray_profile_collector.h:21,                  from /work/llvm/projects/compiler-rt/lib/xray/xray_profile_collector.cc:15: /work/llvm/projects/compiler-rt/lib/xray/xray_segmented_array.h: In instantiation of ‘T* __xray::Array<T>::AppendEmplace(Args&& ...) [with Args = {const __xray::FunctionCallTrie::mergeInto(__xray::FunctionCallTrie&) const::NodeAndTarget&}; T = __xray::FunctionCallTrie::mergeInto(__xray::FunctionCallTrie&) const::NodeAndTarget]’: /work/llvm/projects/compiler-rt/lib/xray/xray_segmented_array.h:383:71:   required from ‘T* __xray::Array<T>::Append(const T&) [with T = __xray::FunctionCallTrie::mergeInto(__xray::FunctionCallTrie&) const::NodeAndTarget]’ /work/llvm/projects/compiler-rt/lib/xray/xray_function_call_trie.h:517:54:   required from here /work/llvm/projects/compiler-rt/lib/xray/xray_segmented_array.h:378:5: error: could not convert ‘{std::forward<const __xray::FunctionCallTrie::mergeInto(__xray::FunctionCallTrie&) const::NodeAndTarget&>((* & args#0))}’ from ‘<brace-enclosed initializer list>’ to ‘__xray::FunctionCallTrie::mergeInto(__xray::FunctionCallTrie&) const::NodeAndTarget’      new (AlignedOffset) T{std::forward<Args>(args)...};      ^ /work/llvm/projects/compiler-rt/lib/xray/xray_segmented_array.h: In instantiation of ‘T* __xray::Array<T>::AppendEmplace(Args&& ...) [with Args = {const __xray::profileCollectorService::{anonymous}::ThreadTrie&}; T = __xray::profileCollectorService::{anonymous}::ThreadTrie]’: /work/llvm/projects/compiler-rt/lib/xray/xray_segmented_array.h:383:71:   required from ‘T* __xray::Array<T>::Append(const T&) [with T = __xray::profileCollectorService::{anonymous}::ThreadTrie]’ /work/llvm/projects/compiler-rt/lib/xray/xray_profile_collector.cc:98:34:   required from here /work/llvm/projects/compiler-rt/lib/xray/xray_segmented_array.h:378:5: error: could not convert ‘{std::forward<const __xray::profileCollectorService::{anonymous}::ThreadTrie&>((* & args#0))}’ from ‘<brace-enclosed initializer list>’ to ‘__xray::profileCollectorService::{anonymous}::ThreadTrie’ /work/llvm/projects/compiler-rt/lib/xray/xray_segmented_array.h: In instantiation of ‘T* __xray::Array<T>::AppendEmplace(Args&& ...) [with Args = {const __xray::profileCollectorService::{anonymous}::ProfileBuffer&}; T = __xray::profileCollectorService::{anonymous}::ProfileBuffer]’: /work/llvm/projects/compiler-rt/lib/xray/xray_segmented_array.h:383:71:   required from ‘T* __xray::Array<T>::Append(const T&) [with T = __xray::profileCollectorService::{anonymous}::ProfileBuffer] ’ /work/llvm/projects/compiler-rt/lib/xray/xray_profile_collector.cc:244:44:   required from here /work/llvm/projects/compiler-rt/lib/xray/xray_segmented_array.h:378:5: error: could not convert ‘{std::forward<const __xray::profileCollectorService::{anonymous}::ProfileBuffer&>((* & args#0))}’ from ‘<brace-enclosed initializer list>’ to ‘__xray::profileCollectorService::{anonymous}::ProfileBuffer’ > Summary: > This change makes the allocator and function call trie implementations > move-aware and remove the FunctionCallTrie's reliance on a > heap-allocated set of allocators. > > The change makes it possible to always have storage associated with > Allocator instances, not necessarily having heap-allocated memory > obtainable from these allocator instances. We also use thread-local > uninitialised storage. > > We've also re-worked the segmented array implementation to have more > precondition and post-condition checks when built in debug mode. This > enables us to better implement some of the operations with surrounding > documentation as well. The `trim` algorithm now has more documentation > on the implementation, reducing the requirement to handle special > conditions, and being more rigorous on the computations involved. > > In this change we also introduce an initialisation guard, through which > we prevent an initialisation operation from racing with a cleanup > operation. > > We also ensure that the ThreadTries array is not destroyed while copies > into the elements are still being performed by other threads submitting > profiles. > > Note that this change still has an issue with accessing thread-local > storage from signal handlers that are instrumented with XRay. We also > learn that with the testing of this patch, that there will be cases > where calls to mmap(...) (through internal_mmap(...)) might be called in > signal handlers, but are not async-signal-safe. Subsequent patches will > address this, by re-using the `BufferQueue` type used in the FDR mode > implementation for pre-allocated memory segments per active, tracing > thread. > > We still want to land this change despite the known issues, with fixes > forthcoming. > > Reviewers: mboerger, jfb > > Subscribers: jfb, llvm-commits > > Differential Revision: https://reviews.llvm.org/D54989 llvm-svn: 348346
2018-12-05 18:19:55 +08:00
template <class... Args>
T *AppendEmplace(Args &&... args) XRAY_NEVER_INSTRUMENT {
DCHECK((Size == 0 && Head == &SentinelSegment && Head == Tail) ||
(Size != 0 && Head != &SentinelSegment && Tail != &SentinelSegment));
if (UNLIKELY(Head == &SentinelSegment)) {
auto R = InitHeadAndTail();
if (R == nullptr)
Revert r348335 "[XRay] Move-only Allocator, FunctionCallTrie, and Array" .. and also the follow-ups r348336 r348338. It broke stand-alone compiler-rt builds with GCC 4.8: In file included from /work/llvm/projects/compiler-rt/lib/xray/xray_function_call_trie.h:20:0,                  from /work/llvm/projects/compiler-rt/lib/xray/xray_profile_collector.h:21,                  from /work/llvm/projects/compiler-rt/lib/xray/xray_profile_collector.cc:15: /work/llvm/projects/compiler-rt/lib/xray/xray_segmented_array.h: In instantiation of ‘T* __xray::Array<T>::AppendEmplace(Args&& ...) [with Args = {const __xray::FunctionCallTrie::mergeInto(__xray::FunctionCallTrie&) const::NodeAndTarget&}; T = __xray::FunctionCallTrie::mergeInto(__xray::FunctionCallTrie&) const::NodeAndTarget]’: /work/llvm/projects/compiler-rt/lib/xray/xray_segmented_array.h:383:71:   required from ‘T* __xray::Array<T>::Append(const T&) [with T = __xray::FunctionCallTrie::mergeInto(__xray::FunctionCallTrie&) const::NodeAndTarget]’ /work/llvm/projects/compiler-rt/lib/xray/xray_function_call_trie.h:517:54:   required from here /work/llvm/projects/compiler-rt/lib/xray/xray_segmented_array.h:378:5: error: could not convert ‘{std::forward<const __xray::FunctionCallTrie::mergeInto(__xray::FunctionCallTrie&) const::NodeAndTarget&>((* & args#0))}’ from ‘<brace-enclosed initializer list>’ to ‘__xray::FunctionCallTrie::mergeInto(__xray::FunctionCallTrie&) const::NodeAndTarget’      new (AlignedOffset) T{std::forward<Args>(args)...};      ^ /work/llvm/projects/compiler-rt/lib/xray/xray_segmented_array.h: In instantiation of ‘T* __xray::Array<T>::AppendEmplace(Args&& ...) [with Args = {const __xray::profileCollectorService::{anonymous}::ThreadTrie&}; T = __xray::profileCollectorService::{anonymous}::ThreadTrie]’: /work/llvm/projects/compiler-rt/lib/xray/xray_segmented_array.h:383:71:   required from ‘T* __xray::Array<T>::Append(const T&) [with T = __xray::profileCollectorService::{anonymous}::ThreadTrie]’ /work/llvm/projects/compiler-rt/lib/xray/xray_profile_collector.cc:98:34:   required from here /work/llvm/projects/compiler-rt/lib/xray/xray_segmented_array.h:378:5: error: could not convert ‘{std::forward<const __xray::profileCollectorService::{anonymous}::ThreadTrie&>((* & args#0))}’ from ‘<brace-enclosed initializer list>’ to ‘__xray::profileCollectorService::{anonymous}::ThreadTrie’ /work/llvm/projects/compiler-rt/lib/xray/xray_segmented_array.h: In instantiation of ‘T* __xray::Array<T>::AppendEmplace(Args&& ...) [with Args = {const __xray::profileCollectorService::{anonymous}::ProfileBuffer&}; T = __xray::profileCollectorService::{anonymous}::ProfileBuffer]’: /work/llvm/projects/compiler-rt/lib/xray/xray_segmented_array.h:383:71:   required from ‘T* __xray::Array<T>::Append(const T&) [with T = __xray::profileCollectorService::{anonymous}::ProfileBuffer] ’ /work/llvm/projects/compiler-rt/lib/xray/xray_profile_collector.cc:244:44:   required from here /work/llvm/projects/compiler-rt/lib/xray/xray_segmented_array.h:378:5: error: could not convert ‘{std::forward<const __xray::profileCollectorService::{anonymous}::ProfileBuffer&>((* & args#0))}’ from ‘<brace-enclosed initializer list>’ to ‘__xray::profileCollectorService::{anonymous}::ProfileBuffer’ > Summary: > This change makes the allocator and function call trie implementations > move-aware and remove the FunctionCallTrie's reliance on a > heap-allocated set of allocators. > > The change makes it possible to always have storage associated with > Allocator instances, not necessarily having heap-allocated memory > obtainable from these allocator instances. We also use thread-local > uninitialised storage. > > We've also re-worked the segmented array implementation to have more > precondition and post-condition checks when built in debug mode. This > enables us to better implement some of the operations with surrounding > documentation as well. The `trim` algorithm now has more documentation > on the implementation, reducing the requirement to handle special > conditions, and being more rigorous on the computations involved. > > In this change we also introduce an initialisation guard, through which > we prevent an initialisation operation from racing with a cleanup > operation. > > We also ensure that the ThreadTries array is not destroyed while copies > into the elements are still being performed by other threads submitting > profiles. > > Note that this change still has an issue with accessing thread-local > storage from signal handlers that are instrumented with XRay. We also > learn that with the testing of this patch, that there will be cases > where calls to mmap(...) (through internal_mmap(...)) might be called in > signal handlers, but are not async-signal-safe. Subsequent patches will > address this, by re-using the `BufferQueue` type used in the FDR mode > implementation for pre-allocated memory segments per active, tracing > thread. > > We still want to land this change despite the known issues, with fixes > forthcoming. > > Reviewers: mboerger, jfb > > Subscribers: jfb, llvm-commits > > Differential Revision: https://reviews.llvm.org/D54989 llvm-svn: 348346
2018-12-05 18:19:55 +08:00
return nullptr;
}
DCHECK_NE(Head, &SentinelSegment);
DCHECK_NE(Tail, &SentinelSegment);
Revert r348335 "[XRay] Move-only Allocator, FunctionCallTrie, and Array" .. and also the follow-ups r348336 r348338. It broke stand-alone compiler-rt builds with GCC 4.8: In file included from /work/llvm/projects/compiler-rt/lib/xray/xray_function_call_trie.h:20:0,                  from /work/llvm/projects/compiler-rt/lib/xray/xray_profile_collector.h:21,                  from /work/llvm/projects/compiler-rt/lib/xray/xray_profile_collector.cc:15: /work/llvm/projects/compiler-rt/lib/xray/xray_segmented_array.h: In instantiation of ‘T* __xray::Array<T>::AppendEmplace(Args&& ...) [with Args = {const __xray::FunctionCallTrie::mergeInto(__xray::FunctionCallTrie&) const::NodeAndTarget&}; T = __xray::FunctionCallTrie::mergeInto(__xray::FunctionCallTrie&) const::NodeAndTarget]’: /work/llvm/projects/compiler-rt/lib/xray/xray_segmented_array.h:383:71:   required from ‘T* __xray::Array<T>::Append(const T&) [with T = __xray::FunctionCallTrie::mergeInto(__xray::FunctionCallTrie&) const::NodeAndTarget]’ /work/llvm/projects/compiler-rt/lib/xray/xray_function_call_trie.h:517:54:   required from here /work/llvm/projects/compiler-rt/lib/xray/xray_segmented_array.h:378:5: error: could not convert ‘{std::forward<const __xray::FunctionCallTrie::mergeInto(__xray::FunctionCallTrie&) const::NodeAndTarget&>((* & args#0))}’ from ‘<brace-enclosed initializer list>’ to ‘__xray::FunctionCallTrie::mergeInto(__xray::FunctionCallTrie&) const::NodeAndTarget’      new (AlignedOffset) T{std::forward<Args>(args)...};      ^ /work/llvm/projects/compiler-rt/lib/xray/xray_segmented_array.h: In instantiation of ‘T* __xray::Array<T>::AppendEmplace(Args&& ...) [with Args = {const __xray::profileCollectorService::{anonymous}::ThreadTrie&}; T = __xray::profileCollectorService::{anonymous}::ThreadTrie]’: /work/llvm/projects/compiler-rt/lib/xray/xray_segmented_array.h:383:71:   required from ‘T* __xray::Array<T>::Append(const T&) [with T = __xray::profileCollectorService::{anonymous}::ThreadTrie]’ /work/llvm/projects/compiler-rt/lib/xray/xray_profile_collector.cc:98:34:   required from here /work/llvm/projects/compiler-rt/lib/xray/xray_segmented_array.h:378:5: error: could not convert ‘{std::forward<const __xray::profileCollectorService::{anonymous}::ThreadTrie&>((* & args#0))}’ from ‘<brace-enclosed initializer list>’ to ‘__xray::profileCollectorService::{anonymous}::ThreadTrie’ /work/llvm/projects/compiler-rt/lib/xray/xray_segmented_array.h: In instantiation of ‘T* __xray::Array<T>::AppendEmplace(Args&& ...) [with Args = {const __xray::profileCollectorService::{anonymous}::ProfileBuffer&}; T = __xray::profileCollectorService::{anonymous}::ProfileBuffer]’: /work/llvm/projects/compiler-rt/lib/xray/xray_segmented_array.h:383:71:   required from ‘T* __xray::Array<T>::Append(const T&) [with T = __xray::profileCollectorService::{anonymous}::ProfileBuffer] ’ /work/llvm/projects/compiler-rt/lib/xray/xray_profile_collector.cc:244:44:   required from here /work/llvm/projects/compiler-rt/lib/xray/xray_segmented_array.h:378:5: error: could not convert ‘{std::forward<const __xray::profileCollectorService::{anonymous}::ProfileBuffer&>((* & args#0))}’ from ‘<brace-enclosed initializer list>’ to ‘__xray::profileCollectorService::{anonymous}::ProfileBuffer’ > Summary: > This change makes the allocator and function call trie implementations > move-aware and remove the FunctionCallTrie's reliance on a > heap-allocated set of allocators. > > The change makes it possible to always have storage associated with > Allocator instances, not necessarily having heap-allocated memory > obtainable from these allocator instances. We also use thread-local > uninitialised storage. > > We've also re-worked the segmented array implementation to have more > precondition and post-condition checks when built in debug mode. This > enables us to better implement some of the operations with surrounding > documentation as well. The `trim` algorithm now has more documentation > on the implementation, reducing the requirement to handle special > conditions, and being more rigorous on the computations involved. > > In this change we also introduce an initialisation guard, through which > we prevent an initialisation operation from racing with a cleanup > operation. > > We also ensure that the ThreadTries array is not destroyed while copies > into the elements are still being performed by other threads submitting > profiles. > > Note that this change still has an issue with accessing thread-local > storage from signal handlers that are instrumented with XRay. We also > learn that with the testing of this patch, that there will be cases > where calls to mmap(...) (through internal_mmap(...)) might be called in > signal handlers, but are not async-signal-safe. Subsequent patches will > address this, by re-using the `BufferQueue` type used in the FDR mode > implementation for pre-allocated memory segments per active, tracing > thread. > > We still want to land this change despite the known issues, with fixes > forthcoming. > > Reviewers: mboerger, jfb > > Subscribers: jfb, llvm-commits > > Differential Revision: https://reviews.llvm.org/D54989 llvm-svn: 348346
2018-12-05 18:19:55 +08:00
auto Offset = Size % ElementsPerSegment;
if (UNLIKELY(Size != 0 && Offset == 0))
if (AppendNewSegment() == nullptr)
Revert r348335 "[XRay] Move-only Allocator, FunctionCallTrie, and Array" .. and also the follow-ups r348336 r348338. It broke stand-alone compiler-rt builds with GCC 4.8: In file included from /work/llvm/projects/compiler-rt/lib/xray/xray_function_call_trie.h:20:0,                  from /work/llvm/projects/compiler-rt/lib/xray/xray_profile_collector.h:21,                  from /work/llvm/projects/compiler-rt/lib/xray/xray_profile_collector.cc:15: /work/llvm/projects/compiler-rt/lib/xray/xray_segmented_array.h: In instantiation of ‘T* __xray::Array<T>::AppendEmplace(Args&& ...) [with Args = {const __xray::FunctionCallTrie::mergeInto(__xray::FunctionCallTrie&) const::NodeAndTarget&}; T = __xray::FunctionCallTrie::mergeInto(__xray::FunctionCallTrie&) const::NodeAndTarget]’: /work/llvm/projects/compiler-rt/lib/xray/xray_segmented_array.h:383:71:   required from ‘T* __xray::Array<T>::Append(const T&) [with T = __xray::FunctionCallTrie::mergeInto(__xray::FunctionCallTrie&) const::NodeAndTarget]’ /work/llvm/projects/compiler-rt/lib/xray/xray_function_call_trie.h:517:54:   required from here /work/llvm/projects/compiler-rt/lib/xray/xray_segmented_array.h:378:5: error: could not convert ‘{std::forward<const __xray::FunctionCallTrie::mergeInto(__xray::FunctionCallTrie&) const::NodeAndTarget&>((* & args#0))}’ from ‘<brace-enclosed initializer list>’ to ‘__xray::FunctionCallTrie::mergeInto(__xray::FunctionCallTrie&) const::NodeAndTarget’      new (AlignedOffset) T{std::forward<Args>(args)...};      ^ /work/llvm/projects/compiler-rt/lib/xray/xray_segmented_array.h: In instantiation of ‘T* __xray::Array<T>::AppendEmplace(Args&& ...) [with Args = {const __xray::profileCollectorService::{anonymous}::ThreadTrie&}; T = __xray::profileCollectorService::{anonymous}::ThreadTrie]’: /work/llvm/projects/compiler-rt/lib/xray/xray_segmented_array.h:383:71:   required from ‘T* __xray::Array<T>::Append(const T&) [with T = __xray::profileCollectorService::{anonymous}::ThreadTrie]’ /work/llvm/projects/compiler-rt/lib/xray/xray_profile_collector.cc:98:34:   required from here /work/llvm/projects/compiler-rt/lib/xray/xray_segmented_array.h:378:5: error: could not convert ‘{std::forward<const __xray::profileCollectorService::{anonymous}::ThreadTrie&>((* & args#0))}’ from ‘<brace-enclosed initializer list>’ to ‘__xray::profileCollectorService::{anonymous}::ThreadTrie’ /work/llvm/projects/compiler-rt/lib/xray/xray_segmented_array.h: In instantiation of ‘T* __xray::Array<T>::AppendEmplace(Args&& ...) [with Args = {const __xray::profileCollectorService::{anonymous}::ProfileBuffer&}; T = __xray::profileCollectorService::{anonymous}::ProfileBuffer]’: /work/llvm/projects/compiler-rt/lib/xray/xray_segmented_array.h:383:71:   required from ‘T* __xray::Array<T>::Append(const T&) [with T = __xray::profileCollectorService::{anonymous}::ProfileBuffer] ’ /work/llvm/projects/compiler-rt/lib/xray/xray_profile_collector.cc:244:44:   required from here /work/llvm/projects/compiler-rt/lib/xray/xray_segmented_array.h:378:5: error: could not convert ‘{std::forward<const __xray::profileCollectorService::{anonymous}::ProfileBuffer&>((* & args#0))}’ from ‘<brace-enclosed initializer list>’ to ‘__xray::profileCollectorService::{anonymous}::ProfileBuffer’ > Summary: > This change makes the allocator and function call trie implementations > move-aware and remove the FunctionCallTrie's reliance on a > heap-allocated set of allocators. > > The change makes it possible to always have storage associated with > Allocator instances, not necessarily having heap-allocated memory > obtainable from these allocator instances. We also use thread-local > uninitialised storage. > > We've also re-worked the segmented array implementation to have more > precondition and post-condition checks when built in debug mode. This > enables us to better implement some of the operations with surrounding > documentation as well. The `trim` algorithm now has more documentation > on the implementation, reducing the requirement to handle special > conditions, and being more rigorous on the computations involved. > > In this change we also introduce an initialisation guard, through which > we prevent an initialisation operation from racing with a cleanup > operation. > > We also ensure that the ThreadTries array is not destroyed while copies > into the elements are still being performed by other threads submitting > profiles. > > Note that this change still has an issue with accessing thread-local > storage from signal handlers that are instrumented with XRay. We also > learn that with the testing of this patch, that there will be cases > where calls to mmap(...) (through internal_mmap(...)) might be called in > signal handlers, but are not async-signal-safe. Subsequent patches will > address this, by re-using the `BufferQueue` type used in the FDR mode > implementation for pre-allocated memory segments per active, tracing > thread. > > We still want to land this change despite the known issues, with fixes > forthcoming. > > Reviewers: mboerger, jfb > > Subscribers: jfb, llvm-commits > > Differential Revision: https://reviews.llvm.org/D54989 llvm-svn: 348346
2018-12-05 18:19:55 +08:00
return nullptr;
DCHECK_NE(Tail, &SentinelSegment);
auto Base = &Tail->Data;
auto AlignedOffset = Base + (Offset * AlignedElementStorageSize);
DCHECK_LE(AlignedOffset + sizeof(T),
reinterpret_cast<unsigned char *>(Base) + SegmentSize);
// In-place construct at Position.
new (AlignedOffset) T{std::forward<Args>(args)...};
++Size;
return reinterpret_cast<T *>(AlignedOffset);
}
T *Append(const T &E) XRAY_NEVER_INSTRUMENT {
// FIXME: This is a duplication of AppenEmplace with the copy semantics
// explicitly used, as a work-around to GCC 4.8 not invoking the copy
// constructor with the placement new with braced-init syntax.
DCHECK((Size == 0 && Head == &SentinelSegment && Head == Tail) ||
(Size != 0 && Head != &SentinelSegment && Tail != &SentinelSegment));
if (UNLIKELY(Head == &SentinelSegment)) {
auto R = InitHeadAndTail();
if (R == nullptr)
return nullptr;
}
DCHECK_NE(Head, &SentinelSegment);
DCHECK_NE(Tail, &SentinelSegment);
auto Offset = Size % ElementsPerSegment;
if (UNLIKELY(Size != 0 && Offset == 0))
if (AppendNewSegment() == nullptr)
return nullptr;
[XRay][compiler-rt] Segmented Array: Simplify and Optimise Summary: This is a follow-on to D49217 which simplifies and optimises the implementation of the segmented array. In this patch we co-locate the book-keeping for segments in the `__xray::Array<T>` with the data it's managing. We take the chance in this patch to actually rename `Chunk` to `Segment` to better align with the high-level description of the segmented array. With measurements using benchmarks landed in D48879, we've identified that calls to `pthread_getspecific` started dominating the cycles, which led us to revert the change made in D49217 to use C++ thread_local initialisation instead (it reduces the cost by a huge margin, since we save one PLT-based call to pthread functions in the hot path). In particular, this is in `__xray::getThreadLocalData()`. We also took the opportunity to remove the least-common-multiple based calculation and instead pack as much data into segments of the array. This greatly simplifies the API of the container which hides as much of the implementation details as possible. For instance, we calculate the number of elements we need for the each segment internally in the Array instead of making it part of the type. With the changes here, we're able to get a measurable improvement on the performance of profiling mode on top of what D48879 already provides. Depends on D48879. Reviewers: kpw, eizan Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D49363 llvm-svn: 337343
2018-07-18 10:08:39 +08:00
DCHECK_NE(Tail, &SentinelSegment);
auto Base = &Tail->Data;
[XRay][compiler-rt] Segmented Array: Simplify and Optimise Summary: This is a follow-on to D49217 which simplifies and optimises the implementation of the segmented array. In this patch we co-locate the book-keeping for segments in the `__xray::Array<T>` with the data it's managing. We take the chance in this patch to actually rename `Chunk` to `Segment` to better align with the high-level description of the segmented array. With measurements using benchmarks landed in D48879, we've identified that calls to `pthread_getspecific` started dominating the cycles, which led us to revert the change made in D49217 to use C++ thread_local initialisation instead (it reduces the cost by a huge margin, since we save one PLT-based call to pthread functions in the hot path). In particular, this is in `__xray::getThreadLocalData()`. We also took the opportunity to remove the least-common-multiple based calculation and instead pack as much data into segments of the array. This greatly simplifies the API of the container which hides as much of the implementation details as possible. For instance, we calculate the number of elements we need for the each segment internally in the Array instead of making it part of the type. With the changes here, we're able to get a measurable improvement on the performance of profiling mode on top of what D48879 already provides. Depends on D48879. Reviewers: kpw, eizan Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D49363 llvm-svn: 337343
2018-07-18 10:08:39 +08:00
auto AlignedOffset = Base + (Offset * AlignedElementStorageSize);
DCHECK_LE(AlignedOffset + sizeof(T),
reinterpret_cast<unsigned char *>(Tail) + SegmentSize);
[XRay][compiler-rt] Segmented Array: Simplify and Optimise Summary: This is a follow-on to D49217 which simplifies and optimises the implementation of the segmented array. In this patch we co-locate the book-keeping for segments in the `__xray::Array<T>` with the data it's managing. We take the chance in this patch to actually rename `Chunk` to `Segment` to better align with the high-level description of the segmented array. With measurements using benchmarks landed in D48879, we've identified that calls to `pthread_getspecific` started dominating the cycles, which led us to revert the change made in D49217 to use C++ thread_local initialisation instead (it reduces the cost by a huge margin, since we save one PLT-based call to pthread functions in the hot path). In particular, this is in `__xray::getThreadLocalData()`. We also took the opportunity to remove the least-common-multiple based calculation and instead pack as much data into segments of the array. This greatly simplifies the API of the container which hides as much of the implementation details as possible. For instance, we calculate the number of elements we need for the each segment internally in the Array instead of making it part of the type. With the changes here, we're able to get a measurable improvement on the performance of profiling mode on top of what D48879 already provides. Depends on D48879. Reviewers: kpw, eizan Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D49363 llvm-svn: 337343
2018-07-18 10:08:39 +08:00
[XRay][profiler] Part 1: XRay Allocator and Array Implementations Summary: This change is part of the larger XRay Profiling Mode effort. Here we implement an arena allocator, for fixed sized buffers used in a segmented array implementation. This change adds the segmented array data structure, which relies on the allocator to provide and maintain the storage for the segmented array. Key features of the `Allocator` type: * It uses cache-aligned blocks, intended to host the actual data. These blocks are cache-line-size multiples of contiguous bytes. * The `Allocator` has a maximum memory budget, set at construction time. This allows us to cap the amount of data each specific `Allocator` instance is responsible for. * Upon destruction, the `Allocator` will clean up the storage it's used, handing it back to the internal allocator used in sanitizer_common. Key features of the `Array` type: * Each segmented array is always backed by an `Allocator`, which is either user-provided or uses a global allocator. * When an `Array` grows, it grows by appending a segment that's fixed-sized. The size of each segment is computed by the number of elements of type `T` that can fit into cache line multiples. * An `Array` does not return memory to the `Allocator`, but it can keep track of the current number of "live" objects it stores. * When an `Array` is destroyed, it will not return memory to the `Allocator`. Users should clean up the `Allocator` independently of the `Array`. * The `Array` type keeps a freelist of the chunks it's used before, so that trimming and growing will re-use previously allocated chunks. These basic data structures are used by the XRay Profiling Mode implementation to implement efficient and cache-aware storage for data that's typically read-and-write heavy for tracking latency information. We're relying on the cache line characteristics of the architecture to provide us good data isolation and cache friendliness, when we're performing operations like searching for elements and/or updating data hosted in these cache lines. Reviewers: echristo, pelikan, kpw Subscribers: mgorny, llvm-commits Differential Revision: https://reviews.llvm.org/D45756 llvm-svn: 331141
2018-04-29 21:46:30 +08:00
// In-place construct at Position.
new (AlignedOffset) T(E);
[XRay][profiler] Part 1: XRay Allocator and Array Implementations Summary: This change is part of the larger XRay Profiling Mode effort. Here we implement an arena allocator, for fixed sized buffers used in a segmented array implementation. This change adds the segmented array data structure, which relies on the allocator to provide and maintain the storage for the segmented array. Key features of the `Allocator` type: * It uses cache-aligned blocks, intended to host the actual data. These blocks are cache-line-size multiples of contiguous bytes. * The `Allocator` has a maximum memory budget, set at construction time. This allows us to cap the amount of data each specific `Allocator` instance is responsible for. * Upon destruction, the `Allocator` will clean up the storage it's used, handing it back to the internal allocator used in sanitizer_common. Key features of the `Array` type: * Each segmented array is always backed by an `Allocator`, which is either user-provided or uses a global allocator. * When an `Array` grows, it grows by appending a segment that's fixed-sized. The size of each segment is computed by the number of elements of type `T` that can fit into cache line multiples. * An `Array` does not return memory to the `Allocator`, but it can keep track of the current number of "live" objects it stores. * When an `Array` is destroyed, it will not return memory to the `Allocator`. Users should clean up the `Allocator` independently of the `Array`. * The `Array` type keeps a freelist of the chunks it's used before, so that trimming and growing will re-use previously allocated chunks. These basic data structures are used by the XRay Profiling Mode implementation to implement efficient and cache-aware storage for data that's typically read-and-write heavy for tracking latency information. We're relying on the cache line characteristics of the architecture to provide us good data isolation and cache friendliness, when we're performing operations like searching for elements and/or updating data hosted in these cache lines. Reviewers: echristo, pelikan, kpw Subscribers: mgorny, llvm-commits Differential Revision: https://reviews.llvm.org/D45756 llvm-svn: 331141
2018-04-29 21:46:30 +08:00
++Size;
return reinterpret_cast<T *>(AlignedOffset);
[XRay][profiler] Part 1: XRay Allocator and Array Implementations Summary: This change is part of the larger XRay Profiling Mode effort. Here we implement an arena allocator, for fixed sized buffers used in a segmented array implementation. This change adds the segmented array data structure, which relies on the allocator to provide and maintain the storage for the segmented array. Key features of the `Allocator` type: * It uses cache-aligned blocks, intended to host the actual data. These blocks are cache-line-size multiples of contiguous bytes. * The `Allocator` has a maximum memory budget, set at construction time. This allows us to cap the amount of data each specific `Allocator` instance is responsible for. * Upon destruction, the `Allocator` will clean up the storage it's used, handing it back to the internal allocator used in sanitizer_common. Key features of the `Array` type: * Each segmented array is always backed by an `Allocator`, which is either user-provided or uses a global allocator. * When an `Array` grows, it grows by appending a segment that's fixed-sized. The size of each segment is computed by the number of elements of type `T` that can fit into cache line multiples. * An `Array` does not return memory to the `Allocator`, but it can keep track of the current number of "live" objects it stores. * When an `Array` is destroyed, it will not return memory to the `Allocator`. Users should clean up the `Allocator` independently of the `Array`. * The `Array` type keeps a freelist of the chunks it's used before, so that trimming and growing will re-use previously allocated chunks. These basic data structures are used by the XRay Profiling Mode implementation to implement efficient and cache-aware storage for data that's typically read-and-write heavy for tracking latency information. We're relying on the cache line characteristics of the architecture to provide us good data isolation and cache friendliness, when we're performing operations like searching for elements and/or updating data hosted in these cache lines. Reviewers: echristo, pelikan, kpw Subscribers: mgorny, llvm-commits Differential Revision: https://reviews.llvm.org/D45756 llvm-svn: 331141
2018-04-29 21:46:30 +08:00
}
T &operator[](uint64_t Offset) const XRAY_NEVER_INSTRUMENT {
[XRay][profiler] Part 1: XRay Allocator and Array Implementations Summary: This change is part of the larger XRay Profiling Mode effort. Here we implement an arena allocator, for fixed sized buffers used in a segmented array implementation. This change adds the segmented array data structure, which relies on the allocator to provide and maintain the storage for the segmented array. Key features of the `Allocator` type: * It uses cache-aligned blocks, intended to host the actual data. These blocks are cache-line-size multiples of contiguous bytes. * The `Allocator` has a maximum memory budget, set at construction time. This allows us to cap the amount of data each specific `Allocator` instance is responsible for. * Upon destruction, the `Allocator` will clean up the storage it's used, handing it back to the internal allocator used in sanitizer_common. Key features of the `Array` type: * Each segmented array is always backed by an `Allocator`, which is either user-provided or uses a global allocator. * When an `Array` grows, it grows by appending a segment that's fixed-sized. The size of each segment is computed by the number of elements of type `T` that can fit into cache line multiples. * An `Array` does not return memory to the `Allocator`, but it can keep track of the current number of "live" objects it stores. * When an `Array` is destroyed, it will not return memory to the `Allocator`. Users should clean up the `Allocator` independently of the `Array`. * The `Array` type keeps a freelist of the chunks it's used before, so that trimming and growing will re-use previously allocated chunks. These basic data structures are used by the XRay Profiling Mode implementation to implement efficient and cache-aware storage for data that's typically read-and-write heavy for tracking latency information. We're relying on the cache line characteristics of the architecture to provide us good data isolation and cache friendliness, when we're performing operations like searching for elements and/or updating data hosted in these cache lines. Reviewers: echristo, pelikan, kpw Subscribers: mgorny, llvm-commits Differential Revision: https://reviews.llvm.org/D45756 llvm-svn: 331141
2018-04-29 21:46:30 +08:00
DCHECK_LE(Offset, Size);
// We need to traverse the array enough times to find the element at Offset.
[XRay][compiler-rt] Segmented Array: Simplify and Optimise Summary: This is a follow-on to D49217 which simplifies and optimises the implementation of the segmented array. In this patch we co-locate the book-keeping for segments in the `__xray::Array<T>` with the data it's managing. We take the chance in this patch to actually rename `Chunk` to `Segment` to better align with the high-level description of the segmented array. With measurements using benchmarks landed in D48879, we've identified that calls to `pthread_getspecific` started dominating the cycles, which led us to revert the change made in D49217 to use C++ thread_local initialisation instead (it reduces the cost by a huge margin, since we save one PLT-based call to pthread functions in the hot path). In particular, this is in `__xray::getThreadLocalData()`. We also took the opportunity to remove the least-common-multiple based calculation and instead pack as much data into segments of the array. This greatly simplifies the API of the container which hides as much of the implementation details as possible. For instance, we calculate the number of elements we need for the each segment internally in the Array instead of making it part of the type. With the changes here, we're able to get a measurable improvement on the performance of profiling mode on top of what D48879 already provides. Depends on D48879. Reviewers: kpw, eizan Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D49363 llvm-svn: 337343
2018-07-18 10:08:39 +08:00
auto S = Head;
while (Offset >= ElementsPerSegment) {
S = S->Next;
Offset -= ElementsPerSegment;
DCHECK_NE(S, &SentinelSegment);
[XRay][profiler] Part 1: XRay Allocator and Array Implementations Summary: This change is part of the larger XRay Profiling Mode effort. Here we implement an arena allocator, for fixed sized buffers used in a segmented array implementation. This change adds the segmented array data structure, which relies on the allocator to provide and maintain the storage for the segmented array. Key features of the `Allocator` type: * It uses cache-aligned blocks, intended to host the actual data. These blocks are cache-line-size multiples of contiguous bytes. * The `Allocator` has a maximum memory budget, set at construction time. This allows us to cap the amount of data each specific `Allocator` instance is responsible for. * Upon destruction, the `Allocator` will clean up the storage it's used, handing it back to the internal allocator used in sanitizer_common. Key features of the `Array` type: * Each segmented array is always backed by an `Allocator`, which is either user-provided or uses a global allocator. * When an `Array` grows, it grows by appending a segment that's fixed-sized. The size of each segment is computed by the number of elements of type `T` that can fit into cache line multiples. * An `Array` does not return memory to the `Allocator`, but it can keep track of the current number of "live" objects it stores. * When an `Array` is destroyed, it will not return memory to the `Allocator`. Users should clean up the `Allocator` independently of the `Array`. * The `Array` type keeps a freelist of the chunks it's used before, so that trimming and growing will re-use previously allocated chunks. These basic data structures are used by the XRay Profiling Mode implementation to implement efficient and cache-aware storage for data that's typically read-and-write heavy for tracking latency information. We're relying on the cache line characteristics of the architecture to provide us good data isolation and cache friendliness, when we're performing operations like searching for elements and/or updating data hosted in these cache lines. Reviewers: echristo, pelikan, kpw Subscribers: mgorny, llvm-commits Differential Revision: https://reviews.llvm.org/D45756 llvm-svn: 331141
2018-04-29 21:46:30 +08:00
}
auto Base = &S->Data;
[XRay][compiler-rt] Segmented Array: Simplify and Optimise Summary: This is a follow-on to D49217 which simplifies and optimises the implementation of the segmented array. In this patch we co-locate the book-keeping for segments in the `__xray::Array<T>` with the data it's managing. We take the chance in this patch to actually rename `Chunk` to `Segment` to better align with the high-level description of the segmented array. With measurements using benchmarks landed in D48879, we've identified that calls to `pthread_getspecific` started dominating the cycles, which led us to revert the change made in D49217 to use C++ thread_local initialisation instead (it reduces the cost by a huge margin, since we save one PLT-based call to pthread functions in the hot path). In particular, this is in `__xray::getThreadLocalData()`. We also took the opportunity to remove the least-common-multiple based calculation and instead pack as much data into segments of the array. This greatly simplifies the API of the container which hides as much of the implementation details as possible. For instance, we calculate the number of elements we need for the each segment internally in the Array instead of making it part of the type. With the changes here, we're able to get a measurable improvement on the performance of profiling mode on top of what D48879 already provides. Depends on D48879. Reviewers: kpw, eizan Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D49363 llvm-svn: 337343
2018-07-18 10:08:39 +08:00
auto AlignedOffset = Base + (Offset * AlignedElementStorageSize);
auto Position = reinterpret_cast<T *>(AlignedOffset);
return *reinterpret_cast<T *>(Position);
[XRay][profiler] Part 1: XRay Allocator and Array Implementations Summary: This change is part of the larger XRay Profiling Mode effort. Here we implement an arena allocator, for fixed sized buffers used in a segmented array implementation. This change adds the segmented array data structure, which relies on the allocator to provide and maintain the storage for the segmented array. Key features of the `Allocator` type: * It uses cache-aligned blocks, intended to host the actual data. These blocks are cache-line-size multiples of contiguous bytes. * The `Allocator` has a maximum memory budget, set at construction time. This allows us to cap the amount of data each specific `Allocator` instance is responsible for. * Upon destruction, the `Allocator` will clean up the storage it's used, handing it back to the internal allocator used in sanitizer_common. Key features of the `Array` type: * Each segmented array is always backed by an `Allocator`, which is either user-provided or uses a global allocator. * When an `Array` grows, it grows by appending a segment that's fixed-sized. The size of each segment is computed by the number of elements of type `T` that can fit into cache line multiples. * An `Array` does not return memory to the `Allocator`, but it can keep track of the current number of "live" objects it stores. * When an `Array` is destroyed, it will not return memory to the `Allocator`. Users should clean up the `Allocator` independently of the `Array`. * The `Array` type keeps a freelist of the chunks it's used before, so that trimming and growing will re-use previously allocated chunks. These basic data structures are used by the XRay Profiling Mode implementation to implement efficient and cache-aware storage for data that's typically read-and-write heavy for tracking latency information. We're relying on the cache line characteristics of the architecture to provide us good data isolation and cache friendliness, when we're performing operations like searching for elements and/or updating data hosted in these cache lines. Reviewers: echristo, pelikan, kpw Subscribers: mgorny, llvm-commits Differential Revision: https://reviews.llvm.org/D45756 llvm-svn: 331141
2018-04-29 21:46:30 +08:00
}
T &front() const XRAY_NEVER_INSTRUMENT {
[XRay][compiler-rt] Segmented Array: Simplify and Optimise Summary: This is a follow-on to D49217 which simplifies and optimises the implementation of the segmented array. In this patch we co-locate the book-keeping for segments in the `__xray::Array<T>` with the data it's managing. We take the chance in this patch to actually rename `Chunk` to `Segment` to better align with the high-level description of the segmented array. With measurements using benchmarks landed in D48879, we've identified that calls to `pthread_getspecific` started dominating the cycles, which led us to revert the change made in D49217 to use C++ thread_local initialisation instead (it reduces the cost by a huge margin, since we save one PLT-based call to pthread functions in the hot path). In particular, this is in `__xray::getThreadLocalData()`. We also took the opportunity to remove the least-common-multiple based calculation and instead pack as much data into segments of the array. This greatly simplifies the API of the container which hides as much of the implementation details as possible. For instance, we calculate the number of elements we need for the each segment internally in the Array instead of making it part of the type. With the changes here, we're able to get a measurable improvement on the performance of profiling mode on top of what D48879 already provides. Depends on D48879. Reviewers: kpw, eizan Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D49363 llvm-svn: 337343
2018-07-18 10:08:39 +08:00
DCHECK_NE(Head, &SentinelSegment);
[XRay][profiler] Part 1: XRay Allocator and Array Implementations Summary: This change is part of the larger XRay Profiling Mode effort. Here we implement an arena allocator, for fixed sized buffers used in a segmented array implementation. This change adds the segmented array data structure, which relies on the allocator to provide and maintain the storage for the segmented array. Key features of the `Allocator` type: * It uses cache-aligned blocks, intended to host the actual data. These blocks are cache-line-size multiples of contiguous bytes. * The `Allocator` has a maximum memory budget, set at construction time. This allows us to cap the amount of data each specific `Allocator` instance is responsible for. * Upon destruction, the `Allocator` will clean up the storage it's used, handing it back to the internal allocator used in sanitizer_common. Key features of the `Array` type: * Each segmented array is always backed by an `Allocator`, which is either user-provided or uses a global allocator. * When an `Array` grows, it grows by appending a segment that's fixed-sized. The size of each segment is computed by the number of elements of type `T` that can fit into cache line multiples. * An `Array` does not return memory to the `Allocator`, but it can keep track of the current number of "live" objects it stores. * When an `Array` is destroyed, it will not return memory to the `Allocator`. Users should clean up the `Allocator` independently of the `Array`. * The `Array` type keeps a freelist of the chunks it's used before, so that trimming and growing will re-use previously allocated chunks. These basic data structures are used by the XRay Profiling Mode implementation to implement efficient and cache-aware storage for data that's typically read-and-write heavy for tracking latency information. We're relying on the cache line characteristics of the architecture to provide us good data isolation and cache friendliness, when we're performing operations like searching for elements and/or updating data hosted in these cache lines. Reviewers: echristo, pelikan, kpw Subscribers: mgorny, llvm-commits Differential Revision: https://reviews.llvm.org/D45756 llvm-svn: 331141
2018-04-29 21:46:30 +08:00
DCHECK_NE(Size, 0u);
return *begin();
[XRay][profiler] Part 1: XRay Allocator and Array Implementations Summary: This change is part of the larger XRay Profiling Mode effort. Here we implement an arena allocator, for fixed sized buffers used in a segmented array implementation. This change adds the segmented array data structure, which relies on the allocator to provide and maintain the storage for the segmented array. Key features of the `Allocator` type: * It uses cache-aligned blocks, intended to host the actual data. These blocks are cache-line-size multiples of contiguous bytes. * The `Allocator` has a maximum memory budget, set at construction time. This allows us to cap the amount of data each specific `Allocator` instance is responsible for. * Upon destruction, the `Allocator` will clean up the storage it's used, handing it back to the internal allocator used in sanitizer_common. Key features of the `Array` type: * Each segmented array is always backed by an `Allocator`, which is either user-provided or uses a global allocator. * When an `Array` grows, it grows by appending a segment that's fixed-sized. The size of each segment is computed by the number of elements of type `T` that can fit into cache line multiples. * An `Array` does not return memory to the `Allocator`, but it can keep track of the current number of "live" objects it stores. * When an `Array` is destroyed, it will not return memory to the `Allocator`. Users should clean up the `Allocator` independently of the `Array`. * The `Array` type keeps a freelist of the chunks it's used before, so that trimming and growing will re-use previously allocated chunks. These basic data structures are used by the XRay Profiling Mode implementation to implement efficient and cache-aware storage for data that's typically read-and-write heavy for tracking latency information. We're relying on the cache line characteristics of the architecture to provide us good data isolation and cache friendliness, when we're performing operations like searching for elements and/or updating data hosted in these cache lines. Reviewers: echristo, pelikan, kpw Subscribers: mgorny, llvm-commits Differential Revision: https://reviews.llvm.org/D45756 llvm-svn: 331141
2018-04-29 21:46:30 +08:00
}
T &back() const XRAY_NEVER_INSTRUMENT {
[XRay][compiler-rt] Segmented Array: Simplify and Optimise Summary: This is a follow-on to D49217 which simplifies and optimises the implementation of the segmented array. In this patch we co-locate the book-keeping for segments in the `__xray::Array<T>` with the data it's managing. We take the chance in this patch to actually rename `Chunk` to `Segment` to better align with the high-level description of the segmented array. With measurements using benchmarks landed in D48879, we've identified that calls to `pthread_getspecific` started dominating the cycles, which led us to revert the change made in D49217 to use C++ thread_local initialisation instead (it reduces the cost by a huge margin, since we save one PLT-based call to pthread functions in the hot path). In particular, this is in `__xray::getThreadLocalData()`. We also took the opportunity to remove the least-common-multiple based calculation and instead pack as much data into segments of the array. This greatly simplifies the API of the container which hides as much of the implementation details as possible. For instance, we calculate the number of elements we need for the each segment internally in the Array instead of making it part of the type. With the changes here, we're able to get a measurable improvement on the performance of profiling mode on top of what D48879 already provides. Depends on D48879. Reviewers: kpw, eizan Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D49363 llvm-svn: 337343
2018-07-18 10:08:39 +08:00
DCHECK_NE(Tail, &SentinelSegment);
DCHECK_NE(Size, 0u);
auto It = end();
--It;
return *It;
[XRay][profiler] Part 1: XRay Allocator and Array Implementations Summary: This change is part of the larger XRay Profiling Mode effort. Here we implement an arena allocator, for fixed sized buffers used in a segmented array implementation. This change adds the segmented array data structure, which relies on the allocator to provide and maintain the storage for the segmented array. Key features of the `Allocator` type: * It uses cache-aligned blocks, intended to host the actual data. These blocks are cache-line-size multiples of contiguous bytes. * The `Allocator` has a maximum memory budget, set at construction time. This allows us to cap the amount of data each specific `Allocator` instance is responsible for. * Upon destruction, the `Allocator` will clean up the storage it's used, handing it back to the internal allocator used in sanitizer_common. Key features of the `Array` type: * Each segmented array is always backed by an `Allocator`, which is either user-provided or uses a global allocator. * When an `Array` grows, it grows by appending a segment that's fixed-sized. The size of each segment is computed by the number of elements of type `T` that can fit into cache line multiples. * An `Array` does not return memory to the `Allocator`, but it can keep track of the current number of "live" objects it stores. * When an `Array` is destroyed, it will not return memory to the `Allocator`. Users should clean up the `Allocator` independently of the `Array`. * The `Array` type keeps a freelist of the chunks it's used before, so that trimming and growing will re-use previously allocated chunks. These basic data structures are used by the XRay Profiling Mode implementation to implement efficient and cache-aware storage for data that's typically read-and-write heavy for tracking latency information. We're relying on the cache line characteristics of the architecture to provide us good data isolation and cache friendliness, when we're performing operations like searching for elements and/or updating data hosted in these cache lines. Reviewers: echristo, pelikan, kpw Subscribers: mgorny, llvm-commits Differential Revision: https://reviews.llvm.org/D45756 llvm-svn: 331141
2018-04-29 21:46:30 +08:00
}
template <class Predicate>
T *find_element(Predicate P) const XRAY_NEVER_INSTRUMENT {
[XRay][profiler] Part 1: XRay Allocator and Array Implementations Summary: This change is part of the larger XRay Profiling Mode effort. Here we implement an arena allocator, for fixed sized buffers used in a segmented array implementation. This change adds the segmented array data structure, which relies on the allocator to provide and maintain the storage for the segmented array. Key features of the `Allocator` type: * It uses cache-aligned blocks, intended to host the actual data. These blocks are cache-line-size multiples of contiguous bytes. * The `Allocator` has a maximum memory budget, set at construction time. This allows us to cap the amount of data each specific `Allocator` instance is responsible for. * Upon destruction, the `Allocator` will clean up the storage it's used, handing it back to the internal allocator used in sanitizer_common. Key features of the `Array` type: * Each segmented array is always backed by an `Allocator`, which is either user-provided or uses a global allocator. * When an `Array` grows, it grows by appending a segment that's fixed-sized. The size of each segment is computed by the number of elements of type `T` that can fit into cache line multiples. * An `Array` does not return memory to the `Allocator`, but it can keep track of the current number of "live" objects it stores. * When an `Array` is destroyed, it will not return memory to the `Allocator`. Users should clean up the `Allocator` independently of the `Array`. * The `Array` type keeps a freelist of the chunks it's used before, so that trimming and growing will re-use previously allocated chunks. These basic data structures are used by the XRay Profiling Mode implementation to implement efficient and cache-aware storage for data that's typically read-and-write heavy for tracking latency information. We're relying on the cache line characteristics of the architecture to provide us good data isolation and cache friendliness, when we're performing operations like searching for elements and/or updating data hosted in these cache lines. Reviewers: echristo, pelikan, kpw Subscribers: mgorny, llvm-commits Differential Revision: https://reviews.llvm.org/D45756 llvm-svn: 331141
2018-04-29 21:46:30 +08:00
if (empty())
return nullptr;
auto E = end();
for (auto I = begin(); I != E; ++I)
if (P(*I))
return &(*I);
return nullptr;
}
/// Remove N Elements from the end. This leaves the blocks behind, and not
/// require allocation of new blocks for new elements added after trimming.
void trim(uint64_t Elements) XRAY_NEVER_INSTRUMENT {
auto OldSize = Size;
Elements = Elements > Size ? Size : Elements;
[XRay][profiler] Part 1: XRay Allocator and Array Implementations Summary: This change is part of the larger XRay Profiling Mode effort. Here we implement an arena allocator, for fixed sized buffers used in a segmented array implementation. This change adds the segmented array data structure, which relies on the allocator to provide and maintain the storage for the segmented array. Key features of the `Allocator` type: * It uses cache-aligned blocks, intended to host the actual data. These blocks are cache-line-size multiples of contiguous bytes. * The `Allocator` has a maximum memory budget, set at construction time. This allows us to cap the amount of data each specific `Allocator` instance is responsible for. * Upon destruction, the `Allocator` will clean up the storage it's used, handing it back to the internal allocator used in sanitizer_common. Key features of the `Array` type: * Each segmented array is always backed by an `Allocator`, which is either user-provided or uses a global allocator. * When an `Array` grows, it grows by appending a segment that's fixed-sized. The size of each segment is computed by the number of elements of type `T` that can fit into cache line multiples. * An `Array` does not return memory to the `Allocator`, but it can keep track of the current number of "live" objects it stores. * When an `Array` is destroyed, it will not return memory to the `Allocator`. Users should clean up the `Allocator` independently of the `Array`. * The `Array` type keeps a freelist of the chunks it's used before, so that trimming and growing will re-use previously allocated chunks. These basic data structures are used by the XRay Profiling Mode implementation to implement efficient and cache-aware storage for data that's typically read-and-write heavy for tracking latency information. We're relying on the cache line characteristics of the architecture to provide us good data isolation and cache friendliness, when we're performing operations like searching for elements and/or updating data hosted in these cache lines. Reviewers: echristo, pelikan, kpw Subscribers: mgorny, llvm-commits Differential Revision: https://reviews.llvm.org/D45756 llvm-svn: 331141
2018-04-29 21:46:30 +08:00
Size -= Elements;
// We compute the number of segments we're going to return from the tail by
// counting how many elements have been trimmed. Given the following:
//
// - Each segment has N valid positions, where N > 0
// - The previous size > current size
//
// To compute the number of segments to return, we need to perform the
// following calculations for the number of segments required given 'x'
// elements:
//
// f(x) = {
// x == 0 : 0
// , 0 < x <= N : 1
// , N < x <= max : x / N + (x % N ? 1 : 0)
// }
//
// We can simplify this down to:
//
// f(x) = {
// x == 0 : 0,
// , 0 < x <= max : x / N + (x < N || x % N ? 1 : 0)
// }
//
// And further down to:
//
// f(x) = x ? x / N + (x < N || x % N ? 1 : 0) : 0
//
// We can then perform the following calculation `s` which counts the number
// of segments we need to remove from the end of the data structure:
//
// s(p, c) = f(p) - f(c)
//
// If we treat p = previous size, and c = current size, and given the
// properties above, the possible range for s(...) is [0..max(typeof(p))/N]
// given that typeof(p) == typeof(c).
auto F = [](uint64_t X) {
return X ? (X / ElementsPerSegment) +
(X < ElementsPerSegment || X % ElementsPerSegment ? 1 : 0)
: 0;
};
auto PS = F(OldSize);
auto CS = F(Size);
DCHECK_GE(PS, CS);
auto SegmentsToTrim = PS - CS;
for (auto I = 0uL; I < SegmentsToTrim; ++I) {
// Here we place the current tail segment to the freelist. To do this
// appropriately, we need to perform a splice operation on two
// bidirectional linked-lists. In particular, we have the current state of
// the doubly-linked list of segments:
//
// @S@ <- s0 <-> s1 <-> ... <-> sT -> @S@
//
DCHECK_NE(Head, &SentinelSegment);
DCHECK_NE(Tail, &SentinelSegment);
DCHECK_EQ(Tail->Next, &SentinelSegment);
if (Freelist == &SentinelSegment) {
// Our two lists at this point are in this configuration:
//
// Freelist: (potentially) @S@
// Mainlist: @S@<-s0<->s1<->...<->sPT<->sT->@S@
// ^ Head ^ Tail
//
// The end state for us will be this configuration:
//
// Freelist: @S@<-sT->@S@
// Mainlist: @S@<-s0<->s1<->...<->sPT->@S@
// ^ Head ^ Tail
//
// The first step for us is to hold a reference to the tail of Mainlist,
// which in our notation is represented by sT. We call this our "free
// segment" which is the segment we are placing on the Freelist.
//
// sF = sT
//
// Then, we also hold a reference to the "pre-tail" element, which we
// call sPT:
//
// sPT = pred(sT)
//
// We want to splice sT into the beginning of the Freelist, which in
// an empty Freelist means placing a segment whose predecessor and
// successor is the sentinel segment.
//
// The splice operation then can be performed in the following
// algorithm:
//
// succ(sPT) = S
// pred(sT) = S
// succ(sT) = Freelist
// Freelist = sT
// Tail = sPT
//
auto SPT = Tail->Prev;
SPT->Next = &SentinelSegment;
Tail->Prev = &SentinelSegment;
Tail->Next = Freelist;
Freelist = Tail;
Tail = SPT;
// Our post-conditions here are:
DCHECK_EQ(Tail->Next, &SentinelSegment);
DCHECK_EQ(Freelist->Prev, &SentinelSegment);
} else {
// In the other case, where the Freelist is not empty, we perform the
// following transformation instead:
//
// This transforms the current state:
//
// Freelist: @S@<-f0->@S@
// ^ Freelist
// Mainlist: @S@<-s0<->s1<->...<->sPT<->sT->@S@
// ^ Head ^ Tail
//
// Into the following:
//
// Freelist: @S@<-sT<->f0->@S@
// ^ Freelist
// Mainlist: @S@<-s0<->s1<->...<->sPT->@S@
// ^ Head ^ Tail
//
// The algorithm is:
//
// sFH = Freelist
// sPT = pred(sT)
// pred(SFH) = sT
// succ(sT) = Freelist
// pred(sT) = S
// succ(sPT) = S
// Tail = sPT
// Freelist = sT
//
auto SFH = Freelist;
auto SPT = Tail->Prev;
auto ST = Tail;
SFH->Prev = ST;
ST->Next = Freelist;
ST->Prev = &SentinelSegment;
SPT->Next = &SentinelSegment;
Tail = SPT;
Freelist = ST;
// Our post-conditions here are:
DCHECK_EQ(Tail->Next, &SentinelSegment);
DCHECK_EQ(Freelist->Prev, &SentinelSegment);
DCHECK_EQ(Freelist->Next->Prev, Freelist);
}
}
// Now in case we've spliced all the segments in the end, we ensure that the
// main list is "empty", or both the head and tail pointing to the sentinel
// segment.
if (Tail == &SentinelSegment)
Head = Tail;
DCHECK(
(Size == 0 && Head == &SentinelSegment && Tail == &SentinelSegment) ||
(Size != 0 && Head != &SentinelSegment && Tail != &SentinelSegment));
DCHECK(
(Freelist != &SentinelSegment && Freelist->Prev == &SentinelSegment) ||
(Freelist == &SentinelSegment && Tail->Next == &SentinelSegment));
[XRay][profiler] Part 1: XRay Allocator and Array Implementations Summary: This change is part of the larger XRay Profiling Mode effort. Here we implement an arena allocator, for fixed sized buffers used in a segmented array implementation. This change adds the segmented array data structure, which relies on the allocator to provide and maintain the storage for the segmented array. Key features of the `Allocator` type: * It uses cache-aligned blocks, intended to host the actual data. These blocks are cache-line-size multiples of contiguous bytes. * The `Allocator` has a maximum memory budget, set at construction time. This allows us to cap the amount of data each specific `Allocator` instance is responsible for. * Upon destruction, the `Allocator` will clean up the storage it's used, handing it back to the internal allocator used in sanitizer_common. Key features of the `Array` type: * Each segmented array is always backed by an `Allocator`, which is either user-provided or uses a global allocator. * When an `Array` grows, it grows by appending a segment that's fixed-sized. The size of each segment is computed by the number of elements of type `T` that can fit into cache line multiples. * An `Array` does not return memory to the `Allocator`, but it can keep track of the current number of "live" objects it stores. * When an `Array` is destroyed, it will not return memory to the `Allocator`. Users should clean up the `Allocator` independently of the `Array`. * The `Array` type keeps a freelist of the chunks it's used before, so that trimming and growing will re-use previously allocated chunks. These basic data structures are used by the XRay Profiling Mode implementation to implement efficient and cache-aware storage for data that's typically read-and-write heavy for tracking latency information. We're relying on the cache line characteristics of the architecture to provide us good data isolation and cache friendliness, when we're performing operations like searching for elements and/or updating data hosted in these cache lines. Reviewers: echristo, pelikan, kpw Subscribers: mgorny, llvm-commits Differential Revision: https://reviews.llvm.org/D45756 llvm-svn: 331141
2018-04-29 21:46:30 +08:00
}
// Provide iterators.
Iterator<T> begin() const XRAY_NEVER_INSTRUMENT {
return Iterator<T>(Head, 0, Size);
}
Iterator<T> end() const XRAY_NEVER_INSTRUMENT {
return Iterator<T>(Tail, Size, Size);
}
Iterator<const T> cbegin() const XRAY_NEVER_INSTRUMENT {
return Iterator<const T>(Head, 0, Size);
}
Iterator<const T> cend() const XRAY_NEVER_INSTRUMENT {
return Iterator<const T>(Tail, Size, Size);
}
[XRay][profiler] Part 1: XRay Allocator and Array Implementations Summary: This change is part of the larger XRay Profiling Mode effort. Here we implement an arena allocator, for fixed sized buffers used in a segmented array implementation. This change adds the segmented array data structure, which relies on the allocator to provide and maintain the storage for the segmented array. Key features of the `Allocator` type: * It uses cache-aligned blocks, intended to host the actual data. These blocks are cache-line-size multiples of contiguous bytes. * The `Allocator` has a maximum memory budget, set at construction time. This allows us to cap the amount of data each specific `Allocator` instance is responsible for. * Upon destruction, the `Allocator` will clean up the storage it's used, handing it back to the internal allocator used in sanitizer_common. Key features of the `Array` type: * Each segmented array is always backed by an `Allocator`, which is either user-provided or uses a global allocator. * When an `Array` grows, it grows by appending a segment that's fixed-sized. The size of each segment is computed by the number of elements of type `T` that can fit into cache line multiples. * An `Array` does not return memory to the `Allocator`, but it can keep track of the current number of "live" objects it stores. * When an `Array` is destroyed, it will not return memory to the `Allocator`. Users should clean up the `Allocator` independently of the `Array`. * The `Array` type keeps a freelist of the chunks it's used before, so that trimming and growing will re-use previously allocated chunks. These basic data structures are used by the XRay Profiling Mode implementation to implement efficient and cache-aware storage for data that's typically read-and-write heavy for tracking latency information. We're relying on the cache line characteristics of the architecture to provide us good data isolation and cache friendliness, when we're performing operations like searching for elements and/or updating data hosted in these cache lines. Reviewers: echristo, pelikan, kpw Subscribers: mgorny, llvm-commits Differential Revision: https://reviews.llvm.org/D45756 llvm-svn: 331141
2018-04-29 21:46:30 +08:00
};
// We need to have this storage definition out-of-line so that the compiler can
[XRay][compiler-rt] Segmented Array: Simplify and Optimise Summary: This is a follow-on to D49217 which simplifies and optimises the implementation of the segmented array. In this patch we co-locate the book-keeping for segments in the `__xray::Array<T>` with the data it's managing. We take the chance in this patch to actually rename `Chunk` to `Segment` to better align with the high-level description of the segmented array. With measurements using benchmarks landed in D48879, we've identified that calls to `pthread_getspecific` started dominating the cycles, which led us to revert the change made in D49217 to use C++ thread_local initialisation instead (it reduces the cost by a huge margin, since we save one PLT-based call to pthread functions in the hot path). In particular, this is in `__xray::getThreadLocalData()`. We also took the opportunity to remove the least-common-multiple based calculation and instead pack as much data into segments of the array. This greatly simplifies the API of the container which hides as much of the implementation details as possible. For instance, we calculate the number of elements we need for the each segment internally in the Array instead of making it part of the type. With the changes here, we're able to get a measurable improvement on the performance of profiling mode on top of what D48879 already provides. Depends on D48879. Reviewers: kpw, eizan Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D49363 llvm-svn: 337343
2018-07-18 10:08:39 +08:00
// ensure that storage for the SentinelSegment is defined and has a single
[XRay][profiler] Part 1: XRay Allocator and Array Implementations Summary: This change is part of the larger XRay Profiling Mode effort. Here we implement an arena allocator, for fixed sized buffers used in a segmented array implementation. This change adds the segmented array data structure, which relies on the allocator to provide and maintain the storage for the segmented array. Key features of the `Allocator` type: * It uses cache-aligned blocks, intended to host the actual data. These blocks are cache-line-size multiples of contiguous bytes. * The `Allocator` has a maximum memory budget, set at construction time. This allows us to cap the amount of data each specific `Allocator` instance is responsible for. * Upon destruction, the `Allocator` will clean up the storage it's used, handing it back to the internal allocator used in sanitizer_common. Key features of the `Array` type: * Each segmented array is always backed by an `Allocator`, which is either user-provided or uses a global allocator. * When an `Array` grows, it grows by appending a segment that's fixed-sized. The size of each segment is computed by the number of elements of type `T` that can fit into cache line multiples. * An `Array` does not return memory to the `Allocator`, but it can keep track of the current number of "live" objects it stores. * When an `Array` is destroyed, it will not return memory to the `Allocator`. Users should clean up the `Allocator` independently of the `Array`. * The `Array` type keeps a freelist of the chunks it's used before, so that trimming and growing will re-use previously allocated chunks. These basic data structures are used by the XRay Profiling Mode implementation to implement efficient and cache-aware storage for data that's typically read-and-write heavy for tracking latency information. We're relying on the cache line characteristics of the architecture to provide us good data isolation and cache friendliness, when we're performing operations like searching for elements and/or updating data hosted in these cache lines. Reviewers: echristo, pelikan, kpw Subscribers: mgorny, llvm-commits Differential Revision: https://reviews.llvm.org/D45756 llvm-svn: 331141
2018-04-29 21:46:30 +08:00
// address.
[XRay][compiler-rt] Segmented Array: Simplify and Optimise Summary: This is a follow-on to D49217 which simplifies and optimises the implementation of the segmented array. In this patch we co-locate the book-keeping for segments in the `__xray::Array<T>` with the data it's managing. We take the chance in this patch to actually rename `Chunk` to `Segment` to better align with the high-level description of the segmented array. With measurements using benchmarks landed in D48879, we've identified that calls to `pthread_getspecific` started dominating the cycles, which led us to revert the change made in D49217 to use C++ thread_local initialisation instead (it reduces the cost by a huge margin, since we save one PLT-based call to pthread functions in the hot path). In particular, this is in `__xray::getThreadLocalData()`. We also took the opportunity to remove the least-common-multiple based calculation and instead pack as much data into segments of the array. This greatly simplifies the API of the container which hides as much of the implementation details as possible. For instance, we calculate the number of elements we need for the each segment internally in the Array instead of making it part of the type. With the changes here, we're able to get a measurable improvement on the performance of profiling mode on top of what D48879 already provides. Depends on D48879. Reviewers: kpw, eizan Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D49363 llvm-svn: 337343
2018-07-18 10:08:39 +08:00
template <class T>
typename Array<T>::Segment Array<T>::SentinelSegment{
&Array<T>::SentinelSegment, &Array<T>::SentinelSegment, {'\0'}};
[XRay][profiler] Part 1: XRay Allocator and Array Implementations Summary: This change is part of the larger XRay Profiling Mode effort. Here we implement an arena allocator, for fixed sized buffers used in a segmented array implementation. This change adds the segmented array data structure, which relies on the allocator to provide and maintain the storage for the segmented array. Key features of the `Allocator` type: * It uses cache-aligned blocks, intended to host the actual data. These blocks are cache-line-size multiples of contiguous bytes. * The `Allocator` has a maximum memory budget, set at construction time. This allows us to cap the amount of data each specific `Allocator` instance is responsible for. * Upon destruction, the `Allocator` will clean up the storage it's used, handing it back to the internal allocator used in sanitizer_common. Key features of the `Array` type: * Each segmented array is always backed by an `Allocator`, which is either user-provided or uses a global allocator. * When an `Array` grows, it grows by appending a segment that's fixed-sized. The size of each segment is computed by the number of elements of type `T` that can fit into cache line multiples. * An `Array` does not return memory to the `Allocator`, but it can keep track of the current number of "live" objects it stores. * When an `Array` is destroyed, it will not return memory to the `Allocator`. Users should clean up the `Allocator` independently of the `Array`. * The `Array` type keeps a freelist of the chunks it's used before, so that trimming and growing will re-use previously allocated chunks. These basic data structures are used by the XRay Profiling Mode implementation to implement efficient and cache-aware storage for data that's typically read-and-write heavy for tracking latency information. We're relying on the cache line characteristics of the architecture to provide us good data isolation and cache friendliness, when we're performing operations like searching for elements and/or updating data hosted in these cache lines. Reviewers: echristo, pelikan, kpw Subscribers: mgorny, llvm-commits Differential Revision: https://reviews.llvm.org/D45756 llvm-svn: 331141
2018-04-29 21:46:30 +08:00
} // namespace __xray
#endif // XRAY_SEGMENTED_ARRAY_H