llvm-project/compiler-rt/lib/xray/xray_segmented_array.h

379 lines
12 KiB
C
Raw Normal View History

[XRay][profiler] Part 1: XRay Allocator and Array Implementations Summary: This change is part of the larger XRay Profiling Mode effort. Here we implement an arena allocator, for fixed sized buffers used in a segmented array implementation. This change adds the segmented array data structure, which relies on the allocator to provide and maintain the storage for the segmented array. Key features of the `Allocator` type: * It uses cache-aligned blocks, intended to host the actual data. These blocks are cache-line-size multiples of contiguous bytes. * The `Allocator` has a maximum memory budget, set at construction time. This allows us to cap the amount of data each specific `Allocator` instance is responsible for. * Upon destruction, the `Allocator` will clean up the storage it's used, handing it back to the internal allocator used in sanitizer_common. Key features of the `Array` type: * Each segmented array is always backed by an `Allocator`, which is either user-provided or uses a global allocator. * When an `Array` grows, it grows by appending a segment that's fixed-sized. The size of each segment is computed by the number of elements of type `T` that can fit into cache line multiples. * An `Array` does not return memory to the `Allocator`, but it can keep track of the current number of "live" objects it stores. * When an `Array` is destroyed, it will not return memory to the `Allocator`. Users should clean up the `Allocator` independently of the `Array`. * The `Array` type keeps a freelist of the chunks it's used before, so that trimming and growing will re-use previously allocated chunks. These basic data structures are used by the XRay Profiling Mode implementation to implement efficient and cache-aware storage for data that's typically read-and-write heavy for tracking latency information. We're relying on the cache line characteristics of the architecture to provide us good data isolation and cache friendliness, when we're performing operations like searching for elements and/or updating data hosted in these cache lines. Reviewers: echristo, pelikan, kpw Subscribers: mgorny, llvm-commits Differential Revision: https://reviews.llvm.org/D45756 llvm-svn: 331141
2018-04-29 21:46:30 +08:00
//===-- xray_segmented_array.h ---------------------------------*- C++ -*-===//
//
// The LLVM Compiler Infrastructure
//
// This file is distributed under the University of Illinois Open Source
// License. See LICENSE.TXT for details.
//
//===----------------------------------------------------------------------===//
//
// This file is a part of XRay, a dynamic runtime instrumentation system.
//
[XRay][compiler-rt] Segmented Array: Simplify and Optimise Summary: This is a follow-on to D49217 which simplifies and optimises the implementation of the segmented array. In this patch we co-locate the book-keeping for segments in the `__xray::Array<T>` with the data it's managing. We take the chance in this patch to actually rename `Chunk` to `Segment` to better align with the high-level description of the segmented array. With measurements using benchmarks landed in D48879, we've identified that calls to `pthread_getspecific` started dominating the cycles, which led us to revert the change made in D49217 to use C++ thread_local initialisation instead (it reduces the cost by a huge margin, since we save one PLT-based call to pthread functions in the hot path). In particular, this is in `__xray::getThreadLocalData()`. We also took the opportunity to remove the least-common-multiple based calculation and instead pack as much data into segments of the array. This greatly simplifies the API of the container which hides as much of the implementation details as possible. For instance, we calculate the number of elements we need for the each segment internally in the Array instead of making it part of the type. With the changes here, we're able to get a measurable improvement on the performance of profiling mode on top of what D48879 already provides. Depends on D48879. Reviewers: kpw, eizan Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D49363 llvm-svn: 337343
2018-07-18 10:08:39 +08:00
// Defines the implementation of a segmented array, with fixed-size segments
[XRay][profiler] Part 1: XRay Allocator and Array Implementations Summary: This change is part of the larger XRay Profiling Mode effort. Here we implement an arena allocator, for fixed sized buffers used in a segmented array implementation. This change adds the segmented array data structure, which relies on the allocator to provide and maintain the storage for the segmented array. Key features of the `Allocator` type: * It uses cache-aligned blocks, intended to host the actual data. These blocks are cache-line-size multiples of contiguous bytes. * The `Allocator` has a maximum memory budget, set at construction time. This allows us to cap the amount of data each specific `Allocator` instance is responsible for. * Upon destruction, the `Allocator` will clean up the storage it's used, handing it back to the internal allocator used in sanitizer_common. Key features of the `Array` type: * Each segmented array is always backed by an `Allocator`, which is either user-provided or uses a global allocator. * When an `Array` grows, it grows by appending a segment that's fixed-sized. The size of each segment is computed by the number of elements of type `T` that can fit into cache line multiples. * An `Array` does not return memory to the `Allocator`, but it can keep track of the current number of "live" objects it stores. * When an `Array` is destroyed, it will not return memory to the `Allocator`. Users should clean up the `Allocator` independently of the `Array`. * The `Array` type keeps a freelist of the chunks it's used before, so that trimming and growing will re-use previously allocated chunks. These basic data structures are used by the XRay Profiling Mode implementation to implement efficient and cache-aware storage for data that's typically read-and-write heavy for tracking latency information. We're relying on the cache line characteristics of the architecture to provide us good data isolation and cache friendliness, when we're performing operations like searching for elements and/or updating data hosted in these cache lines. Reviewers: echristo, pelikan, kpw Subscribers: mgorny, llvm-commits Differential Revision: https://reviews.llvm.org/D45756 llvm-svn: 331141
2018-04-29 21:46:30 +08:00
// backing the segments.
//
//===----------------------------------------------------------------------===//
#ifndef XRAY_SEGMENTED_ARRAY_H
#define XRAY_SEGMENTED_ARRAY_H
#include "sanitizer_common/sanitizer_allocator.h"
#include "xray_allocator.h"
#include "xray_utils.h"
#include <cassert>
[XRay][profiler] Part 1: XRay Allocator and Array Implementations Summary: This change is part of the larger XRay Profiling Mode effort. Here we implement an arena allocator, for fixed sized buffers used in a segmented array implementation. This change adds the segmented array data structure, which relies on the allocator to provide and maintain the storage for the segmented array. Key features of the `Allocator` type: * It uses cache-aligned blocks, intended to host the actual data. These blocks are cache-line-size multiples of contiguous bytes. * The `Allocator` has a maximum memory budget, set at construction time. This allows us to cap the amount of data each specific `Allocator` instance is responsible for. * Upon destruction, the `Allocator` will clean up the storage it's used, handing it back to the internal allocator used in sanitizer_common. Key features of the `Array` type: * Each segmented array is always backed by an `Allocator`, which is either user-provided or uses a global allocator. * When an `Array` grows, it grows by appending a segment that's fixed-sized. The size of each segment is computed by the number of elements of type `T` that can fit into cache line multiples. * An `Array` does not return memory to the `Allocator`, but it can keep track of the current number of "live" objects it stores. * When an `Array` is destroyed, it will not return memory to the `Allocator`. Users should clean up the `Allocator` independently of the `Array`. * The `Array` type keeps a freelist of the chunks it's used before, so that trimming and growing will re-use previously allocated chunks. These basic data structures are used by the XRay Profiling Mode implementation to implement efficient and cache-aware storage for data that's typically read-and-write heavy for tracking latency information. We're relying on the cache line characteristics of the architecture to provide us good data isolation and cache friendliness, when we're performing operations like searching for elements and/or updating data hosted in these cache lines. Reviewers: echristo, pelikan, kpw Subscribers: mgorny, llvm-commits Differential Revision: https://reviews.llvm.org/D45756 llvm-svn: 331141
2018-04-29 21:46:30 +08:00
#include <type_traits>
#include <utility>
namespace __xray {
/// The Array type provides an interface similar to std::vector<...> but does
/// not shrink in size. Once constructed, elements can be appended but cannot be
/// removed. The implementation is heavily dependent on the contract provided by
/// the Allocator type, in that all memory will be released when the Allocator
/// is destroyed. When an Array is destroyed, it will destroy elements in the
[XRay][compiler-rt] Segmented Array: Simplify and Optimise Summary: This is a follow-on to D49217 which simplifies and optimises the implementation of the segmented array. In this patch we co-locate the book-keeping for segments in the `__xray::Array<T>` with the data it's managing. We take the chance in this patch to actually rename `Chunk` to `Segment` to better align with the high-level description of the segmented array. With measurements using benchmarks landed in D48879, we've identified that calls to `pthread_getspecific` started dominating the cycles, which led us to revert the change made in D49217 to use C++ thread_local initialisation instead (it reduces the cost by a huge margin, since we save one PLT-based call to pthread functions in the hot path). In particular, this is in `__xray::getThreadLocalData()`. We also took the opportunity to remove the least-common-multiple based calculation and instead pack as much data into segments of the array. This greatly simplifies the API of the container which hides as much of the implementation details as possible. For instance, we calculate the number of elements we need for the each segment internally in the Array instead of making it part of the type. With the changes here, we're able to get a measurable improvement on the performance of profiling mode on top of what D48879 already provides. Depends on D48879. Reviewers: kpw, eizan Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D49363 llvm-svn: 337343
2018-07-18 10:08:39 +08:00
/// backing store but will not free the memory.
template <class T> class Array {
struct SegmentBase {
SegmentBase *Prev;
SegmentBase *Next;
};
[XRay][profiler] Part 1: XRay Allocator and Array Implementations Summary: This change is part of the larger XRay Profiling Mode effort. Here we implement an arena allocator, for fixed sized buffers used in a segmented array implementation. This change adds the segmented array data structure, which relies on the allocator to provide and maintain the storage for the segmented array. Key features of the `Allocator` type: * It uses cache-aligned blocks, intended to host the actual data. These blocks are cache-line-size multiples of contiguous bytes. * The `Allocator` has a maximum memory budget, set at construction time. This allows us to cap the amount of data each specific `Allocator` instance is responsible for. * Upon destruction, the `Allocator` will clean up the storage it's used, handing it back to the internal allocator used in sanitizer_common. Key features of the `Array` type: * Each segmented array is always backed by an `Allocator`, which is either user-provided or uses a global allocator. * When an `Array` grows, it grows by appending a segment that's fixed-sized. The size of each segment is computed by the number of elements of type `T` that can fit into cache line multiples. * An `Array` does not return memory to the `Allocator`, but it can keep track of the current number of "live" objects it stores. * When an `Array` is destroyed, it will not return memory to the `Allocator`. Users should clean up the `Allocator` independently of the `Array`. * The `Array` type keeps a freelist of the chunks it's used before, so that trimming and growing will re-use previously allocated chunks. These basic data structures are used by the XRay Profiling Mode implementation to implement efficient and cache-aware storage for data that's typically read-and-write heavy for tracking latency information. We're relying on the cache line characteristics of the architecture to provide us good data isolation and cache friendliness, when we're performing operations like searching for elements and/or updating data hosted in these cache lines. Reviewers: echristo, pelikan, kpw Subscribers: mgorny, llvm-commits Differential Revision: https://reviews.llvm.org/D45756 llvm-svn: 331141
2018-04-29 21:46:30 +08:00
[XRay][compiler-rt] Segmented Array: Simplify and Optimise Summary: This is a follow-on to D49217 which simplifies and optimises the implementation of the segmented array. In this patch we co-locate the book-keeping for segments in the `__xray::Array<T>` with the data it's managing. We take the chance in this patch to actually rename `Chunk` to `Segment` to better align with the high-level description of the segmented array. With measurements using benchmarks landed in D48879, we've identified that calls to `pthread_getspecific` started dominating the cycles, which led us to revert the change made in D49217 to use C++ thread_local initialisation instead (it reduces the cost by a huge margin, since we save one PLT-based call to pthread functions in the hot path). In particular, this is in `__xray::getThreadLocalData()`. We also took the opportunity to remove the least-common-multiple based calculation and instead pack as much data into segments of the array. This greatly simplifies the API of the container which hides as much of the implementation details as possible. For instance, we calculate the number of elements we need for the each segment internally in the Array instead of making it part of the type. With the changes here, we're able to get a measurable improvement on the performance of profiling mode on top of what D48879 already provides. Depends on D48879. Reviewers: kpw, eizan Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D49363 llvm-svn: 337343
2018-07-18 10:08:39 +08:00
// We want each segment of the array to be cache-line aligned, and elements of
// the array be offset from the beginning of the segment.
struct Segment : SegmentBase {
char Data[1];
[XRay][compiler-rt] Segmented Array: Simplify and Optimise Summary: This is a follow-on to D49217 which simplifies and optimises the implementation of the segmented array. In this patch we co-locate the book-keeping for segments in the `__xray::Array<T>` with the data it's managing. We take the chance in this patch to actually rename `Chunk` to `Segment` to better align with the high-level description of the segmented array. With measurements using benchmarks landed in D48879, we've identified that calls to `pthread_getspecific` started dominating the cycles, which led us to revert the change made in D49217 to use C++ thread_local initialisation instead (it reduces the cost by a huge margin, since we save one PLT-based call to pthread functions in the hot path). In particular, this is in `__xray::getThreadLocalData()`. We also took the opportunity to remove the least-common-multiple based calculation and instead pack as much data into segments of the array. This greatly simplifies the API of the container which hides as much of the implementation details as possible. For instance, we calculate the number of elements we need for the each segment internally in the Array instead of making it part of the type. With the changes here, we're able to get a measurable improvement on the performance of profiling mode on top of what D48879 already provides. Depends on D48879. Reviewers: kpw, eizan Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D49363 llvm-svn: 337343
2018-07-18 10:08:39 +08:00
};
public:
// Each segment of the array will be laid out with the following assumptions:
//
// - Each segment will be on a cache-line address boundary (kCacheLineSize
// aligned).
//
// - The elements will be accessed through an aligned pointer, dependent on
// the alignment of T.
//
// - Each element is at least two-pointers worth from the beginning of the
// Segment, aligned properly, and the rest of the elements are accessed
// through appropriate alignment.
//
// We then compute the size of the segment to follow this logic:
//
// - Compute the number of elements that can fit within
// kCacheLineSize-multiple segments, minus the size of two pointers.
//
// - Request cacheline-multiple sized elements from the allocator.
static constexpr size_t AlignedElementStorageSize =
sizeof(typename std::aligned_storage<sizeof(T), alignof(T)>::type);
static constexpr size_t SegmentSize =
nearest_boundary(sizeof(Segment) + next_pow2(sizeof(T)), kCacheLineSize);
using AllocatorType = Allocator<SegmentSize>;
static constexpr size_t ElementsPerSegment =
(SegmentSize - sizeof(Segment)) / next_pow2(sizeof(T));
static_assert(ElementsPerSegment > 0,
"Must have at least 1 element per segment.");
static SegmentBase SentinelSegment;
[XRay][profiler] Part 1: XRay Allocator and Array Implementations Summary: This change is part of the larger XRay Profiling Mode effort. Here we implement an arena allocator, for fixed sized buffers used in a segmented array implementation. This change adds the segmented array data structure, which relies on the allocator to provide and maintain the storage for the segmented array. Key features of the `Allocator` type: * It uses cache-aligned blocks, intended to host the actual data. These blocks are cache-line-size multiples of contiguous bytes. * The `Allocator` has a maximum memory budget, set at construction time. This allows us to cap the amount of data each specific `Allocator` instance is responsible for. * Upon destruction, the `Allocator` will clean up the storage it's used, handing it back to the internal allocator used in sanitizer_common. Key features of the `Array` type: * Each segmented array is always backed by an `Allocator`, which is either user-provided or uses a global allocator. * When an `Array` grows, it grows by appending a segment that's fixed-sized. The size of each segment is computed by the number of elements of type `T` that can fit into cache line multiples. * An `Array` does not return memory to the `Allocator`, but it can keep track of the current number of "live" objects it stores. * When an `Array` is destroyed, it will not return memory to the `Allocator`. Users should clean up the `Allocator` independently of the `Array`. * The `Array` type keeps a freelist of the chunks it's used before, so that trimming and growing will re-use previously allocated chunks. These basic data structures are used by the XRay Profiling Mode implementation to implement efficient and cache-aware storage for data that's typically read-and-write heavy for tracking latency information. We're relying on the cache line characteristics of the architecture to provide us good data isolation and cache friendliness, when we're performing operations like searching for elements and/or updating data hosted in these cache lines. Reviewers: echristo, pelikan, kpw Subscribers: mgorny, llvm-commits Differential Revision: https://reviews.llvm.org/D45756 llvm-svn: 331141
2018-04-29 21:46:30 +08:00
[XRay][compiler-rt] Segmented Array: Simplify and Optimise Summary: This is a follow-on to D49217 which simplifies and optimises the implementation of the segmented array. In this patch we co-locate the book-keeping for segments in the `__xray::Array<T>` with the data it's managing. We take the chance in this patch to actually rename `Chunk` to `Segment` to better align with the high-level description of the segmented array. With measurements using benchmarks landed in D48879, we've identified that calls to `pthread_getspecific` started dominating the cycles, which led us to revert the change made in D49217 to use C++ thread_local initialisation instead (it reduces the cost by a huge margin, since we save one PLT-based call to pthread functions in the hot path). In particular, this is in `__xray::getThreadLocalData()`. We also took the opportunity to remove the least-common-multiple based calculation and instead pack as much data into segments of the array. This greatly simplifies the API of the container which hides as much of the implementation details as possible. For instance, we calculate the number of elements we need for the each segment internally in the Array instead of making it part of the type. With the changes here, we're able to get a measurable improvement on the performance of profiling mode on top of what D48879 already provides. Depends on D48879. Reviewers: kpw, eizan Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D49363 llvm-svn: 337343
2018-07-18 10:08:39 +08:00
private:
AllocatorType *Alloc;
[XRay][compiler-rt] Segmented Array: Simplify and Optimise Summary: This is a follow-on to D49217 which simplifies and optimises the implementation of the segmented array. In this patch we co-locate the book-keeping for segments in the `__xray::Array<T>` with the data it's managing. We take the chance in this patch to actually rename `Chunk` to `Segment` to better align with the high-level description of the segmented array. With measurements using benchmarks landed in D48879, we've identified that calls to `pthread_getspecific` started dominating the cycles, which led us to revert the change made in D49217 to use C++ thread_local initialisation instead (it reduces the cost by a huge margin, since we save one PLT-based call to pthread functions in the hot path). In particular, this is in `__xray::getThreadLocalData()`. We also took the opportunity to remove the least-common-multiple based calculation and instead pack as much data into segments of the array. This greatly simplifies the API of the container which hides as much of the implementation details as possible. For instance, we calculate the number of elements we need for the each segment internally in the Array instead of making it part of the type. With the changes here, we're able to get a measurable improvement on the performance of profiling mode on top of what D48879 already provides. Depends on D48879. Reviewers: kpw, eizan Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D49363 llvm-svn: 337343
2018-07-18 10:08:39 +08:00
SegmentBase *Head = &SentinelSegment;
SegmentBase *Tail = &SentinelSegment;
[XRay][profiler] Part 1: XRay Allocator and Array Implementations Summary: This change is part of the larger XRay Profiling Mode effort. Here we implement an arena allocator, for fixed sized buffers used in a segmented array implementation. This change adds the segmented array data structure, which relies on the allocator to provide and maintain the storage for the segmented array. Key features of the `Allocator` type: * It uses cache-aligned blocks, intended to host the actual data. These blocks are cache-line-size multiples of contiguous bytes. * The `Allocator` has a maximum memory budget, set at construction time. This allows us to cap the amount of data each specific `Allocator` instance is responsible for. * Upon destruction, the `Allocator` will clean up the storage it's used, handing it back to the internal allocator used in sanitizer_common. Key features of the `Array` type: * Each segmented array is always backed by an `Allocator`, which is either user-provided or uses a global allocator. * When an `Array` grows, it grows by appending a segment that's fixed-sized. The size of each segment is computed by the number of elements of type `T` that can fit into cache line multiples. * An `Array` does not return memory to the `Allocator`, but it can keep track of the current number of "live" objects it stores. * When an `Array` is destroyed, it will not return memory to the `Allocator`. Users should clean up the `Allocator` independently of the `Array`. * The `Array` type keeps a freelist of the chunks it's used before, so that trimming and growing will re-use previously allocated chunks. These basic data structures are used by the XRay Profiling Mode implementation to implement efficient and cache-aware storage for data that's typically read-and-write heavy for tracking latency information. We're relying on the cache line characteristics of the architecture to provide us good data isolation and cache friendliness, when we're performing operations like searching for elements and/or updating data hosted in these cache lines. Reviewers: echristo, pelikan, kpw Subscribers: mgorny, llvm-commits Differential Revision: https://reviews.llvm.org/D45756 llvm-svn: 331141
2018-04-29 21:46:30 +08:00
size_t Size = 0;
[XRay][compiler-rt] Segmented Array: Simplify and Optimise Summary: This is a follow-on to D49217 which simplifies and optimises the implementation of the segmented array. In this patch we co-locate the book-keeping for segments in the `__xray::Array<T>` with the data it's managing. We take the chance in this patch to actually rename `Chunk` to `Segment` to better align with the high-level description of the segmented array. With measurements using benchmarks landed in D48879, we've identified that calls to `pthread_getspecific` started dominating the cycles, which led us to revert the change made in D49217 to use C++ thread_local initialisation instead (it reduces the cost by a huge margin, since we save one PLT-based call to pthread functions in the hot path). In particular, this is in `__xray::getThreadLocalData()`. We also took the opportunity to remove the least-common-multiple based calculation and instead pack as much data into segments of the array. This greatly simplifies the API of the container which hides as much of the implementation details as possible. For instance, we calculate the number of elements we need for the each segment internally in the Array instead of making it part of the type. With the changes here, we're able to get a measurable improvement on the performance of profiling mode on top of what D48879 already provides. Depends on D48879. Reviewers: kpw, eizan Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D49363 llvm-svn: 337343
2018-07-18 10:08:39 +08:00
// Here we keep track of segments in the freelist, to allow us to re-use
// segments when elements are trimmed off the end.
SegmentBase *Freelist = &SentinelSegment;
[XRay][profiler] Part 1: XRay Allocator and Array Implementations Summary: This change is part of the larger XRay Profiling Mode effort. Here we implement an arena allocator, for fixed sized buffers used in a segmented array implementation. This change adds the segmented array data structure, which relies on the allocator to provide and maintain the storage for the segmented array. Key features of the `Allocator` type: * It uses cache-aligned blocks, intended to host the actual data. These blocks are cache-line-size multiples of contiguous bytes. * The `Allocator` has a maximum memory budget, set at construction time. This allows us to cap the amount of data each specific `Allocator` instance is responsible for. * Upon destruction, the `Allocator` will clean up the storage it's used, handing it back to the internal allocator used in sanitizer_common. Key features of the `Array` type: * Each segmented array is always backed by an `Allocator`, which is either user-provided or uses a global allocator. * When an `Array` grows, it grows by appending a segment that's fixed-sized. The size of each segment is computed by the number of elements of type `T` that can fit into cache line multiples. * An `Array` does not return memory to the `Allocator`, but it can keep track of the current number of "live" objects it stores. * When an `Array` is destroyed, it will not return memory to the `Allocator`. Users should clean up the `Allocator` independently of the `Array`. * The `Array` type keeps a freelist of the chunks it's used before, so that trimming and growing will re-use previously allocated chunks. These basic data structures are used by the XRay Profiling Mode implementation to implement efficient and cache-aware storage for data that's typically read-and-write heavy for tracking latency information. We're relying on the cache line characteristics of the architecture to provide us good data isolation and cache friendliness, when we're performing operations like searching for elements and/or updating data hosted in these cache lines. Reviewers: echristo, pelikan, kpw Subscribers: mgorny, llvm-commits Differential Revision: https://reviews.llvm.org/D45756 llvm-svn: 331141
2018-04-29 21:46:30 +08:00
[XRay][compiler-rt] Segmented Array: Simplify and Optimise Summary: This is a follow-on to D49217 which simplifies and optimises the implementation of the segmented array. In this patch we co-locate the book-keeping for segments in the `__xray::Array<T>` with the data it's managing. We take the chance in this patch to actually rename `Chunk` to `Segment` to better align with the high-level description of the segmented array. With measurements using benchmarks landed in D48879, we've identified that calls to `pthread_getspecific` started dominating the cycles, which led us to revert the change made in D49217 to use C++ thread_local initialisation instead (it reduces the cost by a huge margin, since we save one PLT-based call to pthread functions in the hot path). In particular, this is in `__xray::getThreadLocalData()`. We also took the opportunity to remove the least-common-multiple based calculation and instead pack as much data into segments of the array. This greatly simplifies the API of the container which hides as much of the implementation details as possible. For instance, we calculate the number of elements we need for the each segment internally in the Array instead of making it part of the type. With the changes here, we're able to get a measurable improvement on the performance of profiling mode on top of what D48879 already provides. Depends on D48879. Reviewers: kpw, eizan Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D49363 llvm-svn: 337343
2018-07-18 10:08:39 +08:00
Segment *NewSegment() {
[XRay][profiler] Part 1: XRay Allocator and Array Implementations Summary: This change is part of the larger XRay Profiling Mode effort. Here we implement an arena allocator, for fixed sized buffers used in a segmented array implementation. This change adds the segmented array data structure, which relies on the allocator to provide and maintain the storage for the segmented array. Key features of the `Allocator` type: * It uses cache-aligned blocks, intended to host the actual data. These blocks are cache-line-size multiples of contiguous bytes. * The `Allocator` has a maximum memory budget, set at construction time. This allows us to cap the amount of data each specific `Allocator` instance is responsible for. * Upon destruction, the `Allocator` will clean up the storage it's used, handing it back to the internal allocator used in sanitizer_common. Key features of the `Array` type: * Each segmented array is always backed by an `Allocator`, which is either user-provided or uses a global allocator. * When an `Array` grows, it grows by appending a segment that's fixed-sized. The size of each segment is computed by the number of elements of type `T` that can fit into cache line multiples. * An `Array` does not return memory to the `Allocator`, but it can keep track of the current number of "live" objects it stores. * When an `Array` is destroyed, it will not return memory to the `Allocator`. Users should clean up the `Allocator` independently of the `Array`. * The `Array` type keeps a freelist of the chunks it's used before, so that trimming and growing will re-use previously allocated chunks. These basic data structures are used by the XRay Profiling Mode implementation to implement efficient and cache-aware storage for data that's typically read-and-write heavy for tracking latency information. We're relying on the cache line characteristics of the architecture to provide us good data isolation and cache friendliness, when we're performing operations like searching for elements and/or updating data hosted in these cache lines. Reviewers: echristo, pelikan, kpw Subscribers: mgorny, llvm-commits Differential Revision: https://reviews.llvm.org/D45756 llvm-svn: 331141
2018-04-29 21:46:30 +08:00
// We need to handle the case in which enough elements have been trimmed to
[XRay][compiler-rt] Segmented Array: Simplify and Optimise Summary: This is a follow-on to D49217 which simplifies and optimises the implementation of the segmented array. In this patch we co-locate the book-keeping for segments in the `__xray::Array<T>` with the data it's managing. We take the chance in this patch to actually rename `Chunk` to `Segment` to better align with the high-level description of the segmented array. With measurements using benchmarks landed in D48879, we've identified that calls to `pthread_getspecific` started dominating the cycles, which led us to revert the change made in D49217 to use C++ thread_local initialisation instead (it reduces the cost by a huge margin, since we save one PLT-based call to pthread functions in the hot path). In particular, this is in `__xray::getThreadLocalData()`. We also took the opportunity to remove the least-common-multiple based calculation and instead pack as much data into segments of the array. This greatly simplifies the API of the container which hides as much of the implementation details as possible. For instance, we calculate the number of elements we need for the each segment internally in the Array instead of making it part of the type. With the changes here, we're able to get a measurable improvement on the performance of profiling mode on top of what D48879 already provides. Depends on D48879. Reviewers: kpw, eizan Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D49363 llvm-svn: 337343
2018-07-18 10:08:39 +08:00
// allow us to re-use segments we've allocated before. For this we look into
[XRay][profiler] Part 1: XRay Allocator and Array Implementations Summary: This change is part of the larger XRay Profiling Mode effort. Here we implement an arena allocator, for fixed sized buffers used in a segmented array implementation. This change adds the segmented array data structure, which relies on the allocator to provide and maintain the storage for the segmented array. Key features of the `Allocator` type: * It uses cache-aligned blocks, intended to host the actual data. These blocks are cache-line-size multiples of contiguous bytes. * The `Allocator` has a maximum memory budget, set at construction time. This allows us to cap the amount of data each specific `Allocator` instance is responsible for. * Upon destruction, the `Allocator` will clean up the storage it's used, handing it back to the internal allocator used in sanitizer_common. Key features of the `Array` type: * Each segmented array is always backed by an `Allocator`, which is either user-provided or uses a global allocator. * When an `Array` grows, it grows by appending a segment that's fixed-sized. The size of each segment is computed by the number of elements of type `T` that can fit into cache line multiples. * An `Array` does not return memory to the `Allocator`, but it can keep track of the current number of "live" objects it stores. * When an `Array` is destroyed, it will not return memory to the `Allocator`. Users should clean up the `Allocator` independently of the `Array`. * The `Array` type keeps a freelist of the chunks it's used before, so that trimming and growing will re-use previously allocated chunks. These basic data structures are used by the XRay Profiling Mode implementation to implement efficient and cache-aware storage for data that's typically read-and-write heavy for tracking latency information. We're relying on the cache line characteristics of the architecture to provide us good data isolation and cache friendliness, when we're performing operations like searching for elements and/or updating data hosted in these cache lines. Reviewers: echristo, pelikan, kpw Subscribers: mgorny, llvm-commits Differential Revision: https://reviews.llvm.org/D45756 llvm-svn: 331141
2018-04-29 21:46:30 +08:00
// the Freelist, to see whether we need to actually allocate new blocks or
// just re-use blocks we've already seen before.
[XRay][compiler-rt] Segmented Array: Simplify and Optimise Summary: This is a follow-on to D49217 which simplifies and optimises the implementation of the segmented array. In this patch we co-locate the book-keeping for segments in the `__xray::Array<T>` with the data it's managing. We take the chance in this patch to actually rename `Chunk` to `Segment` to better align with the high-level description of the segmented array. With measurements using benchmarks landed in D48879, we've identified that calls to `pthread_getspecific` started dominating the cycles, which led us to revert the change made in D49217 to use C++ thread_local initialisation instead (it reduces the cost by a huge margin, since we save one PLT-based call to pthread functions in the hot path). In particular, this is in `__xray::getThreadLocalData()`. We also took the opportunity to remove the least-common-multiple based calculation and instead pack as much data into segments of the array. This greatly simplifies the API of the container which hides as much of the implementation details as possible. For instance, we calculate the number of elements we need for the each segment internally in the Array instead of making it part of the type. With the changes here, we're able to get a measurable improvement on the performance of profiling mode on top of what D48879 already provides. Depends on D48879. Reviewers: kpw, eizan Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D49363 llvm-svn: 337343
2018-07-18 10:08:39 +08:00
if (Freelist != &SentinelSegment) {
auto *FreeSegment = Freelist;
Freelist = FreeSegment->Next;
FreeSegment->Next = &SentinelSegment;
Freelist->Prev = &SentinelSegment;
return static_cast<Segment *>(FreeSegment);
[XRay][profiler] Part 1: XRay Allocator and Array Implementations Summary: This change is part of the larger XRay Profiling Mode effort. Here we implement an arena allocator, for fixed sized buffers used in a segmented array implementation. This change adds the segmented array data structure, which relies on the allocator to provide and maintain the storage for the segmented array. Key features of the `Allocator` type: * It uses cache-aligned blocks, intended to host the actual data. These blocks are cache-line-size multiples of contiguous bytes. * The `Allocator` has a maximum memory budget, set at construction time. This allows us to cap the amount of data each specific `Allocator` instance is responsible for. * Upon destruction, the `Allocator` will clean up the storage it's used, handing it back to the internal allocator used in sanitizer_common. Key features of the `Array` type: * Each segmented array is always backed by an `Allocator`, which is either user-provided or uses a global allocator. * When an `Array` grows, it grows by appending a segment that's fixed-sized. The size of each segment is computed by the number of elements of type `T` that can fit into cache line multiples. * An `Array` does not return memory to the `Allocator`, but it can keep track of the current number of "live" objects it stores. * When an `Array` is destroyed, it will not return memory to the `Allocator`. Users should clean up the `Allocator` independently of the `Array`. * The `Array` type keeps a freelist of the chunks it's used before, so that trimming and growing will re-use previously allocated chunks. These basic data structures are used by the XRay Profiling Mode implementation to implement efficient and cache-aware storage for data that's typically read-and-write heavy for tracking latency information. We're relying on the cache line characteristics of the architecture to provide us good data isolation and cache friendliness, when we're performing operations like searching for elements and/or updating data hosted in these cache lines. Reviewers: echristo, pelikan, kpw Subscribers: mgorny, llvm-commits Differential Revision: https://reviews.llvm.org/D45756 llvm-svn: 331141
2018-04-29 21:46:30 +08:00
}
[XRay][compiler-rt] Segmented Array: Simplify and Optimise Summary: This is a follow-on to D49217 which simplifies and optimises the implementation of the segmented array. In this patch we co-locate the book-keeping for segments in the `__xray::Array<T>` with the data it's managing. We take the chance in this patch to actually rename `Chunk` to `Segment` to better align with the high-level description of the segmented array. With measurements using benchmarks landed in D48879, we've identified that calls to `pthread_getspecific` started dominating the cycles, which led us to revert the change made in D49217 to use C++ thread_local initialisation instead (it reduces the cost by a huge margin, since we save one PLT-based call to pthread functions in the hot path). In particular, this is in `__xray::getThreadLocalData()`. We also took the opportunity to remove the least-common-multiple based calculation and instead pack as much data into segments of the array. This greatly simplifies the API of the container which hides as much of the implementation details as possible. For instance, we calculate the number of elements we need for the each segment internally in the Array instead of making it part of the type. With the changes here, we're able to get a measurable improvement on the performance of profiling mode on top of what D48879 already provides. Depends on D48879. Reviewers: kpw, eizan Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D49363 llvm-svn: 337343
2018-07-18 10:08:39 +08:00
auto SegmentBlock = Alloc->Allocate();
if (SegmentBlock.Data == nullptr)
[XRay][profiler] Part 1: XRay Allocator and Array Implementations Summary: This change is part of the larger XRay Profiling Mode effort. Here we implement an arena allocator, for fixed sized buffers used in a segmented array implementation. This change adds the segmented array data structure, which relies on the allocator to provide and maintain the storage for the segmented array. Key features of the `Allocator` type: * It uses cache-aligned blocks, intended to host the actual data. These blocks are cache-line-size multiples of contiguous bytes. * The `Allocator` has a maximum memory budget, set at construction time. This allows us to cap the amount of data each specific `Allocator` instance is responsible for. * Upon destruction, the `Allocator` will clean up the storage it's used, handing it back to the internal allocator used in sanitizer_common. Key features of the `Array` type: * Each segmented array is always backed by an `Allocator`, which is either user-provided or uses a global allocator. * When an `Array` grows, it grows by appending a segment that's fixed-sized. The size of each segment is computed by the number of elements of type `T` that can fit into cache line multiples. * An `Array` does not return memory to the `Allocator`, but it can keep track of the current number of "live" objects it stores. * When an `Array` is destroyed, it will not return memory to the `Allocator`. Users should clean up the `Allocator` independently of the `Array`. * The `Array` type keeps a freelist of the chunks it's used before, so that trimming and growing will re-use previously allocated chunks. These basic data structures are used by the XRay Profiling Mode implementation to implement efficient and cache-aware storage for data that's typically read-and-write heavy for tracking latency information. We're relying on the cache line characteristics of the architecture to provide us good data isolation and cache friendliness, when we're performing operations like searching for elements and/or updating data hosted in these cache lines. Reviewers: echristo, pelikan, kpw Subscribers: mgorny, llvm-commits Differential Revision: https://reviews.llvm.org/D45756 llvm-svn: 331141
2018-04-29 21:46:30 +08:00
return nullptr;
[XRay][compiler-rt] Segmented Array: Simplify and Optimise Summary: This is a follow-on to D49217 which simplifies and optimises the implementation of the segmented array. In this patch we co-locate the book-keeping for segments in the `__xray::Array<T>` with the data it's managing. We take the chance in this patch to actually rename `Chunk` to `Segment` to better align with the high-level description of the segmented array. With measurements using benchmarks landed in D48879, we've identified that calls to `pthread_getspecific` started dominating the cycles, which led us to revert the change made in D49217 to use C++ thread_local initialisation instead (it reduces the cost by a huge margin, since we save one PLT-based call to pthread functions in the hot path). In particular, this is in `__xray::getThreadLocalData()`. We also took the opportunity to remove the least-common-multiple based calculation and instead pack as much data into segments of the array. This greatly simplifies the API of the container which hides as much of the implementation details as possible. For instance, we calculate the number of elements we need for the each segment internally in the Array instead of making it part of the type. With the changes here, we're able to get a measurable improvement on the performance of profiling mode on top of what D48879 already provides. Depends on D48879. Reviewers: kpw, eizan Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D49363 llvm-svn: 337343
2018-07-18 10:08:39 +08:00
// Placement-new the Segment element at the beginning of the SegmentBlock.
auto S = reinterpret_cast<Segment *>(SegmentBlock.Data);
new (S) SegmentBase{&SentinelSegment, &SentinelSegment};
return S;
[XRay][profiler] Part 1: XRay Allocator and Array Implementations Summary: This change is part of the larger XRay Profiling Mode effort. Here we implement an arena allocator, for fixed sized buffers used in a segmented array implementation. This change adds the segmented array data structure, which relies on the allocator to provide and maintain the storage for the segmented array. Key features of the `Allocator` type: * It uses cache-aligned blocks, intended to host the actual data. These blocks are cache-line-size multiples of contiguous bytes. * The `Allocator` has a maximum memory budget, set at construction time. This allows us to cap the amount of data each specific `Allocator` instance is responsible for. * Upon destruction, the `Allocator` will clean up the storage it's used, handing it back to the internal allocator used in sanitizer_common. Key features of the `Array` type: * Each segmented array is always backed by an `Allocator`, which is either user-provided or uses a global allocator. * When an `Array` grows, it grows by appending a segment that's fixed-sized. The size of each segment is computed by the number of elements of type `T` that can fit into cache line multiples. * An `Array` does not return memory to the `Allocator`, but it can keep track of the current number of "live" objects it stores. * When an `Array` is destroyed, it will not return memory to the `Allocator`. Users should clean up the `Allocator` independently of the `Array`. * The `Array` type keeps a freelist of the chunks it's used before, so that trimming and growing will re-use previously allocated chunks. These basic data structures are used by the XRay Profiling Mode implementation to implement efficient and cache-aware storage for data that's typically read-and-write heavy for tracking latency information. We're relying on the cache line characteristics of the architecture to provide us good data isolation and cache friendliness, when we're performing operations like searching for elements and/or updating data hosted in these cache lines. Reviewers: echristo, pelikan, kpw Subscribers: mgorny, llvm-commits Differential Revision: https://reviews.llvm.org/D45756 llvm-svn: 331141
2018-04-29 21:46:30 +08:00
}
[XRay][compiler-rt] Segmented Array: Simplify and Optimise Summary: This is a follow-on to D49217 which simplifies and optimises the implementation of the segmented array. In this patch we co-locate the book-keeping for segments in the `__xray::Array<T>` with the data it's managing. We take the chance in this patch to actually rename `Chunk` to `Segment` to better align with the high-level description of the segmented array. With measurements using benchmarks landed in D48879, we've identified that calls to `pthread_getspecific` started dominating the cycles, which led us to revert the change made in D49217 to use C++ thread_local initialisation instead (it reduces the cost by a huge margin, since we save one PLT-based call to pthread functions in the hot path). In particular, this is in `__xray::getThreadLocalData()`. We also took the opportunity to remove the least-common-multiple based calculation and instead pack as much data into segments of the array. This greatly simplifies the API of the container which hides as much of the implementation details as possible. For instance, we calculate the number of elements we need for the each segment internally in the Array instead of making it part of the type. With the changes here, we're able to get a measurable improvement on the performance of profiling mode on top of what D48879 already provides. Depends on D48879. Reviewers: kpw, eizan Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D49363 llvm-svn: 337343
2018-07-18 10:08:39 +08:00
Segment *InitHeadAndTail() {
DCHECK_EQ(Head, &SentinelSegment);
DCHECK_EQ(Tail, &SentinelSegment);
auto Segment = NewSegment();
if (Segment == nullptr)
[XRay][profiler] Part 1: XRay Allocator and Array Implementations Summary: This change is part of the larger XRay Profiling Mode effort. Here we implement an arena allocator, for fixed sized buffers used in a segmented array implementation. This change adds the segmented array data structure, which relies on the allocator to provide and maintain the storage for the segmented array. Key features of the `Allocator` type: * It uses cache-aligned blocks, intended to host the actual data. These blocks are cache-line-size multiples of contiguous bytes. * The `Allocator` has a maximum memory budget, set at construction time. This allows us to cap the amount of data each specific `Allocator` instance is responsible for. * Upon destruction, the `Allocator` will clean up the storage it's used, handing it back to the internal allocator used in sanitizer_common. Key features of the `Array` type: * Each segmented array is always backed by an `Allocator`, which is either user-provided or uses a global allocator. * When an `Array` grows, it grows by appending a segment that's fixed-sized. The size of each segment is computed by the number of elements of type `T` that can fit into cache line multiples. * An `Array` does not return memory to the `Allocator`, but it can keep track of the current number of "live" objects it stores. * When an `Array` is destroyed, it will not return memory to the `Allocator`. Users should clean up the `Allocator` independently of the `Array`. * The `Array` type keeps a freelist of the chunks it's used before, so that trimming and growing will re-use previously allocated chunks. These basic data structures are used by the XRay Profiling Mode implementation to implement efficient and cache-aware storage for data that's typically read-and-write heavy for tracking latency information. We're relying on the cache line characteristics of the architecture to provide us good data isolation and cache friendliness, when we're performing operations like searching for elements and/or updating data hosted in these cache lines. Reviewers: echristo, pelikan, kpw Subscribers: mgorny, llvm-commits Differential Revision: https://reviews.llvm.org/D45756 llvm-svn: 331141
2018-04-29 21:46:30 +08:00
return nullptr;
[XRay][compiler-rt] Segmented Array: Simplify and Optimise Summary: This is a follow-on to D49217 which simplifies and optimises the implementation of the segmented array. In this patch we co-locate the book-keeping for segments in the `__xray::Array<T>` with the data it's managing. We take the chance in this patch to actually rename `Chunk` to `Segment` to better align with the high-level description of the segmented array. With measurements using benchmarks landed in D48879, we've identified that calls to `pthread_getspecific` started dominating the cycles, which led us to revert the change made in D49217 to use C++ thread_local initialisation instead (it reduces the cost by a huge margin, since we save one PLT-based call to pthread functions in the hot path). In particular, this is in `__xray::getThreadLocalData()`. We also took the opportunity to remove the least-common-multiple based calculation and instead pack as much data into segments of the array. This greatly simplifies the API of the container which hides as much of the implementation details as possible. For instance, we calculate the number of elements we need for the each segment internally in the Array instead of making it part of the type. With the changes here, we're able to get a measurable improvement on the performance of profiling mode on top of what D48879 already provides. Depends on D48879. Reviewers: kpw, eizan Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D49363 llvm-svn: 337343
2018-07-18 10:08:39 +08:00
DCHECK_EQ(Segment->Next, &SentinelSegment);
DCHECK_EQ(Segment->Prev, &SentinelSegment);
Head = Tail = static_cast<SegmentBase *>(Segment);
return Segment;
[XRay][profiler] Part 1: XRay Allocator and Array Implementations Summary: This change is part of the larger XRay Profiling Mode effort. Here we implement an arena allocator, for fixed sized buffers used in a segmented array implementation. This change adds the segmented array data structure, which relies on the allocator to provide and maintain the storage for the segmented array. Key features of the `Allocator` type: * It uses cache-aligned blocks, intended to host the actual data. These blocks are cache-line-size multiples of contiguous bytes. * The `Allocator` has a maximum memory budget, set at construction time. This allows us to cap the amount of data each specific `Allocator` instance is responsible for. * Upon destruction, the `Allocator` will clean up the storage it's used, handing it back to the internal allocator used in sanitizer_common. Key features of the `Array` type: * Each segmented array is always backed by an `Allocator`, which is either user-provided or uses a global allocator. * When an `Array` grows, it grows by appending a segment that's fixed-sized. The size of each segment is computed by the number of elements of type `T` that can fit into cache line multiples. * An `Array` does not return memory to the `Allocator`, but it can keep track of the current number of "live" objects it stores. * When an `Array` is destroyed, it will not return memory to the `Allocator`. Users should clean up the `Allocator` independently of the `Array`. * The `Array` type keeps a freelist of the chunks it's used before, so that trimming and growing will re-use previously allocated chunks. These basic data structures are used by the XRay Profiling Mode implementation to implement efficient and cache-aware storage for data that's typically read-and-write heavy for tracking latency information. We're relying on the cache line characteristics of the architecture to provide us good data isolation and cache friendliness, when we're performing operations like searching for elements and/or updating data hosted in these cache lines. Reviewers: echristo, pelikan, kpw Subscribers: mgorny, llvm-commits Differential Revision: https://reviews.llvm.org/D45756 llvm-svn: 331141
2018-04-29 21:46:30 +08:00
}
[XRay][compiler-rt] Segmented Array: Simplify and Optimise Summary: This is a follow-on to D49217 which simplifies and optimises the implementation of the segmented array. In this patch we co-locate the book-keeping for segments in the `__xray::Array<T>` with the data it's managing. We take the chance in this patch to actually rename `Chunk` to `Segment` to better align with the high-level description of the segmented array. With measurements using benchmarks landed in D48879, we've identified that calls to `pthread_getspecific` started dominating the cycles, which led us to revert the change made in D49217 to use C++ thread_local initialisation instead (it reduces the cost by a huge margin, since we save one PLT-based call to pthread functions in the hot path). In particular, this is in `__xray::getThreadLocalData()`. We also took the opportunity to remove the least-common-multiple based calculation and instead pack as much data into segments of the array. This greatly simplifies the API of the container which hides as much of the implementation details as possible. For instance, we calculate the number of elements we need for the each segment internally in the Array instead of making it part of the type. With the changes here, we're able to get a measurable improvement on the performance of profiling mode on top of what D48879 already provides. Depends on D48879. Reviewers: kpw, eizan Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D49363 llvm-svn: 337343
2018-07-18 10:08:39 +08:00
Segment *AppendNewSegment() {
auto S = NewSegment();
if (S == nullptr)
[XRay][profiler] Part 1: XRay Allocator and Array Implementations Summary: This change is part of the larger XRay Profiling Mode effort. Here we implement an arena allocator, for fixed sized buffers used in a segmented array implementation. This change adds the segmented array data structure, which relies on the allocator to provide and maintain the storage for the segmented array. Key features of the `Allocator` type: * It uses cache-aligned blocks, intended to host the actual data. These blocks are cache-line-size multiples of contiguous bytes. * The `Allocator` has a maximum memory budget, set at construction time. This allows us to cap the amount of data each specific `Allocator` instance is responsible for. * Upon destruction, the `Allocator` will clean up the storage it's used, handing it back to the internal allocator used in sanitizer_common. Key features of the `Array` type: * Each segmented array is always backed by an `Allocator`, which is either user-provided or uses a global allocator. * When an `Array` grows, it grows by appending a segment that's fixed-sized. The size of each segment is computed by the number of elements of type `T` that can fit into cache line multiples. * An `Array` does not return memory to the `Allocator`, but it can keep track of the current number of "live" objects it stores. * When an `Array` is destroyed, it will not return memory to the `Allocator`. Users should clean up the `Allocator` independently of the `Array`. * The `Array` type keeps a freelist of the chunks it's used before, so that trimming and growing will re-use previously allocated chunks. These basic data structures are used by the XRay Profiling Mode implementation to implement efficient and cache-aware storage for data that's typically read-and-write heavy for tracking latency information. We're relying on the cache line characteristics of the architecture to provide us good data isolation and cache friendliness, when we're performing operations like searching for elements and/or updating data hosted in these cache lines. Reviewers: echristo, pelikan, kpw Subscribers: mgorny, llvm-commits Differential Revision: https://reviews.llvm.org/D45756 llvm-svn: 331141
2018-04-29 21:46:30 +08:00
return nullptr;
[XRay][compiler-rt] Segmented Array: Simplify and Optimise Summary: This is a follow-on to D49217 which simplifies and optimises the implementation of the segmented array. In this patch we co-locate the book-keeping for segments in the `__xray::Array<T>` with the data it's managing. We take the chance in this patch to actually rename `Chunk` to `Segment` to better align with the high-level description of the segmented array. With measurements using benchmarks landed in D48879, we've identified that calls to `pthread_getspecific` started dominating the cycles, which led us to revert the change made in D49217 to use C++ thread_local initialisation instead (it reduces the cost by a huge margin, since we save one PLT-based call to pthread functions in the hot path). In particular, this is in `__xray::getThreadLocalData()`. We also took the opportunity to remove the least-common-multiple based calculation and instead pack as much data into segments of the array. This greatly simplifies the API of the container which hides as much of the implementation details as possible. For instance, we calculate the number of elements we need for the each segment internally in the Array instead of making it part of the type. With the changes here, we're able to get a measurable improvement on the performance of profiling mode on top of what D48879 already provides. Depends on D48879. Reviewers: kpw, eizan Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D49363 llvm-svn: 337343
2018-07-18 10:08:39 +08:00
DCHECK_NE(Tail, &SentinelSegment);
DCHECK_EQ(Tail->Next, &SentinelSegment);
DCHECK_EQ(S->Prev, &SentinelSegment);
DCHECK_EQ(S->Next, &SentinelSegment);
Tail->Next = S;
S->Prev = Tail;
Tail = S;
return static_cast<Segment *>(Tail);
[XRay][profiler] Part 1: XRay Allocator and Array Implementations Summary: This change is part of the larger XRay Profiling Mode effort. Here we implement an arena allocator, for fixed sized buffers used in a segmented array implementation. This change adds the segmented array data structure, which relies on the allocator to provide and maintain the storage for the segmented array. Key features of the `Allocator` type: * It uses cache-aligned blocks, intended to host the actual data. These blocks are cache-line-size multiples of contiguous bytes. * The `Allocator` has a maximum memory budget, set at construction time. This allows us to cap the amount of data each specific `Allocator` instance is responsible for. * Upon destruction, the `Allocator` will clean up the storage it's used, handing it back to the internal allocator used in sanitizer_common. Key features of the `Array` type: * Each segmented array is always backed by an `Allocator`, which is either user-provided or uses a global allocator. * When an `Array` grows, it grows by appending a segment that's fixed-sized. The size of each segment is computed by the number of elements of type `T` that can fit into cache line multiples. * An `Array` does not return memory to the `Allocator`, but it can keep track of the current number of "live" objects it stores. * When an `Array` is destroyed, it will not return memory to the `Allocator`. Users should clean up the `Allocator` independently of the `Array`. * The `Array` type keeps a freelist of the chunks it's used before, so that trimming and growing will re-use previously allocated chunks. These basic data structures are used by the XRay Profiling Mode implementation to implement efficient and cache-aware storage for data that's typically read-and-write heavy for tracking latency information. We're relying on the cache line characteristics of the architecture to provide us good data isolation and cache friendliness, when we're performing operations like searching for elements and/or updating data hosted in these cache lines. Reviewers: echristo, pelikan, kpw Subscribers: mgorny, llvm-commits Differential Revision: https://reviews.llvm.org/D45756 llvm-svn: 331141
2018-04-29 21:46:30 +08:00
}
// This Iterator models a BidirectionalIterator.
template <class U> class Iterator {
[XRay][compiler-rt] Segmented Array: Simplify and Optimise Summary: This is a follow-on to D49217 which simplifies and optimises the implementation of the segmented array. In this patch we co-locate the book-keeping for segments in the `__xray::Array<T>` with the data it's managing. We take the chance in this patch to actually rename `Chunk` to `Segment` to better align with the high-level description of the segmented array. With measurements using benchmarks landed in D48879, we've identified that calls to `pthread_getspecific` started dominating the cycles, which led us to revert the change made in D49217 to use C++ thread_local initialisation instead (it reduces the cost by a huge margin, since we save one PLT-based call to pthread functions in the hot path). In particular, this is in `__xray::getThreadLocalData()`. We also took the opportunity to remove the least-common-multiple based calculation and instead pack as much data into segments of the array. This greatly simplifies the API of the container which hides as much of the implementation details as possible. For instance, we calculate the number of elements we need for the each segment internally in the Array instead of making it part of the type. With the changes here, we're able to get a measurable improvement on the performance of profiling mode on top of what D48879 already provides. Depends on D48879. Reviewers: kpw, eizan Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D49363 llvm-svn: 337343
2018-07-18 10:08:39 +08:00
SegmentBase *S = &SentinelSegment;
[XRay][profiler] Part 1: XRay Allocator and Array Implementations Summary: This change is part of the larger XRay Profiling Mode effort. Here we implement an arena allocator, for fixed sized buffers used in a segmented array implementation. This change adds the segmented array data structure, which relies on the allocator to provide and maintain the storage for the segmented array. Key features of the `Allocator` type: * It uses cache-aligned blocks, intended to host the actual data. These blocks are cache-line-size multiples of contiguous bytes. * The `Allocator` has a maximum memory budget, set at construction time. This allows us to cap the amount of data each specific `Allocator` instance is responsible for. * Upon destruction, the `Allocator` will clean up the storage it's used, handing it back to the internal allocator used in sanitizer_common. Key features of the `Array` type: * Each segmented array is always backed by an `Allocator`, which is either user-provided or uses a global allocator. * When an `Array` grows, it grows by appending a segment that's fixed-sized. The size of each segment is computed by the number of elements of type `T` that can fit into cache line multiples. * An `Array` does not return memory to the `Allocator`, but it can keep track of the current number of "live" objects it stores. * When an `Array` is destroyed, it will not return memory to the `Allocator`. Users should clean up the `Allocator` independently of the `Array`. * The `Array` type keeps a freelist of the chunks it's used before, so that trimming and growing will re-use previously allocated chunks. These basic data structures are used by the XRay Profiling Mode implementation to implement efficient and cache-aware storage for data that's typically read-and-write heavy for tracking latency information. We're relying on the cache line characteristics of the architecture to provide us good data isolation and cache friendliness, when we're performing operations like searching for elements and/or updating data hosted in these cache lines. Reviewers: echristo, pelikan, kpw Subscribers: mgorny, llvm-commits Differential Revision: https://reviews.llvm.org/D45756 llvm-svn: 331141
2018-04-29 21:46:30 +08:00
size_t Offset = 0;
size_t Size = 0;
[XRay][profiler] Part 1: XRay Allocator and Array Implementations Summary: This change is part of the larger XRay Profiling Mode effort. Here we implement an arena allocator, for fixed sized buffers used in a segmented array implementation. This change adds the segmented array data structure, which relies on the allocator to provide and maintain the storage for the segmented array. Key features of the `Allocator` type: * It uses cache-aligned blocks, intended to host the actual data. These blocks are cache-line-size multiples of contiguous bytes. * The `Allocator` has a maximum memory budget, set at construction time. This allows us to cap the amount of data each specific `Allocator` instance is responsible for. * Upon destruction, the `Allocator` will clean up the storage it's used, handing it back to the internal allocator used in sanitizer_common. Key features of the `Array` type: * Each segmented array is always backed by an `Allocator`, which is either user-provided or uses a global allocator. * When an `Array` grows, it grows by appending a segment that's fixed-sized. The size of each segment is computed by the number of elements of type `T` that can fit into cache line multiples. * An `Array` does not return memory to the `Allocator`, but it can keep track of the current number of "live" objects it stores. * When an `Array` is destroyed, it will not return memory to the `Allocator`. Users should clean up the `Allocator` independently of the `Array`. * The `Array` type keeps a freelist of the chunks it's used before, so that trimming and growing will re-use previously allocated chunks. These basic data structures are used by the XRay Profiling Mode implementation to implement efficient and cache-aware storage for data that's typically read-and-write heavy for tracking latency information. We're relying on the cache line characteristics of the architecture to provide us good data isolation and cache friendliness, when we're performing operations like searching for elements and/or updating data hosted in these cache lines. Reviewers: echristo, pelikan, kpw Subscribers: mgorny, llvm-commits Differential Revision: https://reviews.llvm.org/D45756 llvm-svn: 331141
2018-04-29 21:46:30 +08:00
public:
[XRay][compiler-rt] Segmented Array: Simplify and Optimise Summary: This is a follow-on to D49217 which simplifies and optimises the implementation of the segmented array. In this patch we co-locate the book-keeping for segments in the `__xray::Array<T>` with the data it's managing. We take the chance in this patch to actually rename `Chunk` to `Segment` to better align with the high-level description of the segmented array. With measurements using benchmarks landed in D48879, we've identified that calls to `pthread_getspecific` started dominating the cycles, which led us to revert the change made in D49217 to use C++ thread_local initialisation instead (it reduces the cost by a huge margin, since we save one PLT-based call to pthread functions in the hot path). In particular, this is in `__xray::getThreadLocalData()`. We also took the opportunity to remove the least-common-multiple based calculation and instead pack as much data into segments of the array. This greatly simplifies the API of the container which hides as much of the implementation details as possible. For instance, we calculate the number of elements we need for the each segment internally in the Array instead of making it part of the type. With the changes here, we're able to get a measurable improvement on the performance of profiling mode on top of what D48879 already provides. Depends on D48879. Reviewers: kpw, eizan Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D49363 llvm-svn: 337343
2018-07-18 10:08:39 +08:00
Iterator(SegmentBase *IS, size_t Off, size_t S)
: S(IS), Offset(Off), Size(S) {}
Iterator(const Iterator &) noexcept = default;
Iterator() noexcept = default;
Iterator(Iterator &&) noexcept = default;
Iterator &operator=(const Iterator &) = default;
Iterator &operator=(Iterator &&) = default;
~Iterator() = default;
[XRay][profiler] Part 1: XRay Allocator and Array Implementations Summary: This change is part of the larger XRay Profiling Mode effort. Here we implement an arena allocator, for fixed sized buffers used in a segmented array implementation. This change adds the segmented array data structure, which relies on the allocator to provide and maintain the storage for the segmented array. Key features of the `Allocator` type: * It uses cache-aligned blocks, intended to host the actual data. These blocks are cache-line-size multiples of contiguous bytes. * The `Allocator` has a maximum memory budget, set at construction time. This allows us to cap the amount of data each specific `Allocator` instance is responsible for. * Upon destruction, the `Allocator` will clean up the storage it's used, handing it back to the internal allocator used in sanitizer_common. Key features of the `Array` type: * Each segmented array is always backed by an `Allocator`, which is either user-provided or uses a global allocator. * When an `Array` grows, it grows by appending a segment that's fixed-sized. The size of each segment is computed by the number of elements of type `T` that can fit into cache line multiples. * An `Array` does not return memory to the `Allocator`, but it can keep track of the current number of "live" objects it stores. * When an `Array` is destroyed, it will not return memory to the `Allocator`. Users should clean up the `Allocator` independently of the `Array`. * The `Array` type keeps a freelist of the chunks it's used before, so that trimming and growing will re-use previously allocated chunks. These basic data structures are used by the XRay Profiling Mode implementation to implement efficient and cache-aware storage for data that's typically read-and-write heavy for tracking latency information. We're relying on the cache line characteristics of the architecture to provide us good data isolation and cache friendliness, when we're performing operations like searching for elements and/or updating data hosted in these cache lines. Reviewers: echristo, pelikan, kpw Subscribers: mgorny, llvm-commits Differential Revision: https://reviews.llvm.org/D45756 llvm-svn: 331141
2018-04-29 21:46:30 +08:00
Iterator &operator++() {
[XRay][compiler-rt] Segmented Array: Simplify and Optimise Summary: This is a follow-on to D49217 which simplifies and optimises the implementation of the segmented array. In this patch we co-locate the book-keeping for segments in the `__xray::Array<T>` with the data it's managing. We take the chance in this patch to actually rename `Chunk` to `Segment` to better align with the high-level description of the segmented array. With measurements using benchmarks landed in D48879, we've identified that calls to `pthread_getspecific` started dominating the cycles, which led us to revert the change made in D49217 to use C++ thread_local initialisation instead (it reduces the cost by a huge margin, since we save one PLT-based call to pthread functions in the hot path). In particular, this is in `__xray::getThreadLocalData()`. We also took the opportunity to remove the least-common-multiple based calculation and instead pack as much data into segments of the array. This greatly simplifies the API of the container which hides as much of the implementation details as possible. For instance, we calculate the number of elements we need for the each segment internally in the Array instead of making it part of the type. With the changes here, we're able to get a measurable improvement on the performance of profiling mode on top of what D48879 already provides. Depends on D48879. Reviewers: kpw, eizan Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D49363 llvm-svn: 337343
2018-07-18 10:08:39 +08:00
if (++Offset % ElementsPerSegment || Offset == Size)
[XRay][profiler] Part 1: XRay Allocator and Array Implementations Summary: This change is part of the larger XRay Profiling Mode effort. Here we implement an arena allocator, for fixed sized buffers used in a segmented array implementation. This change adds the segmented array data structure, which relies on the allocator to provide and maintain the storage for the segmented array. Key features of the `Allocator` type: * It uses cache-aligned blocks, intended to host the actual data. These blocks are cache-line-size multiples of contiguous bytes. * The `Allocator` has a maximum memory budget, set at construction time. This allows us to cap the amount of data each specific `Allocator` instance is responsible for. * Upon destruction, the `Allocator` will clean up the storage it's used, handing it back to the internal allocator used in sanitizer_common. Key features of the `Array` type: * Each segmented array is always backed by an `Allocator`, which is either user-provided or uses a global allocator. * When an `Array` grows, it grows by appending a segment that's fixed-sized. The size of each segment is computed by the number of elements of type `T` that can fit into cache line multiples. * An `Array` does not return memory to the `Allocator`, but it can keep track of the current number of "live" objects it stores. * When an `Array` is destroyed, it will not return memory to the `Allocator`. Users should clean up the `Allocator` independently of the `Array`. * The `Array` type keeps a freelist of the chunks it's used before, so that trimming and growing will re-use previously allocated chunks. These basic data structures are used by the XRay Profiling Mode implementation to implement efficient and cache-aware storage for data that's typically read-and-write heavy for tracking latency information. We're relying on the cache line characteristics of the architecture to provide us good data isolation and cache friendliness, when we're performing operations like searching for elements and/or updating data hosted in these cache lines. Reviewers: echristo, pelikan, kpw Subscribers: mgorny, llvm-commits Differential Revision: https://reviews.llvm.org/D45756 llvm-svn: 331141
2018-04-29 21:46:30 +08:00
return *this;
// At this point, we know that Offset % N == 0, so we must advance the
[XRay][compiler-rt] Segmented Array: Simplify and Optimise Summary: This is a follow-on to D49217 which simplifies and optimises the implementation of the segmented array. In this patch we co-locate the book-keeping for segments in the `__xray::Array<T>` with the data it's managing. We take the chance in this patch to actually rename `Chunk` to `Segment` to better align with the high-level description of the segmented array. With measurements using benchmarks landed in D48879, we've identified that calls to `pthread_getspecific` started dominating the cycles, which led us to revert the change made in D49217 to use C++ thread_local initialisation instead (it reduces the cost by a huge margin, since we save one PLT-based call to pthread functions in the hot path). In particular, this is in `__xray::getThreadLocalData()`. We also took the opportunity to remove the least-common-multiple based calculation and instead pack as much data into segments of the array. This greatly simplifies the API of the container which hides as much of the implementation details as possible. For instance, we calculate the number of elements we need for the each segment internally in the Array instead of making it part of the type. With the changes here, we're able to get a measurable improvement on the performance of profiling mode on top of what D48879 already provides. Depends on D48879. Reviewers: kpw, eizan Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D49363 llvm-svn: 337343
2018-07-18 10:08:39 +08:00
// segment pointer.
DCHECK_EQ(Offset % ElementsPerSegment, 0);
DCHECK_NE(Offset, Size);
[XRay][compiler-rt] Segmented Array: Simplify and Optimise Summary: This is a follow-on to D49217 which simplifies and optimises the implementation of the segmented array. In this patch we co-locate the book-keeping for segments in the `__xray::Array<T>` with the data it's managing. We take the chance in this patch to actually rename `Chunk` to `Segment` to better align with the high-level description of the segmented array. With measurements using benchmarks landed in D48879, we've identified that calls to `pthread_getspecific` started dominating the cycles, which led us to revert the change made in D49217 to use C++ thread_local initialisation instead (it reduces the cost by a huge margin, since we save one PLT-based call to pthread functions in the hot path). In particular, this is in `__xray::getThreadLocalData()`. We also took the opportunity to remove the least-common-multiple based calculation and instead pack as much data into segments of the array. This greatly simplifies the API of the container which hides as much of the implementation details as possible. For instance, we calculate the number of elements we need for the each segment internally in the Array instead of making it part of the type. With the changes here, we're able to get a measurable improvement on the performance of profiling mode on top of what D48879 already provides. Depends on D48879. Reviewers: kpw, eizan Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D49363 llvm-svn: 337343
2018-07-18 10:08:39 +08:00
DCHECK_NE(S, &SentinelSegment);
DCHECK_NE(S->Next, &SentinelSegment);
S = S->Next;
DCHECK_NE(S, &SentinelSegment);
[XRay][profiler] Part 1: XRay Allocator and Array Implementations Summary: This change is part of the larger XRay Profiling Mode effort. Here we implement an arena allocator, for fixed sized buffers used in a segmented array implementation. This change adds the segmented array data structure, which relies on the allocator to provide and maintain the storage for the segmented array. Key features of the `Allocator` type: * It uses cache-aligned blocks, intended to host the actual data. These blocks are cache-line-size multiples of contiguous bytes. * The `Allocator` has a maximum memory budget, set at construction time. This allows us to cap the amount of data each specific `Allocator` instance is responsible for. * Upon destruction, the `Allocator` will clean up the storage it's used, handing it back to the internal allocator used in sanitizer_common. Key features of the `Array` type: * Each segmented array is always backed by an `Allocator`, which is either user-provided or uses a global allocator. * When an `Array` grows, it grows by appending a segment that's fixed-sized. The size of each segment is computed by the number of elements of type `T` that can fit into cache line multiples. * An `Array` does not return memory to the `Allocator`, but it can keep track of the current number of "live" objects it stores. * When an `Array` is destroyed, it will not return memory to the `Allocator`. Users should clean up the `Allocator` independently of the `Array`. * The `Array` type keeps a freelist of the chunks it's used before, so that trimming and growing will re-use previously allocated chunks. These basic data structures are used by the XRay Profiling Mode implementation to implement efficient and cache-aware storage for data that's typically read-and-write heavy for tracking latency information. We're relying on the cache line characteristics of the architecture to provide us good data isolation and cache friendliness, when we're performing operations like searching for elements and/or updating data hosted in these cache lines. Reviewers: echristo, pelikan, kpw Subscribers: mgorny, llvm-commits Differential Revision: https://reviews.llvm.org/D45756 llvm-svn: 331141
2018-04-29 21:46:30 +08:00
return *this;
}
Iterator &operator--() {
[XRay][compiler-rt] Segmented Array: Simplify and Optimise Summary: This is a follow-on to D49217 which simplifies and optimises the implementation of the segmented array. In this patch we co-locate the book-keeping for segments in the `__xray::Array<T>` with the data it's managing. We take the chance in this patch to actually rename `Chunk` to `Segment` to better align with the high-level description of the segmented array. With measurements using benchmarks landed in D48879, we've identified that calls to `pthread_getspecific` started dominating the cycles, which led us to revert the change made in D49217 to use C++ thread_local initialisation instead (it reduces the cost by a huge margin, since we save one PLT-based call to pthread functions in the hot path). In particular, this is in `__xray::getThreadLocalData()`. We also took the opportunity to remove the least-common-multiple based calculation and instead pack as much data into segments of the array. This greatly simplifies the API of the container which hides as much of the implementation details as possible. For instance, we calculate the number of elements we need for the each segment internally in the Array instead of making it part of the type. With the changes here, we're able to get a measurable improvement on the performance of profiling mode on top of what D48879 already provides. Depends on D48879. Reviewers: kpw, eizan Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D49363 llvm-svn: 337343
2018-07-18 10:08:39 +08:00
DCHECK_NE(S, &SentinelSegment);
[XRay][profiler] Part 1: XRay Allocator and Array Implementations Summary: This change is part of the larger XRay Profiling Mode effort. Here we implement an arena allocator, for fixed sized buffers used in a segmented array implementation. This change adds the segmented array data structure, which relies on the allocator to provide and maintain the storage for the segmented array. Key features of the `Allocator` type: * It uses cache-aligned blocks, intended to host the actual data. These blocks are cache-line-size multiples of contiguous bytes. * The `Allocator` has a maximum memory budget, set at construction time. This allows us to cap the amount of data each specific `Allocator` instance is responsible for. * Upon destruction, the `Allocator` will clean up the storage it's used, handing it back to the internal allocator used in sanitizer_common. Key features of the `Array` type: * Each segmented array is always backed by an `Allocator`, which is either user-provided or uses a global allocator. * When an `Array` grows, it grows by appending a segment that's fixed-sized. The size of each segment is computed by the number of elements of type `T` that can fit into cache line multiples. * An `Array` does not return memory to the `Allocator`, but it can keep track of the current number of "live" objects it stores. * When an `Array` is destroyed, it will not return memory to the `Allocator`. Users should clean up the `Allocator` independently of the `Array`. * The `Array` type keeps a freelist of the chunks it's used before, so that trimming and growing will re-use previously allocated chunks. These basic data structures are used by the XRay Profiling Mode implementation to implement efficient and cache-aware storage for data that's typically read-and-write heavy for tracking latency information. We're relying on the cache line characteristics of the architecture to provide us good data isolation and cache friendliness, when we're performing operations like searching for elements and/or updating data hosted in these cache lines. Reviewers: echristo, pelikan, kpw Subscribers: mgorny, llvm-commits Differential Revision: https://reviews.llvm.org/D45756 llvm-svn: 331141
2018-04-29 21:46:30 +08:00
DCHECK_GT(Offset, 0);
auto PreviousOffset = Offset--;
[XRay][compiler-rt] Segmented Array: Simplify and Optimise Summary: This is a follow-on to D49217 which simplifies and optimises the implementation of the segmented array. In this patch we co-locate the book-keeping for segments in the `__xray::Array<T>` with the data it's managing. We take the chance in this patch to actually rename `Chunk` to `Segment` to better align with the high-level description of the segmented array. With measurements using benchmarks landed in D48879, we've identified that calls to `pthread_getspecific` started dominating the cycles, which led us to revert the change made in D49217 to use C++ thread_local initialisation instead (it reduces the cost by a huge margin, since we save one PLT-based call to pthread functions in the hot path). In particular, this is in `__xray::getThreadLocalData()`. We also took the opportunity to remove the least-common-multiple based calculation and instead pack as much data into segments of the array. This greatly simplifies the API of the container which hides as much of the implementation details as possible. For instance, we calculate the number of elements we need for the each segment internally in the Array instead of making it part of the type. With the changes here, we're able to get a measurable improvement on the performance of profiling mode on top of what D48879 already provides. Depends on D48879. Reviewers: kpw, eizan Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D49363 llvm-svn: 337343
2018-07-18 10:08:39 +08:00
if (PreviousOffset != Size && PreviousOffset % ElementsPerSegment == 0) {
DCHECK_NE(S->Prev, &SentinelSegment);
S = S->Prev;
}
[XRay][profiler] Part 1: XRay Allocator and Array Implementations Summary: This change is part of the larger XRay Profiling Mode effort. Here we implement an arena allocator, for fixed sized buffers used in a segmented array implementation. This change adds the segmented array data structure, which relies on the allocator to provide and maintain the storage for the segmented array. Key features of the `Allocator` type: * It uses cache-aligned blocks, intended to host the actual data. These blocks are cache-line-size multiples of contiguous bytes. * The `Allocator` has a maximum memory budget, set at construction time. This allows us to cap the amount of data each specific `Allocator` instance is responsible for. * Upon destruction, the `Allocator` will clean up the storage it's used, handing it back to the internal allocator used in sanitizer_common. Key features of the `Array` type: * Each segmented array is always backed by an `Allocator`, which is either user-provided or uses a global allocator. * When an `Array` grows, it grows by appending a segment that's fixed-sized. The size of each segment is computed by the number of elements of type `T` that can fit into cache line multiples. * An `Array` does not return memory to the `Allocator`, but it can keep track of the current number of "live" objects it stores. * When an `Array` is destroyed, it will not return memory to the `Allocator`. Users should clean up the `Allocator` independently of the `Array`. * The `Array` type keeps a freelist of the chunks it's used before, so that trimming and growing will re-use previously allocated chunks. These basic data structures are used by the XRay Profiling Mode implementation to implement efficient and cache-aware storage for data that's typically read-and-write heavy for tracking latency information. We're relying on the cache line characteristics of the architecture to provide us good data isolation and cache friendliness, when we're performing operations like searching for elements and/or updating data hosted in these cache lines. Reviewers: echristo, pelikan, kpw Subscribers: mgorny, llvm-commits Differential Revision: https://reviews.llvm.org/D45756 llvm-svn: 331141
2018-04-29 21:46:30 +08:00
return *this;
}
Iterator operator++(int) {
Iterator Copy(*this);
++(*this);
return Copy;
}
Iterator operator--(int) {
Iterator Copy(*this);
--(*this);
return Copy;
}
template <class V, class W>
friend bool operator==(const Iterator<V> &L, const Iterator<W> &R) {
[XRay][compiler-rt] Segmented Array: Simplify and Optimise Summary: This is a follow-on to D49217 which simplifies and optimises the implementation of the segmented array. In this patch we co-locate the book-keeping for segments in the `__xray::Array<T>` with the data it's managing. We take the chance in this patch to actually rename `Chunk` to `Segment` to better align with the high-level description of the segmented array. With measurements using benchmarks landed in D48879, we've identified that calls to `pthread_getspecific` started dominating the cycles, which led us to revert the change made in D49217 to use C++ thread_local initialisation instead (it reduces the cost by a huge margin, since we save one PLT-based call to pthread functions in the hot path). In particular, this is in `__xray::getThreadLocalData()`. We also took the opportunity to remove the least-common-multiple based calculation and instead pack as much data into segments of the array. This greatly simplifies the API of the container which hides as much of the implementation details as possible. For instance, we calculate the number of elements we need for the each segment internally in the Array instead of making it part of the type. With the changes here, we're able to get a measurable improvement on the performance of profiling mode on top of what D48879 already provides. Depends on D48879. Reviewers: kpw, eizan Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D49363 llvm-svn: 337343
2018-07-18 10:08:39 +08:00
return L.S == R.S && L.Offset == R.Offset;
[XRay][profiler] Part 1: XRay Allocator and Array Implementations Summary: This change is part of the larger XRay Profiling Mode effort. Here we implement an arena allocator, for fixed sized buffers used in a segmented array implementation. This change adds the segmented array data structure, which relies on the allocator to provide and maintain the storage for the segmented array. Key features of the `Allocator` type: * It uses cache-aligned blocks, intended to host the actual data. These blocks are cache-line-size multiples of contiguous bytes. * The `Allocator` has a maximum memory budget, set at construction time. This allows us to cap the amount of data each specific `Allocator` instance is responsible for. * Upon destruction, the `Allocator` will clean up the storage it's used, handing it back to the internal allocator used in sanitizer_common. Key features of the `Array` type: * Each segmented array is always backed by an `Allocator`, which is either user-provided or uses a global allocator. * When an `Array` grows, it grows by appending a segment that's fixed-sized. The size of each segment is computed by the number of elements of type `T` that can fit into cache line multiples. * An `Array` does not return memory to the `Allocator`, but it can keep track of the current number of "live" objects it stores. * When an `Array` is destroyed, it will not return memory to the `Allocator`. Users should clean up the `Allocator` independently of the `Array`. * The `Array` type keeps a freelist of the chunks it's used before, so that trimming and growing will re-use previously allocated chunks. These basic data structures are used by the XRay Profiling Mode implementation to implement efficient and cache-aware storage for data that's typically read-and-write heavy for tracking latency information. We're relying on the cache line characteristics of the architecture to provide us good data isolation and cache friendliness, when we're performing operations like searching for elements and/or updating data hosted in these cache lines. Reviewers: echristo, pelikan, kpw Subscribers: mgorny, llvm-commits Differential Revision: https://reviews.llvm.org/D45756 llvm-svn: 331141
2018-04-29 21:46:30 +08:00
}
template <class V, class W>
friend bool operator!=(const Iterator<V> &L, const Iterator<W> &R) {
return !(L == R);
}
U &operator*() const {
[XRay][compiler-rt] Segmented Array: Simplify and Optimise Summary: This is a follow-on to D49217 which simplifies and optimises the implementation of the segmented array. In this patch we co-locate the book-keeping for segments in the `__xray::Array<T>` with the data it's managing. We take the chance in this patch to actually rename `Chunk` to `Segment` to better align with the high-level description of the segmented array. With measurements using benchmarks landed in D48879, we've identified that calls to `pthread_getspecific` started dominating the cycles, which led us to revert the change made in D49217 to use C++ thread_local initialisation instead (it reduces the cost by a huge margin, since we save one PLT-based call to pthread functions in the hot path). In particular, this is in `__xray::getThreadLocalData()`. We also took the opportunity to remove the least-common-multiple based calculation and instead pack as much data into segments of the array. This greatly simplifies the API of the container which hides as much of the implementation details as possible. For instance, we calculate the number of elements we need for the each segment internally in the Array instead of making it part of the type. With the changes here, we're able to get a measurable improvement on the performance of profiling mode on top of what D48879 already provides. Depends on D48879. Reviewers: kpw, eizan Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D49363 llvm-svn: 337343
2018-07-18 10:08:39 +08:00
DCHECK_NE(S, &SentinelSegment);
auto RelOff = Offset % ElementsPerSegment;
// We need to compute the character-aligned pointer, offset from the
// segment's Data location to get the element in the position of Offset.
auto Base = static_cast<Segment *>(S)->Data;
auto AlignedOffset = Base + (RelOff * AlignedElementStorageSize);
return *reinterpret_cast<U *>(AlignedOffset);
[XRay][profiler] Part 1: XRay Allocator and Array Implementations Summary: This change is part of the larger XRay Profiling Mode effort. Here we implement an arena allocator, for fixed sized buffers used in a segmented array implementation. This change adds the segmented array data structure, which relies on the allocator to provide and maintain the storage for the segmented array. Key features of the `Allocator` type: * It uses cache-aligned blocks, intended to host the actual data. These blocks are cache-line-size multiples of contiguous bytes. * The `Allocator` has a maximum memory budget, set at construction time. This allows us to cap the amount of data each specific `Allocator` instance is responsible for. * Upon destruction, the `Allocator` will clean up the storage it's used, handing it back to the internal allocator used in sanitizer_common. Key features of the `Array` type: * Each segmented array is always backed by an `Allocator`, which is either user-provided or uses a global allocator. * When an `Array` grows, it grows by appending a segment that's fixed-sized. The size of each segment is computed by the number of elements of type `T` that can fit into cache line multiples. * An `Array` does not return memory to the `Allocator`, but it can keep track of the current number of "live" objects it stores. * When an `Array` is destroyed, it will not return memory to the `Allocator`. Users should clean up the `Allocator` independently of the `Array`. * The `Array` type keeps a freelist of the chunks it's used before, so that trimming and growing will re-use previously allocated chunks. These basic data structures are used by the XRay Profiling Mode implementation to implement efficient and cache-aware storage for data that's typically read-and-write heavy for tracking latency information. We're relying on the cache line characteristics of the architecture to provide us good data isolation and cache friendliness, when we're performing operations like searching for elements and/or updating data hosted in these cache lines. Reviewers: echristo, pelikan, kpw Subscribers: mgorny, llvm-commits Differential Revision: https://reviews.llvm.org/D45756 llvm-svn: 331141
2018-04-29 21:46:30 +08:00
}
[XRay][compiler-rt] Segmented Array: Simplify and Optimise Summary: This is a follow-on to D49217 which simplifies and optimises the implementation of the segmented array. In this patch we co-locate the book-keeping for segments in the `__xray::Array<T>` with the data it's managing. We take the chance in this patch to actually rename `Chunk` to `Segment` to better align with the high-level description of the segmented array. With measurements using benchmarks landed in D48879, we've identified that calls to `pthread_getspecific` started dominating the cycles, which led us to revert the change made in D49217 to use C++ thread_local initialisation instead (it reduces the cost by a huge margin, since we save one PLT-based call to pthread functions in the hot path). In particular, this is in `__xray::getThreadLocalData()`. We also took the opportunity to remove the least-common-multiple based calculation and instead pack as much data into segments of the array. This greatly simplifies the API of the container which hides as much of the implementation details as possible. For instance, we calculate the number of elements we need for the each segment internally in the Array instead of making it part of the type. With the changes here, we're able to get a measurable improvement on the performance of profiling mode on top of what D48879 already provides. Depends on D48879. Reviewers: kpw, eizan Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D49363 llvm-svn: 337343
2018-07-18 10:08:39 +08:00
U *operator->() const { return &(**this); }
[XRay][profiler] Part 1: XRay Allocator and Array Implementations Summary: This change is part of the larger XRay Profiling Mode effort. Here we implement an arena allocator, for fixed sized buffers used in a segmented array implementation. This change adds the segmented array data structure, which relies on the allocator to provide and maintain the storage for the segmented array. Key features of the `Allocator` type: * It uses cache-aligned blocks, intended to host the actual data. These blocks are cache-line-size multiples of contiguous bytes. * The `Allocator` has a maximum memory budget, set at construction time. This allows us to cap the amount of data each specific `Allocator` instance is responsible for. * Upon destruction, the `Allocator` will clean up the storage it's used, handing it back to the internal allocator used in sanitizer_common. Key features of the `Array` type: * Each segmented array is always backed by an `Allocator`, which is either user-provided or uses a global allocator. * When an `Array` grows, it grows by appending a segment that's fixed-sized. The size of each segment is computed by the number of elements of type `T` that can fit into cache line multiples. * An `Array` does not return memory to the `Allocator`, but it can keep track of the current number of "live" objects it stores. * When an `Array` is destroyed, it will not return memory to the `Allocator`. Users should clean up the `Allocator` independently of the `Array`. * The `Array` type keeps a freelist of the chunks it's used before, so that trimming and growing will re-use previously allocated chunks. These basic data structures are used by the XRay Profiling Mode implementation to implement efficient and cache-aware storage for data that's typically read-and-write heavy for tracking latency information. We're relying on the cache line characteristics of the architecture to provide us good data isolation and cache friendliness, when we're performing operations like searching for elements and/or updating data hosted in these cache lines. Reviewers: echristo, pelikan, kpw Subscribers: mgorny, llvm-commits Differential Revision: https://reviews.llvm.org/D45756 llvm-svn: 331141
2018-04-29 21:46:30 +08:00
};
public:
[XRay][compiler-rt] Segmented Array: Simplify and Optimise Summary: This is a follow-on to D49217 which simplifies and optimises the implementation of the segmented array. In this patch we co-locate the book-keeping for segments in the `__xray::Array<T>` with the data it's managing. We take the chance in this patch to actually rename `Chunk` to `Segment` to better align with the high-level description of the segmented array. With measurements using benchmarks landed in D48879, we've identified that calls to `pthread_getspecific` started dominating the cycles, which led us to revert the change made in D49217 to use C++ thread_local initialisation instead (it reduces the cost by a huge margin, since we save one PLT-based call to pthread functions in the hot path). In particular, this is in `__xray::getThreadLocalData()`. We also took the opportunity to remove the least-common-multiple based calculation and instead pack as much data into segments of the array. This greatly simplifies the API of the container which hides as much of the implementation details as possible. For instance, we calculate the number of elements we need for the each segment internally in the Array instead of making it part of the type. With the changes here, we're able to get a measurable improvement on the performance of profiling mode on top of what D48879 already provides. Depends on D48879. Reviewers: kpw, eizan Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D49363 llvm-svn: 337343
2018-07-18 10:08:39 +08:00
explicit Array(AllocatorType &A) : Alloc(&A) {}
[XRay][profiler] Part 1: XRay Allocator and Array Implementations Summary: This change is part of the larger XRay Profiling Mode effort. Here we implement an arena allocator, for fixed sized buffers used in a segmented array implementation. This change adds the segmented array data structure, which relies on the allocator to provide and maintain the storage for the segmented array. Key features of the `Allocator` type: * It uses cache-aligned blocks, intended to host the actual data. These blocks are cache-line-size multiples of contiguous bytes. * The `Allocator` has a maximum memory budget, set at construction time. This allows us to cap the amount of data each specific `Allocator` instance is responsible for. * Upon destruction, the `Allocator` will clean up the storage it's used, handing it back to the internal allocator used in sanitizer_common. Key features of the `Array` type: * Each segmented array is always backed by an `Allocator`, which is either user-provided or uses a global allocator. * When an `Array` grows, it grows by appending a segment that's fixed-sized. The size of each segment is computed by the number of elements of type `T` that can fit into cache line multiples. * An `Array` does not return memory to the `Allocator`, but it can keep track of the current number of "live" objects it stores. * When an `Array` is destroyed, it will not return memory to the `Allocator`. Users should clean up the `Allocator` independently of the `Array`. * The `Array` type keeps a freelist of the chunks it's used before, so that trimming and growing will re-use previously allocated chunks. These basic data structures are used by the XRay Profiling Mode implementation to implement efficient and cache-aware storage for data that's typically read-and-write heavy for tracking latency information. We're relying on the cache line characteristics of the architecture to provide us good data isolation and cache friendliness, when we're performing operations like searching for elements and/or updating data hosted in these cache lines. Reviewers: echristo, pelikan, kpw Subscribers: mgorny, llvm-commits Differential Revision: https://reviews.llvm.org/D45756 llvm-svn: 331141
2018-04-29 21:46:30 +08:00
Array(const Array &) = delete;
Array(Array &&O) NOEXCEPT : Alloc(O.Alloc),
[XRay][profiler] Part 1: XRay Allocator and Array Implementations Summary: This change is part of the larger XRay Profiling Mode effort. Here we implement an arena allocator, for fixed sized buffers used in a segmented array implementation. This change adds the segmented array data structure, which relies on the allocator to provide and maintain the storage for the segmented array. Key features of the `Allocator` type: * It uses cache-aligned blocks, intended to host the actual data. These blocks are cache-line-size multiples of contiguous bytes. * The `Allocator` has a maximum memory budget, set at construction time. This allows us to cap the amount of data each specific `Allocator` instance is responsible for. * Upon destruction, the `Allocator` will clean up the storage it's used, handing it back to the internal allocator used in sanitizer_common. Key features of the `Array` type: * Each segmented array is always backed by an `Allocator`, which is either user-provided or uses a global allocator. * When an `Array` grows, it grows by appending a segment that's fixed-sized. The size of each segment is computed by the number of elements of type `T` that can fit into cache line multiples. * An `Array` does not return memory to the `Allocator`, but it can keep track of the current number of "live" objects it stores. * When an `Array` is destroyed, it will not return memory to the `Allocator`. Users should clean up the `Allocator` independently of the `Array`. * The `Array` type keeps a freelist of the chunks it's used before, so that trimming and growing will re-use previously allocated chunks. These basic data structures are used by the XRay Profiling Mode implementation to implement efficient and cache-aware storage for data that's typically read-and-write heavy for tracking latency information. We're relying on the cache line characteristics of the architecture to provide us good data isolation and cache friendliness, when we're performing operations like searching for elements and/or updating data hosted in these cache lines. Reviewers: echristo, pelikan, kpw Subscribers: mgorny, llvm-commits Differential Revision: https://reviews.llvm.org/D45756 llvm-svn: 331141
2018-04-29 21:46:30 +08:00
Head(O.Head),
Tail(O.Tail),
Size(O.Size) {
[XRay][compiler-rt] Segmented Array: Simplify and Optimise Summary: This is a follow-on to D49217 which simplifies and optimises the implementation of the segmented array. In this patch we co-locate the book-keeping for segments in the `__xray::Array<T>` with the data it's managing. We take the chance in this patch to actually rename `Chunk` to `Segment` to better align with the high-level description of the segmented array. With measurements using benchmarks landed in D48879, we've identified that calls to `pthread_getspecific` started dominating the cycles, which led us to revert the change made in D49217 to use C++ thread_local initialisation instead (it reduces the cost by a huge margin, since we save one PLT-based call to pthread functions in the hot path). In particular, this is in `__xray::getThreadLocalData()`. We also took the opportunity to remove the least-common-multiple based calculation and instead pack as much data into segments of the array. This greatly simplifies the API of the container which hides as much of the implementation details as possible. For instance, we calculate the number of elements we need for the each segment internally in the Array instead of making it part of the type. With the changes here, we're able to get a measurable improvement on the performance of profiling mode on top of what D48879 already provides. Depends on D48879. Reviewers: kpw, eizan Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D49363 llvm-svn: 337343
2018-07-18 10:08:39 +08:00
O.Head = &SentinelSegment;
O.Tail = &SentinelSegment;
[XRay][profiler] Part 1: XRay Allocator and Array Implementations Summary: This change is part of the larger XRay Profiling Mode effort. Here we implement an arena allocator, for fixed sized buffers used in a segmented array implementation. This change adds the segmented array data structure, which relies on the allocator to provide and maintain the storage for the segmented array. Key features of the `Allocator` type: * It uses cache-aligned blocks, intended to host the actual data. These blocks are cache-line-size multiples of contiguous bytes. * The `Allocator` has a maximum memory budget, set at construction time. This allows us to cap the amount of data each specific `Allocator` instance is responsible for. * Upon destruction, the `Allocator` will clean up the storage it's used, handing it back to the internal allocator used in sanitizer_common. Key features of the `Array` type: * Each segmented array is always backed by an `Allocator`, which is either user-provided or uses a global allocator. * When an `Array` grows, it grows by appending a segment that's fixed-sized. The size of each segment is computed by the number of elements of type `T` that can fit into cache line multiples. * An `Array` does not return memory to the `Allocator`, but it can keep track of the current number of "live" objects it stores. * When an `Array` is destroyed, it will not return memory to the `Allocator`. Users should clean up the `Allocator` independently of the `Array`. * The `Array` type keeps a freelist of the chunks it's used before, so that trimming and growing will re-use previously allocated chunks. These basic data structures are used by the XRay Profiling Mode implementation to implement efficient and cache-aware storage for data that's typically read-and-write heavy for tracking latency information. We're relying on the cache line characteristics of the architecture to provide us good data isolation and cache friendliness, when we're performing operations like searching for elements and/or updating data hosted in these cache lines. Reviewers: echristo, pelikan, kpw Subscribers: mgorny, llvm-commits Differential Revision: https://reviews.llvm.org/D45756 llvm-svn: 331141
2018-04-29 21:46:30 +08:00
O.Size = 0;
}
bool empty() const { return Size == 0; }
AllocatorType &allocator() const {
DCHECK_NE(Alloc, nullptr);
return *Alloc;
[XRay][profiler] Part 1: XRay Allocator and Array Implementations Summary: This change is part of the larger XRay Profiling Mode effort. Here we implement an arena allocator, for fixed sized buffers used in a segmented array implementation. This change adds the segmented array data structure, which relies on the allocator to provide and maintain the storage for the segmented array. Key features of the `Allocator` type: * It uses cache-aligned blocks, intended to host the actual data. These blocks are cache-line-size multiples of contiguous bytes. * The `Allocator` has a maximum memory budget, set at construction time. This allows us to cap the amount of data each specific `Allocator` instance is responsible for. * Upon destruction, the `Allocator` will clean up the storage it's used, handing it back to the internal allocator used in sanitizer_common. Key features of the `Array` type: * Each segmented array is always backed by an `Allocator`, which is either user-provided or uses a global allocator. * When an `Array` grows, it grows by appending a segment that's fixed-sized. The size of each segment is computed by the number of elements of type `T` that can fit into cache line multiples. * An `Array` does not return memory to the `Allocator`, but it can keep track of the current number of "live" objects it stores. * When an `Array` is destroyed, it will not return memory to the `Allocator`. Users should clean up the `Allocator` independently of the `Array`. * The `Array` type keeps a freelist of the chunks it's used before, so that trimming and growing will re-use previously allocated chunks. These basic data structures are used by the XRay Profiling Mode implementation to implement efficient and cache-aware storage for data that's typically read-and-write heavy for tracking latency information. We're relying on the cache line characteristics of the architecture to provide us good data isolation and cache friendliness, when we're performing operations like searching for elements and/or updating data hosted in these cache lines. Reviewers: echristo, pelikan, kpw Subscribers: mgorny, llvm-commits Differential Revision: https://reviews.llvm.org/D45756 llvm-svn: 331141
2018-04-29 21:46:30 +08:00
}
size_t size() const { return Size; }
T *Append(const T &E) {
[XRay][compiler-rt] Segmented Array: Simplify and Optimise Summary: This is a follow-on to D49217 which simplifies and optimises the implementation of the segmented array. In this patch we co-locate the book-keeping for segments in the `__xray::Array<T>` with the data it's managing. We take the chance in this patch to actually rename `Chunk` to `Segment` to better align with the high-level description of the segmented array. With measurements using benchmarks landed in D48879, we've identified that calls to `pthread_getspecific` started dominating the cycles, which led us to revert the change made in D49217 to use C++ thread_local initialisation instead (it reduces the cost by a huge margin, since we save one PLT-based call to pthread functions in the hot path). In particular, this is in `__xray::getThreadLocalData()`. We also took the opportunity to remove the least-common-multiple based calculation and instead pack as much data into segments of the array. This greatly simplifies the API of the container which hides as much of the implementation details as possible. For instance, we calculate the number of elements we need for the each segment internally in the Array instead of making it part of the type. With the changes here, we're able to get a measurable improvement on the performance of profiling mode on top of what D48879 already provides. Depends on D48879. Reviewers: kpw, eizan Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D49363 llvm-svn: 337343
2018-07-18 10:08:39 +08:00
if (UNLIKELY(Head == &SentinelSegment))
[XRay][profiler] Part 1: XRay Allocator and Array Implementations Summary: This change is part of the larger XRay Profiling Mode effort. Here we implement an arena allocator, for fixed sized buffers used in a segmented array implementation. This change adds the segmented array data structure, which relies on the allocator to provide and maintain the storage for the segmented array. Key features of the `Allocator` type: * It uses cache-aligned blocks, intended to host the actual data. These blocks are cache-line-size multiples of contiguous bytes. * The `Allocator` has a maximum memory budget, set at construction time. This allows us to cap the amount of data each specific `Allocator` instance is responsible for. * Upon destruction, the `Allocator` will clean up the storage it's used, handing it back to the internal allocator used in sanitizer_common. Key features of the `Array` type: * Each segmented array is always backed by an `Allocator`, which is either user-provided or uses a global allocator. * When an `Array` grows, it grows by appending a segment that's fixed-sized. The size of each segment is computed by the number of elements of type `T` that can fit into cache line multiples. * An `Array` does not return memory to the `Allocator`, but it can keep track of the current number of "live" objects it stores. * When an `Array` is destroyed, it will not return memory to the `Allocator`. Users should clean up the `Allocator` independently of the `Array`. * The `Array` type keeps a freelist of the chunks it's used before, so that trimming and growing will re-use previously allocated chunks. These basic data structures are used by the XRay Profiling Mode implementation to implement efficient and cache-aware storage for data that's typically read-and-write heavy for tracking latency information. We're relying on the cache line characteristics of the architecture to provide us good data isolation and cache friendliness, when we're performing operations like searching for elements and/or updating data hosted in these cache lines. Reviewers: echristo, pelikan, kpw Subscribers: mgorny, llvm-commits Differential Revision: https://reviews.llvm.org/D45756 llvm-svn: 331141
2018-04-29 21:46:30 +08:00
if (InitHeadAndTail() == nullptr)
return nullptr;
[XRay][compiler-rt] Segmented Array: Simplify and Optimise Summary: This is a follow-on to D49217 which simplifies and optimises the implementation of the segmented array. In this patch we co-locate the book-keeping for segments in the `__xray::Array<T>` with the data it's managing. We take the chance in this patch to actually rename `Chunk` to `Segment` to better align with the high-level description of the segmented array. With measurements using benchmarks landed in D48879, we've identified that calls to `pthread_getspecific` started dominating the cycles, which led us to revert the change made in D49217 to use C++ thread_local initialisation instead (it reduces the cost by a huge margin, since we save one PLT-based call to pthread functions in the hot path). In particular, this is in `__xray::getThreadLocalData()`. We also took the opportunity to remove the least-common-multiple based calculation and instead pack as much data into segments of the array. This greatly simplifies the API of the container which hides as much of the implementation details as possible. For instance, we calculate the number of elements we need for the each segment internally in the Array instead of making it part of the type. With the changes here, we're able to get a measurable improvement on the performance of profiling mode on top of what D48879 already provides. Depends on D48879. Reviewers: kpw, eizan Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D49363 llvm-svn: 337343
2018-07-18 10:08:39 +08:00
auto Offset = Size % ElementsPerSegment;
[XRay][profiler] Part 1: XRay Allocator and Array Implementations Summary: This change is part of the larger XRay Profiling Mode effort. Here we implement an arena allocator, for fixed sized buffers used in a segmented array implementation. This change adds the segmented array data structure, which relies on the allocator to provide and maintain the storage for the segmented array. Key features of the `Allocator` type: * It uses cache-aligned blocks, intended to host the actual data. These blocks are cache-line-size multiples of contiguous bytes. * The `Allocator` has a maximum memory budget, set at construction time. This allows us to cap the amount of data each specific `Allocator` instance is responsible for. * Upon destruction, the `Allocator` will clean up the storage it's used, handing it back to the internal allocator used in sanitizer_common. Key features of the `Array` type: * Each segmented array is always backed by an `Allocator`, which is either user-provided or uses a global allocator. * When an `Array` grows, it grows by appending a segment that's fixed-sized. The size of each segment is computed by the number of elements of type `T` that can fit into cache line multiples. * An `Array` does not return memory to the `Allocator`, but it can keep track of the current number of "live" objects it stores. * When an `Array` is destroyed, it will not return memory to the `Allocator`. Users should clean up the `Allocator` independently of the `Array`. * The `Array` type keeps a freelist of the chunks it's used before, so that trimming and growing will re-use previously allocated chunks. These basic data structures are used by the XRay Profiling Mode implementation to implement efficient and cache-aware storage for data that's typically read-and-write heavy for tracking latency information. We're relying on the cache line characteristics of the architecture to provide us good data isolation and cache friendliness, when we're performing operations like searching for elements and/or updating data hosted in these cache lines. Reviewers: echristo, pelikan, kpw Subscribers: mgorny, llvm-commits Differential Revision: https://reviews.llvm.org/D45756 llvm-svn: 331141
2018-04-29 21:46:30 +08:00
if (UNLIKELY(Size != 0 && Offset == 0))
[XRay][compiler-rt] Segmented Array: Simplify and Optimise Summary: This is a follow-on to D49217 which simplifies and optimises the implementation of the segmented array. In this patch we co-locate the book-keeping for segments in the `__xray::Array<T>` with the data it's managing. We take the chance in this patch to actually rename `Chunk` to `Segment` to better align with the high-level description of the segmented array. With measurements using benchmarks landed in D48879, we've identified that calls to `pthread_getspecific` started dominating the cycles, which led us to revert the change made in D49217 to use C++ thread_local initialisation instead (it reduces the cost by a huge margin, since we save one PLT-based call to pthread functions in the hot path). In particular, this is in `__xray::getThreadLocalData()`. We also took the opportunity to remove the least-common-multiple based calculation and instead pack as much data into segments of the array. This greatly simplifies the API of the container which hides as much of the implementation details as possible. For instance, we calculate the number of elements we need for the each segment internally in the Array instead of making it part of the type. With the changes here, we're able to get a measurable improvement on the performance of profiling mode on top of what D48879 already provides. Depends on D48879. Reviewers: kpw, eizan Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D49363 llvm-svn: 337343
2018-07-18 10:08:39 +08:00
if (AppendNewSegment() == nullptr)
[XRay][profiler] Part 1: XRay Allocator and Array Implementations Summary: This change is part of the larger XRay Profiling Mode effort. Here we implement an arena allocator, for fixed sized buffers used in a segmented array implementation. This change adds the segmented array data structure, which relies on the allocator to provide and maintain the storage for the segmented array. Key features of the `Allocator` type: * It uses cache-aligned blocks, intended to host the actual data. These blocks are cache-line-size multiples of contiguous bytes. * The `Allocator` has a maximum memory budget, set at construction time. This allows us to cap the amount of data each specific `Allocator` instance is responsible for. * Upon destruction, the `Allocator` will clean up the storage it's used, handing it back to the internal allocator used in sanitizer_common. Key features of the `Array` type: * Each segmented array is always backed by an `Allocator`, which is either user-provided or uses a global allocator. * When an `Array` grows, it grows by appending a segment that's fixed-sized. The size of each segment is computed by the number of elements of type `T` that can fit into cache line multiples. * An `Array` does not return memory to the `Allocator`, but it can keep track of the current number of "live" objects it stores. * When an `Array` is destroyed, it will not return memory to the `Allocator`. Users should clean up the `Allocator` independently of the `Array`. * The `Array` type keeps a freelist of the chunks it's used before, so that trimming and growing will re-use previously allocated chunks. These basic data structures are used by the XRay Profiling Mode implementation to implement efficient and cache-aware storage for data that's typically read-and-write heavy for tracking latency information. We're relying on the cache line characteristics of the architecture to provide us good data isolation and cache friendliness, when we're performing operations like searching for elements and/or updating data hosted in these cache lines. Reviewers: echristo, pelikan, kpw Subscribers: mgorny, llvm-commits Differential Revision: https://reviews.llvm.org/D45756 llvm-svn: 331141
2018-04-29 21:46:30 +08:00
return nullptr;
[XRay][compiler-rt] Segmented Array: Simplify and Optimise Summary: This is a follow-on to D49217 which simplifies and optimises the implementation of the segmented array. In this patch we co-locate the book-keeping for segments in the `__xray::Array<T>` with the data it's managing. We take the chance in this patch to actually rename `Chunk` to `Segment` to better align with the high-level description of the segmented array. With measurements using benchmarks landed in D48879, we've identified that calls to `pthread_getspecific` started dominating the cycles, which led us to revert the change made in D49217 to use C++ thread_local initialisation instead (it reduces the cost by a huge margin, since we save one PLT-based call to pthread functions in the hot path). In particular, this is in `__xray::getThreadLocalData()`. We also took the opportunity to remove the least-common-multiple based calculation and instead pack as much data into segments of the array. This greatly simplifies the API of the container which hides as much of the implementation details as possible. For instance, we calculate the number of elements we need for the each segment internally in the Array instead of making it part of the type. With the changes here, we're able to get a measurable improvement on the performance of profiling mode on top of what D48879 already provides. Depends on D48879. Reviewers: kpw, eizan Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D49363 llvm-svn: 337343
2018-07-18 10:08:39 +08:00
auto Base = static_cast<Segment *>(Tail)->Data;
auto AlignedOffset = Base + (Offset * AlignedElementStorageSize);
auto Position = reinterpret_cast<T *>(AlignedOffset);
[XRay][profiler] Part 1: XRay Allocator and Array Implementations Summary: This change is part of the larger XRay Profiling Mode effort. Here we implement an arena allocator, for fixed sized buffers used in a segmented array implementation. This change adds the segmented array data structure, which relies on the allocator to provide and maintain the storage for the segmented array. Key features of the `Allocator` type: * It uses cache-aligned blocks, intended to host the actual data. These blocks are cache-line-size multiples of contiguous bytes. * The `Allocator` has a maximum memory budget, set at construction time. This allows us to cap the amount of data each specific `Allocator` instance is responsible for. * Upon destruction, the `Allocator` will clean up the storage it's used, handing it back to the internal allocator used in sanitizer_common. Key features of the `Array` type: * Each segmented array is always backed by an `Allocator`, which is either user-provided or uses a global allocator. * When an `Array` grows, it grows by appending a segment that's fixed-sized. The size of each segment is computed by the number of elements of type `T` that can fit into cache line multiples. * An `Array` does not return memory to the `Allocator`, but it can keep track of the current number of "live" objects it stores. * When an `Array` is destroyed, it will not return memory to the `Allocator`. Users should clean up the `Allocator` independently of the `Array`. * The `Array` type keeps a freelist of the chunks it's used before, so that trimming and growing will re-use previously allocated chunks. These basic data structures are used by the XRay Profiling Mode implementation to implement efficient and cache-aware storage for data that's typically read-and-write heavy for tracking latency information. We're relying on the cache line characteristics of the architecture to provide us good data isolation and cache friendliness, when we're performing operations like searching for elements and/or updating data hosted in these cache lines. Reviewers: echristo, pelikan, kpw Subscribers: mgorny, llvm-commits Differential Revision: https://reviews.llvm.org/D45756 llvm-svn: 331141
2018-04-29 21:46:30 +08:00
*Position = E;
++Size;
return Position;
}
template <class... Args> T *AppendEmplace(Args &&... args) {
[XRay][compiler-rt] Segmented Array: Simplify and Optimise Summary: This is a follow-on to D49217 which simplifies and optimises the implementation of the segmented array. In this patch we co-locate the book-keeping for segments in the `__xray::Array<T>` with the data it's managing. We take the chance in this patch to actually rename `Chunk` to `Segment` to better align with the high-level description of the segmented array. With measurements using benchmarks landed in D48879, we've identified that calls to `pthread_getspecific` started dominating the cycles, which led us to revert the change made in D49217 to use C++ thread_local initialisation instead (it reduces the cost by a huge margin, since we save one PLT-based call to pthread functions in the hot path). In particular, this is in `__xray::getThreadLocalData()`. We also took the opportunity to remove the least-common-multiple based calculation and instead pack as much data into segments of the array. This greatly simplifies the API of the container which hides as much of the implementation details as possible. For instance, we calculate the number of elements we need for the each segment internally in the Array instead of making it part of the type. With the changes here, we're able to get a measurable improvement on the performance of profiling mode on top of what D48879 already provides. Depends on D48879. Reviewers: kpw, eizan Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D49363 llvm-svn: 337343
2018-07-18 10:08:39 +08:00
if (UNLIKELY(Head == &SentinelSegment))
[XRay][profiler] Part 1: XRay Allocator and Array Implementations Summary: This change is part of the larger XRay Profiling Mode effort. Here we implement an arena allocator, for fixed sized buffers used in a segmented array implementation. This change adds the segmented array data structure, which relies on the allocator to provide and maintain the storage for the segmented array. Key features of the `Allocator` type: * It uses cache-aligned blocks, intended to host the actual data. These blocks are cache-line-size multiples of contiguous bytes. * The `Allocator` has a maximum memory budget, set at construction time. This allows us to cap the amount of data each specific `Allocator` instance is responsible for. * Upon destruction, the `Allocator` will clean up the storage it's used, handing it back to the internal allocator used in sanitizer_common. Key features of the `Array` type: * Each segmented array is always backed by an `Allocator`, which is either user-provided or uses a global allocator. * When an `Array` grows, it grows by appending a segment that's fixed-sized. The size of each segment is computed by the number of elements of type `T` that can fit into cache line multiples. * An `Array` does not return memory to the `Allocator`, but it can keep track of the current number of "live" objects it stores. * When an `Array` is destroyed, it will not return memory to the `Allocator`. Users should clean up the `Allocator` independently of the `Array`. * The `Array` type keeps a freelist of the chunks it's used before, so that trimming and growing will re-use previously allocated chunks. These basic data structures are used by the XRay Profiling Mode implementation to implement efficient and cache-aware storage for data that's typically read-and-write heavy for tracking latency information. We're relying on the cache line characteristics of the architecture to provide us good data isolation and cache friendliness, when we're performing operations like searching for elements and/or updating data hosted in these cache lines. Reviewers: echristo, pelikan, kpw Subscribers: mgorny, llvm-commits Differential Revision: https://reviews.llvm.org/D45756 llvm-svn: 331141
2018-04-29 21:46:30 +08:00
if (InitHeadAndTail() == nullptr)
return nullptr;
[XRay][compiler-rt] Segmented Array: Simplify and Optimise Summary: This is a follow-on to D49217 which simplifies and optimises the implementation of the segmented array. In this patch we co-locate the book-keeping for segments in the `__xray::Array<T>` with the data it's managing. We take the chance in this patch to actually rename `Chunk` to `Segment` to better align with the high-level description of the segmented array. With measurements using benchmarks landed in D48879, we've identified that calls to `pthread_getspecific` started dominating the cycles, which led us to revert the change made in D49217 to use C++ thread_local initialisation instead (it reduces the cost by a huge margin, since we save one PLT-based call to pthread functions in the hot path). In particular, this is in `__xray::getThreadLocalData()`. We also took the opportunity to remove the least-common-multiple based calculation and instead pack as much data into segments of the array. This greatly simplifies the API of the container which hides as much of the implementation details as possible. For instance, we calculate the number of elements we need for the each segment internally in the Array instead of making it part of the type. With the changes here, we're able to get a measurable improvement on the performance of profiling mode on top of what D48879 already provides. Depends on D48879. Reviewers: kpw, eizan Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D49363 llvm-svn: 337343
2018-07-18 10:08:39 +08:00
auto Offset = Size % ElementsPerSegment;
auto *LatestSegment = Tail;
if (UNLIKELY(Size != 0 && Offset == 0)) {
[XRay][compiler-rt] Segmented Array: Simplify and Optimise Summary: This is a follow-on to D49217 which simplifies and optimises the implementation of the segmented array. In this patch we co-locate the book-keeping for segments in the `__xray::Array<T>` with the data it's managing. We take the chance in this patch to actually rename `Chunk` to `Segment` to better align with the high-level description of the segmented array. With measurements using benchmarks landed in D48879, we've identified that calls to `pthread_getspecific` started dominating the cycles, which led us to revert the change made in D49217 to use C++ thread_local initialisation instead (it reduces the cost by a huge margin, since we save one PLT-based call to pthread functions in the hot path). In particular, this is in `__xray::getThreadLocalData()`. We also took the opportunity to remove the least-common-multiple based calculation and instead pack as much data into segments of the array. This greatly simplifies the API of the container which hides as much of the implementation details as possible. For instance, we calculate the number of elements we need for the each segment internally in the Array instead of making it part of the type. With the changes here, we're able to get a measurable improvement on the performance of profiling mode on top of what D48879 already provides. Depends on D48879. Reviewers: kpw, eizan Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D49363 llvm-svn: 337343
2018-07-18 10:08:39 +08:00
LatestSegment = AppendNewSegment();
if (LatestSegment == nullptr)
[XRay][profiler] Part 1: XRay Allocator and Array Implementations Summary: This change is part of the larger XRay Profiling Mode effort. Here we implement an arena allocator, for fixed sized buffers used in a segmented array implementation. This change adds the segmented array data structure, which relies on the allocator to provide and maintain the storage for the segmented array. Key features of the `Allocator` type: * It uses cache-aligned blocks, intended to host the actual data. These blocks are cache-line-size multiples of contiguous bytes. * The `Allocator` has a maximum memory budget, set at construction time. This allows us to cap the amount of data each specific `Allocator` instance is responsible for. * Upon destruction, the `Allocator` will clean up the storage it's used, handing it back to the internal allocator used in sanitizer_common. Key features of the `Array` type: * Each segmented array is always backed by an `Allocator`, which is either user-provided or uses a global allocator. * When an `Array` grows, it grows by appending a segment that's fixed-sized. The size of each segment is computed by the number of elements of type `T` that can fit into cache line multiples. * An `Array` does not return memory to the `Allocator`, but it can keep track of the current number of "live" objects it stores. * When an `Array` is destroyed, it will not return memory to the `Allocator`. Users should clean up the `Allocator` independently of the `Array`. * The `Array` type keeps a freelist of the chunks it's used before, so that trimming and growing will re-use previously allocated chunks. These basic data structures are used by the XRay Profiling Mode implementation to implement efficient and cache-aware storage for data that's typically read-and-write heavy for tracking latency information. We're relying on the cache line characteristics of the architecture to provide us good data isolation and cache friendliness, when we're performing operations like searching for elements and/or updating data hosted in these cache lines. Reviewers: echristo, pelikan, kpw Subscribers: mgorny, llvm-commits Differential Revision: https://reviews.llvm.org/D45756 llvm-svn: 331141
2018-04-29 21:46:30 +08:00
return nullptr;
}
[XRay][profiler] Part 1: XRay Allocator and Array Implementations Summary: This change is part of the larger XRay Profiling Mode effort. Here we implement an arena allocator, for fixed sized buffers used in a segmented array implementation. This change adds the segmented array data structure, which relies on the allocator to provide and maintain the storage for the segmented array. Key features of the `Allocator` type: * It uses cache-aligned blocks, intended to host the actual data. These blocks are cache-line-size multiples of contiguous bytes. * The `Allocator` has a maximum memory budget, set at construction time. This allows us to cap the amount of data each specific `Allocator` instance is responsible for. * Upon destruction, the `Allocator` will clean up the storage it's used, handing it back to the internal allocator used in sanitizer_common. Key features of the `Array` type: * Each segmented array is always backed by an `Allocator`, which is either user-provided or uses a global allocator. * When an `Array` grows, it grows by appending a segment that's fixed-sized. The size of each segment is computed by the number of elements of type `T` that can fit into cache line multiples. * An `Array` does not return memory to the `Allocator`, but it can keep track of the current number of "live" objects it stores. * When an `Array` is destroyed, it will not return memory to the `Allocator`. Users should clean up the `Allocator` independently of the `Array`. * The `Array` type keeps a freelist of the chunks it's used before, so that trimming and growing will re-use previously allocated chunks. These basic data structures are used by the XRay Profiling Mode implementation to implement efficient and cache-aware storage for data that's typically read-and-write heavy for tracking latency information. We're relying on the cache line characteristics of the architecture to provide us good data isolation and cache friendliness, when we're performing operations like searching for elements and/or updating data hosted in these cache lines. Reviewers: echristo, pelikan, kpw Subscribers: mgorny, llvm-commits Differential Revision: https://reviews.llvm.org/D45756 llvm-svn: 331141
2018-04-29 21:46:30 +08:00
[XRay][compiler-rt] Segmented Array: Simplify and Optimise Summary: This is a follow-on to D49217 which simplifies and optimises the implementation of the segmented array. In this patch we co-locate the book-keeping for segments in the `__xray::Array<T>` with the data it's managing. We take the chance in this patch to actually rename `Chunk` to `Segment` to better align with the high-level description of the segmented array. With measurements using benchmarks landed in D48879, we've identified that calls to `pthread_getspecific` started dominating the cycles, which led us to revert the change made in D49217 to use C++ thread_local initialisation instead (it reduces the cost by a huge margin, since we save one PLT-based call to pthread functions in the hot path). In particular, this is in `__xray::getThreadLocalData()`. We also took the opportunity to remove the least-common-multiple based calculation and instead pack as much data into segments of the array. This greatly simplifies the API of the container which hides as much of the implementation details as possible. For instance, we calculate the number of elements we need for the each segment internally in the Array instead of making it part of the type. With the changes here, we're able to get a measurable improvement on the performance of profiling mode on top of what D48879 already provides. Depends on D48879. Reviewers: kpw, eizan Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D49363 llvm-svn: 337343
2018-07-18 10:08:39 +08:00
DCHECK_NE(Tail, &SentinelSegment);
auto Base = static_cast<Segment *>(LatestSegment)->Data;
auto AlignedOffset = Base + (Offset * AlignedElementStorageSize);
auto Position = reinterpret_cast<T *>(AlignedOffset);
[XRay][profiler] Part 1: XRay Allocator and Array Implementations Summary: This change is part of the larger XRay Profiling Mode effort. Here we implement an arena allocator, for fixed sized buffers used in a segmented array implementation. This change adds the segmented array data structure, which relies on the allocator to provide and maintain the storage for the segmented array. Key features of the `Allocator` type: * It uses cache-aligned blocks, intended to host the actual data. These blocks are cache-line-size multiples of contiguous bytes. * The `Allocator` has a maximum memory budget, set at construction time. This allows us to cap the amount of data each specific `Allocator` instance is responsible for. * Upon destruction, the `Allocator` will clean up the storage it's used, handing it back to the internal allocator used in sanitizer_common. Key features of the `Array` type: * Each segmented array is always backed by an `Allocator`, which is either user-provided or uses a global allocator. * When an `Array` grows, it grows by appending a segment that's fixed-sized. The size of each segment is computed by the number of elements of type `T` that can fit into cache line multiples. * An `Array` does not return memory to the `Allocator`, but it can keep track of the current number of "live" objects it stores. * When an `Array` is destroyed, it will not return memory to the `Allocator`. Users should clean up the `Allocator` independently of the `Array`. * The `Array` type keeps a freelist of the chunks it's used before, so that trimming and growing will re-use previously allocated chunks. These basic data structures are used by the XRay Profiling Mode implementation to implement efficient and cache-aware storage for data that's typically read-and-write heavy for tracking latency information. We're relying on the cache line characteristics of the architecture to provide us good data isolation and cache friendliness, when we're performing operations like searching for elements and/or updating data hosted in these cache lines. Reviewers: echristo, pelikan, kpw Subscribers: mgorny, llvm-commits Differential Revision: https://reviews.llvm.org/D45756 llvm-svn: 331141
2018-04-29 21:46:30 +08:00
// In-place construct at Position.
new (Position) T{std::forward<Args>(args)...};
[XRay][profiler] Part 1: XRay Allocator and Array Implementations Summary: This change is part of the larger XRay Profiling Mode effort. Here we implement an arena allocator, for fixed sized buffers used in a segmented array implementation. This change adds the segmented array data structure, which relies on the allocator to provide and maintain the storage for the segmented array. Key features of the `Allocator` type: * It uses cache-aligned blocks, intended to host the actual data. These blocks are cache-line-size multiples of contiguous bytes. * The `Allocator` has a maximum memory budget, set at construction time. This allows us to cap the amount of data each specific `Allocator` instance is responsible for. * Upon destruction, the `Allocator` will clean up the storage it's used, handing it back to the internal allocator used in sanitizer_common. Key features of the `Array` type: * Each segmented array is always backed by an `Allocator`, which is either user-provided or uses a global allocator. * When an `Array` grows, it grows by appending a segment that's fixed-sized. The size of each segment is computed by the number of elements of type `T` that can fit into cache line multiples. * An `Array` does not return memory to the `Allocator`, but it can keep track of the current number of "live" objects it stores. * When an `Array` is destroyed, it will not return memory to the `Allocator`. Users should clean up the `Allocator` independently of the `Array`. * The `Array` type keeps a freelist of the chunks it's used before, so that trimming and growing will re-use previously allocated chunks. These basic data structures are used by the XRay Profiling Mode implementation to implement efficient and cache-aware storage for data that's typically read-and-write heavy for tracking latency information. We're relying on the cache line characteristics of the architecture to provide us good data isolation and cache friendliness, when we're performing operations like searching for elements and/or updating data hosted in these cache lines. Reviewers: echristo, pelikan, kpw Subscribers: mgorny, llvm-commits Differential Revision: https://reviews.llvm.org/D45756 llvm-svn: 331141
2018-04-29 21:46:30 +08:00
++Size;
return reinterpret_cast<T *>(Position);
[XRay][profiler] Part 1: XRay Allocator and Array Implementations Summary: This change is part of the larger XRay Profiling Mode effort. Here we implement an arena allocator, for fixed sized buffers used in a segmented array implementation. This change adds the segmented array data structure, which relies on the allocator to provide and maintain the storage for the segmented array. Key features of the `Allocator` type: * It uses cache-aligned blocks, intended to host the actual data. These blocks are cache-line-size multiples of contiguous bytes. * The `Allocator` has a maximum memory budget, set at construction time. This allows us to cap the amount of data each specific `Allocator` instance is responsible for. * Upon destruction, the `Allocator` will clean up the storage it's used, handing it back to the internal allocator used in sanitizer_common. Key features of the `Array` type: * Each segmented array is always backed by an `Allocator`, which is either user-provided or uses a global allocator. * When an `Array` grows, it grows by appending a segment that's fixed-sized. The size of each segment is computed by the number of elements of type `T` that can fit into cache line multiples. * An `Array` does not return memory to the `Allocator`, but it can keep track of the current number of "live" objects it stores. * When an `Array` is destroyed, it will not return memory to the `Allocator`. Users should clean up the `Allocator` independently of the `Array`. * The `Array` type keeps a freelist of the chunks it's used before, so that trimming and growing will re-use previously allocated chunks. These basic data structures are used by the XRay Profiling Mode implementation to implement efficient and cache-aware storage for data that's typically read-and-write heavy for tracking latency information. We're relying on the cache line characteristics of the architecture to provide us good data isolation and cache friendliness, when we're performing operations like searching for elements and/or updating data hosted in these cache lines. Reviewers: echristo, pelikan, kpw Subscribers: mgorny, llvm-commits Differential Revision: https://reviews.llvm.org/D45756 llvm-svn: 331141
2018-04-29 21:46:30 +08:00
}
T &operator[](size_t Offset) const {
DCHECK_LE(Offset, Size);
// We need to traverse the array enough times to find the element at Offset.
[XRay][compiler-rt] Segmented Array: Simplify and Optimise Summary: This is a follow-on to D49217 which simplifies and optimises the implementation of the segmented array. In this patch we co-locate the book-keeping for segments in the `__xray::Array<T>` with the data it's managing. We take the chance in this patch to actually rename `Chunk` to `Segment` to better align with the high-level description of the segmented array. With measurements using benchmarks landed in D48879, we've identified that calls to `pthread_getspecific` started dominating the cycles, which led us to revert the change made in D49217 to use C++ thread_local initialisation instead (it reduces the cost by a huge margin, since we save one PLT-based call to pthread functions in the hot path). In particular, this is in `__xray::getThreadLocalData()`. We also took the opportunity to remove the least-common-multiple based calculation and instead pack as much data into segments of the array. This greatly simplifies the API of the container which hides as much of the implementation details as possible. For instance, we calculate the number of elements we need for the each segment internally in the Array instead of making it part of the type. With the changes here, we're able to get a measurable improvement on the performance of profiling mode on top of what D48879 already provides. Depends on D48879. Reviewers: kpw, eizan Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D49363 llvm-svn: 337343
2018-07-18 10:08:39 +08:00
auto S = Head;
while (Offset >= ElementsPerSegment) {
S = S->Next;
Offset -= ElementsPerSegment;
DCHECK_NE(S, &SentinelSegment);
[XRay][profiler] Part 1: XRay Allocator and Array Implementations Summary: This change is part of the larger XRay Profiling Mode effort. Here we implement an arena allocator, for fixed sized buffers used in a segmented array implementation. This change adds the segmented array data structure, which relies on the allocator to provide and maintain the storage for the segmented array. Key features of the `Allocator` type: * It uses cache-aligned blocks, intended to host the actual data. These blocks are cache-line-size multiples of contiguous bytes. * The `Allocator` has a maximum memory budget, set at construction time. This allows us to cap the amount of data each specific `Allocator` instance is responsible for. * Upon destruction, the `Allocator` will clean up the storage it's used, handing it back to the internal allocator used in sanitizer_common. Key features of the `Array` type: * Each segmented array is always backed by an `Allocator`, which is either user-provided or uses a global allocator. * When an `Array` grows, it grows by appending a segment that's fixed-sized. The size of each segment is computed by the number of elements of type `T` that can fit into cache line multiples. * An `Array` does not return memory to the `Allocator`, but it can keep track of the current number of "live" objects it stores. * When an `Array` is destroyed, it will not return memory to the `Allocator`. Users should clean up the `Allocator` independently of the `Array`. * The `Array` type keeps a freelist of the chunks it's used before, so that trimming and growing will re-use previously allocated chunks. These basic data structures are used by the XRay Profiling Mode implementation to implement efficient and cache-aware storage for data that's typically read-and-write heavy for tracking latency information. We're relying on the cache line characteristics of the architecture to provide us good data isolation and cache friendliness, when we're performing operations like searching for elements and/or updating data hosted in these cache lines. Reviewers: echristo, pelikan, kpw Subscribers: mgorny, llvm-commits Differential Revision: https://reviews.llvm.org/D45756 llvm-svn: 331141
2018-04-29 21:46:30 +08:00
}
[XRay][compiler-rt] Segmented Array: Simplify and Optimise Summary: This is a follow-on to D49217 which simplifies and optimises the implementation of the segmented array. In this patch we co-locate the book-keeping for segments in the `__xray::Array<T>` with the data it's managing. We take the chance in this patch to actually rename `Chunk` to `Segment` to better align with the high-level description of the segmented array. With measurements using benchmarks landed in D48879, we've identified that calls to `pthread_getspecific` started dominating the cycles, which led us to revert the change made in D49217 to use C++ thread_local initialisation instead (it reduces the cost by a huge margin, since we save one PLT-based call to pthread functions in the hot path). In particular, this is in `__xray::getThreadLocalData()`. We also took the opportunity to remove the least-common-multiple based calculation and instead pack as much data into segments of the array. This greatly simplifies the API of the container which hides as much of the implementation details as possible. For instance, we calculate the number of elements we need for the each segment internally in the Array instead of making it part of the type. With the changes here, we're able to get a measurable improvement on the performance of profiling mode on top of what D48879 already provides. Depends on D48879. Reviewers: kpw, eizan Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D49363 llvm-svn: 337343
2018-07-18 10:08:39 +08:00
auto Base = static_cast<Segment *>(S)->Data;
auto AlignedOffset = Base + (Offset * AlignedElementStorageSize);
auto Position = reinterpret_cast<T *>(AlignedOffset);
return *reinterpret_cast<T *>(Position);
[XRay][profiler] Part 1: XRay Allocator and Array Implementations Summary: This change is part of the larger XRay Profiling Mode effort. Here we implement an arena allocator, for fixed sized buffers used in a segmented array implementation. This change adds the segmented array data structure, which relies on the allocator to provide and maintain the storage for the segmented array. Key features of the `Allocator` type: * It uses cache-aligned blocks, intended to host the actual data. These blocks are cache-line-size multiples of contiguous bytes. * The `Allocator` has a maximum memory budget, set at construction time. This allows us to cap the amount of data each specific `Allocator` instance is responsible for. * Upon destruction, the `Allocator` will clean up the storage it's used, handing it back to the internal allocator used in sanitizer_common. Key features of the `Array` type: * Each segmented array is always backed by an `Allocator`, which is either user-provided or uses a global allocator. * When an `Array` grows, it grows by appending a segment that's fixed-sized. The size of each segment is computed by the number of elements of type `T` that can fit into cache line multiples. * An `Array` does not return memory to the `Allocator`, but it can keep track of the current number of "live" objects it stores. * When an `Array` is destroyed, it will not return memory to the `Allocator`. Users should clean up the `Allocator` independently of the `Array`. * The `Array` type keeps a freelist of the chunks it's used before, so that trimming and growing will re-use previously allocated chunks. These basic data structures are used by the XRay Profiling Mode implementation to implement efficient and cache-aware storage for data that's typically read-and-write heavy for tracking latency information. We're relying on the cache line characteristics of the architecture to provide us good data isolation and cache friendliness, when we're performing operations like searching for elements and/or updating data hosted in these cache lines. Reviewers: echristo, pelikan, kpw Subscribers: mgorny, llvm-commits Differential Revision: https://reviews.llvm.org/D45756 llvm-svn: 331141
2018-04-29 21:46:30 +08:00
}
T &front() const {
[XRay][compiler-rt] Segmented Array: Simplify and Optimise Summary: This is a follow-on to D49217 which simplifies and optimises the implementation of the segmented array. In this patch we co-locate the book-keeping for segments in the `__xray::Array<T>` with the data it's managing. We take the chance in this patch to actually rename `Chunk` to `Segment` to better align with the high-level description of the segmented array. With measurements using benchmarks landed in D48879, we've identified that calls to `pthread_getspecific` started dominating the cycles, which led us to revert the change made in D49217 to use C++ thread_local initialisation instead (it reduces the cost by a huge margin, since we save one PLT-based call to pthread functions in the hot path). In particular, this is in `__xray::getThreadLocalData()`. We also took the opportunity to remove the least-common-multiple based calculation and instead pack as much data into segments of the array. This greatly simplifies the API of the container which hides as much of the implementation details as possible. For instance, we calculate the number of elements we need for the each segment internally in the Array instead of making it part of the type. With the changes here, we're able to get a measurable improvement on the performance of profiling mode on top of what D48879 already provides. Depends on D48879. Reviewers: kpw, eizan Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D49363 llvm-svn: 337343
2018-07-18 10:08:39 +08:00
DCHECK_NE(Head, &SentinelSegment);
[XRay][profiler] Part 1: XRay Allocator and Array Implementations Summary: This change is part of the larger XRay Profiling Mode effort. Here we implement an arena allocator, for fixed sized buffers used in a segmented array implementation. This change adds the segmented array data structure, which relies on the allocator to provide and maintain the storage for the segmented array. Key features of the `Allocator` type: * It uses cache-aligned blocks, intended to host the actual data. These blocks are cache-line-size multiples of contiguous bytes. * The `Allocator` has a maximum memory budget, set at construction time. This allows us to cap the amount of data each specific `Allocator` instance is responsible for. * Upon destruction, the `Allocator` will clean up the storage it's used, handing it back to the internal allocator used in sanitizer_common. Key features of the `Array` type: * Each segmented array is always backed by an `Allocator`, which is either user-provided or uses a global allocator. * When an `Array` grows, it grows by appending a segment that's fixed-sized. The size of each segment is computed by the number of elements of type `T` that can fit into cache line multiples. * An `Array` does not return memory to the `Allocator`, but it can keep track of the current number of "live" objects it stores. * When an `Array` is destroyed, it will not return memory to the `Allocator`. Users should clean up the `Allocator` independently of the `Array`. * The `Array` type keeps a freelist of the chunks it's used before, so that trimming and growing will re-use previously allocated chunks. These basic data structures are used by the XRay Profiling Mode implementation to implement efficient and cache-aware storage for data that's typically read-and-write heavy for tracking latency information. We're relying on the cache line characteristics of the architecture to provide us good data isolation and cache friendliness, when we're performing operations like searching for elements and/or updating data hosted in these cache lines. Reviewers: echristo, pelikan, kpw Subscribers: mgorny, llvm-commits Differential Revision: https://reviews.llvm.org/D45756 llvm-svn: 331141
2018-04-29 21:46:30 +08:00
DCHECK_NE(Size, 0u);
return *begin();
[XRay][profiler] Part 1: XRay Allocator and Array Implementations Summary: This change is part of the larger XRay Profiling Mode effort. Here we implement an arena allocator, for fixed sized buffers used in a segmented array implementation. This change adds the segmented array data structure, which relies on the allocator to provide and maintain the storage for the segmented array. Key features of the `Allocator` type: * It uses cache-aligned blocks, intended to host the actual data. These blocks are cache-line-size multiples of contiguous bytes. * The `Allocator` has a maximum memory budget, set at construction time. This allows us to cap the amount of data each specific `Allocator` instance is responsible for. * Upon destruction, the `Allocator` will clean up the storage it's used, handing it back to the internal allocator used in sanitizer_common. Key features of the `Array` type: * Each segmented array is always backed by an `Allocator`, which is either user-provided or uses a global allocator. * When an `Array` grows, it grows by appending a segment that's fixed-sized. The size of each segment is computed by the number of elements of type `T` that can fit into cache line multiples. * An `Array` does not return memory to the `Allocator`, but it can keep track of the current number of "live" objects it stores. * When an `Array` is destroyed, it will not return memory to the `Allocator`. Users should clean up the `Allocator` independently of the `Array`. * The `Array` type keeps a freelist of the chunks it's used before, so that trimming and growing will re-use previously allocated chunks. These basic data structures are used by the XRay Profiling Mode implementation to implement efficient and cache-aware storage for data that's typically read-and-write heavy for tracking latency information. We're relying on the cache line characteristics of the architecture to provide us good data isolation and cache friendliness, when we're performing operations like searching for elements and/or updating data hosted in these cache lines. Reviewers: echristo, pelikan, kpw Subscribers: mgorny, llvm-commits Differential Revision: https://reviews.llvm.org/D45756 llvm-svn: 331141
2018-04-29 21:46:30 +08:00
}
T &back() const {
[XRay][compiler-rt] Segmented Array: Simplify and Optimise Summary: This is a follow-on to D49217 which simplifies and optimises the implementation of the segmented array. In this patch we co-locate the book-keeping for segments in the `__xray::Array<T>` with the data it's managing. We take the chance in this patch to actually rename `Chunk` to `Segment` to better align with the high-level description of the segmented array. With measurements using benchmarks landed in D48879, we've identified that calls to `pthread_getspecific` started dominating the cycles, which led us to revert the change made in D49217 to use C++ thread_local initialisation instead (it reduces the cost by a huge margin, since we save one PLT-based call to pthread functions in the hot path). In particular, this is in `__xray::getThreadLocalData()`. We also took the opportunity to remove the least-common-multiple based calculation and instead pack as much data into segments of the array. This greatly simplifies the API of the container which hides as much of the implementation details as possible. For instance, we calculate the number of elements we need for the each segment internally in the Array instead of making it part of the type. With the changes here, we're able to get a measurable improvement on the performance of profiling mode on top of what D48879 already provides. Depends on D48879. Reviewers: kpw, eizan Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D49363 llvm-svn: 337343
2018-07-18 10:08:39 +08:00
DCHECK_NE(Tail, &SentinelSegment);
DCHECK_NE(Size, 0u);
auto It = end();
--It;
return *It;
[XRay][profiler] Part 1: XRay Allocator and Array Implementations Summary: This change is part of the larger XRay Profiling Mode effort. Here we implement an arena allocator, for fixed sized buffers used in a segmented array implementation. This change adds the segmented array data structure, which relies on the allocator to provide and maintain the storage for the segmented array. Key features of the `Allocator` type: * It uses cache-aligned blocks, intended to host the actual data. These blocks are cache-line-size multiples of contiguous bytes. * The `Allocator` has a maximum memory budget, set at construction time. This allows us to cap the amount of data each specific `Allocator` instance is responsible for. * Upon destruction, the `Allocator` will clean up the storage it's used, handing it back to the internal allocator used in sanitizer_common. Key features of the `Array` type: * Each segmented array is always backed by an `Allocator`, which is either user-provided or uses a global allocator. * When an `Array` grows, it grows by appending a segment that's fixed-sized. The size of each segment is computed by the number of elements of type `T` that can fit into cache line multiples. * An `Array` does not return memory to the `Allocator`, but it can keep track of the current number of "live" objects it stores. * When an `Array` is destroyed, it will not return memory to the `Allocator`. Users should clean up the `Allocator` independently of the `Array`. * The `Array` type keeps a freelist of the chunks it's used before, so that trimming and growing will re-use previously allocated chunks. These basic data structures are used by the XRay Profiling Mode implementation to implement efficient and cache-aware storage for data that's typically read-and-write heavy for tracking latency information. We're relying on the cache line characteristics of the architecture to provide us good data isolation and cache friendliness, when we're performing operations like searching for elements and/or updating data hosted in these cache lines. Reviewers: echristo, pelikan, kpw Subscribers: mgorny, llvm-commits Differential Revision: https://reviews.llvm.org/D45756 llvm-svn: 331141
2018-04-29 21:46:30 +08:00
}
template <class Predicate> T *find_element(Predicate P) const {
if (empty())
return nullptr;
auto E = end();
for (auto I = begin(); I != E; ++I)
if (P(*I))
return &(*I);
return nullptr;
}
/// Remove N Elements from the end. This leaves the blocks behind, and not
/// require allocation of new blocks for new elements added after trimming.
void trim(size_t Elements) {
if (Elements == 0)
return;
[XRay][profiler] Part 1: XRay Allocator and Array Implementations Summary: This change is part of the larger XRay Profiling Mode effort. Here we implement an arena allocator, for fixed sized buffers used in a segmented array implementation. This change adds the segmented array data structure, which relies on the allocator to provide and maintain the storage for the segmented array. Key features of the `Allocator` type: * It uses cache-aligned blocks, intended to host the actual data. These blocks are cache-line-size multiples of contiguous bytes. * The `Allocator` has a maximum memory budget, set at construction time. This allows us to cap the amount of data each specific `Allocator` instance is responsible for. * Upon destruction, the `Allocator` will clean up the storage it's used, handing it back to the internal allocator used in sanitizer_common. Key features of the `Array` type: * Each segmented array is always backed by an `Allocator`, which is either user-provided or uses a global allocator. * When an `Array` grows, it grows by appending a segment that's fixed-sized. The size of each segment is computed by the number of elements of type `T` that can fit into cache line multiples. * An `Array` does not return memory to the `Allocator`, but it can keep track of the current number of "live" objects it stores. * When an `Array` is destroyed, it will not return memory to the `Allocator`. Users should clean up the `Allocator` independently of the `Array`. * The `Array` type keeps a freelist of the chunks it's used before, so that trimming and growing will re-use previously allocated chunks. These basic data structures are used by the XRay Profiling Mode implementation to implement efficient and cache-aware storage for data that's typically read-and-write heavy for tracking latency information. We're relying on the cache line characteristics of the architecture to provide us good data isolation and cache friendliness, when we're performing operations like searching for elements and/or updating data hosted in these cache lines. Reviewers: echristo, pelikan, kpw Subscribers: mgorny, llvm-commits Differential Revision: https://reviews.llvm.org/D45756 llvm-svn: 331141
2018-04-29 21:46:30 +08:00
DCHECK_LE(Elements, Size);
DCHECK_GT(Size, 0);
auto OldSize = Size;
[XRay][profiler] Part 1: XRay Allocator and Array Implementations Summary: This change is part of the larger XRay Profiling Mode effort. Here we implement an arena allocator, for fixed sized buffers used in a segmented array implementation. This change adds the segmented array data structure, which relies on the allocator to provide and maintain the storage for the segmented array. Key features of the `Allocator` type: * It uses cache-aligned blocks, intended to host the actual data. These blocks are cache-line-size multiples of contiguous bytes. * The `Allocator` has a maximum memory budget, set at construction time. This allows us to cap the amount of data each specific `Allocator` instance is responsible for. * Upon destruction, the `Allocator` will clean up the storage it's used, handing it back to the internal allocator used in sanitizer_common. Key features of the `Array` type: * Each segmented array is always backed by an `Allocator`, which is either user-provided or uses a global allocator. * When an `Array` grows, it grows by appending a segment that's fixed-sized. The size of each segment is computed by the number of elements of type `T` that can fit into cache line multiples. * An `Array` does not return memory to the `Allocator`, but it can keep track of the current number of "live" objects it stores. * When an `Array` is destroyed, it will not return memory to the `Allocator`. Users should clean up the `Allocator` independently of the `Array`. * The `Array` type keeps a freelist of the chunks it's used before, so that trimming and growing will re-use previously allocated chunks. These basic data structures are used by the XRay Profiling Mode implementation to implement efficient and cache-aware storage for data that's typically read-and-write heavy for tracking latency information. We're relying on the cache line characteristics of the architecture to provide us good data isolation and cache friendliness, when we're performing operations like searching for elements and/or updating data hosted in these cache lines. Reviewers: echristo, pelikan, kpw Subscribers: mgorny, llvm-commits Differential Revision: https://reviews.llvm.org/D45756 llvm-svn: 331141
2018-04-29 21:46:30 +08:00
Size -= Elements;
[XRay][compiler-rt] Segmented Array: Simplify and Optimise Summary: This is a follow-on to D49217 which simplifies and optimises the implementation of the segmented array. In this patch we co-locate the book-keeping for segments in the `__xray::Array<T>` with the data it's managing. We take the chance in this patch to actually rename `Chunk` to `Segment` to better align with the high-level description of the segmented array. With measurements using benchmarks landed in D48879, we've identified that calls to `pthread_getspecific` started dominating the cycles, which led us to revert the change made in D49217 to use C++ thread_local initialisation instead (it reduces the cost by a huge margin, since we save one PLT-based call to pthread functions in the hot path). In particular, this is in `__xray::getThreadLocalData()`. We also took the opportunity to remove the least-common-multiple based calculation and instead pack as much data into segments of the array. This greatly simplifies the API of the container which hides as much of the implementation details as possible. For instance, we calculate the number of elements we need for the each segment internally in the Array instead of making it part of the type. With the changes here, we're able to get a measurable improvement on the performance of profiling mode on top of what D48879 already provides. Depends on D48879. Reviewers: kpw, eizan Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D49363 llvm-svn: 337343
2018-07-18 10:08:39 +08:00
DCHECK_NE(Head, &SentinelSegment);
DCHECK_NE(Tail, &SentinelSegment);
[XRay][compiler-rt] Segmented Array: Simplify and Optimise Summary: This is a follow-on to D49217 which simplifies and optimises the implementation of the segmented array. In this patch we co-locate the book-keeping for segments in the `__xray::Array<T>` with the data it's managing. We take the chance in this patch to actually rename `Chunk` to `Segment` to better align with the high-level description of the segmented array. With measurements using benchmarks landed in D48879, we've identified that calls to `pthread_getspecific` started dominating the cycles, which led us to revert the change made in D49217 to use C++ thread_local initialisation instead (it reduces the cost by a huge margin, since we save one PLT-based call to pthread functions in the hot path). In particular, this is in `__xray::getThreadLocalData()`. We also took the opportunity to remove the least-common-multiple based calculation and instead pack as much data into segments of the array. This greatly simplifies the API of the container which hides as much of the implementation details as possible. For instance, we calculate the number of elements we need for the each segment internally in the Array instead of making it part of the type. With the changes here, we're able to get a measurable improvement on the performance of profiling mode on top of what D48879 already provides. Depends on D48879. Reviewers: kpw, eizan Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D49363 llvm-svn: 337343
2018-07-18 10:08:39 +08:00
for (auto SegmentsToTrim = (nearest_boundary(OldSize, ElementsPerSegment) -
nearest_boundary(Size, ElementsPerSegment)) /
ElementsPerSegment;
SegmentsToTrim > 0; --SegmentsToTrim) {
DCHECK_NE(Head, &SentinelSegment);
DCHECK_NE(Tail, &SentinelSegment);
[XRay][profiler] Part 1: XRay Allocator and Array Implementations Summary: This change is part of the larger XRay Profiling Mode effort. Here we implement an arena allocator, for fixed sized buffers used in a segmented array implementation. This change adds the segmented array data structure, which relies on the allocator to provide and maintain the storage for the segmented array. Key features of the `Allocator` type: * It uses cache-aligned blocks, intended to host the actual data. These blocks are cache-line-size multiples of contiguous bytes. * The `Allocator` has a maximum memory budget, set at construction time. This allows us to cap the amount of data each specific `Allocator` instance is responsible for. * Upon destruction, the `Allocator` will clean up the storage it's used, handing it back to the internal allocator used in sanitizer_common. Key features of the `Array` type: * Each segmented array is always backed by an `Allocator`, which is either user-provided or uses a global allocator. * When an `Array` grows, it grows by appending a segment that's fixed-sized. The size of each segment is computed by the number of elements of type `T` that can fit into cache line multiples. * An `Array` does not return memory to the `Allocator`, but it can keep track of the current number of "live" objects it stores. * When an `Array` is destroyed, it will not return memory to the `Allocator`. Users should clean up the `Allocator` independently of the `Array`. * The `Array` type keeps a freelist of the chunks it's used before, so that trimming and growing will re-use previously allocated chunks. These basic data structures are used by the XRay Profiling Mode implementation to implement efficient and cache-aware storage for data that's typically read-and-write heavy for tracking latency information. We're relying on the cache line characteristics of the architecture to provide us good data isolation and cache friendliness, when we're performing operations like searching for elements and/or updating data hosted in these cache lines. Reviewers: echristo, pelikan, kpw Subscribers: mgorny, llvm-commits Differential Revision: https://reviews.llvm.org/D45756 llvm-svn: 331141
2018-04-29 21:46:30 +08:00
// Put the tail into the Freelist.
[XRay][compiler-rt] Segmented Array: Simplify and Optimise Summary: This is a follow-on to D49217 which simplifies and optimises the implementation of the segmented array. In this patch we co-locate the book-keeping for segments in the `__xray::Array<T>` with the data it's managing. We take the chance in this patch to actually rename `Chunk` to `Segment` to better align with the high-level description of the segmented array. With measurements using benchmarks landed in D48879, we've identified that calls to `pthread_getspecific` started dominating the cycles, which led us to revert the change made in D49217 to use C++ thread_local initialisation instead (it reduces the cost by a huge margin, since we save one PLT-based call to pthread functions in the hot path). In particular, this is in `__xray::getThreadLocalData()`. We also took the opportunity to remove the least-common-multiple based calculation and instead pack as much data into segments of the array. This greatly simplifies the API of the container which hides as much of the implementation details as possible. For instance, we calculate the number of elements we need for the each segment internally in the Array instead of making it part of the type. With the changes here, we're able to get a measurable improvement on the performance of profiling mode on top of what D48879 already provides. Depends on D48879. Reviewers: kpw, eizan Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D49363 llvm-svn: 337343
2018-07-18 10:08:39 +08:00
auto *FreeSegment = Tail;
[XRay][profiler] Part 1: XRay Allocator and Array Implementations Summary: This change is part of the larger XRay Profiling Mode effort. Here we implement an arena allocator, for fixed sized buffers used in a segmented array implementation. This change adds the segmented array data structure, which relies on the allocator to provide and maintain the storage for the segmented array. Key features of the `Allocator` type: * It uses cache-aligned blocks, intended to host the actual data. These blocks are cache-line-size multiples of contiguous bytes. * The `Allocator` has a maximum memory budget, set at construction time. This allows us to cap the amount of data each specific `Allocator` instance is responsible for. * Upon destruction, the `Allocator` will clean up the storage it's used, handing it back to the internal allocator used in sanitizer_common. Key features of the `Array` type: * Each segmented array is always backed by an `Allocator`, which is either user-provided or uses a global allocator. * When an `Array` grows, it grows by appending a segment that's fixed-sized. The size of each segment is computed by the number of elements of type `T` that can fit into cache line multiples. * An `Array` does not return memory to the `Allocator`, but it can keep track of the current number of "live" objects it stores. * When an `Array` is destroyed, it will not return memory to the `Allocator`. Users should clean up the `Allocator` independently of the `Array`. * The `Array` type keeps a freelist of the chunks it's used before, so that trimming and growing will re-use previously allocated chunks. These basic data structures are used by the XRay Profiling Mode implementation to implement efficient and cache-aware storage for data that's typically read-and-write heavy for tracking latency information. We're relying on the cache line characteristics of the architecture to provide us good data isolation and cache friendliness, when we're performing operations like searching for elements and/or updating data hosted in these cache lines. Reviewers: echristo, pelikan, kpw Subscribers: mgorny, llvm-commits Differential Revision: https://reviews.llvm.org/D45756 llvm-svn: 331141
2018-04-29 21:46:30 +08:00
Tail = Tail->Prev;
[XRay][compiler-rt] Segmented Array: Simplify and Optimise Summary: This is a follow-on to D49217 which simplifies and optimises the implementation of the segmented array. In this patch we co-locate the book-keeping for segments in the `__xray::Array<T>` with the data it's managing. We take the chance in this patch to actually rename `Chunk` to `Segment` to better align with the high-level description of the segmented array. With measurements using benchmarks landed in D48879, we've identified that calls to `pthread_getspecific` started dominating the cycles, which led us to revert the change made in D49217 to use C++ thread_local initialisation instead (it reduces the cost by a huge margin, since we save one PLT-based call to pthread functions in the hot path). In particular, this is in `__xray::getThreadLocalData()`. We also took the opportunity to remove the least-common-multiple based calculation and instead pack as much data into segments of the array. This greatly simplifies the API of the container which hides as much of the implementation details as possible. For instance, we calculate the number of elements we need for the each segment internally in the Array instead of making it part of the type. With the changes here, we're able to get a measurable improvement on the performance of profiling mode on top of what D48879 already provides. Depends on D48879. Reviewers: kpw, eizan Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D49363 llvm-svn: 337343
2018-07-18 10:08:39 +08:00
if (Tail == &SentinelSegment)
[XRay][profiler] Part 1: XRay Allocator and Array Implementations Summary: This change is part of the larger XRay Profiling Mode effort. Here we implement an arena allocator, for fixed sized buffers used in a segmented array implementation. This change adds the segmented array data structure, which relies on the allocator to provide and maintain the storage for the segmented array. Key features of the `Allocator` type: * It uses cache-aligned blocks, intended to host the actual data. These blocks are cache-line-size multiples of contiguous bytes. * The `Allocator` has a maximum memory budget, set at construction time. This allows us to cap the amount of data each specific `Allocator` instance is responsible for. * Upon destruction, the `Allocator` will clean up the storage it's used, handing it back to the internal allocator used in sanitizer_common. Key features of the `Array` type: * Each segmented array is always backed by an `Allocator`, which is either user-provided or uses a global allocator. * When an `Array` grows, it grows by appending a segment that's fixed-sized. The size of each segment is computed by the number of elements of type `T` that can fit into cache line multiples. * An `Array` does not return memory to the `Allocator`, but it can keep track of the current number of "live" objects it stores. * When an `Array` is destroyed, it will not return memory to the `Allocator`. Users should clean up the `Allocator` independently of the `Array`. * The `Array` type keeps a freelist of the chunks it's used before, so that trimming and growing will re-use previously allocated chunks. These basic data structures are used by the XRay Profiling Mode implementation to implement efficient and cache-aware storage for data that's typically read-and-write heavy for tracking latency information. We're relying on the cache line characteristics of the architecture to provide us good data isolation and cache friendliness, when we're performing operations like searching for elements and/or updating data hosted in these cache lines. Reviewers: echristo, pelikan, kpw Subscribers: mgorny, llvm-commits Differential Revision: https://reviews.llvm.org/D45756 llvm-svn: 331141
2018-04-29 21:46:30 +08:00
Head = Tail;
else
[XRay][compiler-rt] Segmented Array: Simplify and Optimise Summary: This is a follow-on to D49217 which simplifies and optimises the implementation of the segmented array. In this patch we co-locate the book-keeping for segments in the `__xray::Array<T>` with the data it's managing. We take the chance in this patch to actually rename `Chunk` to `Segment` to better align with the high-level description of the segmented array. With measurements using benchmarks landed in D48879, we've identified that calls to `pthread_getspecific` started dominating the cycles, which led us to revert the change made in D49217 to use C++ thread_local initialisation instead (it reduces the cost by a huge margin, since we save one PLT-based call to pthread functions in the hot path). In particular, this is in `__xray::getThreadLocalData()`. We also took the opportunity to remove the least-common-multiple based calculation and instead pack as much data into segments of the array. This greatly simplifies the API of the container which hides as much of the implementation details as possible. For instance, we calculate the number of elements we need for the each segment internally in the Array instead of making it part of the type. With the changes here, we're able to get a measurable improvement on the performance of profiling mode on top of what D48879 already provides. Depends on D48879. Reviewers: kpw, eizan Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D49363 llvm-svn: 337343
2018-07-18 10:08:39 +08:00
Tail->Next = &SentinelSegment;
DCHECK_EQ(Tail->Next, &SentinelSegment);
FreeSegment->Next = Freelist;
FreeSegment->Prev = &SentinelSegment;
if (Freelist != &SentinelSegment)
Freelist->Prev = FreeSegment;
Freelist = FreeSegment;
[XRay][profiler] Part 1: XRay Allocator and Array Implementations Summary: This change is part of the larger XRay Profiling Mode effort. Here we implement an arena allocator, for fixed sized buffers used in a segmented array implementation. This change adds the segmented array data structure, which relies on the allocator to provide and maintain the storage for the segmented array. Key features of the `Allocator` type: * It uses cache-aligned blocks, intended to host the actual data. These blocks are cache-line-size multiples of contiguous bytes. * The `Allocator` has a maximum memory budget, set at construction time. This allows us to cap the amount of data each specific `Allocator` instance is responsible for. * Upon destruction, the `Allocator` will clean up the storage it's used, handing it back to the internal allocator used in sanitizer_common. Key features of the `Array` type: * Each segmented array is always backed by an `Allocator`, which is either user-provided or uses a global allocator. * When an `Array` grows, it grows by appending a segment that's fixed-sized. The size of each segment is computed by the number of elements of type `T` that can fit into cache line multiples. * An `Array` does not return memory to the `Allocator`, but it can keep track of the current number of "live" objects it stores. * When an `Array` is destroyed, it will not return memory to the `Allocator`. Users should clean up the `Allocator` independently of the `Array`. * The `Array` type keeps a freelist of the chunks it's used before, so that trimming and growing will re-use previously allocated chunks. These basic data structures are used by the XRay Profiling Mode implementation to implement efficient and cache-aware storage for data that's typically read-and-write heavy for tracking latency information. We're relying on the cache line characteristics of the architecture to provide us good data isolation and cache friendliness, when we're performing operations like searching for elements and/or updating data hosted in these cache lines. Reviewers: echristo, pelikan, kpw Subscribers: mgorny, llvm-commits Differential Revision: https://reviews.llvm.org/D45756 llvm-svn: 331141
2018-04-29 21:46:30 +08:00
}
}
// Provide iterators.
Iterator<T> begin() const { return Iterator<T>(Head, 0, Size); }
Iterator<T> end() const { return Iterator<T>(Tail, Size, Size); }
Iterator<const T> cbegin() const { return Iterator<const T>(Head, 0, Size); }
Iterator<const T> cend() const { return Iterator<const T>(Tail, Size, Size); }
[XRay][profiler] Part 1: XRay Allocator and Array Implementations Summary: This change is part of the larger XRay Profiling Mode effort. Here we implement an arena allocator, for fixed sized buffers used in a segmented array implementation. This change adds the segmented array data structure, which relies on the allocator to provide and maintain the storage for the segmented array. Key features of the `Allocator` type: * It uses cache-aligned blocks, intended to host the actual data. These blocks are cache-line-size multiples of contiguous bytes. * The `Allocator` has a maximum memory budget, set at construction time. This allows us to cap the amount of data each specific `Allocator` instance is responsible for. * Upon destruction, the `Allocator` will clean up the storage it's used, handing it back to the internal allocator used in sanitizer_common. Key features of the `Array` type: * Each segmented array is always backed by an `Allocator`, which is either user-provided or uses a global allocator. * When an `Array` grows, it grows by appending a segment that's fixed-sized. The size of each segment is computed by the number of elements of type `T` that can fit into cache line multiples. * An `Array` does not return memory to the `Allocator`, but it can keep track of the current number of "live" objects it stores. * When an `Array` is destroyed, it will not return memory to the `Allocator`. Users should clean up the `Allocator` independently of the `Array`. * The `Array` type keeps a freelist of the chunks it's used before, so that trimming and growing will re-use previously allocated chunks. These basic data structures are used by the XRay Profiling Mode implementation to implement efficient and cache-aware storage for data that's typically read-and-write heavy for tracking latency information. We're relying on the cache line characteristics of the architecture to provide us good data isolation and cache friendliness, when we're performing operations like searching for elements and/or updating data hosted in these cache lines. Reviewers: echristo, pelikan, kpw Subscribers: mgorny, llvm-commits Differential Revision: https://reviews.llvm.org/D45756 llvm-svn: 331141
2018-04-29 21:46:30 +08:00
};
// We need to have this storage definition out-of-line so that the compiler can
[XRay][compiler-rt] Segmented Array: Simplify and Optimise Summary: This is a follow-on to D49217 which simplifies and optimises the implementation of the segmented array. In this patch we co-locate the book-keeping for segments in the `__xray::Array<T>` with the data it's managing. We take the chance in this patch to actually rename `Chunk` to `Segment` to better align with the high-level description of the segmented array. With measurements using benchmarks landed in D48879, we've identified that calls to `pthread_getspecific` started dominating the cycles, which led us to revert the change made in D49217 to use C++ thread_local initialisation instead (it reduces the cost by a huge margin, since we save one PLT-based call to pthread functions in the hot path). In particular, this is in `__xray::getThreadLocalData()`. We also took the opportunity to remove the least-common-multiple based calculation and instead pack as much data into segments of the array. This greatly simplifies the API of the container which hides as much of the implementation details as possible. For instance, we calculate the number of elements we need for the each segment internally in the Array instead of making it part of the type. With the changes here, we're able to get a measurable improvement on the performance of profiling mode on top of what D48879 already provides. Depends on D48879. Reviewers: kpw, eizan Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D49363 llvm-svn: 337343
2018-07-18 10:08:39 +08:00
// ensure that storage for the SentinelSegment is defined and has a single
[XRay][profiler] Part 1: XRay Allocator and Array Implementations Summary: This change is part of the larger XRay Profiling Mode effort. Here we implement an arena allocator, for fixed sized buffers used in a segmented array implementation. This change adds the segmented array data structure, which relies on the allocator to provide and maintain the storage for the segmented array. Key features of the `Allocator` type: * It uses cache-aligned blocks, intended to host the actual data. These blocks are cache-line-size multiples of contiguous bytes. * The `Allocator` has a maximum memory budget, set at construction time. This allows us to cap the amount of data each specific `Allocator` instance is responsible for. * Upon destruction, the `Allocator` will clean up the storage it's used, handing it back to the internal allocator used in sanitizer_common. Key features of the `Array` type: * Each segmented array is always backed by an `Allocator`, which is either user-provided or uses a global allocator. * When an `Array` grows, it grows by appending a segment that's fixed-sized. The size of each segment is computed by the number of elements of type `T` that can fit into cache line multiples. * An `Array` does not return memory to the `Allocator`, but it can keep track of the current number of "live" objects it stores. * When an `Array` is destroyed, it will not return memory to the `Allocator`. Users should clean up the `Allocator` independently of the `Array`. * The `Array` type keeps a freelist of the chunks it's used before, so that trimming and growing will re-use previously allocated chunks. These basic data structures are used by the XRay Profiling Mode implementation to implement efficient and cache-aware storage for data that's typically read-and-write heavy for tracking latency information. We're relying on the cache line characteristics of the architecture to provide us good data isolation and cache friendliness, when we're performing operations like searching for elements and/or updating data hosted in these cache lines. Reviewers: echristo, pelikan, kpw Subscribers: mgorny, llvm-commits Differential Revision: https://reviews.llvm.org/D45756 llvm-svn: 331141
2018-04-29 21:46:30 +08:00
// address.
[XRay][compiler-rt] Segmented Array: Simplify and Optimise Summary: This is a follow-on to D49217 which simplifies and optimises the implementation of the segmented array. In this patch we co-locate the book-keeping for segments in the `__xray::Array<T>` with the data it's managing. We take the chance in this patch to actually rename `Chunk` to `Segment` to better align with the high-level description of the segmented array. With measurements using benchmarks landed in D48879, we've identified that calls to `pthread_getspecific` started dominating the cycles, which led us to revert the change made in D49217 to use C++ thread_local initialisation instead (it reduces the cost by a huge margin, since we save one PLT-based call to pthread functions in the hot path). In particular, this is in `__xray::getThreadLocalData()`. We also took the opportunity to remove the least-common-multiple based calculation and instead pack as much data into segments of the array. This greatly simplifies the API of the container which hides as much of the implementation details as possible. For instance, we calculate the number of elements we need for the each segment internally in the Array instead of making it part of the type. With the changes here, we're able to get a measurable improvement on the performance of profiling mode on top of what D48879 already provides. Depends on D48879. Reviewers: kpw, eizan Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D49363 llvm-svn: 337343
2018-07-18 10:08:39 +08:00
template <class T>
typename Array<T>::SegmentBase Array<T>::SentinelSegment{
&Array<T>::SentinelSegment, &Array<T>::SentinelSegment};
[XRay][profiler] Part 1: XRay Allocator and Array Implementations Summary: This change is part of the larger XRay Profiling Mode effort. Here we implement an arena allocator, for fixed sized buffers used in a segmented array implementation. This change adds the segmented array data structure, which relies on the allocator to provide and maintain the storage for the segmented array. Key features of the `Allocator` type: * It uses cache-aligned blocks, intended to host the actual data. These blocks are cache-line-size multiples of contiguous bytes. * The `Allocator` has a maximum memory budget, set at construction time. This allows us to cap the amount of data each specific `Allocator` instance is responsible for. * Upon destruction, the `Allocator` will clean up the storage it's used, handing it back to the internal allocator used in sanitizer_common. Key features of the `Array` type: * Each segmented array is always backed by an `Allocator`, which is either user-provided or uses a global allocator. * When an `Array` grows, it grows by appending a segment that's fixed-sized. The size of each segment is computed by the number of elements of type `T` that can fit into cache line multiples. * An `Array` does not return memory to the `Allocator`, but it can keep track of the current number of "live" objects it stores. * When an `Array` is destroyed, it will not return memory to the `Allocator`. Users should clean up the `Allocator` independently of the `Array`. * The `Array` type keeps a freelist of the chunks it's used before, so that trimming and growing will re-use previously allocated chunks. These basic data structures are used by the XRay Profiling Mode implementation to implement efficient and cache-aware storage for data that's typically read-and-write heavy for tracking latency information. We're relying on the cache line characteristics of the architecture to provide us good data isolation and cache friendliness, when we're performing operations like searching for elements and/or updating data hosted in these cache lines. Reviewers: echristo, pelikan, kpw Subscribers: mgorny, llvm-commits Differential Revision: https://reviews.llvm.org/D45756 llvm-svn: 331141
2018-04-29 21:46:30 +08:00
} // namespace __xray
#endif // XRAY_SEGMENTED_ARRAY_H