[libc++][format] Improve format buffer.

Allow bulk output operations on the buffer instead of adding one
code unit at a time. This has a huge performance benefit at the cost of
larger binary. This doesn't implement @vitaut's earlier suggestion to
avoid buffering for std::string when writing a strings. That can be done
in a follow-up patch.

There are some minor complications for the non-buffered format_to_n.
When writing one character at a time it's easy to detect when reaching
the limit n. This is solved by adding a small overhead for format_to_n.
When the next write would overflow it stores the data in the internal
buffer and copies that up-to n code units. The overhead isn't measured,
but it's expected to only be an issue for small values of n; for larger
values the general improvements will outweight the new overhead.

```
   text	   data	    bss	    dec	    hex	filename
 349081	   6096	    440	 355617	  56d21	format.libcxx.out-baseline
 344442	   6088	    440	 350970	  55afa	formatted_size.libcxx.out-baseline
4567980	  57272	    424	4625676	 46950c	formatter_float.libcxx.out-baseline
 718800	  12472	    488	 731760	  b2a70	formatter_int.libcxx.out-baseline
 376341	   6096	    552	 382989	  5d80d	format_to.libcxx.out-beaseline

 370169	   6096	    440	 376705	  5bf81	format.libcxx.out
 365530	   6088	    440	 372058	  5ad5a	formatted_size.libcxx.out
4575116	  57272	    424	4632812	 46b0ec	formatter_float.libcxx.out
 725936	  12472	    488	 738896	  b4650	formatter_int.libcxx.out
 397429	   6096	    552	 404077	  62a6d	format_to.libcxx.out
```

For very small strings the new method is slower, from 4 characters
there's already a small gain.

```
Comparing ./format.libcxx.out-baseline to ./format.libcxx.out
Benchmark                                           Time             CPU      Time Old      Time New       CPU Old       CPU New
--------------------------------------------------------------------------------------------------------------------------------
BM_format_string<char>/1                         +0.0268         +0.0268            43            44            43            44
BM_format_string<char>/2                         +0.0133         +0.0133            22            22            22            22
BM_format_string<char>/4                         -0.0248         -0.0248            12            11            12            11
BM_format_string<char>/8                         -0.0831         -0.0831             6             6             6             6
BM_format_string<char>/16                        -0.2976         -0.2976             4             3             4             3
BM_format_string<char>/32                        -0.4369         -0.4369             3             2             3             2
BM_format_string<char>/64                        -0.6375         -0.6375             3             1             3             1
BM_format_string<char>/128                       -0.7685         -0.7685             2             1             2             1

```

The int benchmark has benefits for the simple formatting, but shines for
the complex formatting:
```
Comparing ./formatter_int.libcxx.out-baseline to ./formatter_int.libcxx.out
Benchmark                                                               Time             CPU      Time Old      Time New       CPU Old       CPU New
----------------------------------------------------------------------------------------------------------------------------------------------------
BM_Basic<uint32_t>                                                   -0.2307         -0.2307            60            46            60            46
BM_Basic<int32_t>                                                    -0.1985         -0.1985            61            49            61            49
BM_Basic<uint64_t>                                                   -0.3478         -0.3479            81            53            81            53
BM_Basic<int64_t>                                                    -0.3475         -0.3475            81            53            81            53
BM_BasicLow<__uint128_t>                                             -0.3388         -0.3388            86            57            86            57
BM_BasicLow<__int128_t>                                              -0.3431         -0.3431            86            57            86            57
BM_Basic<__uint128_t>                                                -0.2822         -0.2822           236           170           236           170
BM_Basic<__int128_t>                                                 -0.3107         -0.3107           219           151           219           151
Integral_LocFalse_BaseBin_AlignNone_Int64                            -0.5781         -0.5781           178            75           178            75
Integral_LocFalse_BaseBin_AlignmentLeft_Int64                        -0.9231         -0.9231          1156            89          1156            89
Integral_LocFalse_BaseBin_AlignmentCenter_Int64                      -0.9179         -0.9179          1107            91          1107            91
Integral_LocFalse_BaseBin_AlignmentRight_Int64                       -0.9238         -0.9238          1147            87          1147            87
Integral_LocFalse_BaseBin_ZeroPadding_Int64                          -0.9170         -0.9170          1137            94          1137            94
Integral_LocFalse_BaseBin_AlignNone_Uint64                           -0.5923         -0.5923           175            71           175            71
Integral_LocFalse_BaseBin_AlignmentLeft_Uint64                       -0.9251         -0.9251          1154            86          1154            86
Integral_LocFalse_BaseBin_AlignmentCenter_Uint64                     -0.9204         -0.9204          1105            88          1105            88
Integral_LocFalse_BaseBin_AlignmentRight_Uint64                      -0.9242         -0.9242          1125            85          1125            85
Integral_LocFalse_BaseBin_ZeroPadding_Uint64                         -0.9232         -0.9232          1139            88          1139            88
Integral_LocFalse_BaseOct_AlignNone_Int64                            -0.3241         -0.3241           100            67           100            67
Integral_LocFalse_BaseOct_AlignmentLeft_Int64                        -0.9322         -0.9322          1166            79          1166            79
Integral_LocFalse_BaseOct_AlignmentCenter_Int64                      -0.9251         -0.9251          1108            83          1108            83
Integral_LocFalse_BaseOct_AlignmentRight_Int64                       -0.9303         -0.9303          1136            79          1136            79
Integral_LocFalse_BaseOct_ZeroPadding_Int64                          -0.9264         -0.9264          1156            85          1156            85
Integral_LocFalse_BaseOct_AlignNone_Uint64                           -0.3116         -0.3116            96            66            96            66
Integral_LocFalse_BaseOct_AlignmentLeft_Uint64                       -0.9310         -0.9310          1168            81          1168            81
Integral_LocFalse_BaseOct_AlignmentCenter_Uint64                     -0.9281         -0.9281          1128            81          1128            81
Integral_LocFalse_BaseOct_AlignmentRight_Uint64                      -0.9299         -0.9299          1148            80          1148            80
Integral_LocFalse_BaseOct_ZeroPadding_Uint64                         -0.9288         -0.9288          1153            82          1153            82
Integral_LocFalse_BaseDec_AlignNone_Int64                            -0.3342         -0.3342            95            63            95            63
Integral_LocFalse_BaseDec_AlignmentLeft_Int64                        -0.9360         -0.9360          1157            74          1157            74
Integral_LocFalse_BaseDec_AlignmentCenter_Int64                      -0.9303         -0.9303          1128            79          1128            79
Integral_LocFalse_BaseDec_AlignmentRight_Int64                       -0.9369         -0.9369          1164            73          1164            73
Integral_LocFalse_BaseDec_ZeroPadding_Int64                          -0.9323         -0.9323          1157            78          1157            78
Integral_LocFalse_BaseDec_AlignNone_Uint64                           -0.3198         -0.3198            93            63            93            63
Integral_LocFalse_BaseDec_AlignmentLeft_Uint64                       -0.9351         -0.9351          1158            75          1158            75
Integral_LocFalse_BaseDec_AlignmentCenter_Uint64                     -0.9298         -0.9298          1128            79          1128            79
Integral_LocFalse_BaseDec_AlignmentRight_Uint64                      -0.9361         -0.9361          1157            74          1157            74
Integral_LocFalse_BaseDec_ZeroPadding_Uint64                         -0.9333         -0.9333          1151            77          1151            77
Integral_LocFalse_BaseHex_AlignNone_Int64                            -0.3020         -0.3020            89            62            89            62
Integral_LocFalse_BaseHex_AlignmentLeft_Int64                        -0.9357         -0.9357          1174            75          1174            75
Integral_LocFalse_BaseHex_AlignmentCenter_Int64                      -0.9319         -0.9319          1129            77          1129            77
Integral_LocFalse_BaseHex_AlignmentRight_Int64                       -0.9350         -0.9350          1161            75          1161            75
Integral_LocFalse_BaseHex_ZeroPadding_Int64                          -0.9293         -0.9293          1150            81          1150            81
Integral_LocFalse_BaseHex_AlignNone_Uint64                           -0.3056         -0.3057            86            59            86            59
Integral_LocFalse_BaseHex_AlignmentLeft_Uint64                       -0.9378         -0.9378          1174            73          1174            73
Integral_LocFalse_BaseHex_AlignmentCenter_Uint64                     -0.9341         -0.9341          1129            74          1130            74
Integral_LocFalse_BaseHex_AlignmentRight_Uint64                      -0.9361         -0.9361          1157            74          1157            74
Integral_LocFalse_BaseHex_ZeroPadding_Uint64                         -0.9315         -0.9315          1147            79          1147            79
Integral_LocFalse_BaseHexUpper_AlignNone_Int64                       -0.0019         -0.0019            91            90            91            90
Integral_LocFalse_BaseHexUpper_AlignmentLeft_Int64                   -0.9099         -0.9099          1162           105          1162           105
Integral_LocFalse_BaseHexUpper_AlignmentCenter_Int64                 -0.9041         -0.9041          1121           108          1121           108
Integral_LocFalse_BaseHexUpper_AlignmentRight_Int64                  -0.9086         -0.9086          1162           106          1162           106
Integral_LocFalse_BaseHexUpper_ZeroPadding_Int64                     -0.9057         -0.9057          1164           110          1164           110
Integral_LocFalse_BaseHexUpper_AlignNone_Uint64                      +0.0110         +0.0110            86            87            86            87
Integral_LocFalse_BaseHexUpper_AlignmentLeft_Uint64                  -0.9136         -0.9136          1161           100          1161           100
Integral_LocFalse_BaseHexUpper_AlignmentCenter_Uint64                -0.9078         -0.9078          1133           104          1133           104
Integral_LocFalse_BaseHexUpper_AlignmentRight_Uint64                 -0.9132         -0.9132          1177           102          1177           102
Integral_LocFalse_BaseHexUpper_ZeroPadding_Uint64                    -0.9091         -0.9091          1160           105          1160           105
```
Other benchmarks give similar results.

Reviewed By: #libc, ldionne

Differential Revision: https://reviews.llvm.org/D129964
This commit is contained in:
Mark de Wever 2022-07-16 17:03:27 +02:00
parent 69c09d11f8
commit f7c0df002a
6 changed files with 341 additions and 66 deletions

View File

@ -11,8 +11,10 @@
#define _LIBCPP___FORMAT_BUFFER_H
#include <__algorithm/copy_n.h>
#include <__algorithm/fill_n.h>
#include <__algorithm/max.h>
#include <__algorithm/min.h>
#include <__algorithm/transform.h>
#include <__algorithm/unwrap_iter.h>
#include <__config>
#include <__format/enable_insertable.h>
@ -26,6 +28,7 @@
#include <__utility/move.h>
#include <concepts>
#include <cstddef>
#include <string_view>
#include <type_traits>
#if !defined(_LIBCPP_HAS_NO_PRAGMA_SYSTEM_HEADER)
@ -69,8 +72,6 @@ public:
return back_insert_iterator{*this};
}
// TODO FMT It would be nice to have an overload taking a
// basic_string_view<_CharT> and append it directly.
_LIBCPP_HIDE_FROM_ABI void push_back(_CharT __c) {
__ptr_[__size_++] = __c;
@ -80,6 +81,95 @@ public:
flush();
}
/// Copies the input __str to the buffer.
///
/// Since some of the input is generated by std::to_chars, there needs to be a
/// conversion when _CharT is wchar_t.
template <__formatter::__char_type _InCharT>
_LIBCPP_HIDE_FROM_ABI void __copy(basic_string_view<_InCharT> __str) {
// When the underlying iterator is a simple iterator the __capacity_ is
// infinite. For a string or container back_inserter it isn't. This means
// adding a large string the the buffer can cause some overhead. In that
// case a better approach could be:
// - flush the buffer
// - container.append(__str.begin(), __str.end());
// The same holds true for the fill.
// For transform it might be slightly harder, however the use case for
// transform is slightly less common; it converts hexadecimal values to
// upper case. For integral these strings are short.
// TODO FMT Look at the improvements above.
size_t __n = __str.size();
__flush_on_overflow(__n);
if (__n <= __capacity_) {
_VSTD::copy_n(__str.data(), __n, _VSTD::addressof(__ptr_[__size_]));
__size_ += __n;
return;
}
// The output doesn't fit in the internal buffer.
// Copy the data in "__capacity_" sized chunks.
_LIBCPP_ASSERT(__size_ == 0, "the buffer should be flushed by __flush_on_overflow");
const _InCharT* __first = __str.data();
do {
size_t __chunk = _VSTD::min(__n, __capacity_);
_VSTD::copy_n(__first, __chunk, _VSTD::addressof(__ptr_[__size_]));
__size_ = __chunk;
__first += __chunk;
__n -= __chunk;
flush();
} while (__n);
}
/// A std::transform wrapper.
///
/// Like @ref __copy it may need to do type conversion.
template <__formatter::__char_type _InCharT, class _UnaryOperation>
_LIBCPP_HIDE_FROM_ABI void __transform(const _InCharT* __first, const _InCharT* __last, _UnaryOperation __operation) {
_LIBCPP_ASSERT(__first <= __last, "not a valid range");
size_t __n = static_cast<size_t>(__last - __first);
__flush_on_overflow(__n);
if (__n <= __capacity_) {
_VSTD::transform(__first, __last, _VSTD::addressof(__ptr_[__size_]), _VSTD::move(__operation));
__size_ += __n;
return;
}
// The output doesn't fit in the internal buffer.
// Transform the data in "__capacity_" sized chunks.
_LIBCPP_ASSERT(__size_ == 0, "the buffer should be flushed by __flush_on_overflow");
do {
size_t __chunk = _VSTD::min(__n, __capacity_);
_VSTD::transform(__first, __first + __chunk, _VSTD::addressof(__ptr_[__size_]), __operation);
__size_ = __chunk;
__first += __chunk;
__n -= __chunk;
flush();
} while (__n);
}
/// A \c fill_n wrapper.
_LIBCPP_HIDE_FROM_ABI void __fill(size_t __n, _CharT __value) {
__flush_on_overflow(__n);
if (__n <= __capacity_) {
_VSTD::fill_n(_VSTD::addressof(__ptr_[__size_]), __n, __value);
__size_ += __n;
return;
}
// The output doesn't fit in the internal buffer.
// Fill the buffer in "__capacity_" sized chunks.
_LIBCPP_ASSERT(__size_ == 0, "the buffer should be flushed by __flush_on_overflow");
do {
size_t __chunk = _VSTD::min(__n, __capacity_);
_VSTD::fill_n(_VSTD::addressof(__ptr_[__size_]), __chunk, __value);
__size_ = __chunk;
__n -= __chunk;
flush();
} while (__n);
}
_LIBCPP_HIDE_FROM_ABI void flush() {
__flush_(__ptr_, __size_, __obj_);
__size_ = 0;
@ -91,6 +181,44 @@ private:
size_t __size_{0};
void (*__flush_)(_CharT*, size_t, void*);
void* __obj_;
/// Flushes the buffer when the output operation would overflow the buffer.
///
/// A simple approach for the overflow detection would be something along the
/// lines:
/// \code
/// // The internal buffer is large enough.
/// if (__n <= __capacity_) {
/// // Flush when we really would overflow.
/// if (__size_ + __n >= __capacity_)
/// flush();
/// ...
/// }
/// \endcode
///
/// This approach works for all cases but one:
/// A __format_to_n_buffer_base where \ref __enable_direct_output is true.
/// In that case the \ref __capacity_ of the buffer changes during the first
/// \ref flush. During that operation the output buffer switches from its
/// __writer_ to its __storage_. The \ref __capacity_ of the former depends
/// on the value of n, of the latter is a fixed size. For example:
/// - a format_to_n call with a 10'000 char buffer,
/// - the buffer is filled with 9'500 chars,
/// - adding 1'000 elements would overflow the buffer so the buffer gets
/// changed and the \ref __capacity_ decreases from 10'000 to
/// __buffer_size (256 at the time of writing).
///
/// This means that the \ref flush for this class may need to copy a part of
/// the internal buffer to the proper output. In this example there will be
/// 500 characters that need this copy operation.
///
/// Note it would be more efficient to write 500 chars directly and then swap
/// the buffers. This would make the code more complex and \ref format_to_n is
/// not the most common use case. Therefore the optimization isn't done.
_LIBCPP_HIDE_FROM_ABI void __flush_on_overflow(size_t __n) {
if (__size_ + __n >= __capacity_)
flush();
}
};
/// A storage using an internal buffer.
@ -280,12 +408,12 @@ struct _LIBCPP_TEMPLATE_VIS __format_to_n_buffer_base {
using _Size = iter_difference_t<_OutIt>;
public:
_LIBCPP_HIDE_FROM_ABI explicit __format_to_n_buffer_base(_OutIt __out_it, _Size __n)
: __writer_(_VSTD::move(__out_it)), __n_(_VSTD::max(_Size(0), __n)) {}
_LIBCPP_HIDE_FROM_ABI explicit __format_to_n_buffer_base(_OutIt __out_it, _Size __max_size)
: __writer_(_VSTD::move(__out_it)), __max_size_(_VSTD::max(_Size(0), __max_size)) {}
_LIBCPP_HIDE_FROM_ABI void flush(_CharT* __ptr, size_t __size) {
if (_Size(__size_) <= __n_)
__writer_.flush(__ptr, _VSTD::min(_Size(__size), __n_ - __size_));
if (_Size(__size_) <= __max_size_)
__writer_.flush(__ptr, _VSTD::min(_Size(__size), __max_size_ - __size_));
__size_ += __size;
}
@ -294,7 +422,7 @@ protected:
__output_buffer<_CharT> __output_{__storage_.begin(), __storage_.__buffer_size, this};
typename __writer_selector<_OutIt, _CharT>::type __writer_;
_Size __n_;
_Size __max_size_;
_Size __size_{0};
};
@ -310,24 +438,35 @@ class _LIBCPP_TEMPLATE_VIS __format_to_n_buffer_base<_OutIt, _CharT, true> {
using _Size = iter_difference_t<_OutIt>;
public:
_LIBCPP_HIDE_FROM_ABI explicit __format_to_n_buffer_base(_OutIt __out_it, _Size __n)
: __output_(_VSTD::__unwrap_iter(__out_it), __n, this), __writer_(_VSTD::move(__out_it)) {
if (__n <= 0) [[unlikely]]
_LIBCPP_HIDE_FROM_ABI explicit __format_to_n_buffer_base(_OutIt __out_it, _Size __max_size)
: __output_(_VSTD::__unwrap_iter(__out_it), __max_size, this),
__writer_(_VSTD::move(__out_it)),
__max_size_(__max_size) {
if (__max_size <= 0) [[unlikely]]
__output_.reset(__storage_.begin(), __storage_.__buffer_size);
}
_LIBCPP_HIDE_FROM_ABI void flush(_CharT* __ptr, size_t __size) {
// A flush to the direct writer happens in two occasions:
// A flush to the direct writer happens in the following occasions:
// - The format function has written the maximum number of allowed code
// units. At this point it's no longer valid to write to this writer. So
// switch to the internal storage. This internal storage doesn't need to
// be written anywhere so the flush for that storage writes no output.
// - Like above, but the next "mass write" operation would overflow the
// buffer. In that case the buffer is pre-emptively switched. The still
// valid code units will be written separately.
// - The format_to_n function is finished. In this case there's no need to
// switch the buffer, but for simplicity the buffers are still switched.
// When the __n <= 0 the constructor already switched the buffers.
// When the __max_size <= 0 the constructor already switched the buffers.
if (__size_ == 0 && __ptr != __storage_.begin()) {
__writer_.flush(__ptr, __size);
__output_.reset(__storage_.begin(), __storage_.__buffer_size);
} else if (__size_ < __max_size_) {
// Copies a part of the internal buffer to the output up to n characters.
// See __output_buffer<_CharT>::__flush_on_overflow for more information.
_Size __s = _VSTD::min(_Size(__size), __max_size_ - __size_);
std::copy_n(__ptr, __s, __writer_.out());
__writer_.flush(__ptr, __s);
}
__size_ += __size;
@ -338,6 +477,7 @@ protected:
__output_buffer<_CharT> __output_;
__writer_direct<_OutIt, _CharT> __writer_;
_Size __max_size_;
_Size __size_{0};
};
@ -350,7 +490,8 @@ struct _LIBCPP_TEMPLATE_VIS __format_to_n_buffer final
using _Size = iter_difference_t<_OutIt>;
public:
_LIBCPP_HIDE_FROM_ABI explicit __format_to_n_buffer(_OutIt __out_it, _Size __n) : _Base(_VSTD::move(__out_it), __n) {}
_LIBCPP_HIDE_FROM_ABI explicit __format_to_n_buffer(_OutIt __out_it, _Size __max_size)
: _Base(_VSTD::move(__out_it), __max_size) {}
_LIBCPP_HIDE_FROM_ABI auto make_output_iterator() { return this->__output_.make_output_iterator(); }
_LIBCPP_HIDE_FROM_ABI format_to_n_result<_OutIt> result() && {

View File

@ -10,9 +10,7 @@
#ifndef _LIBCPP___FORMAT_FORMATTER_FLOATING_POINT_H
#define _LIBCPP___FORMAT_FORMATTER_FLOATING_POINT_H
#include <__algorithm/copy.h>
#include <__algorithm/copy_n.h>
#include <__algorithm/fill_n.h>
#include <__algorithm/find.h>
#include <__algorithm/min.h>
#include <__algorithm/rotate.h>
@ -528,13 +526,13 @@ _LIBCPP_HIDE_FROM_ABI _OutIt __format_locale_specific_form(
// sign and (zero padding or alignment)
if (__zero_padding && __first != __buffer.begin())
*__out_it++ = *__buffer.begin();
__out_it = _VSTD::fill_n(_VSTD::move(__out_it), __padding.__before_, __specs.__fill_);
__out_it = __formatter::__fill(_VSTD::move(__out_it), __padding.__before_, __specs.__fill_);
if (!__zero_padding && __first != __buffer.begin())
*__out_it++ = *__buffer.begin();
// integral part
if (__grouping.empty()) {
__out_it = _VSTD::copy_n(__first, __digits, _VSTD::move(__out_it));
__out_it = __formatter::__copy(__first, __digits, _VSTD::move(__out_it));
} else {
auto __r = __grouping.rbegin();
auto __e = __grouping.rend() - 1;
@ -546,7 +544,7 @@ _LIBCPP_HIDE_FROM_ABI _OutIt __format_locale_specific_form(
// This loop achieves that process by testing the termination condition
// midway in the loop.
while (true) {
__out_it = _VSTD::copy_n(__first, *__r, _VSTD::move(__out_it));
__out_it = __formatter::__copy(__first, *__r, _VSTD::move(__out_it));
__first += *__r;
if (__r == __e)
@ -560,16 +558,16 @@ _LIBCPP_HIDE_FROM_ABI _OutIt __format_locale_specific_form(
// fractional part
if (__result.__radix_point != __result.__last) {
*__out_it++ = __np.decimal_point();
__out_it = _VSTD::copy(__result.__radix_point + 1, __result.__exponent, _VSTD::move(__out_it));
__out_it = _VSTD::fill_n(_VSTD::move(__out_it), __buffer.__num_trailing_zeros(), _CharT('0'));
__out_it = __formatter::__copy(__result.__radix_point + 1, __result.__exponent, _VSTD::move(__out_it));
__out_it = __formatter::__fill(_VSTD::move(__out_it), __buffer.__num_trailing_zeros(), _CharT('0'));
}
// exponent
if (__result.__exponent != __result.__last)
__out_it = _VSTD::copy(__result.__exponent, __result.__last, _VSTD::move(__out_it));
__out_it = __formatter::__copy(__result.__exponent, __result.__last, _VSTD::move(__out_it));
// alignment
return _VSTD::fill_n(_VSTD::move(__out_it), __padding.__after_, __specs.__fill_);
return __formatter::__fill(_VSTD::move(__out_it), __padding.__after_, __specs.__fill_);
}
# endif // _LIBCPP_HAS_NO_LOCALIZATION
@ -651,14 +649,15 @@ __format_floating_point(_Tp __value, auto& __ctx, __format_spec::__parsed_specif
if (__size + __num_trailing_zeros >= __specs.__width_) {
if (__num_trailing_zeros && __result.__exponent != __result.__last)
// Insert trailing zeros before exponent character.
return _VSTD::copy(
return __formatter::__copy(
__result.__exponent,
__result.__last,
_VSTD::fill_n(
_VSTD::copy(__buffer.begin(), __result.__exponent, __ctx.out()), __num_trailing_zeros, _CharT('0')));
__formatter::__fill(__formatter::__copy(__buffer.begin(), __result.__exponent, __ctx.out()),
__num_trailing_zeros,
_CharT('0')));
return _VSTD::fill_n(
_VSTD::copy(__buffer.begin(), __result.__last, __ctx.out()), __num_trailing_zeros, _CharT('0'));
return __formatter::__fill(
__formatter::__copy(__buffer.begin(), __result.__last, __ctx.out()), __num_trailing_zeros, _CharT('0'));
}
auto __out_it = __ctx.out();

View File

@ -243,7 +243,7 @@ _LIBCPP_HIDE_FROM_ABI auto __format_integer(
// The zero padding is done like:
// - Write [sign][prefix]
// - Write data right aligned with '0' as fill character.
__out_it = _VSTD::copy(__begin, __first, _VSTD::move(__out_it));
__out_it = __formatter::__copy(__begin, __first, _VSTD::move(__out_it));
__specs.__alignment_ = __format_spec::__alignment::__right;
__specs.__fill_ = _CharT('0');
int32_t __size = __first - __begin;

View File

@ -14,10 +14,13 @@
#include <__algorithm/copy_n.h>
#include <__algorithm/fill_n.h>
#include <__algorithm/transform.h>
#include <__concepts/same_as.h>
#include <__config>
#include <__format/buffer.h>
#include <__format/formatter.h>
#include <__format/parser_std_format_spec.h>
#include <__format/unicode.h>
#include <__iterator/back_insert_iterator.h>
#include <__utility/move.h>
#include <__utility/unreachable.h>
#include <cstddef>
@ -86,6 +89,63 @@ __padding_size(size_t __size, size_t __width, __format_spec::__alignment __align
__libcpp_unreachable();
}
/// Copy wrapper.
///
/// This uses a "mass output function" of __format::__output_buffer when possible.
template <__formatter::__char_type _CharT, __formatter::__char_type _OutCharT = _CharT>
_LIBCPP_HIDE_FROM_ABI auto __copy(basic_string_view<_CharT> __str, output_iterator<const _OutCharT&> auto __out_it)
-> decltype(__out_it) {
if constexpr (_VSTD::same_as<decltype(__out_it), _VSTD::back_insert_iterator<__format::__output_buffer<_OutCharT>>>) {
__out_it.__get_container()->__copy(__str);
return __out_it;
} else {
return std::copy_n(__str.data(), __str.size(), _VSTD::move(__out_it));
}
}
template <__formatter::__char_type _CharT, __formatter::__char_type _OutCharT = _CharT>
_LIBCPP_HIDE_FROM_ABI auto
__copy(const _CharT* __first, const _CharT* __last, output_iterator<const _OutCharT&> auto __out_it)
-> decltype(__out_it) {
return __formatter::__copy(basic_string_view{__first, __last}, _VSTD::move(__out_it));
}
template <__formatter::__char_type _CharT, __formatter::__char_type _OutCharT = _CharT>
_LIBCPP_HIDE_FROM_ABI auto __copy(const _CharT* __first, size_t __n, output_iterator<const _OutCharT&> auto __out_it)
-> decltype(__out_it) {
return __formatter::__copy(basic_string_view{__first, __n}, _VSTD::move(__out_it));
}
/// Transform wrapper.
///
/// This uses a "mass output function" of __format::__output_buffer when possible.
template <__formatter::__char_type _CharT, __formatter::__char_type _OutCharT = _CharT, class _UnaryOperation>
_LIBCPP_HIDE_FROM_ABI auto
__transform(const _CharT* __first,
const _CharT* __last,
output_iterator<const _OutCharT&> auto __out_it,
_UnaryOperation __operation) -> decltype(__out_it) {
if constexpr (_VSTD::same_as<decltype(__out_it), _VSTD::back_insert_iterator<__format::__output_buffer<_OutCharT>>>) {
__out_it.__get_container()->__transform(__first, __last, _VSTD::move(__operation));
return __out_it;
} else {
return std::transform(__first, __last, _VSTD::move(__out_it), __operation);
}
}
/// Fill wrapper.
///
/// This uses a "mass output function" of __format::__output_buffer when possible.
template <__formatter::__char_type _CharT, output_iterator<const _CharT&> _OutIt>
_LIBCPP_HIDE_FROM_ABI _OutIt __fill(_OutIt __out_it, size_t __n, _CharT __value) {
if constexpr (_VSTD::same_as<decltype(__out_it), _VSTD::back_insert_iterator<__format::__output_buffer<_CharT>>>) {
__out_it.__get_container()->__fill(__n, __value);
return __out_it;
} else {
return std::fill_n(_VSTD::move(__out_it), __n, __value);
}
}
template <class _OutIt, class _CharT>
_LIBCPP_HIDE_FROM_ABI _OutIt __write_using_decimal_separators(_OutIt __out_it, const char* __begin, const char* __first,
const char* __last, string&& __grouping, _CharT __sep,
@ -97,22 +157,22 @@ _LIBCPP_HIDE_FROM_ABI _OutIt __write_using_decimal_separators(_OutIt __out_it, c
__padding_size_result __padding = {0, 0};
if (__specs.__alignment_ == __format_spec::__alignment::__zero_padding) {
// Write [sign][prefix].
__out_it = _VSTD::copy(__begin, __first, _VSTD::move(__out_it));
__out_it = __formatter::__copy(__begin, __first, _VSTD::move(__out_it));
if (__specs.__width_ > __size) {
// Write zero padding.
__padding.__before_ = __specs.__width_ - __size;
__out_it = _VSTD::fill_n(_VSTD::move(__out_it), __specs.__width_ - __size, _CharT('0'));
__out_it = __formatter::__fill(_VSTD::move(__out_it), __specs.__width_ - __size, _CharT('0'));
}
} else {
if (__specs.__width_ > __size) {
// Determine padding and write padding.
__padding = __padding_size(__size, __specs.__width_, __specs.__alignment_);
__out_it = _VSTD::fill_n(_VSTD::move(__out_it), __padding.__before_, __specs.__fill_);
__out_it = __formatter::__fill(_VSTD::move(__out_it), __padding.__before_, __specs.__fill_);
}
// Write [sign][prefix].
__out_it = _VSTD::copy(__begin, __first, _VSTD::move(__out_it));
__out_it = __formatter::__copy(__begin, __first, _VSTD::move(__out_it));
}
auto __r = __grouping.rbegin();
@ -133,10 +193,10 @@ _LIBCPP_HIDE_FROM_ABI _OutIt __write_using_decimal_separators(_OutIt __out_it, c
while (true) {
if (__specs.__std_.__type_ == __format_spec::__type::__hexadecimal_upper_case) {
__last = __first + *__r;
__out_it = _VSTD::transform(__first, __last, _VSTD::move(__out_it), __hex_to_upper);
__out_it = __formatter::__transform(__first, __last, _VSTD::move(__out_it), __hex_to_upper);
__first = __last;
} else {
__out_it = _VSTD::copy_n(__first, *__r, _VSTD::move(__out_it));
__out_it = __formatter::__copy(__first, *__r, _VSTD::move(__out_it));
__first += *__r;
}
@ -147,7 +207,7 @@ _LIBCPP_HIDE_FROM_ABI _OutIt __write_using_decimal_separators(_OutIt __out_it, c
*__out_it++ = __sep;
}
return _VSTD::fill_n(_VSTD::move(__out_it), __padding.__after_, __specs.__fill_);
return __formatter::__fill(_VSTD::move(__out_it), __padding.__after_, __specs.__fill_);
}
/// Writes the input to the output with the required padding.
@ -155,12 +215,10 @@ _LIBCPP_HIDE_FROM_ABI _OutIt __write_using_decimal_separators(_OutIt __out_it, c
/// Since the output column width is specified the function can be used for
/// ASCII and Unicode output.
///
/// \pre [\a __first, \a __last) is a valid range.
/// \pre \a __size <= \a __width. Using this function when this pre-condition
/// doesn't hold incurs an unwanted overhead.
///
/// \param __first Pointer to the first element to write.
/// \param __last Pointer beyond the last element to write.
/// \param __str The string to write.
/// \param __out_it The output iterator to write to.
/// \param __specs The parsed formatting specifications.
/// \param __size The (estimated) output column width. When the elements
@ -174,31 +232,42 @@ _LIBCPP_HIDE_FROM_ABI _OutIt __write_using_decimal_separators(_OutIt __out_it, c
/// conversion, which means the [\a __first, \a __last) always contains elements
/// of the type \c char.
template <class _CharT, class _ParserCharT>
_LIBCPP_HIDE_FROM_ABI auto __write(
const _CharT* __first,
const _CharT* __last,
output_iterator<const _CharT&> auto __out_it,
__format_spec::__parsed_specifications<_ParserCharT> __specs,
ptrdiff_t __size) -> decltype(__out_it) {
_LIBCPP_ASSERT(__first <= __last, "Not a valid range");
_LIBCPP_HIDE_FROM_ABI auto
__write(basic_string_view<_CharT> __str,
output_iterator<const _CharT&> auto __out_it,
__format_spec::__parsed_specifications<_ParserCharT> __specs,
ptrdiff_t __size) -> decltype(__out_it) {
if (__size >= __specs.__width_)
return _VSTD::copy(__first, __last, _VSTD::move(__out_it));
return __formatter::__copy(__str, _VSTD::move(__out_it));
__padding_size_result __padding = __formatter::__padding_size(__size, __specs.__width_, __specs.__std_.__alignment_);
__out_it = _VSTD::fill_n(_VSTD::move(__out_it), __padding.__before_, __specs.__fill_);
__out_it = _VSTD::copy(__first, __last, _VSTD::move(__out_it));
return _VSTD::fill_n(_VSTD::move(__out_it), __padding.__after_, __specs.__fill_);
__out_it = __formatter::__fill(_VSTD::move(__out_it), __padding.__before_, __specs.__fill_);
__out_it = __formatter::__copy(__str, _VSTD::move(__out_it));
return __formatter::__fill(_VSTD::move(__out_it), __padding.__after_, __specs.__fill_);
}
template <class _CharT, class _ParserCharT>
_LIBCPP_HIDE_FROM_ABI auto
__write(const _CharT* __first,
const _CharT* __last,
output_iterator<const _CharT&> auto __out_it,
__format_spec::__parsed_specifications<_ParserCharT> __specs,
ptrdiff_t __size) -> decltype(__out_it) {
_LIBCPP_ASSERT(__first <= __last, "Not a valid range");
return __formatter::__write(basic_string_view{__first, __last}, _VSTD::move(__out_it), __specs, __size);
}
/// \overload
///
/// Calls the function above where \a __size = \a __last - \a __first.
template <class _CharT, class _ParserCharT>
_LIBCPP_HIDE_FROM_ABI auto __write(const _CharT* __first, const _CharT* __last,
output_iterator<const _CharT&> auto __out_it,
__format_spec::__parsed_specifications<_ParserCharT> __specs) -> decltype(__out_it) {
return __write(__first, __last, _VSTD::move(__out_it), __specs, __last - __first);
_LIBCPP_HIDE_FROM_ABI auto
__write(const _CharT* __first,
const _CharT* __last,
output_iterator<const _CharT&> auto __out_it,
__format_spec::__parsed_specifications<_ParserCharT> __specs) -> decltype(__out_it) {
_LIBCPP_ASSERT(__first <= __last, "Not a valid range");
return __formatter::__write(__first, __last, _VSTD::move(__out_it), __specs, __last - __first);
}
template <class _CharT, class _ParserCharT, class _UnaryOperation>
@ -210,12 +279,12 @@ _LIBCPP_HIDE_FROM_ABI auto __write_transformed(const _CharT* __first, const _Cha
ptrdiff_t __size = __last - __first;
if (__size >= __specs.__width_)
return _VSTD::transform(__first, __last, _VSTD::move(__out_it), __op);
return __formatter::__transform(__first, __last, _VSTD::move(__out_it), __op);
__padding_size_result __padding = __padding_size(__size, __specs.__width_, __specs.__alignment_);
__out_it = _VSTD::fill_n(_VSTD::move(__out_it), __padding.__before_, __specs.__fill_);
__out_it = _VSTD::transform(__first, __last, _VSTD::move(__out_it), __op);
return _VSTD::fill_n(_VSTD::move(__out_it), __padding.__after_, __specs.__fill_);
__out_it = __formatter::__fill(_VSTD::move(__out_it), __padding.__before_, __specs.__fill_);
__out_it = __formatter::__transform(__first, __last, _VSTD::move(__out_it), __op);
return __formatter::__fill(_VSTD::move(__out_it), __padding.__after_, __specs.__fill_);
}
/// Writes additional zero's for the precision before the exponent.
@ -240,11 +309,11 @@ _LIBCPP_HIDE_FROM_ABI auto __write_using_trailing_zeros(
__padding_size_result __padding =
__padding_size(__size + __num_trailing_zeros, __specs.__width_, __specs.__alignment_);
__out_it = _VSTD::fill_n(_VSTD::move(__out_it), __padding.__before_, __specs.__fill_);
__out_it = _VSTD::copy(__first, __exponent, _VSTD::move(__out_it));
__out_it = _VSTD::fill_n(_VSTD::move(__out_it), __num_trailing_zeros, _CharT('0'));
__out_it = _VSTD::copy(__exponent, __last, _VSTD::move(__out_it));
return _VSTD::fill_n(_VSTD::move(__out_it), __padding.__after_, __specs.__fill_);
__out_it = __formatter::__fill(_VSTD::move(__out_it), __padding.__before_, __specs.__fill_);
__out_it = __formatter::__copy(__first, __exponent, _VSTD::move(__out_it));
__out_it = __formatter::__fill(_VSTD::move(__out_it), __num_trailing_zeros, _CharT('0'));
__out_it = __formatter::__copy(__exponent, __last, _VSTD::move(__out_it));
return __formatter::__fill(_VSTD::move(__out_it), __padding.__after_, __specs.__fill_);
}
/// Writes a string using format's width estimation algorithm.
@ -262,7 +331,7 @@ _LIBCPP_HIDE_FROM_ABI auto __write_string_no_precision(
// No padding -> copy the string
if (!__specs.__has_width())
return _VSTD::copy(__str.begin(), __str.end(), _VSTD::move(__out_it));
return __formatter::__copy(__str, _VSTD::move(__out_it));
// Note when the estimated width is larger than size there's no padding. So
// there's no reason to get the real size when the estimate is larger than or
@ -270,8 +339,7 @@ _LIBCPP_HIDE_FROM_ABI auto __write_string_no_precision(
size_t __size =
__format_spec::__estimate_column_width(__str, __specs.__width_, __format_spec::__column_width_rounding::__up)
.__width_;
return __formatter::__write(__str.begin(), __str.end(), _VSTD::move(__out_it), __specs, __size);
return __formatter::__write(__str, _VSTD::move(__out_it), __specs, __size);
}
template <class _CharT>

View File

@ -88,6 +88,8 @@ void test_unsigned_integral_type() {
test_termination_condition(
STR("340282366920938463463374607431768211455"), STR("}"), A(std::numeric_limits<__uint128_t>::max()));
#endif
// Test __formatter::__transform (libc++ specific).
test_termination_condition(STR("FF"), STR("X}"), A(255));
}
template <class CharT>

View File

@ -2557,6 +2557,68 @@ void format_test_pointer(TestFunction check, ExceptionTest check_exception) {
format_test_pointer<const void*, CharT>(check, check_exception);
}
/// Tests special buffer functions with a "large" input.
///
/// This is a test specific for libc++, however the code should behave the same
/// on all implementations.
/// In \c __format::__output_buffer there are some special functions to optimize
/// outputting multiple characters, \c __copy, \c __transform, \c __fill. This
/// test validates whether the functions behave properly when the output size
/// doesn't fit in its internal buffer.
template <class CharT, class TestFunction>
void format_test_buffer_optimizations(TestFunction check) {
#ifdef _LIBCPP_VERSION
// Used to validate our test sets are the proper size.
// To test the chunked operations it needs to be larger than the internal
// buffer. Picked a nice looking number.
constexpr int minimum = 3 * std::__format::__internal_storage<CharT>::__buffer_size;
#else
constexpr int minimum = 1;
#endif
// Copy
std::basic_string<CharT> str = STR(
"The quick brown fox jumps over the lazy dog."
"The quick brown fox jumps over the lazy dog."
"The quick brown fox jumps over the lazy dog."
"The quick brown fox jumps over the lazy dog."
"The quick brown fox jumps over the lazy dog."
"The quick brown fox jumps over the lazy dog."
"The quick brown fox jumps over the lazy dog."
"The quick brown fox jumps over the lazy dog."
"The quick brown fox jumps over the lazy dog."
"The quick brown fox jumps over the lazy dog."
"The quick brown fox jumps over the lazy dog."
"The quick brown fox jumps over the lazy dog."
"The quick brown fox jumps over the lazy dog."
"The quick brown fox jumps over the lazy dog."
"The quick brown fox jumps over the lazy dog."
"The quick brown fox jumps over the lazy dog."
"The quick brown fox jumps over the lazy dog."
"The quick brown fox jumps over the lazy dog."
"The quick brown fox jumps over the lazy dog."
"The quick brown fox jumps over the lazy dog."
"The quick brown fox jumps over the lazy dog."
"The quick brown fox jumps over the lazy dog."
"The quick brown fox jumps over the lazy dog."
"The quick brown fox jumps over the lazy dog."
"The quick brown fox jumps over the lazy dog."
"The quick brown fox jumps over the lazy dog."
"The quick brown fox jumps over the lazy dog."
"The quick brown fox jumps over the lazy dog."
"The quick brown fox jumps over the lazy dog."
"The quick brown fox jumps over the lazy dog.");
assert(str.size() > minimum);
check.template operator()<"{}">(std::basic_string_view<CharT>{str}, str);
// Fill
std::basic_string<CharT> fill(minimum, CharT('*'));
check.template operator()<"{:*<{}}">(std::basic_string_view<CharT>{str + fill}, str, str.size() + minimum);
check.template operator()<"{:*^{}}">(
std::basic_string_view<CharT>{fill + str + fill}, str, minimum + str.size() + minimum);
check.template operator()<"{:*>{}}">(std::basic_string_view<CharT>{fill + str}, str, minimum + str.size());
}
template <class CharT, class TestFunction, class ExceptionTest>
void format_tests(TestFunction check, ExceptionTest check_exception) {
// *** Test escaping ***
@ -2671,6 +2733,9 @@ void format_tests(TestFunction check, ExceptionTest check_exception) {
// *** Test handle formatter argument ***
format_test_handle<CharT>(check, check_exception);
// *** Test the interal buffer optimizations ***
format_test_buffer_optimizations<CharT>(check);
}
#ifndef TEST_HAS_NO_WIDE_CHARACTERS