llvm-project/llvm/unittests/Support
Kirill Bobyrev 5f26a642e6 [llvm] Make YAML serialization up to 2.5 times faster
This patch significantly improves performance of the YAML serializer by
optimizing `YAML::isNumeric` function. This function is called on the
most strings and is highly inefficient for two reasons:

* It uses `Regex`, which is parsed and compiled each time this
  function is called
* It uses multiple passes which are not necessary

This patch introduces stateful ad hoc YAML number parser which does not
rely on `Regex`. It also fixes YAML number format inconsistency: current
implementation supports C-stile octal number format (`01234567`) which
was present in YAML 1.0 specialization (http://yaml.org/spec/1.0/),
[Section 2.4. Tags, Example 2.19] but was deprecated and is no longer
present in latest YAML 1.2 specification
(http://yaml.org/spec/1.2/spec.html), see [Section 10.3.2. Tag
Resolution]. Since the rest of the rest of the implementation does not
support other deprecated YAML 1.0 numeric features such as sexagecimal
numbers, commas as delimiters it is treated as inconsistency and not
longer supported. This patch also adds unit tests to ensure the validity
of proposed implementation.

This performance bottleneck was identified while profiling Clangd's
global-symbol-builder tool with my colleague @ilya-biryukov. The
substantial part of the runtime was spent during a single-thread Reduce
phase, which concludes with YAML serialization of collected symbol
collection. Regex matching was accountable for approximately 45% of the
whole runtime (which involves sharded Map phase), now it is reduced to
18% (which is spent in `clang::clangd::CanonicalIncludes` and can be
also optimized because all used regexes are in fact either suffix
matches or exact matches).

`llvm-yaml-numeric-parser-fuzzer` was used to ensure the validity of the
proposed regex replacement. Fuzzing for ~60 hours using 10 threads did
not expose any bugs.

Benchmarking `global-symbol-builder` (using `hyperfine --warmup 2
--min-runs 5 'command 1' 'command 2'`) tool by processing a reasonable
amount of code (26 source files matched by
`clang-tools-extra/clangd/*.cpp` with all transitive includes) confirmed
our understanding of the performance bottleneck nature as it speeds up
the command by the factor of 1.6x:

| Command | Mean [s] | Min…Max [s] |
| this patch (D50839) | 84.7 ± 0.6 | 83.3…84.7 |
| master (rL339849) | 133.1 ± 0.8 | 132.4…134.6 |

Using smaller samples (e.g. by collecting symbols from
`clang-tools-extra/clangd/AST.cpp` only) yields even better performance
improvement, which is expected because Map phase takes less time
compared to Reduce and is 2.05x faster and therefore would significantly
improve the performance of standalone YAML serializations.

| Command | Mean [ms] | Min…Max [ms] |
| this patch (D50839) | 3702.2 ± 48.7 | 3635.1…3752.3 |
| master (rL339849) | 7607.6 ± 109.5 | 7533.3…7796.4 |

Reviewed by: zturner, ilya-biryukov

Differential revision: https://reviews.llvm.org/D50839

llvm-svn: 340154
2018-08-20 07:00:36 +00:00
..
DynamicLibrary [Unittests] Change linker flags of dynamic library tests 2018-06-11 09:15:37 +00:00
ARMAttributeParser.cpp Remove redundant includes from unittests. 2017-12-13 21:31:05 +00:00
AlignOfTest.cpp
AllocatorTest.cpp Report fatal error in the case of out of memory 2018-02-20 05:41:26 +00:00
ArrayRecyclerTest.cpp
BinaryStreamTest.cpp Remove redundant includes from unittests. 2017-12-13 21:31:05 +00:00
BlockFrequencyTest.cpp
BranchProbabilityTest.cpp
CMakeLists.txt [DebugCounters] Keep track of total counts 2018-07-23 21:49:36 +00:00
CachePruningTest.cpp Unbreak the build. Combining chrono with Optional is annoying. 2017-12-22 21:18:50 +00:00
Casting.cpp
CheckedArithmeticTest.cpp Add checkMulAdd helper function to CheckedArithmetic 2018-06-13 18:32:02 +00:00
Chrono.cpp Support formatv of TimePoint with strftime-style formats. 2017-10-24 08:30:19 +00:00
CommandLineTest.cpp Do not enforce absolute path argv0 in windows 2018-06-13 14:29:26 +00:00
CompressionTest.cpp Use the same constants as zlib to represent compression level. 2018-08-04 00:13:13 +00:00
ConvertUTFTest.cpp Remove redundant includes from unittests. 2017-12-13 21:31:05 +00:00
CrashRecoveryTest.cpp s/LLVM_ON_WIN32/_WIN32/, llvm 2018-04-29 00:45:03 +00:00
DJBTest.cpp Resubmit r325107 (case folding DJB hash) 2018-02-21 22:36:31 +00:00
DataExtractorTest.cpp Re-sort #include lines for unittests. This uses a slightly modified 2017-06-06 11:06:56 +00:00
DebugCounterTest.cpp [DebugCounters] Keep track of total counts 2018-07-23 21:49:36 +00:00
DebugTest.cpp
EndianStreamTest.cpp Support: Simplify endian stream interface. NFCI. 2018-05-18 19:46:24 +00:00
EndianTest.cpp
ErrnoTest.cpp [Support] Clear errno before calling the function in RetryAfterSignal. 2018-07-07 02:46:12 +00:00
ErrorOrTest.cpp Fix incorrect usage of std::is_assignable. 2018-02-02 22:29:54 +00:00
ErrorTest.cpp [Support] Add a basic C API for llvm::Error. 2018-08-15 18:42:11 +00:00
FileOutputBufferTest.cpp [SupportTests] Silence -Wsign-compare warnings 2018-06-28 21:03:24 +00:00
FormatVariadicTest.cpp [Support] Require llvm::Error passed to formatv() to be wrapped in fmt_consume() 2018-07-12 07:11:28 +00:00
GlobPatternTest.cpp [Support/GlobPattern] - Do not crash when pattern has characters with int value < 0. 2017-07-31 09:26:50 +00:00
Host.cpp Refactor ExecuteAndWait to take StringRefs. 2018-06-12 17:43:52 +00:00
JSONTest.cpp [Support] Harded JSON against invalid UTF-8. 2018-07-10 11:51:26 +00:00
LEB128Test.cpp Change encodeU/SLEB128 to pad to certain number of bytes 2017-09-15 20:34:47 +00:00
LineIteratorTest.cpp
LockFileManagerTest.cpp [FileSystem] Split up the OpenFlags enumeration. 2018-06-07 19:58:58 +00:00
MD5Test.cpp Remove \brief commands from doxygen comments. 2018-05-01 15:54:18 +00:00
ManagedStatic.cpp Report fatal error in the case of out of memory 2018-02-20 05:41:26 +00:00
MathExtrasTest.cpp MathExtras UnitTest: Assert that isPowerOf2(0) is false. NFC. 2017-07-03 18:42:47 +00:00
MemoryBufferTest.cpp [Support] Pacify -Wsign-compare in unit test. 2018-03-08 21:54:30 +00:00
MemoryTest.cpp Untabify. 2017-10-18 13:31:28 +00:00
NativeFormatTests.cpp
ParallelTest.cpp Remove \brief commands from doxygen comments. 2018-05-01 15:54:18 +00:00
Path.cpp [Support] NFC: Allow modifying access/modification times independently in sys::fs::setLastModificationAndAccessTime. 2018-08-13 23:03:45 +00:00
ProcessTest.cpp s/LLVM_ON_WIN32/_WIN32/, llvm 2018-04-29 00:45:03 +00:00
ProgramTest.cpp Refactor ExecuteAndWait to take StringRefs. 2018-06-12 17:43:52 +00:00
RegexTest.cpp Fix llvm-special-case-list-fuzzer regexp exception 2017-10-27 19:15:13 +00:00
ReplaceFileTest.cpp [FileSystem] Split up the OpenFlags enumeration. 2018-06-07 19:58:58 +00:00
ReverseIterationTest.cpp [unittest/ReverseIteration] Unbreak when compiling with GCC. 2017-09-05 21:27:23 +00:00
ScaledNumberTest.cpp
SourceMgrTest.cpp [Support] Make line-number cache robust against access patterns. 2018-04-07 00:44:02 +00:00
SpecialCaseListTest.cpp Extend SpecialCaseList to allow users to blame matches on entries in the file. 2017-11-07 21:16:46 +00:00
StringPool.cpp
SwapByteOrderTest.cpp Re-sort #include lines for unittests. This uses a slightly modified 2017-06-06 11:06:56 +00:00
TarWriterTest.cpp Fix build bot after r319750 "[Support/TarWriter] - Don't allow TarWriter to add the same file more than once." 2017-12-05 10:35:11 +00:00
TargetParserTest.cpp [ARM/AArch64] Support FP16 +fp16fml instructions 2018-08-17 11:29:49 +00:00
TaskQueueTest.cpp Build TaskQueueTest in threads=on builds, fixes regression from r335608. 2018-06-27 11:52:30 +00:00
ThreadLocalTest.cpp
ThreadPool.cpp Revert "Enable ThreadPool to queue tasks that return values." 2018-06-13 21:24:19 +00:00
Threading.cpp
TimerTest.cpp s/LLVM_ON_WIN32/_WIN32/, llvm 2018-04-29 00:45:03 +00:00
TrailingObjectsTest.cpp
TrigramIndexTest.cpp Re-sort #include lines for unittests. This uses a slightly modified 2017-06-06 11:06:56 +00:00
TypeNameTest.cpp
TypeTraitsTest.cpp Remove extra semicolon (fixes -Wpedantic warning). NFCI. 2018-08-13 10:05:34 +00:00
UnicodeTest.cpp
VersionTupleTest.cpp Move VersionTuple from clang/Basic to llvm/Support 2018-06-11 10:28:04 +00:00
YAMLIOTest.cpp [llvm] Make YAML serialization up to 2.5 times faster 2018-08-20 07:00:36 +00:00
YAMLParserTest.cpp [YAMLParser] Don't crash on null keys in KeyValueNodes. 2017-11-23 20:57:20 +00:00
formatted_raw_ostream_test.cpp Re-sort #include lines for unittests. This uses a slightly modified 2017-06-06 11:06:56 +00:00
raw_ostream_test.cpp Support: Add llvm::center_justify. 2017-07-13 16:11:08 +00:00
raw_pwrite_stream_test.cpp [FileSystem] Split up the OpenFlags enumeration. 2018-06-07 19:58:58 +00:00
raw_sha1_ostream_test.cpp Re-sort #include lines for unittests. This uses a slightly modified 2017-06-06 11:06:56 +00:00
xxhashTest.cpp