In some places the parser guards against dereferencing `End`, while in
others it relies on the presence of a trailing `'\0'` to elide checks.
Add the remaining guards needed to ensure the parser never attempts to
dereference `End`, making it safe to not require a null-terminated input
buffer.
Update the parser fuzzer harness so that it tests with buffers that are
guaranteed to be non-null-terminated, null-terminated, and 1-terminated,
additionally ensuring the result of the parse is the same in each case.
Some of the regression tests were written by inspection, and some are
cases caught by the fuzzer which required additional fixes in the
parser.
Differential Revision: https://reviews.llvm.org/D84050
The `Range` of an alias/anchor token includes the leading `&` or `*`,
but it is skipped while parsing the name. The check for an empty name
fails to account for the skipped leading character and so the error is
never hit.
Fix the off-by-one and add a couple regression tests.
Reviewed By: dexonsmith
Differential Revision: https://reviews.llvm.org/D91462
This reverts commit 1b589f4d4d and relands the D89463
with the fix: update `MappingTraits<FileFilter>::validate()` in ClangTidyOptions.cpp to
match the new signature (change the return type to "std::string" from "StringRef").
Original commit message:
This:
Changes the return type of MappingTraits<T>>::validate to std::string
instead of StringRef. It allows to create more complex error messages.
It introduces std::vector<std::pair<StringRef, bool>> getEntries():
a new virtual method of Section, which is the base class for all sections.
It returns names of special section specific keys (e.g. "Entries") and flags that says if them exist in a YAML.
The code in validate() uses this list of entries descriptions to generalize validation.
This approach was discussed in the D89039 thread.
Differential revision: https://reviews.llvm.org/D89463
This:
1) Changes the return type of `MappingTraits<T>>::validate` to `std::string`
instead of `StringRef`. It allows to create more complex error messages.
2) It introduces std::vector<std::pair<StringRef, bool>> getEntries():
a new virtual method of Section, which is the base class for all sections.
It returns names of special section specific keys (e.g. "Entries") and flags that
says if them exist in a YAML. The code in validate() uses this list of entries
descriptions to generalize validation.
This approach was discussed in the D89039 thread.
Differential revision: https://reviews.llvm.org/D89463
Summary:
New line duplication logic introduced in https://reviews.llvm.org/D63482
has two issues: (1) there is no logic that removes duplicate newlines
when clang-apply-replacment reads YAML and (2) in general such logic
should be applied to all strings and should happen on string
serialization level instead in YAML parser.
This diff changes multiline strings quotation from single quote `'` to
double `"`. It solves problems with internal newlines because now they are
escaped. Also double quotation solves the problem with leading whitespace after
newline. In case of single quotation YAML parsers should remove leading
whitespace according to specification. In case of double quotation these
leading are internal space and they are preserved. There is no way to
instruct YAML parsers to preserve leading whitespaces after newline so
double quotation is the only viable option that solves all problems at
once.
Test Plan: check-all
Reviewers: gribozavr, mgehre, yvvan
Subscribers: xazax.hun, hiraditya, cfe-commits, llvm-commits
Tags: #clang-tools-extra, #clang, #llvm
Differential Revision: https://reviews.llvm.org/D80301
Lots of headers pass around MemoryBuffer objects, but very few open
them. Let those that do include FileSystem.h.
Saves ~250 includes of Chrono.h & FileSystem.h:
$ diff -u thedeps-before.txt thedeps-after.txt | grep '^[-+] ' | sort | uniq -c | sort -nr
254 - ../llvm/include/llvm/Support/FileSystem.h
253 - ../llvm/include/llvm/Support/Chrono.h
237 - ../llvm/include/llvm/Support/NativeFormatting.h
237 - ../llvm/include/llvm/Support/FormatProviders.h
192 - ../llvm/include/llvm/ADT/StringSwitch.h
190 - ../llvm/include/llvm/Support/FormatVariadicDetails.h
...
This requires duplicating the file_t typedef, which is unfortunate. I
sunk the choice of mapping mode down into the cpp file using variable
template specializations instead of class members in headers.
Summary: This patch fixes a number of bugs found in the YAML parser
through fuzzing. In general, this makes the parser more robust against
malformed inputs.
The fixes are mostly improved null checking and returning errors in
more cases. In some cases, asserts were changed to regular errors,
this provides the same robustness but also protects release builds
from the triggering conditions. This also improves the fuzzability of
the YAML parser since asserts can act as a roadblock to further
fuzzing once they're hit.
Each fix has a corresponding test case:
- TestAnchorMapError - Added proper null pointer handling in
`Stream::printError` if N is null and `KeyValueNode::getValue` if
getKey returns null, `Input::createHNodes` `dyn_casts` changed to
`dyn_cast_or_null` so the null pointer checks are actually able to
fail
- TestFlowSequenceTokenErrors - Added case in
`Document::parseBlockNode` for FlowMappingEnd, FlowSequenceEnd, or
FlowEntry tokens outside of mappings or sequences
- TestDirectiveMappingNoValue - Changed assert to regular error
return in `Scanner::scanValue`
- TestUnescapeInfiniteLoop - Fixed infinite loop in
`ScalarNode::unescapeDoubleQuoted` by returning an error for
unrecognized escape codes
- TestScannerUnexpectedCharacter - Changed asserts to regular error
returns in `Scanner::consume`
- TestUnknownDirective - For both of the inputs the stream doesn't
fail and correctly returns TK_Error, but there is no valid root
node for the document. There's no reasonable way to make the
scanner fail for unknown directives without breaking the YAML spec
(see spec-07-01.test). I think the assert is unnecessary given
that an error is still generated for this case.
The `SimpleKeys.clear()` line fixes a bug found by AddressSanitizer
triggered by multiple test cases - when TokenQueue is cleared
SimpleKeys is still holding dangling pointers into it, so SimpleKeys
should be cleared as well.
Patch by Thomas Finch!
Reviewers: chandlerc, Bigcheese, hintonda
Reviewed By: Bigcheese, hintonda
Subscribers: hintonda, kristina, beanz, dexonsmith, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D61608
I am using it in https://reviews.llvm.org/D69399.
This change changes how obj2yaml dumps arrays of `llvm::yaml::Hex8/llvm::yaml::Hex16/llvm::yaml::Hex32`
from:
```
PayloadBytes:
- 0x01
- 0x02
...
```
To
```
PayloadBytes: [ 0x01, 0x02, ... ]
```
The latter way is shorter and looks better for arrays.
Differential revision: https://reviews.llvm.org/D69558
Now that we've moved to C++14, we no longer need the llvm::make_unique
implementation from STLExtras.h. This patch is a mechanical replacement
of (hopefully) all the llvm::make_unique instances across the monorepo.
llvm-svn: 369013
llvm::yaml::Output::paddedKey unconditionally outputs spaces, which
are superfluous if the value to be dumped is a sequence or map.
Change `bool NeedsNewLine` to `StringRef Padding` so that it can be
overridden to `\n` if the value is a sequence or map.
An empty map/sequence is special. It is printed as `{}` or `[]` without
a newline, while a non-empty map/sequence follows a newline. To handle
this distinction, add another variable `PaddingBeforeContainer` and does
the special handling in endMapping/endSequence.
Reviewed By: grimar, jhenderson
Differential Revision: https://reviews.llvm.org/D64566
llvm-svn: 365869
Summary:
A bug/typo in Output::scalarString caused us to round-trip a StringRef
through a const char *. This meant that any strings with embedded nuls
were unintentionally cut short at the first such character. (It also
could have caused accidental buffer overruns, but it seems that all
StringRefs coming into this functions were formed from null-terminated
strings.)
This patch fixes the bug and adds an appropriate test.
Reviewers: sammccall, jhenderson
Subscribers: kristina, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D60505
llvm-svn: 358176
Summary:
Now that endian types support enumerations (D59141), the existing yaml
support for them is somewhat insufficient. The current solution was to
define the ScalarTraits class for these types, which always forwards to
the ScalarTraits of the underlying type. However, the enum types will
usually have ScalarEnumerationTraits of ScalarBitsetTraits.
In this patch I add the two extra Traits types to the endian types. In
order to properly SFINAE-ize them, I've also added an extra "Enable"
template argument to the Traits template classes.
Reviewers: zturner, sammccall
Subscribers: kristina, Bigcheese, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D59289
llvm-svn: 356269
Summary:
The way c++ template argument deduction works, both arguments are used
to deduce the template type in the three-argument overload of
mapOptional. This is a problem if the types are slightly different, even
if they are implicitly convertible. This is fairly easy to trigger with
integral types, as the default type of most integral constants is int,
which then requires casting the constant to the type of the other
argument.
This patch fixes that by using a separate template type for the default
value, which is then cast to the type of the first argument. To avoid
this conversion triggerring conversions marged as explicit, we use
static_assert to check that the types are implicitly convertible.
Reviewers: zturner, sammccall
Subscribers: kristina, jdoerfert, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D59142
llvm-svn: 356157
to reflect the new license.
We understand that people may be surprised that we're moving the header
entirely to discuss the new license. We checked this carefully with the
Foundation's lawyer and we believe this is the correct approach.
Essentially, all code in the project is now made available by the LLVM
project under our new license, so you will see that the license headers
include that license only. Some of our contributors have contributed
code under our old license, and accordingly, we have retained a copy of
our old license notice in the top-level files in each project and
repository.
llvm-svn: 351636
Add support for "polymorphic" types to YAMLIO.
PolymorphicTraits can dynamically switch between other traits (Scalar, Map, or
Sequence). When inputting, the PolymorphicTraits type is told which type to
become, and when outputting the PolymorphicTraits type is asked which type it
currently is.
Also add support for TaggedScalarTraits to allow dynamically differentiating
between multiple scalar types using YAML tags.
Serialize empty maps as "{}" and empty sequences as "[]", so that types
are preserved when round-tripping PolymorphicTraits. This change has
equivalent semantics, but may break e.g. tests which compare output
verbatim.
Differential Revision: https://reviews.llvm.org/D48144
llvm-svn: 346884
If you have the string /usr/bin, prior to this patch it would not
be quoted by our YAML serializer. But a string like C:\src would
be, due to the presence of a backslash. This makes the quoting
rules of basically every single file path different depending on
the path syntax (posix vs. Windows).
While technically not required by the YAML specification to quote
forward slashes, when the behavior of paths is inconsistent it
makes it difficult to portably write FileCheck lines that will
work with either kind of path.
Differential Revision: https://reviews.llvm.org/D53169
llvm-svn: 344359
This reverts commit b86c16ad8c97dadc1f529da72a5bb74e9eaed344.
This is being reverted because I forgot to write a useful
commit message, so I'm going to resubmit it with an actual
commit message.
llvm-svn: 344358
This patch significantly improves performance of the YAML serializer by
optimizing `YAML::isNumeric` function. This function is called on the
most strings and is highly inefficient for two reasons:
* It uses `Regex`, which is parsed and compiled each time this
function is called
* It uses multiple passes which are not necessary
This patch introduces stateful ad hoc YAML number parser which does not
rely on `Regex`. It also fixes YAML number format inconsistency: current
implementation supports C-stile octal number format (`01234567`) which
was present in YAML 1.0 specialization (http://yaml.org/spec/1.0/),
[Section 2.4. Tags, Example 2.19] but was deprecated and is no longer
present in latest YAML 1.2 specification
(http://yaml.org/spec/1.2/spec.html), see [Section 10.3.2. Tag
Resolution]. Since the rest of the rest of the implementation does not
support other deprecated YAML 1.0 numeric features such as sexagecimal
numbers, commas as delimiters it is treated as inconsistency and not
longer supported. This patch also adds unit tests to ensure the validity
of proposed implementation.
This performance bottleneck was identified while profiling Clangd's
global-symbol-builder tool with my colleague @ilya-biryukov. The
substantial part of the runtime was spent during a single-thread Reduce
phase, which concludes with YAML serialization of collected symbol
collection. Regex matching was accountable for approximately 45% of the
whole runtime (which involves sharded Map phase), now it is reduced to
18% (which is spent in `clang::clangd::CanonicalIncludes` and can be
also optimized because all used regexes are in fact either suffix
matches or exact matches).
`llvm-yaml-numeric-parser-fuzzer` was used to ensure the validity of the
proposed regex replacement. Fuzzing for ~60 hours using 10 threads did
not expose any bugs.
Benchmarking `global-symbol-builder` (using `hyperfine --warmup 2
--min-runs 5 'command 1' 'command 2'`) tool by processing a reasonable
amount of code (26 source files matched by
`clang-tools-extra/clangd/*.cpp` with all transitive includes) confirmed
our understanding of the performance bottleneck nature as it speeds up
the command by the factor of 1.6x:
| Command | Mean [s] | Min…Max [s] |
| this patch (D50839) | 84.7 ± 0.6 | 83.3…84.7 |
| master (rL339849) | 133.1 ± 0.8 | 132.4…134.6 |
Using smaller samples (e.g. by collecting symbols from
`clang-tools-extra/clangd/AST.cpp` only) yields even better performance
improvement, which is expected because Map phase takes less time
compared to Reduce and is 2.05x faster and therefore would significantly
improve the performance of standalone YAML serializations.
| Command | Mean [ms] | Min…Max [ms] |
| this patch (D50839) | 3702.2 ± 48.7 | 3635.1…3752.3 |
| master (rL339849) | 7607.6 ± 109.5 | 7533.3…7796.4 |
Reviewed by: zturner, ilya-biryukov
Differential revision: https://reviews.llvm.org/D50839
llvm-svn: 340154
Summary:
Otherwise, the YAML parser breaks when trying to read them back in
'key: multiline_string_value' cases.
This patch fixes a problem when serializing structs which contain multi-line strings.
E.g., if we try to serialize the following struct
```
{ "key1": "first line\nsecond line",
"key2": "another string" }`
```
Before this patch, we got the YAML output that failed to parse:
```
key1: first line
second line
key2: another string
```
After the patch, we get:
```
key1: 'first line
second line'
key2: another string
```
Reviewers: sammccall
Reviewed By: sammccall
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D47468
llvm-svn: 333527
The existing YAML Output::scalarString code path includes a partial and
incorrect implementation of YAML escaping logic. In particular, the logic put
in place in rL321283 escapes non-printable bytes only if they are not part of a
multibyte UTF8 sequence; implicitly this means that all multibyte UTF8
sequences -- printable and non -- are passed through verbatim.
The simplest solution to this is to direct the Output::scalarString method to
use the standalone yaml::escape function, and this _almost_ works, except that
the existing code in that function _over_ escapes: any multibyte UTF8 sequence
is escaped, even printable ones. While this is permitted for YAML, it is also
more aggressive (and hard to read for non-English locales) than necessary,
and the entire point of rL321283 was to back off such aggressive over-escaping.
So in this change, I have both redirected Output::scalarString to use
yaml::escape _and_ modified yaml::escape to optionally restrict its escaping to
non-printables. This preserves behaviour of any existing clients while giving
them a path to more moderate escaping should they desire.
Reviewers: JDevlieghere, thegameg, MatzeB, vladimir.plyashkun
Reviewed By: thegameg
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D44863
llvm-svn: 328661
Summary:
Discovered when clangd loads YAML symbols, some symbol documentations
start with indicators (e.g. "-"), but YAML prints them as plain scalars
(no quotes), which make the YAML parser fail to parse.
For these kind of strings, we need quotes.
Reviewers: sammccall
Reviewed By: sammccall
Subscribers: ilya-biryukov, ioeric, llvm-commits, cfe-commits
Differential Revision: https://reviews.llvm.org/D42362
llvm-svn: 323097
LLVM IR function names which disable mangling start with '\01'
(https://www.llvm.org/docs/LangRef.html#identifiers).
When an identifier like "\01@abc@" gets dumped to MIR, it is quoted, but
only with single quotes.
http://www.yaml.org/spec/1.2/spec.html#id2770814:
"The allowed character range explicitly excludes the C0 control block
allowed), the surrogate block #xD800-#xDFFF, #xFFFE, and #xFFFF."
http://www.yaml.org/spec/1.2/spec.html#id2776092:
"All non-printable characters must be escaped.
[...]
Note that escape sequences are only interpreted in double-quoted scalars."
This patch adds support for printing escaped non-printable characters
between double quotes if needed.
Should also fix PR31743.
Differential Revision: https://reviews.llvm.org/D41290
llvm-svn: 320996
Summary:
The current yaml::Input constructor takes a StringRef of data as its
first parameter, discarding any filename information that may have been
present when a YAML file was opened. Add an alterate yaml::Input
constructor that takes a MemoryBufferRef, which can have a filename
associated with it. This leads to clearer diagnostic messages.
Sponsored By: DARPA, AFRL
Reviewed By: arphaman
Differential Revision: https://reviews.llvm.org/D35398
Patch by: Jonathan Anderson (trombonehero)
llvm-svn: 308172
This is a short-term fix for PR33650 aimed to get the modules build bots green again.
Remove all the places where we use the LLVM_YAML_IS_(FLOW_)?SEQUENCE_VECTOR
macros to try to locally specialize a global template for a global type. That's
not how C++ works.
Instead, we now centrally define how to format vectors of fundamental types and
of string (std::string and StringRef). We use flow formatting for the former
cases, since that's the obvious right thing to do; in the latter case, it's
less clear what the right choice is, but flow formatting is really bad for some
cases (due to very long strings), so we pick block formatting. (Many of the
cases that were using flow formatting for strings are improved by this change.)
Other than the flow -> block formatting change for some vectors of strings,
this should result in no functionality change.
Differential Revision: https://reviews.llvm.org/D34907
Corresponding updates to clang, clang-tools-extra, and lld to follow.
llvm-svn: 306878
clang-format (https://reviews.llvm.org/D33932) to keep primary headers
at the top and handle new utility headers like 'gmock' consistently with
other utility headers.
No other change was made. I did no manual edits, all of this is
clang-format.
This should allow other changes to have more clear and focused diffs,
and is especially motivated by moving some headers into more focused
libraries.
llvm-svn: 304786
Otherwise, yamlize in YAMLTraits.h might be wrongly defined.
This makes some AMDGPU tests fail when LLVM_LINK_LLVM_DYLIB is set.
Differential Revision: https://reviews.llvm.org/D30508
llvm-svn: 299415
Some scanner errors were not checked and reported by the parser.
Fix PR30934. Recommit r288014 after fixing unittest.
Patch by: Serge Guelton <serge.guelton@telecom-bretagne.eu>
Differential Revision: https://reviews.llvm.org/D26419
llvm-svn: 288071
Some scanner errors were not checked and reported by the parser.
Fix PR30934
Patch by: Serge Guelton <serge.guelton@telecom-bretagne.eu>
Differential Revision: https://reviews.llvm.org/D26419
llvm-svn: 288014
mapping a yaml field to an object in code has always been
a stateless operation. You could still pass state by using the
`setContext` function of the YAMLIO object, but this represented
global state for the entire yaml input. In order to have
context-sensitive state, it is necessary to pass this state in
at the granularity of an individual mapping.
This patch adds support for this type of context-sensitive state.
You simply pass an additional argument of type T to the
`mapRequired` or `mapOptional` functions, and provided you have
specialized a `MappingContextTraits<U, T>` class with the
appropriate mapping function, you can pass this context into
the mapping function.
Reviewed By: chandlerc
Differential Revision: https://reviews.llvm.org/D24162
llvm-svn: 280977
This allows mapping of any endian-aware type whose underlying
type (e.g. uint32_t) provides a ScalarTraits specialization.
Reviewed by: majnemer
Differential Revision: http://reviews.llvm.org/D21057
llvm-svn: 272049
Removed some unused headers, replaced some headers with forward class declarations.
Found using simple scripts like this one:
clear && ack --cpp -l '#include "llvm/ADT/IndexedMap.h"' | xargs grep -L 'IndexedMap[<]' | xargs grep -n --color=auto 'IndexedMap'
Patch by Eugene Kosov <claprix@yandex.ru>
Differential Revision: http://reviews.llvm.org/D19219
From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 266595
This commit ensures that a value that's passed into YAML's IO mapOptional method
is going to be assigned a value returned by the default constructor for that
value's type when the appropriate key is not present in the YAML mapping.
Reviewers: Duncan P. N. Exon Smith
Differential Revision: http://reviews.llvm.org/D10492
llvm-svn: 239972
Summary:
We would wrap flow mappings and sequences when they go over a hardcoded 70
characters limit. Make the wrapping column configurable (and default to 70
co the change should be NFC for current users). Passing 0 allows to completely
suppress the wrapping which makes it easier to handle in tools like FileCheck.
Reviewers: bogner
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D10109
llvm-svn: 238584
This commit gives the users of the YAML Traits I/O library
the ability to serialize scalars using the YAML literal block
scalar notation by allowing them to implement a specialization
of the `BlockScalarTraits` struct for their custom types.
Reviewers: Duncan P. N. Exon Smith
Differential Revision: http://reviews.llvm.org/D9613
llvm-svn: 237404
This patch adds an optional 'flow' field to the MappingTrait
class so that yaml IO will be able to output flow mappings.
Reviewers: Justin Bogner
Differential Revision: http://reviews.llvm.org/D9450
llvm-svn: 236456
This patch fixes a bug where the YAML Output class emitted
a sequence of flow sequences without the '-' characters.
Before:
seq:
[ a, b ]
[ c, d ]
After:
seq:
- [ a, b ]
- [ c, d ]
Reviewers: Justin Bogner
Differential Revision: http://reviews.llvm.org/D9206
llvm-svn: 236329