This patch significantly improves performance of the YAML serializer by
optimizing `YAML::isNumeric` function. This function is called on the
most strings and is highly inefficient for two reasons:
* It uses `Regex`, which is parsed and compiled each time this
function is called
* It uses multiple passes which are not necessary
This patch introduces stateful ad hoc YAML number parser which does not
rely on `Regex`. It also fixes YAML number format inconsistency: current
implementation supports C-stile octal number format (`01234567`) which
was present in YAML 1.0 specialization (http://yaml.org/spec/1.0/),
[Section 2.4. Tags, Example 2.19] but was deprecated and is no longer
present in latest YAML 1.2 specification
(http://yaml.org/spec/1.2/spec.html), see [Section 10.3.2. Tag
Resolution]. Since the rest of the rest of the implementation does not
support other deprecated YAML 1.0 numeric features such as sexagecimal
numbers, commas as delimiters it is treated as inconsistency and not
longer supported. This patch also adds unit tests to ensure the validity
of proposed implementation.
This performance bottleneck was identified while profiling Clangd's
global-symbol-builder tool with my colleague @ilya-biryukov. The
substantial part of the runtime was spent during a single-thread Reduce
phase, which concludes with YAML serialization of collected symbol
collection. Regex matching was accountable for approximately 45% of the
whole runtime (which involves sharded Map phase), now it is reduced to
18% (which is spent in `clang::clangd::CanonicalIncludes` and can be
also optimized because all used regexes are in fact either suffix
matches or exact matches).
`llvm-yaml-numeric-parser-fuzzer` was used to ensure the validity of the
proposed regex replacement. Fuzzing for ~60 hours using 10 threads did
not expose any bugs.
Benchmarking `global-symbol-builder` (using `hyperfine --warmup 2
--min-runs 5 'command 1' 'command 2'`) tool by processing a reasonable
amount of code (26 source files matched by
`clang-tools-extra/clangd/*.cpp` with all transitive includes) confirmed
our understanding of the performance bottleneck nature as it speeds up
the command by the factor of 1.6x:
| Command | Mean [s] | Min…Max [s] |
| this patch (D50839) | 84.7 ± 0.6 | 83.3…84.7 |
| master (rL339849) | 133.1 ± 0.8 | 132.4…134.6 |
Using smaller samples (e.g. by collecting symbols from
`clang-tools-extra/clangd/AST.cpp` only) yields even better performance
improvement, which is expected because Map phase takes less time
compared to Reduce and is 2.05x faster and therefore would significantly
improve the performance of standalone YAML serializations.
| Command | Mean [ms] | Min…Max [ms] |
| this patch (D50839) | 3702.2 ± 48.7 | 3635.1…3752.3 |
| master (rL339849) | 7607.6 ± 109.5 | 7533.3…7796.4 |
Reviewed by: zturner, ilya-biryukov
Differential revision: https://reviews.llvm.org/D50839
llvm-svn: 340154
If the arch is P8, we will select XFLOAD to load the floating point, and then, expand it to vsx and non-vsx X-form instruction post RA. This patch is trying to convert the X-form to D-form if it meets the requirement that one operand of the x-form inst is the special Zero register, and another operand fed by add inst. i.e.
y = add imm, reg
LFDX. 0, y
-->
LFD imm(reg)
Reviewers: Nemanjai
Differential Revision: https://reviews.llvm.org/D49007
llvm-svn: 340149
Added DIFlags in LLVMDIBuilderCreateBasicType to add optional DWARF
attributes, such as DW_AT_endianity.
Patch by Chirag Patel.
Differential Revision: https://reviews.llvm.org/D50832
llvm-svn: 340146
Only adds support to the existing 'large element' scalar/vector to 'small element' vector bitcasts.
The next step would be to support cases where the large elements aren't all sign bits, and determine the small element equivalent based on the demanded elements.
llvm-svn: 340143
This is a partial retry of rL340137 (reverted at rL340138 because of gcc host compiler crashing)
with 1 change:
Remove the changes to make microsoft builtins also use the LLVM intrinsics.
This exposes the LLVM funnel shift intrinsics as more familiar bit rotation functions in clang
(when both halves of a funnel shift are the same value, it's a rotate).
We're free to name these as we want because we're not copying gcc, but if there's some other
existing art (eg, the microsoft ops) that we want to replicate, we can change the names.
The funnel shift intrinsics were added here:
https://reviews.llvm.org/D49242
With improved codegen in:
https://reviews.llvm.org/rL337966https://reviews.llvm.org/rL339359
And basic IR optimization added in:
https://reviews.llvm.org/rL338218https://reviews.llvm.org/rL340022
...so these are expected to produce asm output that's equal or better to the multi-instruction
alternatives using primitive C/IR ops.
In the motivating loop example from PR37387:
https://bugs.llvm.org/show_bug.cgi?id=37387#c7
...we get the expected 'rolq' x86 instructions if we substitute the rotate builtin into the source.
Differential Revision: https://reviews.llvm.org/D50924
llvm-svn: 340141
This patch fixes definitions of vld and vst NEON intrinsics so
that we only define them if half-precision arithmetic is
supported on the target platform, as prescribed in ACLE 2.0.
Differential Revision: https://reviews.llvm.org/D49075
llvm-svn: 340140
This is a retry of rL340135 (reverted at rL340136 because of gcc host compiler crashing)
with 2 changes:
1. Move the code into a helper to reduce code duplication (and hopefully work-around the crash).
2. The original commit had a formatting bug in the docs (missing an underscore).
Original commit message:
This exposes the LLVM funnel shift intrinsics as more familiar bit rotation functions in clang
(when both halves of a funnel shift are the same value, it's a rotate).
We're free to name these as we want because we're not copying gcc, but if there's some other
existing art (eg, the microsoft ops that are modified in this patch) that we want to replicate,
we can change the names.
The funnel shift intrinsics were added here:
https://reviews.llvm.org/D49242
With improved codegen in:
https://reviews.llvm.org/rL337966https://reviews.llvm.org/rL339359
And basic IR optimization added in:
https://reviews.llvm.org/rL338218https://reviews.llvm.org/rL340022
...so these are expected to produce asm output that's equal or better to the multi-instruction
alternatives using primitive C/IR ops.
In the motivating loop example from PR37387:
https://bugs.llvm.org/show_bug.cgi?id=37387#c7
...we get the expected 'rolq' x86 instructions if we substitute the rotate builtin into the source.
Differential Revision: https://reviews.llvm.org/D50924
llvm-svn: 340137
This exposes the LLVM funnel shift intrinsics as more familiar bit rotation functions in clang
(when both halves of a funnel shift are the same value, it's a rotate).
We're free to name these as we want because we're not copying gcc, but if there's some other
existing art (eg, the microsoft ops that are modified in this patch) that we want to replicate,
we can change the names.
The funnel shift intrinsics were added here:
D49242
With improved codegen in:
rL337966
rL339359
And basic IR optimization added in:
rL338218
rL340022
...so these are expected to produce asm output that's equal or better to the multi-instruction
alternatives using primitive C/IR ops.
In the motivating loop example from PR37387:
https://bugs.llvm.org/show_bug.cgi?id=37387#c7
...we get the expected 'rolq' x86 instructions if we substitute the rotate builtin into the source.
Differential Revision: https://reviews.llvm.org/D50924
llvm-svn: 340135
We were basically assuming only one operand of the compare could be an ADD node and using that to swap operands. But we can have a normal add followed by a saturing add.
This rewrites the canonicalization to just be based on the condition code.
llvm-svn: 340134
The code already support 128 and 256 and even knows to split 256 for AVX1. So we really just needed to stop looking for specific VTs and subtarget features and just look for legal VTs with i8/i16 elements.
While there, add some curly braces around outer if statement bodies that contain only another if. It makes all the closing curly braces look more regular.
llvm-svn: 340128
A while back I submitted a patch to resolve backreferences
lazily, thinking this that it was not always possible to know
in advance what type you were looking at until you had completed
a full pass over the input, and therefore it would be impossible
to resolve backreferences eagerly.
This was mistaken though, and turned out to be an unrelated
problem. In fact, the reverse is true. You *must* resolve
backreferences eagerly. This is because certain types of nested
mangled symbols do not share a backreference context with their
parent symbol, and as such, if you try to resolve them lazily
their backreference context will have been lost by the time you
finish demangling the entire input. On the other hand, resolving
them eagerly appears to always work, and enables us to port
many more tests over.
llvm-svn: 340126
Summary:
I believe this restores the behavior we had before r339147.
Fixes PR38622.
Reviewers: RKSimon, chandlerc, spatel
Reviewed By: chandlerc
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D50936
llvm-svn: 340120
An emitted symbol has had its contents written and its memory protections
applied, but it is not automatically ready to execute.
Prior to ORC supporting concurrent compilation, the term "finalized" could be
interpreted two different (but effectively equivalent) ways: (1) The finalized
symbol's contents have been written and its memory protections applied, and (2)
the symbol is ready to run. Now that ORC supports concurrent compilation, sense
(1) no longer implies sense (2). We have already introduced a new term, 'ready',
to capture sense (2), so rename sense (1) to 'emitted' to avoid any lingering
confusion.
llvm-svn: 340115
ARCMigrator is using code from RetainCountChecker, which is a layering
violation (and it also does it badly, by using a different header, and
then relying on implementation being present in a header file).
This change splits up RetainSummaryManager into a separate library in
lib/Analysis, which can be used independently of a checker.
Differential Revision: https://reviews.llvm.org/D50934
llvm-svn: 340114
Remove code for writing auxiliary symbols of type function definition
and begin function. These types of symbols are associated with
pre-CodeView debug info and we never emit them.
llvm-svn: 340113
Summary:
Port GNU Objcopy -G/--keep-global-symbol(s).
This is slightly different than the already-implemented --globalize-symbol, which marks a symbol as global when copying. When --keep-global-symbol (alias -G) is used, *only* those symbols marked will stay global, and all other globals are demoted to local. (Also note that it doesn't *promote* a symbol to global). Additionally, there is a pluralized version of the flag --keep-global-symbols, which effectively applies --keep-global-symbol for every non-comment in a file.
Reviewers: jakehehrlich, jhenderson, alexshap
Reviewed By: jhenderson
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D50589
llvm-svn: 340105