Currently context strings contain a lot of duplicated function names and that significantly increase the profile size. This change split the context into a series of {name, offset, discriminator} tuples so function names used in the context can be replaced by the index into the name table and that significantly reduce the size consumed by context.
A follow-up improvement made in the compiler and profiling tools is to avoid reconstructing full context strings which is time- and memory- consuming. Instead a context vector of `StringRef` is adopted to represent the full context in all scenarios. As a result, the previous prevalent profile map which was implemented as a `StringRef` is now engineered as an unordered map keyed by `SampleContext`. `SampleContext` is reshaped to using an `ArrayRef` to represent a full context for CS profile. For non-CS profile, it falls back to use `StringRef` to represent a contextless function name. Both the `ArrayRef` and `StringRef` objects are underpinned by real array and string objects that are stored in producer buffers. For compiler, they are maintained by the sample reader. For llvm-profgen, they are maintained in `ProfiledBinary` and `ProfileGenerator`. Full context strings can be generated only in those cases of debugging and printing.
When it comes to profile format, nothing has changed to the text format, though internally CS context is implemented as a vector. Extbinary format is only changed for CS profile, with an additional `SecCSNameTable` section which stores all full contexts logically in the form of `vector<int>`, which each element as an offset points to `SecNameTable`. All occurrences of contexts elsewhere are redirected to using the offset of `SecCSNameTable`.
Testing
This is no-diff change in terms of code quality and profile content (for text profile).
For our internal large service (aka ads), the profile generation is cut to half, with a 20x smaller string-based extbinary format generated.
The compile time of ads is dropped by 25%.
Differential Revision: https://reviews.llvm.org/D107299
Summary: This patch is trying to add support for llvm-readobj
--needed-libs option under XCOFF.
For XCOFF, the needed libraries can be found from the Import
File ID Name Table of the Loader Section.
Currently, I am using binary inputs in the test since yaml2obj
does not yet support for writing the Loader Section and the
import file table.
Reviewed By: jhenderson
Differential Revision: https://reviews.llvm.org/D106643
The change adds a switch to allow sample loader to use global pre-inliner's decision instead. The pre-inliner in llvm-profgen makes inline decision globally based on whole program profile and function byte size as cost proxy.
Since pre-inliner also adjusts/merges context profile based on its inline decision, honoring its inline decision in sample loader would lead to better post-inline profile quality especially for thinlto where cross module profile merging isn't possible without pre-inliner.
Minor fix in profile reader is also included. When pre-inliner is use, we now also turn off the default merging and trimming logic unless it's explicitly asked.
Differential Revision: https://reviews.llvm.org/D108677
The --set-section-flags option was being ignored when adding a new
section. Take it into account if present.
Fixes https://llvm.org/PR51244
Reviewed By: jhenderson, MaskRay
Differential Revision: https://reviews.llvm.org/D106942
Moved View.h and View.cpp from /tools/llvm-mca/Views/ to /lib/MCA/ and
/include/llvm/MCA/. This is so that targets can define their own Views within
the /lib/Target/ directory (so that the View can use backend functionality).
To enable these Views within mca, targets will need to add them to the vector of
Views returned by their target's CustomBehaviour::getViews() methods.
Differential Revision: https://reviews.llvm.org/D108520
This is a follow up diff for BinarySizeContextTracker to track zero size for fully optimized inlinee. When an inlinee is fully optimized away, we won't be able to get its size through symbolizing instructions, hence we will treat the corresponding context size as unknown. However by traversing the inlined probe forest, we know what're original inlinees regardless of optimization. If a context show up in inlined probes, but not during symbolization, we know that it's fully optimized away hence its size is zero instead of unknown. It should provide more accurate size cost estimation for pre-inliner to make better inline decisions in llvm-profgen.
Differential Revision: https://reviews.llvm.org/D108350
No demangling may be a better default in the future.
Add `--demangle` for migration convenience.
Reviewed By: Enna1
Differential Revision: https://reviews.llvm.org/D108100
The implementation uses the int_asan_check_memaccess intrinsic to instrument the code. The intrinsic is replaced by a call to a function which performs the access check. The generated function names encode the input register name as a number using Reg - X86::NoRegister formula.
Reviewed By: vitalybuka
Differential Revision: https://reviews.llvm.org/D107850
This removes the data layout, target triple, source filename, and module
identifier when possible.
Reviewed By: swamulism
Differential Revision: https://reviews.llvm.org/D108568
Commit 9f2967bcfe introduced support for
branch coverage including export to the LCOV format.
This commit corrects the LCOV field name for branches from BFH to BRH.
The mistake seems to have slipped in as typo because the correct field
name BRH is used in the comment section at the beginning of the file.
Differential Revision: https://reviews.llvm.org/D108358
A couple of passes that are parameterized in new-PM used different
pass names (in cmd line interface) while using the same pass class
name. This patch updates the PassRegistry to model pass parameters
more properly using PASS_WITH_PARAMS.
Reason for the change is to ensure that we have a 1-1 mapping
between class name and pass name (when disregarding the params).
With a 1-1 mapping it is more obvious which pass name to use in
options such as -debug-only, -print-after etc.
The opt -passes syntax is changed for the following passes:
early-cse-memssa => early-cse<memssa>
post-inline-ee-instrument => ee-instrument<post-inline>
loop-extract-single => loop-extract<single>
lower-matrix-intrinsics-minimal => lower-matrix-intrinsics<minimal>
This patch is not updating pass names in docs/Passes.rst. Not quite
sure what the status is for that document (e.g. when it comes to
listing pass paramters). It is only loop-extract-single that is
mentioned in Passes.rst today, out of the passes mentioned above.
Differential Revision: https://reviews.llvm.org/D108362
Support XCOFFDumper relocation reading support
This patch is part of D103696 partition
Reviewed By: daltenty, Helflym
Differential Revision: https://reviews.llvm.org/D104646
Refactored implementation of AddressSanitizerPass and
HWAddressSanitizerPass to use pass options similar to passes like
MemorySanitizerPass. This makes sure that there is a single mapping
from class name to pass name (needed by D108298), and options like
-debug-only and -print-after makes a bit more sense when (despite
that it is the unparameterized pass name that should be used in those
options).
A result of the above is that some pass names are removed in favor
of the parameterized versions:
- "khwasan" is now "hwasan<kernel;recover>"
- "kasan" is now "asan<kernel>"
- "kmsan" is now "msan<kernel>"
Differential Revision: https://reviews.llvm.org/D105007
Currently, `printHelp` behaves differently for options that:
* do not define `HelpText` (such options _are not printed_), and
* define its `HelpText` as `HelpText<"">` (such options _are printed_).
In practice, both approaches lead to no help text and `printHelp` should
treat them consistently. This patch addresses that by making
`printHelpt` check the length of the help text to be printed.
All affected tests have been updated accordingly. The option definitions
for llvm-cvtres have been updated with a short description or "Not
implemented" for options that are ignored by the tool.
Differential Revision: https://reviews.llvm.org/D107557
This change enables llvm-profgen to use accurate context-sensitive post-optimization function byte size as a cost proxy to drive global preinline decisions.
To do this, BinarySizeContextTracker is introduced to track function byte size under different inline context during disassembling. In preinliner, we can not query context byte size under switch `context-cost-for-preinliner`. The tracker uses a reverse trie to keep size of functions under different context (callee as parent, caller as child), and it can give best/longest possible matching context size for given input context.
The new size cost is off by default. There're a few TODOs that needs to addressed: 1) avoid dangling string from `Offset2LocStackMap`, which will be addressed in split context work; 2) using inlinee's entry probe to make sure we have correct zero size for inlinee that's completely optimized away after inlining. Some tuning is also needed.
Differential Revision: https://reviews.llvm.org/D108180
This patch implements Flow Sensitive Sample FDO (FSAFDO) profile
loader. We have two profile loaders for FS profile,
one before RegAlloc and one before BlockPlacement.
To enable it, when -fprofile-sample-use=<profile> is specified,
add "-enable-fs-discriminator=true \
-disable-ra-fsprofile-loader=false \
-disable-layout-fsprofile-loader=false"
to turn on the FS profile loaders.
Differential Revision: https://reviews.llvm.org/D107878
This change adds support to ORCv2 and the Orc runtime library for static
initializers, C++ static destructors, and exception handler registration for
ELF-based platforms, at present Linux and FreeBSD on x86_64. It is based on the
MachO platform and runtime support introduced in bb5f97e3ad.
Patch by Peter Housel. Thanks very much Peter!
Reviewed By: lhames
Differential Revision: https://reviews.llvm.org/D108081
When option `--symbolize` is true, llvm-xray convert will demangle function
name on default. This patch adds a llvm-xray convert option `no-demangle` to
determine whether to demangle function name when symbolizing function ids from
the input log.
Reviewed By: MaskRay, smeenai
Differential Revision: https://reviews.llvm.org/D108019
Change to use unique pointer of profiled binary to unblock asan.
At same time, I realized we can decouple to move the profiled binary loading out of PerfReader, so I made some other related refactors.
Reviewed By: hoy
Differential Revision: https://reviews.llvm.org/D108254
The current implementation of printAttributes makes it fiddly to extend
attribute support for new targets.
By refactoring the code so all target specific variables are
initialized in a switch/case statement, it becomes simpler to extend
attribute support for new targets.
Reviewed By: jhenderson, MaskRay
Differential Revision: https://reviews.llvm.org/D107968
As we decided to support only one binary each time, this patch cleans up the related code dealing with multiple binaries. We can use `llvm-profdata` to merge profile from multiple binaries.
Reviewed By: hoy, wenlei
Differential Revision: https://reviews.llvm.org/D108002
Similar to D94907 (llvm-nm -D).
The output will match GNU objdump 2.37.
Older versions don't use ` (version)` for undefined symbols.
Reviewed By: jhenderson
Differential Revision: https://reviews.llvm.org/D108097
The new ELF notes are added in clang-offload-wrapper, and llvm-readobj has to visualize them properly.
Differential Revision: https://reviews.llvm.org/D99552
A DiffConsumer object may be reused, but we'd like to reset it before
the next use.
No functionality change intended.
Differential Revision: https://reviews.llvm.org/D107985
Previoulsy debug-info-for-profiling and pseudo-probe-for-profiling are mutual exclusive because they compete the dwarf discrimnator for callsites on the IR. This changes allows to use the two switches together. The side effect is that callsite discriminators will be taken by pseudo probe, while discriminators for other instructions are still available for AutoFDO use. This is less than ideal, however, it still allows us a chance to smoothly transition from AutoFDO to CSSPGO, by collecting both profiles from a CSSPGO binary.
Reviewed By: wenlei, wmi
Differential Revision: https://reviews.llvm.org/D107876
As for now, llvm-objcopy sorts section headers according to the offsets
of the sections in the input file. That can corrupt section references
in the dynamic symbol table because it is a loadable section and as such
is not updated by the tool. Even though the section references are not
required for loading the binary correctly, they are still handy for a
user who analyzes the file.
While the patch removes global reordering of section headers, it layouts
the sections in the same way as before, i.e. according to their original
offsets. All that helps the output file to resemble the input better.
Note that the patch removes sorting SHT_GROUP sections to the start of
the list, which was introduced in D62620 in order to ensure that they
come before the group members, along with the corresponding test. The
original issue was caused by the sorting of section headers, so dropping
the sorting also resolves the issue.
Differential Revision: https://reviews.llvm.org/D107653
Currently we use a centralized string map(StringMap<FunctionSamples> ProfileMap) to store the profile while populating the sample, which might cause the memory usage bottleneck. I saw in an extreme case, there are thousands of samples whose context stack depth is >= 100. The memory consumption can be greater than 100GB.
As here the context is used for inlining, we can assume we won't have so many of inlinees keeping inlined at the same root function, so this change tried to cap the context stack and merge the samples for peak memory reduction and this is done after recursion compression.
The default value is -1 meaning no depth limit, in the future we can tune to a smaller one.
Reviewed By: hoy, wenlei
Differential Revision: https://reviews.llvm.org/D107800
Some files still contained the old University of Illinois Open Source
Licence header. This patch replaces that with the Apache 2 with LLVM
Exception licence.
Differential Revision: https://reviews.llvm.org/D107528
The patch removes mutable accessor methods for sections and segments.
As for now, const variants of them are not used because all callers have
mutable access to an instance of Object. On the other hand, they do not
actually modify the sets, so it looks better to keep only const ones.
Differential Revision: https://reviews.llvm.org/D107652
This is related to PR51392.
Before this patch, the timeline view was rounding doubles to the first decimal,
using a logic similar to this:
```
double AverageTime = (double)Input / CumulativeExecutions;
double Result = floor((AverageTime * 10) + 0.5) / 10
```
Here, Input and CumulativeExecutions are both unsigned integers.
The last operation is what effectively performs the rounding of AverageTime.
PR51392 has been raised because - under specific -m32 configurations of GCC -
one of the timeline tests reports slighlty different values (due to a different
rounding choice).
This patch tries to minimise the propagation of floating-point error by
hoisting the multiply by 10, so that it is performed on the unsigned.
```
double AverageTime = (double)(Input * 10) / CumulativeExecutions;
floor(AverageTime + 0.5) / 10
```
So we are trading a floating point multiply for a integer multiply (which can be
expanded using a simple MUL or using an `ADD + LEA` sequence). This decrease in
floating point operations executed should also help with decreasing the error in
the computation..
Strictly speaking, that computation will always be potentially subject to error
(depending on what values are passed in input). However, this patch should
improve the situation and make bug like PR51392 less frequent.
Fix an edge case missed by https://reviews.llvm.org/D78921. For e.g.,
the Repro debug entry (generated with the /Brepro linker flag) does not
have a debug-directory payload. Do not attempt to patch Debug entries
without a payload.
Differential Revision: https://reviews.llvm.org/D107324
One performance issue happened in profile generation and it turned out the line 525 loop is the bottleneck.
Moving the code outside of loop scope can fix this issue. The run time is improved from 30+mins to ~30s.
Reviewed By: hoy, wenlei
Differential Revision: https://reviews.llvm.org/D107529