Summary: This patch is trying to add support for llvm-readobj
--needed-libs option under XCOFF.
For XCOFF, the needed libraries can be found from the Import
File ID Name Table of the Loader Section.
Currently, I am using binary inputs in the test since yaml2obj
does not yet support for writing the Loader Section and the
import file table.
Reviewed By: jhenderson
Differential Revision: https://reviews.llvm.org/D106643
The change adds a switch to allow sample loader to use global pre-inliner's decision instead. The pre-inliner in llvm-profgen makes inline decision globally based on whole program profile and function byte size as cost proxy.
Since pre-inliner also adjusts/merges context profile based on its inline decision, honoring its inline decision in sample loader would lead to better post-inline profile quality especially for thinlto where cross module profile merging isn't possible without pre-inliner.
Minor fix in profile reader is also included. When pre-inliner is use, we now also turn off the default merging and trimming logic unless it's explicitly asked.
Differential Revision: https://reviews.llvm.org/D108677
The --set-section-flags option was being ignored when adding a new
section. Take it into account if present.
Fixes https://llvm.org/PR51244
Reviewed By: jhenderson, MaskRay
Differential Revision: https://reviews.llvm.org/D106942
Moved View.h and View.cpp from /tools/llvm-mca/Views/ to /lib/MCA/ and
/include/llvm/MCA/. This is so that targets can define their own Views within
the /lib/Target/ directory (so that the View can use backend functionality).
To enable these Views within mca, targets will need to add them to the vector of
Views returned by their target's CustomBehaviour::getViews() methods.
Differential Revision: https://reviews.llvm.org/D108520
This is a follow up diff for BinarySizeContextTracker to track zero size for fully optimized inlinee. When an inlinee is fully optimized away, we won't be able to get its size through symbolizing instructions, hence we will treat the corresponding context size as unknown. However by traversing the inlined probe forest, we know what're original inlinees regardless of optimization. If a context show up in inlined probes, but not during symbolization, we know that it's fully optimized away hence its size is zero instead of unknown. It should provide more accurate size cost estimation for pre-inliner to make better inline decisions in llvm-profgen.
Differential Revision: https://reviews.llvm.org/D108350
No demangling may be a better default in the future.
Add `--demangle` for migration convenience.
Reviewed By: Enna1
Differential Revision: https://reviews.llvm.org/D108100
The implementation uses the int_asan_check_memaccess intrinsic to instrument the code. The intrinsic is replaced by a call to a function which performs the access check. The generated function names encode the input register name as a number using Reg - X86::NoRegister formula.
Reviewed By: vitalybuka
Differential Revision: https://reviews.llvm.org/D107850
This removes the data layout, target triple, source filename, and module
identifier when possible.
Reviewed By: swamulism
Differential Revision: https://reviews.llvm.org/D108568
Commit 9f2967bcfe introduced support for
branch coverage including export to the LCOV format.
This commit corrects the LCOV field name for branches from BFH to BRH.
The mistake seems to have slipped in as typo because the correct field
name BRH is used in the comment section at the beginning of the file.
Differential Revision: https://reviews.llvm.org/D108358
A couple of passes that are parameterized in new-PM used different
pass names (in cmd line interface) while using the same pass class
name. This patch updates the PassRegistry to model pass parameters
more properly using PASS_WITH_PARAMS.
Reason for the change is to ensure that we have a 1-1 mapping
between class name and pass name (when disregarding the params).
With a 1-1 mapping it is more obvious which pass name to use in
options such as -debug-only, -print-after etc.
The opt -passes syntax is changed for the following passes:
early-cse-memssa => early-cse<memssa>
post-inline-ee-instrument => ee-instrument<post-inline>
loop-extract-single => loop-extract<single>
lower-matrix-intrinsics-minimal => lower-matrix-intrinsics<minimal>
This patch is not updating pass names in docs/Passes.rst. Not quite
sure what the status is for that document (e.g. when it comes to
listing pass paramters). It is only loop-extract-single that is
mentioned in Passes.rst today, out of the passes mentioned above.
Differential Revision: https://reviews.llvm.org/D108362
Support XCOFFDumper relocation reading support
This patch is part of D103696 partition
Reviewed By: daltenty, Helflym
Differential Revision: https://reviews.llvm.org/D104646
Refactored implementation of AddressSanitizerPass and
HWAddressSanitizerPass to use pass options similar to passes like
MemorySanitizerPass. This makes sure that there is a single mapping
from class name to pass name (needed by D108298), and options like
-debug-only and -print-after makes a bit more sense when (despite
that it is the unparameterized pass name that should be used in those
options).
A result of the above is that some pass names are removed in favor
of the parameterized versions:
- "khwasan" is now "hwasan<kernel;recover>"
- "kasan" is now "asan<kernel>"
- "kmsan" is now "msan<kernel>"
Differential Revision: https://reviews.llvm.org/D105007
Currently, `printHelp` behaves differently for options that:
* do not define `HelpText` (such options _are not printed_), and
* define its `HelpText` as `HelpText<"">` (such options _are printed_).
In practice, both approaches lead to no help text and `printHelp` should
treat them consistently. This patch addresses that by making
`printHelpt` check the length of the help text to be printed.
All affected tests have been updated accordingly. The option definitions
for llvm-cvtres have been updated with a short description or "Not
implemented" for options that are ignored by the tool.
Differential Revision: https://reviews.llvm.org/D107557
This change enables llvm-profgen to use accurate context-sensitive post-optimization function byte size as a cost proxy to drive global preinline decisions.
To do this, BinarySizeContextTracker is introduced to track function byte size under different inline context during disassembling. In preinliner, we can not query context byte size under switch `context-cost-for-preinliner`. The tracker uses a reverse trie to keep size of functions under different context (callee as parent, caller as child), and it can give best/longest possible matching context size for given input context.
The new size cost is off by default. There're a few TODOs that needs to addressed: 1) avoid dangling string from `Offset2LocStackMap`, which will be addressed in split context work; 2) using inlinee's entry probe to make sure we have correct zero size for inlinee that's completely optimized away after inlining. Some tuning is also needed.
Differential Revision: https://reviews.llvm.org/D108180
This patch implements Flow Sensitive Sample FDO (FSAFDO) profile
loader. We have two profile loaders for FS profile,
one before RegAlloc and one before BlockPlacement.
To enable it, when -fprofile-sample-use=<profile> is specified,
add "-enable-fs-discriminator=true \
-disable-ra-fsprofile-loader=false \
-disable-layout-fsprofile-loader=false"
to turn on the FS profile loaders.
Differential Revision: https://reviews.llvm.org/D107878
This change adds support to ORCv2 and the Orc runtime library for static
initializers, C++ static destructors, and exception handler registration for
ELF-based platforms, at present Linux and FreeBSD on x86_64. It is based on the
MachO platform and runtime support introduced in bb5f97e3ad.
Patch by Peter Housel. Thanks very much Peter!
Reviewed By: lhames
Differential Revision: https://reviews.llvm.org/D108081
When option `--symbolize` is true, llvm-xray convert will demangle function
name on default. This patch adds a llvm-xray convert option `no-demangle` to
determine whether to demangle function name when symbolizing function ids from
the input log.
Reviewed By: MaskRay, smeenai
Differential Revision: https://reviews.llvm.org/D108019
Change to use unique pointer of profiled binary to unblock asan.
At same time, I realized we can decouple to move the profiled binary loading out of PerfReader, so I made some other related refactors.
Reviewed By: hoy
Differential Revision: https://reviews.llvm.org/D108254
The current implementation of printAttributes makes it fiddly to extend
attribute support for new targets.
By refactoring the code so all target specific variables are
initialized in a switch/case statement, it becomes simpler to extend
attribute support for new targets.
Reviewed By: jhenderson, MaskRay
Differential Revision: https://reviews.llvm.org/D107968
As we decided to support only one binary each time, this patch cleans up the related code dealing with multiple binaries. We can use `llvm-profdata` to merge profile from multiple binaries.
Reviewed By: hoy, wenlei
Differential Revision: https://reviews.llvm.org/D108002
Similar to D94907 (llvm-nm -D).
The output will match GNU objdump 2.37.
Older versions don't use ` (version)` for undefined symbols.
Reviewed By: jhenderson
Differential Revision: https://reviews.llvm.org/D108097
The new ELF notes are added in clang-offload-wrapper, and llvm-readobj has to visualize them properly.
Differential Revision: https://reviews.llvm.org/D99552
A DiffConsumer object may be reused, but we'd like to reset it before
the next use.
No functionality change intended.
Differential Revision: https://reviews.llvm.org/D107985
Previoulsy debug-info-for-profiling and pseudo-probe-for-profiling are mutual exclusive because they compete the dwarf discrimnator for callsites on the IR. This changes allows to use the two switches together. The side effect is that callsite discriminators will be taken by pseudo probe, while discriminators for other instructions are still available for AutoFDO use. This is less than ideal, however, it still allows us a chance to smoothly transition from AutoFDO to CSSPGO, by collecting both profiles from a CSSPGO binary.
Reviewed By: wenlei, wmi
Differential Revision: https://reviews.llvm.org/D107876
As for now, llvm-objcopy sorts section headers according to the offsets
of the sections in the input file. That can corrupt section references
in the dynamic symbol table because it is a loadable section and as such
is not updated by the tool. Even though the section references are not
required for loading the binary correctly, they are still handy for a
user who analyzes the file.
While the patch removes global reordering of section headers, it layouts
the sections in the same way as before, i.e. according to their original
offsets. All that helps the output file to resemble the input better.
Note that the patch removes sorting SHT_GROUP sections to the start of
the list, which was introduced in D62620 in order to ensure that they
come before the group members, along with the corresponding test. The
original issue was caused by the sorting of section headers, so dropping
the sorting also resolves the issue.
Differential Revision: https://reviews.llvm.org/D107653
Currently we use a centralized string map(StringMap<FunctionSamples> ProfileMap) to store the profile while populating the sample, which might cause the memory usage bottleneck. I saw in an extreme case, there are thousands of samples whose context stack depth is >= 100. The memory consumption can be greater than 100GB.
As here the context is used for inlining, we can assume we won't have so many of inlinees keeping inlined at the same root function, so this change tried to cap the context stack and merge the samples for peak memory reduction and this is done after recursion compression.
The default value is -1 meaning no depth limit, in the future we can tune to a smaller one.
Reviewed By: hoy, wenlei
Differential Revision: https://reviews.llvm.org/D107800
Some files still contained the old University of Illinois Open Source
Licence header. This patch replaces that with the Apache 2 with LLVM
Exception licence.
Differential Revision: https://reviews.llvm.org/D107528
The patch removes mutable accessor methods for sections and segments.
As for now, const variants of them are not used because all callers have
mutable access to an instance of Object. On the other hand, they do not
actually modify the sets, so it looks better to keep only const ones.
Differential Revision: https://reviews.llvm.org/D107652
This is related to PR51392.
Before this patch, the timeline view was rounding doubles to the first decimal,
using a logic similar to this:
```
double AverageTime = (double)Input / CumulativeExecutions;
double Result = floor((AverageTime * 10) + 0.5) / 10
```
Here, Input and CumulativeExecutions are both unsigned integers.
The last operation is what effectively performs the rounding of AverageTime.
PR51392 has been raised because - under specific -m32 configurations of GCC -
one of the timeline tests reports slighlty different values (due to a different
rounding choice).
This patch tries to minimise the propagation of floating-point error by
hoisting the multiply by 10, so that it is performed on the unsigned.
```
double AverageTime = (double)(Input * 10) / CumulativeExecutions;
floor(AverageTime + 0.5) / 10
```
So we are trading a floating point multiply for a integer multiply (which can be
expanded using a simple MUL or using an `ADD + LEA` sequence). This decrease in
floating point operations executed should also help with decreasing the error in
the computation..
Strictly speaking, that computation will always be potentially subject to error
(depending on what values are passed in input). However, this patch should
improve the situation and make bug like PR51392 less frequent.
Fix an edge case missed by https://reviews.llvm.org/D78921. For e.g.,
the Repro debug entry (generated with the /Brepro linker flag) does not
have a debug-directory payload. Do not attempt to patch Debug entries
without a payload.
Differential Revision: https://reviews.llvm.org/D107324
One performance issue happened in profile generation and it turned out the line 525 loop is the bottleneck.
Moving the code outside of loop scope can fix this issue. The run time is improved from 30+mins to ~30s.
Reviewed By: hoy, wenlei
Differential Revision: https://reviews.llvm.org/D107529
Some tools may want to use the LLVM "diff" code. Move the code into a
library for easy use.
No functionality change intende.
Differential Revision: https://reviews.llvm.org/D107392
Some tools may want to use the LLVM "diff" code. Move the code into a
library for easy use.
No functionality change intende.
Differential Revision: https://reviews.llvm.org/D107392
This option is always interpreted strictly as a hexadecimal string,
even if it has no prefix that indicates the number format, hence
the existing call to StringRef::getAsInteger(16, ...).
StringRef::getAsInteger(0, ...) consumes a leading "0x" prefix is
present, but when the radix is specified, the radix shouldn't
be included.
Both MS rc.exe and GNU windres accept the language with that
prefix.
Also allow specifying the codepage to llvm-windres with a different
radix, as GNU windres allows that (but MS rc.exe doesn't).
This fixes https://llvm.org/PR51295.
Differential Revision: https://reviews.llvm.org/D107263
Migrate pseudo probe decoding logic in llvm-profgen to MC, so other LLVM-base program could reuse existing codes. Redesign object layout of encoded and decoded pseudo probes.
Reviewed By: hoy
Differential Revision: https://reviews.llvm.org/D106861
This change tried to integrate a new count based aggregated type of perf script. The only difference of the format is that an aggregated count is added at the head of the original sample which means the same samples are repeated to the given count times. This is used to reduce the perf script size.
e.g.
```
2
4005dc
400634
400684
7f68c5788793
0x4005c8/0x4005dc/P/-/-/0 ....
```
Implemented by a dedicated PerfReader `AggregatedHybridPerfReader`.
Differential Revision: https://reviews.llvm.org/D107192
This change supports to run without parsing MMap binary loading events instead it always assumes binary is loaded at the preferred address. This is used when we have assured no binary load address changes or we have pre-processed the addresses resolution. Warn if there's interior mmap event but without leading mmap events.
Reviewed By: hoy
Differential Revision: https://reviews.llvm.org/D107097
As detailed on https://pvs-studio.com/en/blog/posts/cpp/0771/ and raised on D62583, the SecNo++ increment is not guaranteed to occur before the second use of SecNo in the same addSection() call.
This patch pulls out the increment (just for clarity) and replaces the second use of SecNo with a constant zero value (we're using stable_sort so the value isn't critical).
Differential Revision: https://reviews.llvm.org/D107273
item of StringTable.
Summary: For the string table in XCOFF, the first 4 bytes
contains the length of the string table, so we should
print the string entries from fifth bytes. This patch
also adds tests for llvm-readobj dumping the string
table.
Reviewed By: jhenderson
Differential Revision: https://reviews.llvm.org/D105522
In order to support different types of perf scripts, this change tried to refactor `PerfReader` by adding the base class `PerfReaderBase` and current HybridPerfReader is derived from it for CS profile generation. Common functions like, passMM2PEvents, extract_lbrs, extract_callstack, etc. can be reused.
Next step is to add LBR only reader(for non-CS profile) and aggregated perf scripts reader(do a pre-aggregation of scripts).
Reviewed By: hoy, wenlei
Differential Revision: https://reviews.llvm.org/D107014
When we build with split dwarf in single mode the .o files that contain both "normal" debug sections and dwo sections, along with relocaiton sections for "normal" debug sections.
When we create DWARF context in DWARFObjInMemory we process relocations and store them in the map for .debug_info, etc section.
For DWO Context we also do it for non dwo dwarf sections. Which I believe is not necessary. This leads to a lot of memory being wasted. We observed 70GB extra memory being used.
I went with context sensitive approach, flag is passed in. I am not sure if it's always safe not to process relocations for regular debug sections if Obj contains .dwo sections.
If it is alternatvie might be just to scan, in constructor, sections and if there are .dwo sections not to process regular debug ones.
Reviewed By: dblaikie
Differential Revision: https://reviews.llvm.org/D106624
Pulled out the OptimizationLevel class from PassBuilder in order to be able to access it from within the PassManager and avoid include conflicts.
Reviewed By: mtrofin
Differential Revision: https://reviews.llvm.org/D107025
The LC_SUB_FRAMEWORK, LC_SUB_UMBRELLA, LC_SUB_CLIENT, and LC_SUB_LIBRARY
are used to indicate related libraries, binaries or framework names.
Their only payload is the string with the name of the object. Adding
those commands to the list of ignored/skipped load commands will avoid
an error that stop the process of copying/stripping and will copy their
contents verbatim.
Additionally, in order to have a test for this case, `yaml2obj` now
allows those four commands to contain a `Content`.
Differential Revision: https://reviews.llvm.org/D106412
[[noreturn]] can be used since Oct 2016 when the minimum compiler requirement was bumped to GCC 4.8/MSVC 2015.
Note: the definition of LLVM_ATTRIBUTE_NORETURN is kept for now.
On AIX, the linker needs to check whether a given lto_module_t contains
any constructor/destructor functions, in order to implement the behavior
of the -bcdtors:all flag. See
https://www.ibm.com/docs/en/aix/7.2?topic=l-ld-command for the flag's
documentation.
In llvm IR, constructor (destructor) functions are added to a special
global array @llvm.global_ctors (@llvm.global_dtors).
However, because these two symbols are artificial, they are not visited
during the symbol traversal (using the
lto_module_get_[num_symbols|symbol_name|symbol_attribute] API).
This patch adds a new function to the libLTO interface that checks the
presence of one or both of these two symbols.
Reviewed By: steven_wu
Differential Revision: https://reviews.llvm.org/D106887
Wrapper function call and dispatch handler helpers are moved to
ExecutionSession, and existing EPC-based tools are re-written to take an
ExecutionSession argument instead.
Requiring an ExecutorProcessControl instance simplifies existing EPC based
utilities (which only need to take an ES now), and should encourage more
utilities to use the EPC interface. It also simplifies process termination,
since the session can automatically call ExecutorProcessControl::disconnect
(previously this had to be done manually, and carefully ordered with the
rest of JIT tear-down to work correctly).
These tests access private symbols in the backends, so they cannot link
against libLLVM.so and must be statically linked. Linking these tests
can be slow and with debug builds the resulting binaries use a lot of
disk space.
By merging them into a single test binary means we now only need to
statically link 1 test instead of 6, which helps reduce the build
times and saves disk space.
Reviewed By: courbet
Differential Revision: https://reviews.llvm.org/D106464
See [GRP_COMDAT group with STB_LOCAL signature](https://groups.google.com/g/generic-abi/c/2X6mR-s2zoc)
objcopy PR: https://sourceware.org/bugzilla/show_bug.cgi?id=27931
GRP_COMDAT deduplication is purely based on the signature symbol name in
ld.lld/GNU ld/gold. The local/global status is not part of the equation.
If the signature symbol is localized by --localize-hidden or
--keep-global-symbol, the intention is likely to make the group fully
localized. Drop GRP_COMDAT to suppress deduplication.
Reviewed By: jhenderson
Differential Revision: https://reviews.llvm.org/D106782
The current implementation of displaying .stack_size information
presumes that each entry represents a single function but this is not
always the case. For example with the use of ICF multiple functions can
be represented with the same code, meaning that the address found in a
.stack_size entry corresponds to multiple function symbols.
This change allows multiple function names to be displayed when
appropriate.
Differential Revision: https://reviews.llvm.org/D105884
This matches what MS rc.exe allows in practice. I'm not aware of
any legal syntax case that are broken by allowing dashes as part
of what the tokenizer considers an Identifier - but I'm not
very well versed in the RC syntax either, can @amccarth think of
any case that would be broken by this?
This fixes downstream bug
https://github.com/msys2/MINGW-packages/issues/9180.
Additionally, rc.exe allows such resource name strings to be surrounded
by quotes, ending up with e.g.
Resource name (string): "QUOTEDNAME"
(i.e., the quotes end up as part of the string), which llvm-rc doesn't
support yet either. (I'm not aware of such cases in the wild though,
but resource string names with dashes do exist.)
This also allows including files with unquoted paths, with filenames
containing dashes (which fixes
https://github.com/msys2/MINGW-packages/issues/9130, which has been
worked around differently so far).
Differential Revision: https://reviews.llvm.org/D106598
Most modern tools only accept two-dash long options. Remove one-dash
long options which are not recognized by GNU style `getopt_long`.
This ensures long options cannot collide with grouped short options.
Note: llvm-symbolizer has `-demangle={true,false}` for pprof compatibility
(for a while). They are kept.
Reviewed By: jhenderson
Differential Revision: https://reviews.llvm.org/D106377
When run the command in the llvm-mc-assemble-fuzzer document,
```
llvm-mc-fuzzer --triple=aarch64-linux-gnu --fuzzer-args -max_len=4
```
it triggers the following assertion:
```
llvm-mc-assemble-fuzzer:
llvm-project/llvm/lib/MC/MCTargetOptionsCommandFlags.cpp:38:
bool llvm::mc::getRelaxAll(): Assertion `RelaxAllView &&
"RegisterMCTargetOptionsFlags not created."' failed.
```
It is caused by no global RegisterMCTargetOptionsFlags object to initialize
the MC target options.
Differential Revision: https://reviews.llvm.org/D106417
Add support for all built-in text macros supported by ML64:
@Date, @Time, @FileName, @FileCur, and @CurSeg.
Reviewed By: thakis
Differential Revision: https://reviews.llvm.org/D104965
This patch allows iterating typed enum via the ADT/Sequence utility.
It also changes the original design to better separate concerns:
- `StrongInt` only deals with safe `intmax_t` operations,
- `SafeIntIterator` presents the iterator and reverse iterator
interface but only deals with safe `StrongInt` internally.
- `iota_range` only deals with `SafeIntIterator` internally.
This design ensures that operations are always valid. In particular,
"Out of bounds" assertions fire when:
- the `value_type` is not representable as an `intmax_t`
- iterator operations make internal computation underflow/overflow
- the internal representation cannot be converted back to `value_type`
Differential Revision: https://reviews.llvm.org/D106279
This is a step1, mechanical refactor, of moving the bulk of llvm-dwp functionality in to a library. This should allow other tools, like BOLT, to re-use some of the llvm-dwp functionality.
Reviewed By: dblaikie
Differential Revision: https://reviews.llvm.org/D106198
In PGO, a C++ external linkage function `foo` has a private counter
`__profc_foo` and a private `__profd_foo` in a `comdat nodeduplicate`.
A `__attribute__((weak))` function `foo` has a weak hidden counter `__profc_foo`
and a private `__profd_foo` in a `comdat nodeduplicate`.
In `ld.lld a.o b.o`, say a.o defines an external linkage `foo` and b.o
defines a weak `foo`. Currently we treat `comdat nodeduplicate` as `comdat any`,
ld.lld will incorrectly consider `b.o:__profc_foo` non-prevailing. In the worst
case when `b.o:__profd_foo` is retained and `b.o:__profc_foo` isn't, there will
be dangling reference causing an `undefined hidden symbol` error.
Add SelectionKind to `Comdat` in IRSymtab and let linkers ignore nodeduplicate comdat.
Differential Revision: https://reviews.llvm.org/D106228
This diff changes llvm-ifs to use unified IFS file format
and perform other renaming changes in preparation for the
merging between elfabi/ifs.
Differential Revision: https://reviews.llvm.org/D99810
This change implements unified text stub format and command line
interface proposed in the elfabi/ifs merge plan.
Differential Revision: https://reviews.llvm.org/D99399
Adds support for MachO static initializers/deinitializers and eh-frame
registration via the ORC runtime.
This commit introduces cooperative support code into the ORC runtime and ORC
LLVM libraries (especially the MachOPlatform class) to support macho runtime
features for JIT'd code. This commit introduces support for static
initializers, static destructors (via cxa_atexit interposition), and eh-frame
registration. Near-future commits will add support for MachO native
thread-local variables, and language runtime registration (e.g. for Objective-C
and Swift).
The llvm-jitlink tool is updated to use the ORC runtime where available, and
regression tests for the new MachOPlatform support are added to compiler-rt.
Notable changes on the ORC runtime side:
1. The new macho_platform.h / macho_platform.cpp files contain the bulk of the
runtime-side support. This includes eh-frame registration; jit versions of
dlopen, dlsym, and dlclose; a cxa_atexit interpose to record static destructors,
and an '__orc_rt_macho_run_program' function that defines running a JIT'd MachO
program in terms of the jit- dlopen/dlsym/dlclose functions.
2. Replaces JITTargetAddress (and casting operations) with ExecutorAddress
(copied from LLVM) to improve type-safety of address management.
3. Adds serialization support for ExecutorAddress and unordered_map types to
the runtime-side Simple Packed Serialization code.
4. Adds orc-runtime regression tests to ensure that static initializers and
cxa-atexit interposes work as expected.
Notable changes on the LLVM side:
1. The MachOPlatform class is updated to:
1.1. Load the ORC runtime into the ExecutionSession.
1.2. Set up standard aliases for macho-specific runtime functions. E.g.
___cxa_atexit -> ___orc_rt_macho_cxa_atexit.
1.3. Install the MachOPlatformPlugin to scrape LinkGraphs for information
needed to support MachO features (e.g. eh-frames, mod-inits), and
communicate this information to the runtime.
1.4. Provide entry-points that the runtime can call to request initializers,
perform symbol lookup, and request deinitialiers (the latter is
implemented as an empty placeholder as macho object deinits are rarely
used).
1.5. Create a MachO header object for each JITDylib (defining the __mh_header
and __dso_handle symbols).
2. The llvm-jitlink tool (and llvm-jitlink-executor) are updated to use the
runtime when available.
3. A `lookupInitSymbolsAsync` method is added to the Platform base class. This
can be used to issue an async lookup for initializer symbols. The existing
`lookupInitSymbols` method is retained (the GenericIRPlatform code is still
using it), but is deprecated and will be removed soon.
4. JIT-dispatch support code is added to ExecutorProcessControl.
The JIT-dispatch system allows handlers in the JIT process to be associated with
'tag' symbols in the executor, and allows the executor to make remote procedure
calls back to the JIT process (via __orc_rt_jit_dispatch) using those tags.
The primary use case is ORC runtime code that needs to call bakc to handlers in
orc::Platform subclasses. E.g. __orc_rt_macho_jit_dlopen calling back to
MachOPlatform::rt_getInitializers using __orc_rt_macho_get_initializers_tag.
(The system is generic however, and could be used by non-runtime code).
The new ExecutorProcessControl::JITDispatchInfo struct provides the address
(in the executor) of the jit-dispatch function and a jit-dispatch context
object, and implementations of the dispatch function are added to
SelfExecutorProcessControl and OrcRPCExecutorProcessControl.
5. OrcRPCTPCServer is updated to support JIT-dispatch calls over ORC-RPC.
6. Serialization support for StringMap is added to the LLVM-side Simple Packed
Serialization code.
7. A JITLink::allocateBuffer operation is introduced to allocate writable memory
attached to the graph. This is used by the MachO header synthesis code, and will
be generically useful for other clients who want to create new graph content
from scratch.
If a file has no symbols, perhaps because it is a linked executable,
synthesize some symbols by walking the code section. Otherwise the
disassembler will try to treat the whole code section as a function,
which won't parse. Fixes https://bugs.llvm.org/show_bug.cgi?id=50957.
Differential Revision: https://reviews.llvm.org/D105539
llvm-readelf is a user-facing tool which emulates GNU readelf. Remove one-dash
long options which are not recognized by GNU style `getopt_long`. This ensures
long options cannot collide with grouped short options.
Note: the documentation (D63719)/help messages have recommended the double-dash
forms since LLVM 9.0.0.
llvm-readobj is intended as an internal tool which has some flexibility.
llvm-readelf/llvm-readobj use the same option parsing code and llvm-readobj's
one-dash long options aren't used after test migration.
Differential Revision: https://reviews.llvm.org/D106037
We can build it with -Werror=global-constructors now. This helps
in situation where libSupport is embedded as a shared library,
potential with dlopen/dlclose scenario, and when command-line
parsing or other facilities may not be involved. Avoiding the
implicit construction of these cl::opt can avoid double-registration
issues and other kind of behavior.
Reviewed By: lattner, jpienaar
Differential Revision: https://reviews.llvm.org/D105959
We can build it with -Werror=global-constructors now. This helps
in situation where libSupport is embedded as a shared library,
potential with dlopen/dlclose scenario, and when command-line
parsing or other facilities may not be involved. Avoiding the
implicit construction of these cl::opt can avoid double-registration
issues and other kind of behavior.
Reviewed By: lattner, jpienaar
Differential Revision: https://reviews.llvm.org/D105959
We can build it with -Werror=global-constructors now. This helps
in situation where libSupport is embedded as a shared library,
potential with dlopen/dlclose scenario, and when command-line
parsing or other facilities may not be involved. Avoiding the
implicit construction of these cl::opt can avoid double-registration
issues and other kind of behavior.
Reviewed By: lattner, jpienaar
Differential Revision: https://reviews.llvm.org/D105959
Details:
Switch all #includes to use <> because that is consistent with what happens in the cmake checks.
Otherwise, we could be in the situation where cmake checks see that headers exist at <perfmon/...>
but in llvm-exegesis code, we use "perfmon/...", which may not exist.
Related PR/revisions: D84076, PR51017+D105615
Differential Revision: https://reviews.llvm.org/D105861
The documentation and help messages have recommended the double-dash forms for
quite a while. Remove one-dash long options which are not recognized by GNU
style `getopt_long`.
`-arch` is kept as it is in the manpage of classic nm
https://keith.github.io/xcode-man-pages/nm.1.html
Note: the dyldinfo related options don't have a test.
Reviewed By: jhenderson
Differential Revision: https://reviews.llvm.org/D105948
This patch make coroutine passes run by default in LLVM pipeline. Now
the clang and opt could handle IR inputs containing coroutine intrinsics
without special options.
It should be fine. On the one hand, the coroutine passes seems to be stable
since there are already many projects using coroutine feature.
On the other hand, the coroutine passes should do nothing for IR who doesn't
contain coroutine intrinsic.
Test Plan: check-llvm
Reviewed by: lxfind, aeubanks
Differential Revision: https://reviews.llvm.org/D105877
Summary:
Add support for the basic section stripping (and keeping) flags for wasm:
strip with no flags, --strip-all, --strip-debug,
--only-section, --keep-section, and --only-keep-debug.
Factor section removal into a function and use a predicate chain like
the ELF implementation.
Reviewers: jhenderson, sbc100
Differential Revision: https://reviews.llvm.org/D73820
The linker or post-link optimizer can create an ELF image with multiple executable segments each of which will be loaded separately at run time. This breaks the assumption of llvm-profgen that currently only supports one base load address. What it ends up with is that the subsequent mmap events will be treated as an overwrite of the first mmap event which will in turn screw up address mapping. While it is non-trivial to support multiple separate load addresses and given that on x64 those segments will always be loaded at consecutive addresses (though via separate mmap
sys calls), I'm adding an error checking logic to bail out if that's violated and keep using a single load address which is the address of the first executable segment.
Also changing the disassembly output from printing section offset to printing the virtual address instead, which matches the behavior of objdump.
Differential Revision: https://reviews.llvm.org/D103178
This is a follow up to https://reviews.llvm.org/D104080, and ca3bdb57fa (diff-e64a48fabe31db213a631fdc5f2acb51bdddf3f16a8fb2928784f4c579229585). The implementation of call graph profile was changed from a black box section to relocation approach. This was done to be compatible with post processing tools like strip/objcopy, and llvm equivalent. When they are invoked on object file before the final linking step with this new approach the symbol indices correctness is preserved.
The GNU binutils tools change the REL section to RELA section, unlike llvm tools. For example when strip -S is run on the ELF object files, as an intermediate step before linking. To preserve compatibility this patch extends implementation in LLD and ELFDumper to support both REL and RELA sections for call graph profile.
Reviewed By: MaskRay, jhenderson
Differential Revision: https://reviews.llvm.org/D105217
Applied clang-format to all files. Discarded BottleneckAnalysis.h
80-column width violation since it contains an example of report.
Caught some typos and minor style details.
Reviewed By: andreadb
Differential Revision: https://reviews.llvm.org/D105900
For Clang, `MCUseDwarfDirectory` is true by default for the majority cases
(-fintegrated-as or -gdwarf-5; most targets use -fintegrated-as by default).
Defaulting MCUseDwarfDirectory to true can reduce the differences between clang
and llc.
Reviewed By: #debug-info, dblaikie
Differential Revision: https://reviews.llvm.org/D105856
Users should generally observe no difference as long as they don't use
unintended option forms. Behavior changes:
* `-t=d` is removed. Use `-t d` instead.
* `--demangle=false` and `--demangle=0` cannot be used. Omit the option or use `--no-demangle`. Other flag-style options don't have `--no-` forms.
* `--help-list` is removed. This is a `cl::` specific option.
* llvm-readobj now supports grouped short options as well.
* `--color` is removed. This is generally not useful (only apply to errors/warnings) but was inherited from Support.
Some adjustment to the canonical forms
(usually from GNU readelf; currently llvm-readobj has too many redundant aliases):
* --dyn-syms is canonical. --dyn-symbols is a hidden alias
* --file-header is canonical. --file-headers is a hidden alias
* --histogram is canonical. --elf-hash-histogram is a hidden alias
* --relocs is canonical. --relocations is a hidden alias
* --section-groups is canonical. --elf-section-groups is a hidden alias
OptTable avoids global option collision if we decide to support multiplexing for binary utilities.
* Most one-dash long options are still supported. `-dt, -sd, -st, -sr` are dropped due to their conflict with grouped short options.
* `--section-mapping=false` (D57365) is strange but is kept for now.
* Many `cl::opt` variables were unnecessarily external. I added `static` whenever appropriate.
Reviewed By: jhenderson
Differential Revision: https://reviews.llvm.org/D105532
Some users use a long list of fixed patterns (PR50404) and
O(|patterns|*|symbols|) can be too slow. Such usage typically does not use
--regex or --wildcard. We can use a DenseSet<CachedHashStringRef> to optimize
name lookups.
Reviewed By: jhenderson
Differential Revision: https://reviews.llvm.org/D105218
This patch addresses the last remaining problems reported in PR51008.
Previous fixes for PR51008 worked under the wrong assumption that code regions
are always named (except maybe for the default region, which was automatically
named "main").
In reality, it is quite common for users to declare multiple anonymous regions.
So we cannot really use the region name as the key string of a JSON object. In
practice, code region names are completely optional.
Using "main" for the default region was also problematic because there can be
another region with that same name.
This patch fixes these issues by introducing a json::array of regions. Each
region has a "Name" field, which would default to the empty string for anonymous
regions.
Added a few more tests to verify that the JSON file format is still valid, and
that multiple anonymous regions all appear in the final output.
This patch renames object "Resources" to "TargetInfo".
Moved the getJSONTargetInfo method from class InstructionView to the
PipelinePrinter.
Removed uses of std::stringstream.
Removed unused method View::printViewJSON().
Part of https://lists.llvm.org/pipermail/llvm-dev/2021-July/151622.html
"Binary utilities: switch command line parsing from llvm::cl to OptTable"
* `--totals=false` and `--totals=0` cannot be used. Omit the option.
* `--help-list` is removed. This is a `cl::` specific option.
OptTable avoids global option collision if we decide to support multiplexing for binary utilities.
Note: because the tool is simple, and its long options are uncommon, I just drop
the one-dash forms except `-arch <value>` (Darwin style).
Reviewed By: jhenderson
Differential Revision: https://reviews.llvm.org/D105598
Similar to D104889. The tool is very simple and its long options are uncommon,
so just drop the one-dash form in this patch.
Reviewed By: jhenderson
Differential Revision: https://reviews.llvm.org/D105605
Instead of printing each region individually when using JSON format,
this patch creates a JSON object which is updated with the values of
each region, printing them at the end. New test is added for JSON output
with multiple regions.
Bug: https://bugs.llvm.org/show_bug.cgi?id=51008
Reviewed By: andreadb
Differential Revision: https://reviews.llvm.org/D105618
This matches what rc.exe tolerates in this type.
This fixes cases like this:
1 24
BEGIN
"<?xml version=""1.0""?>\n"
"<assembly>\n"
"</assembly>\n"
END
Differential Revision: https://reviews.llvm.org/D105621
There was an alias between 'simplifycfg' and 'simplify-cfg' in the
PassRegistry. That was the original reason for this patch, which
effectively removes the alias.
This patch also replaces all occurrances of 'simplify-cfg'
by 'simplifycfg'. Reason for choosing that form for the name is
that it matches the DEBUG_TYPE for the pass, and the legacy PM name
and also how it is spelled out in other passes such as
'loop-simplifycfg', and in other options such as
'simplifycfg-merge-cond-stores'.
I for some reason the name should be changed to 'simplify-cfg' in
the future, then I think such a renaming should be more widely done
and not only impacting the PassRegistry.
Reviewed By: aeubanks
Differential Revision: https://reviews.llvm.org/D105627
C++23 will make these conversions ambiguous - so fix them to make the
codebase forward-compatible with C++23 (& a follow-up change I've made
will make this ambiguous/invalid even in <C++23 so we don't regress
this & it generally improves the code anyway)
This commit also makes some slight changes to the scheduling model for AMDGPU to set the RetireOOO flag for all scheduling classes.
This flag is only used by llvm-mca and allows instructions to retire out of order.
See the differential link below for a deeper explanation of everything.
Differential Revision: https://reviews.llvm.org/D104730
Part of https://lists.llvm.org/pipermail/llvm-dev/2021-July/151622.html
"Binary utilities: switch command line parsing from llvm::cl to OptTable"
Users should generally observe no difference as long as they only use intended
option forms. Behavior changes:
* `-t=d` is removed. Use `-t d` instead.
* `--demangle=0` cannot be used. Omit the option or use `--no-demangle` instead.
* `--help-list` is removed. This is a `cl::` specific option.
Note:
* `-t` diagnostic gets improved.
* This patch avoids cl::opt collision if we decide to support multiplexing for binary utilities
* One-dash long options are still supported.
* The `-s` collision (`-s segment section` for Mach-O) is unfortunate. `-s` means `--print-armap` in GNU nm.
* This patch removes the last `cl::multi_val` use case from the `llvm/lib/Support/CommandLine.cpp` library
`-M` (`--print-armap`), `-U` (`--defined-only`), and `-W` (`--no-weak`)
are now deprecated. They could conflict with future GNU nm options.
(--print-armap has an existing alias -s, so GNU will unlikely add a new one.
--no-weak (not in GNU nm) is rarely used anyway.)
`--just-symbol-name` is now deprecated in favor of
`--format=just-symbols` and `-j`.
Reviewed By: jhenderson
Differential Revision: https://reviews.llvm.org/D105330
Some behavior changes:
* `-t=d` is removed. Use `-t d` instead.
* one-dash long options like `-all` are supported. Use `--all` instead.
* `--all=0` or `--all=false` cannot be used. (Note: `--all` is silently ignored anyway)
* `--help-list` is removed. This is a `cl::` specific option.
Nobody is likely leveraging any of the above.
Advantages:
* `-t` diagnostic gets improved.
* in the absence of `HideUnrelatedOptions`, `--help` will not list unrelated options if linking against libLLVM-13git.so or linker GC is not used.
* Decrease the probability of cl::opt collision if we do decide to support multiplexing
Note: because the tool is so simple, used more for forensics instead of a building
tool, and its long options are unlikely used in one-dash form, I just drop the
one-dash form in this patch.
Reviewed By: jhenderson
Differential Revision: https://reviews.llvm.org/D104889
Summary: The patch adds the StringTable dumping to
llvm-readobj. Currently only XCOFF is supported.
Reviewed By: jhenderson
Differential Revision: https://reviews.llvm.org/D104613
GNU and Apple `strip` implementations seems to support grouped options.
Enable the support for grouped options introduced in
https://reviews.llvm.org/D83639 for `llvm-strip` invocations.
Includes test that checks that both the grouped and non grouped
invocations produces the same result.
Reviewed By: alexander-shaposhnikov, MaskRay
Differential Revision: https://reviews.llvm.org/D105249
Based on the discussion in PR50922, minor changes have been done to properly
output a valid JSON. Removed "not implemented" keys.
Differential Revision: https://reviews.llvm.org/D105064
This is a first step towards consistently using the term 'executor' for the
process that executes JIT'd code. I've opted for 'executor' as the preferred
term over 'target' as target is already heavily overloaded ("the target
machine for the executor" is much clearer than "the target machine for the
target").
By using stable_sort.
Added a test case which previously failed when expensive checks were
enabled.
Reviewed By: MaskRay
Differential Revision: https://reviews.llvm.org/D105240
This is an ELF specific option which isn't supported for Windows/MinGW
targets, even if the MinGW linker otherwise uses an ld.bfd like linker
interface.
Differential Revision: https://reviews.llvm.org/D105148
The load command is currently specific to arm64 and holds information
for instruction rewriting, e.g. converting a GOT load to an ADR to
compute a local address.
(On ELF the information is usually conveyed by relocations, e.g.
R_X86_64_REX_GOTPCRELX, R_PPC64_TOC16_HA)
Reviewed By: alexander-shaposhnikov
Differential Revision: https://reviews.llvm.org/D104968
llvm-readobj is an internal testing tool for binary formats. Its output and
command line options do not need to be stable. It isn't supposed to be part of a
build process.
llvm-readelf was created as a user-facing utility and its interface intends to
be compatible with GNU readelf (unless there are good reasons not to).
The two tools have mostly compatible options. -s and -t are noticeable
exceptions due to history. I think the cost of keeping the inconsistency
overweighs the little history-compatible benefit and hinders transition from
cl::opt to OptTable, so let's change it.
Reviewed By: jhenderson
Differential Revision: https://reviews.llvm.org/D105055
An ARM64_RELOC_ADDEND relocation reuses the symbol field for the addend value.
We should pass through such relocations.
Reviewed By: alexander-shaposhnikov
Differential Revision: https://reviews.llvm.org/D104967
The patch reuses the common code to print memory operand addresses as
instruction comments. This helps to align the comments and enables using
target-specific comment markers when `evaluateMemoryOperandAddress()` is
implemented for them.
Differential Revision: https://reviews.llvm.org/D104861
For now, the source variable locations are printed at about the same
space as the comments for disassembled code, which can make some ranges
for variables disappear if a line contains comments, for example:
┠─ bar = W1
0: add x0, x2, #2, lsl #12 // =8192┃
4: add z31.d, z31.d, #65280 // =0xff00
8: nop ┻
The patch shifts the report a bit to allow printing comments up to
approximately 16 characters without interferences.
Differential Revision: https://reviews.llvm.org/D104700
LLVM disassembler can generate comments for disassembled instructions.
The patch enables printing these comments for 'llvm-objdump -d'.
Differential Revision: https://reviews.llvm.org/D104699
When the default target arch isn't one that is supported as a
windows target, we want to set a suitable architecture (so that
Clang tests that run plain 'llvm-rc' succeed checks for e.g.
"#ifdef _WIN32" even for llvm builds that default to e.g. ppc64).
But if the default target architecture is usable, don't rewrite it.
(Rewriting it, by e.g. "T.setArch(T.getArch())", normalizes the
spelling of the architecture, e.g. changing i686 to i386. Such a
change can make clang unable to find the right sysroot.)
This can't, unfortunately, practically be tested very well because
it is entirely dependent on the default triple of the llvm build.
Differential Revision: https://reviews.llvm.org/D104589
... even on targets preferring RELA. The section is only consumed by ld.lld
which can handle REL.
Follow-up to D104080 as I explained in the review. There are two advantages:
* The D104080 code only handles RELA, so arm/i386/mips32 etc may warn for -fprofile-use=/-fprofile-sample-use= usage.
* Decrease object file size for RELA targets
While here, change the relocation to relocate weights, instead of 0,1,2,3,..
I failed to catch the issue during review.
This is a mechanical change. This actually also renames the
similarly named methods in the SmallString class, however these
methods don't seem to be used outside of the llvm subproject, so
this doesn't break building of the rest of the monorepo.
We don't want to start updating tests to use opaque pointers until we're
close to the opaque pointer transition. However, before the transition
we want to run tests as if pointers are opaque pointers to see if there
are any crashes.
At some point when we have a flag to only create opaque pointers in the
bitcode and textual IR readers, and when we have fixed all places that
try to read a pointee type, this flag will be useless. However, until
then, this can help us find issues more easily.
Since the cl::opt is read into LLVMContext, we need to make sure
LLVMContext is created after cl::ParseCommandLineOptions().
Previously ValueEnumerator would visit the value types of global values
via the pointer type, but with opaque pointers we have to manually visit
the value type.
Reviewed By: nikic, dexonsmith
Differential Revision: https://reviews.llvm.org/D103503
Currently when .llvm.call-graph-profile is created by llvm it explicitly encodes the symbol indices. This section is basically a black box for post processing tools. For example, if we run strip -s on the object files the symbol table changes, but indices in that section do not. In non-visible behavior indices point to wrong symbols. The visible behavior indices point outside of Symbol table: "invalid symbol index".
This patch changes the format by using R_*_NONE relocations to indicate the from/to symbols. The Frequency (Weight) will still be in the .llvm.call-graph-profile, but symbol information will be in relocation section. In LLD information from both sections is used to reconstruct call graph profile. Relocations themselves will never be applied.
With this approach post processing tools that handle relocations correctly work for this section also. Tools can add/remove symbols and as long as they handle relocation sections with this approach information stays correct.
Doing a quick experiment with clang-13.
The size went up from 107KB to 322KB, aggregate of all the input sections. Size of clang-13 binary is ~118MB. For users of -fprofile-use/-fprofile-sample-use the size of object files will go up slightly, it will not impact final binary size.
Reviewed By: jhenderson, MaskRay
Differential Revision: https://reviews.llvm.org/D104080
Change --max-timeline-cycles=0 to mean no limit on the number of cycles.
Use this in AMDGPU tests to show all instructions in the timeline view
instead of having it arbitrarily truncated.
Differential Revision: https://reviews.llvm.org/D104846
A ConstantStruct is renamed when the LLVM context sees a new one. This
makes global variable initializers appear different when they aren't.
Instead, check the ConstantStruct for equivalence.
Differential Revision: https://reviews.llvm.org/D104734
This is a similarity visualization tool that accepts a Module and
passes it to the IRSimilarityIdentifier. The resulting SimilarityGroups
are output in a JSON file.
Tests are found in test/tools/llvm-sim and check for the file not found,
a bad module, and that the JSON is created correctly.
Reviewers: paquette, jroelofs, MaskRay
Recommit of: 15645d044b to fix linking
errors and GN build system.
Differential Revision: https://reviews.llvm.org/D86974
Without this patch we're only showing a generic error message derived
from the error code to the end user.
rdar://79378794
Differential Revision: https://reviews.llvm.org/D104483
In a callback case, a return from internal code, say A, to external runtime can happen. The external runtime can then call back to another internal routine, say B. Making an artificial branch that looks like a return from A to B can confuse the unwinder to treat the instruction before B as the call instruction.
Reviewed By: wenlei, wmi
Differential Revision: https://reviews.llvm.org/D104546
Global initializers may be ConstantArrays. They need to be checked
explicitly, because different-yet-still-equivalent type names may be
used for each, and/or a GEP instruction may appear in one.
The only wrinkle is that we can't process the "blockaddress" arguments
of the callbr until the blocks have been equated. So we force them to be
"unified" before checking.
This was left out when the callbr instruction was added.
Differential Revision: https://reviews.llvm.org/D104606
0 latency instructions now get processed and retired properly within the in-order pipeline. Had to fix a bug within TimelineView.cpp as well that would show up when a 0 latency instruction was the first instruction in the source.
Differential Revision: https://reviews.llvm.org/D104675
Some APIs work with const variables while others don't. This can cause
conflicts when calling one from the other.
This is NFC.
Differential Revision: https://reviews.llvm.org/D104719
These serve as a convenient combination of consume_front/back and
startswith_lower/endswith_lower, consistent with other existing
case insensitive methods named <operation>_lower.
Differential Revision: https://reviews.llvm.org/D104218
Add a parameter of IsFSDiscriminator to function
getBaseDiscriminatorFromDiscriminator().
This function currently checks the internal flag of
--enable-fs-discriminator. This is not good because we might
change the default value of the internal flag.
Note that we have a default parameter. This is just
because create_afdo_tool has a call-site to it.
I will remove the default parameter in a later patch.
Differential Revision: https://reviews.llvm.org/D104584
The argument reduction pass shouldn't remove arguments of
intrinsics, because the resulting module is ill-formed, and so
inherently uninteresting.
Reviewed By: aeubanks
Differential Revision: https://reviews.llvm.org/D103129
This makes it more convenient to get a text format profile.
Add an error for printing non-text format output to a terminal for instrumentation profile.
(It cannot be portably tested. For sample profile, raw_fd_ostream is hidden deeply so it's inconvenient to add a diagnostic.)
Reviewed By: davidxl
Differential Revision: https://reviews.llvm.org/D104600
As a follow-up to https://reviews.llvm.org/D104129, I'm cleaning up the danling probe related code in both the compiler and llvm-profgen.
I'm seeing a 5% size win for the pseudo_probe section for SPEC2017 and 10% for Ciner. Certain benchmark such as 602.gcc has a 20% size win. No obvious difference seen on build time for SPEC2017 and Cinder.
Reviewed By: wenlei
Differential Revision: https://reviews.llvm.org/D104477
We were using 0 as an indicator of invalid offset when computing disjoint ranges. In reality, 0 can be an valid code offset which stands for the first function in .text section. I'm using UINT64_MAX as an invalid code offset instead.
Reviewed By: wenlei
Differential Revision: https://reviews.llvm.org/D104497
If we have seen an inwards transition from external code to internal code, but not a following outwards transition, the inwards transition is likely due to interrupt which is usually unpaired. Ignore current and subsequent entries since they are likely from an unrelated pre-interrupt context.
LBR records from different interrupt context are unrelated and they should not be mixed together. Currenlty the OS does this for task-scheduling interrupt but not for all interrupts.
Reviewed By: wenlei, wlei
Differential Revision: https://reviews.llvm.org/D104276
We were using a `StringMap` object to store all profiles to be emitted. The object is basically an unordered hash table, therefore updating it in the process of trasvering it may cause issue since the underlying bucket array could change.
I'm also moving the `csspgo-preinliner` switch around so that no context tri will be constructed (by the constructor of `CSPreInliner`) when the switch is off.
Reviewed By: wenlei
Differential Revision: https://reviews.llvm.org/D104267
Put the dtor of mca::CustomBehaviour into the cpp file to avoid
undefined vtable when linking libLLVMMCACustomBehaviourAMDGPU as shared
library.
Differential Revision: https://reviews.llvm.org/D104401